Published In

Journal of Child Language

Document Type


Publication Date



Register (Linguistics) -- Variation, Corpora (Linguistics) -- Research


In their conceptual framework for linguistic literacy development, Ravid & Tolchinsky synthesize research studies from several perspectives. One of these is corpus-based research, which has been used for several large-scale research studies of spoken and written registers over the past 20 years. In this approach, a large, principled collection of natural texts (a 'corpus') is analysed using computational and interactive techniques, to identify the salient linguistic characteristics of each register or text variety. Three characteristics of corpus-based analysis are particularly important (see Biber, Conrad & Reppen 1998):(1) a special concern for the representativeness of the text sample being analysed, and for the generalizability of fndings; (2) overt recognition of the interactions among linguistic features: the ways in which features co-occur and alternate; (3) a focus on register as the most important parameter of linguistic variation: strong patterns of use in one register often represent only weak patterns in other registers. Corpus studies have documented the linguistic differences among spoken and written registers in English and other languages. Further, by analyzing systematic corpora produced by students at different stages, these same techniques have been used to track the patterns of extended language development associated with literacy. Two major patterns emerge from studies in this research tradition: (1) adult written language is dramatically different from natural conversation; and (2) written language is by no means homogeneous: rather, there are major linguistic differences among written registers. Thus, the developmental acquisition of linguistic literacy requires control over the patterns of register variation, in addition to a mastery of the mechanics of the written mode.


This is the publisher's final PDF. This paper was published in Journal of Child Language (



Persistent Identifier