Usage patterns in the development of Hebrew grammatical subjects

The grammatical subject is a multi-faceted linguistic notion embedded in morphology, syntax and discourse-pragmatics. In Hebrew, grammatical subjects are associated with two distinct word orders, differing along grammatical, semantic, and pragmatic axes. The study examines the growth and proliferation of Hebrew grammatical subjects in the spontaneous speech of preschool children in a Usage-Based perspective, taking into account inflectional, syntactic and semantic properties of the clause side by side with discursive information. The corpus used for this study consisted of the recordings of 54 children in six age groups from two to eight years, engaged in triadic peer talk. Subject, predicate morphology and word order were coded, and utterances were coded according to their conversational roles. Using cluster analysis, each age group was found to have a characteristic usage pattern of subjects with associated syntactic, semantic and discursive properties, underscoring the acquisition and development of grammatical subjects in Hebrew. The usage patterns emerging from the current corpus are taken as a manifestation of the Discourse Profile Constructions notion: probabilistic form-function correlations consisting of multiple sources of formal and functional information, pairing a usage pattern of clauses with a unified construal and discourse function. Three Discourse Profile Constructions emerged from the data: joint action planning, conversational narrative, and conversational presentation. Each of these was associated with a different patterning of lexical or pronominal subjects coupled with predicates with specific temporal features and different word order. These findings suggest that gaining command of the subject category is linked to communicative functions in development.


Introduction
Syntactic development has been the topic of developmental psycholinguistic investigation since its inception (Brown 1973). The scope of what is considered as syntactic development has been broad, focusing in young speakers on inflectional morphology, wh-questions, subject-aux inversion (e.g., Santelmann et al. 2002;Goldberg 2006), head-complement order, and verb-argument structure (e.g., Tomasello 2003;Mcclure et al. 2006;Ambridge & Blything 2016). Studies on older, preschool children's syntax are few (Frizelle et al. 2018), and have mostly targeted the acquisition of relative clauses ( Friedmann & Novogrodsky 2004;Arnon 2010;Kirjavainen et al. 2017;Haendler & Adani 2018). Researchers of both language development and processing are in agreement that learning how grammatical subjects are expressed and used is critical for young children, as subjects are related to virtually any aspect of syntactic and pragmatic development. Most previous studies have examined the realization of grammatical subjects (overt vs. null), mainly in dyadic contexts, at the early emergence of syntax (Borer & Wexler 1987;Bloom 1990;Valian 1990;1991;Grinstead 1998;Levy & Vainikka 2000;Hacohen & Schaeffer 2007). Fewer Glossa general linguistics a journal of Dattner, Elitzur, et al. 2019. Usage patterns in the development of Hebrew grammatical subjects. Glossa: a journal of general linguistics 4(1): 129. 1-28. DOI: https://doi.org/10. 5334/gjgl.928 1978;Ravid 1995;Schwarzwald 2002). While the current study focuses on syntactic phenomena, verb agreement is relevant here as grammatical subjects can be either lexical or pronominal, including pronominal omission. The issue of subject realization and omission in Hebrew is vastly studied in the literature, mainly in the context of pro-drop discussions. For example, it was shown to be related to accessibility (Ariel 1998), licensing (Borer 1989), and information structure (Melnik 2006). However, the prototypical behavior of subjects in Modern Hebrew can be generalized as shown in Tables 1 and 2. Tables 1 and 2 show the Hebrew agreement system regarding finite (past, present and future tense) main verbs, in terms of type of agreement (Table 1: number, gender, and person), and the options for pronominal omissions ( Table 2).
Past and future verbs agree with their subjects in person, number and gender, while present tense verbs are marked only for number and gender agreement ( Table 1). Pronominal subject omission is generally possible in 1 st and 2 nd person pronouns in past and future tense, and is prototypically not allowed in all other positions ( Table 2). Hebrew may allow third person pronominal omission in some restricted contexts, such as adjunct subordinate clauses, with a particular semantic types of verbs (Melnik 2007), or in some accessibility related scenarios (Ariel 1998). In the following sections, we exemplify the main patterns of subject realization in Modern Hebrew.

Default word orders and subjects
Modern Hebrew has two default word orders, which differ along several grammatical and lexical axes and are highly relevant to the notion of subject (Berman 1980;Ravid 1995;Goldenberg 1998;Kuzar 2012). Framed in typological and discursive parlance, these orientations are two alignments (Comrie 1978;Dixon 1979), reflecting Dryer's (1997) SV vs. VS contrastive orders, which he regards as a fundamental property of language. One orientation is the Subject-First construction with A/S marked grammatical subjects followed by predicates on which they confer agreement. Another is the Predicate-First construction (discussed in Melnik 2006), which is close to the semi-ergative, or active alignments (Bubenik 1989;Du Bois 2017), with clause-final, non-canonical, sometimes O-like marked subjects.

The Subject-First construction
Two sub-constructions are actually included under the Subject-First construction, the habitat of typical grammatical subjects in Hebrew -the transitive AVO and the intransitive SV constructions. The first one, AVO, refers to transitive clauses typically starting with an  The second Subject-First construction is the intransitive SV structure. SV constructions too start with grammatical subjects, followed by an intransitive predicate on which they confer agreement (see (3) below). SV predicates may contain an intransitive lexical verb, as in (3-a)-(3-b); or else they are copular constructions with grammatical verbs that agree with the grammatical subject (3-c): In sum, Hebrew Subject-First constructions (2)-(3) host pre-verbal pronominal and lexical grammatical subjects, agreement-conferring NPs, expressing the range of thematic roles associated with the notion of subject ( Van Valin 2004). They also express a non-constrained range of predicative content (Ravid & Cahana-Amitay 2005), both verbal and copular.

The Predicate-First construction
In contrast to Subject-First constructions, Predicate-First constructions lack canonical grammatical subjects (Berman 1980;, and offer little marking of agreement (Ravid 1995;Preminger 2009;Haspelmath & Sims 2010). The NP slot occupied by grammatical subjects in this construction can only contain lexical subjects. They are also restricted in the range of predicative content to non-dynamic or presentative construals (Ziv 1988;Melnik 2006;Dattner 2008). Two constructions are actually included under this Predicate-First label. First, existence/possession and modal/experience clauses expressing non-dynamic, mainly present-tense oriented content. These are demonstrated in (4) below.
(4) Existence/possession and modal/experience clauses a. yesh li et ha-xacer haxi niflaa b-a-shxuna. be to.me acc the-yard.fm(S) most wonderful.fm in-the-neighborhood. 'I have the most wonderful yard in the neighborhood.' b. carix le-calcel b-a-paamon. necessary to-ring in-the-bell. 'You have to ring the bell.' The second Predicate-First construction is an early-emerging class of presentative constructions (Berman 1985;Dromi & Berman 1986) with lexical verbs and subjects (5).
(5) afa lo pitom cipor gdola. flew.fm to.him suddenly bird.fm(S) big.fm. 'A big bird suddenly flew away from him.' The fact that grammatical subjects do not constitute a homogeneous, coherent category in Hebrew poses a challenge to young speakers who base their learning on usage patterns in conversation (Fischer 2015). They have to learn about grammatical subjects from occurrences of two different word orders with distinct morphological, syntactic, semantic and discursive functions, where subjects make more and less canonical appearances. Therefore, the current study takes a developmental perspective on the characterization of the notion of subject in spoken Hebrew interaction (Fox & Thompson 1990;Berman & Slobin 1994;Dasinger & Toupin 1994;Helasvuo & Kyröläinen 2016).

Grammatical subjects in spoken interaction
A context-bound study that takes dialogical conditions into account has been proven relevant in many cases, especially in accounting for the emergence of pragmatically dependent elements (such as the emergence of pronouns, conversational contexts containing turn-taking, interaction, and third party reference). These, and other interactive phenomena utilize triadic or polyadic skills not used in dyadic contexts (Forrester 1988). Deixis forms, for example, may be learned through overhearing, and so does understanding of naming practices (Brener 1983;Oshima-Takane 1988).
In adult-child conversation, adult interlocutors use scaffolding procedures to clarify and interpret children's utterances, while making the meaning of their own language forms accessible to the child by basing them in shared context or linking them to previous discourse. Thus, adults provide children with models of how their intentional meanings should find conventional expression (Murase et al. 2005;Howard et al. 2008).
The context of the current analysis is the natural conversation of Hebrew-speaking preschool and school children with their age peers, rather than with their caregivers. Peer talk offers a unique window on development, as children in peer interactions do not receive elaborate adult feedback facilitating linguistic communication (Blum-kulka et al. 2010;Schuele 2010). Therefore, peer interactions provide more realistic distributions of language structures and reliable information on linguistic usage (Hoff 2010;Veneziano 2010;. Peer talk moreover renders children active participants in communicative events, tasked with expressing intentions and meanings to their interlocutors while taking into account their discursive roles (Forrester & Cherrington 2009). As conversational agents, children participate in the construction of communicatively valid behaviors and are placed upfront for observing and treating those of the more experienced partner (Snow & Ferguson 1977;Gallaway & Richards 1994). Furthermore, utterances in conversation might have different roles. That is, an utterance can initiate a conversation (or a section within one), or it can function as a response to a previous utterance by an interlocutor. Utterances can also be structured based on previous utterances, thus resonating them (Du Bois 2007). These qualities of utterances provide children with different opportunities to manifest their involvement in conversation, taking into account that each type of utterance has a different function. We assume children's emerging conversational skills go hand in hand with grammatical acquisition (Serratrice 2005), making peer talk a particularly felicitous framework for studying the development of grammatical subjects.

Seeking grammatical subjects in development
Despite its obvious importance in the acquisition of syntax, the developmental literature does not contain an abundance of studies looking into the emergence and consolidation of grammatical subjecthood. Grammatical subjects make their appearance in child language investigation under three main umbrellas. One perspective, prevalent in the generative framework, focuses on subject slots as empty categories (De Villiers 1995), especially in relation to the agent and patient roles. According to de Villiers, the inventory of possible lexical NPs in the language is paralleled by a set of empty categories, which need to be learned (or parametrized) by the child, given the different syntactic and lexical properties in different languages. The acquisition problem according to this view is for the child to determine this inventory and the syntactic conditions under which empty categories are allowed. The omission of pronominal subjects by young children, termed null subjects (e.g., want water) has thus been a central topic of generative acquisition studies, explained as either due to children having different grammatical abilities than adults (Borer & Wexler 1987;Clahsen 1990;Valian 1991), or assuming an extended period where English allows null subjects (Hyams & Wexler 1993). Bloom (1993) strongly supports the parametric view, whereby children start off representing overt subjects as obligatory (as in English), but as a result of simple positive input -subjectless sentences in the speech of adultschildren who are exposed to a null subject language can switch their grammars to prodrop (as in Italian or Spanish, see Shin 2016) or topic-drop (as in Chinese). Finally, in Hebrew-specific perspective, Armon-Lotem (2008) regards the acquisition of the Hebrew finite verb as related to the morphosyntactic features of the subject of the sentence. She examines the emergence of verb morphology together with overt subjects in the speech of several Hebrew-speaking children from about 1;6 to 3 years of age. This study indicates that Hebrew number, gender and person marking first appears towards the end of the second year of life, while bare subjects first occur much earlier; however, no distributional evidence is offered.
A second perspective on learning grammatical subjects regards it as part of the acquisition of the verb's argument structure (Tomasello 1992;Gleitman & Gillette 1995), as a window on children's growing knowledge of verb semantics and tendency towards agentive structures (Matthei 1987;Brooks & Tomasello 1999;Alishahi & Stevenson 2008). These studies mainly focused on the number and type of verb complements across early development, frequency and meaning of specific constructions, and implications for learning (Bidgood et al. 2014;Ambridge & Blything 2016), including impaired populations (Fletcher & Ingham 1995). For example, Goldberg (2006) considers the verb as having more predictive value than the grammatical subject in generalizing argument structure constructions for learners. However, no study to date has systematically investigated the evidence for the consolidation of grammatical subject structures and functions from early to the later preschool years.
A third approach to learning grammatical subjects focuses on children's early ability to detect the pragmatic and discourse conditions for subject realization and omission, as shown by, e.g., Allen (2000) for Inuktitut, and by Guerriero et al. (2001) for Japanese. One insight is that in languages with fixed word order, such as English, children rely on word order and pragmatic knowledge in comprehending active transitive sentences, whereas in languages with more flexible word order, the primary source of information about the agent/subject role is verb agreement (Abbot-smith & Serratrice 2015; Serratrice & Allen 2015). Important to our current topic, Paradis & Navarro (2003) show that the sources of information available to young native learners regarding subject realization at the pragmatics/syntax interface include both distributions of overt and null subjects in the ambient language as well as the specific events that children experience in on-going interaction. Paradis and Navarro show that, in line with other studies of Spanish and Catalan acquisition (Grinstead 1998), monolingual children in their study grasped the appropriate frequency and the functional discourse-pragmatics determinants of subject realization in Spanish at two years of age. This conclusion is shared by Salazar Orvig et al.'s 2010 study of clitic pronouns in dyadic conversations, who suggest that children's uses of pronouns reflect early pragmatic skills, acquired by the third year of life. They also suggest that the grammatical values of morphological devices are not acquired prior to their discursive value, but rather these pragmatic functions are associated with the grammatical level from the onset. Finally, peer talk studies of children aged 12-34 months demonstrate how children build and co-construct grammar dialogically in terms of the social-interactive projects they are trying to accomplish, negotiating how they stand vis-a-vis one another (Goodwin & Kyratzis 2012;Köymen & Kyratzis 2014).
Within these three perspectives to the development of grammatical subjects, previous research mostly examined isolated parameters. Consequently, research questions regarded the study parameter as a dependent variable, asking what affects its realization. For instance, the realization of the subject (as lexical vs. pronominal vs. zero, Serratrice 2005), or the development of a specific argument structure construction, can be studied in this way. However, engaging in language is an inherently multifactorial, context sensitive, social activity. That is, parameters do not evolve in isolation. Thus, the current paper proposes an integrated approach to the development of grammatical subjects, seeking for usage patterns that capture the contextual conditions for each type of grammatical subject in different constructions. In that we follow Croft's (2001) radical approach to grammatical phenomena, and suggest that the grammatical subject is not a unified notion. Rather, it is built bottom-up on a construction-specific basis. By revealing the subject-related usage patterns associated with each age group, we will be able to portray a picture of the developmental path through which such exemplar-based grammatical category is created, and a non-unified, probabilistic notion of grammatical subject is formed. Since the notion of subject is a multi-faceted phenomenon with relevance to morphological, syntactic and discourse-pragmatic information, we analyze its occurrence in children's peer-talk conversations across preschool ages, using cluster analysis to reveal multifactorial usage patterns. We expect to find usage patterns which are sets of utterances, grouped together according to statistically based within-cluster similarity and between-cluster dissimilarity. We hypothesize that the distribution of usage patterns in the corpus will not be uniform across age groups, such that each usage pattern will characterize a particular age range in the data. That is, for every age range there should be a usage pattern that is used significantly more frequently than others. Thus we will be able to characterize each age range in terms of its usage patterns of grammatical subjects. In order to so, we utilize the notion of Discourse Profile Constructions.

Form, function, and Discourse Profile Constructions
Language use is inherently contextual, concrete, and discriminatory (Ramscar et al. 2010). Form and function, therefore, are intimately related, in the most obvious way, especially in the case of grammatical form (morphology and syntax). 1 A verb in the past tense form, for example, has the function of construing a state of affairs that took place in a time before the time of speech. Such construal is mostly connected with narrative discourse. On the other hand, a verb in the future tense form has the function of discussing the future; that is, it is used to predict, plan, and convey intentions. In the same way, a subject in the first person plural form has the function of talking about the interlocutors as a unified entity, and a subject in the second person singular has the function of profiling the interlocutor as a topic in the current discourse. Thus, form is more than isolated, context-free morphology or syntax; form is always motivated by function, to the extent that morphemes, words, and sentences have meanings. That is, there is a functional (i.e., meaningful) difference between a first person plural pronominal subject and a first person singular subject, and there is a functional difference between a verb in the past form and a verb in the present form.
Such an approach to language use and language function as highly concrete stems from the Usage-Based approach to language, in which language knowledge, as well as linguistic phenomena, are the product of praxis. The Usage-Based view can be traced back to Wittgenstein who put forward the proposition that, for many cases, "the meaning of a word is its use in language" (Wittgenstein 1953). Usage, however, is a very fluid concept. Taking the notion of usage seriously, then, we need to adopt a model that is able to quantify the role of usage in building language knowledge. The Discourse Profile Construction hypothesis is such a model (Dattner 2019). Discourse Profile Constructions (henceforth, DPCs) are a manifestation of this view in accounting for the usage from which the meanings, or the functions, of grammatical constructs emerge. DPCs are emergent form-function correlations that consist of multiple sources of concrete formal and functional information, conventionally pairing a usage pattern of clauses with a unified construal or discursive function. A usage pattern in this context is a frequent co-occurrence of concrete elements of form and function (e.g., a first person singular subject and a past tense verb), realized as a cluster of tokens with high within cluster similarity and high between cluster dissimilarity. Moreover, Dattner (2019) shows that DPCs are extensions of the Construction Grammar notion of Argument Structure Construction (Goldberg 1995;Perek 2015). From a usage-based, discursive point of view, Argument Structure Constructions can be broaden to include concrete forms and functions, emphasizing the existence of basic discursive scenarios that correspond with concrete exemplars of usage patterns.
Note that other models try to answer the question of what is a possible sentence in a language, or "to what extent is the human faculty of language an optimal solution to minimal design specifications, conditions that must be satisfied for language to be usable at all" (Chomsky 2001). The approach taken in the present paper assumes that for modeling the grammar of a language the question should be why does the speaker choose the concrete form A over form B to convey a construal of the world X. Thus, the DPC hypothesis states that DPCs are the basic clausal form-function correlations in the language, constituting the main usage-based source from which grammatical meaning emerges. Consequently, the DPC hypothesis does not account for what can be done with language; rather, it accounts for what is done with language.

The corpus
The peer talk corpus (Zwiling 2009) consists of a total of eight hours of recordings of 54 children in six age groups: 2;0-2;6, 2;6-3;0, 3;0-4;0, 4;0-5;0, 7;0-8;0, engaged in triadic conversations, three triads per age group. All participants were native Hebrew speakers from medium-high SES, with no language disorders or other developmental problems. Triadic conversation was selected for this study as requiring complex turn-taking, the usage of pronominal reference, and the comprehension of the role of a third person in conversation (Salazar Orvig et al. 2010). Each of the 18 triadic sessions was conducted in one of the children's homes, audio-recorded and transcribed. Each triadic session was 30 minutes long, except for the youngest group triads (ages 2;0-2;6), which were 40 minutes long in order to allow for a larger production of language. One triad in each age group was given instructions to celebrate a birthday to a puppet, while the other two triads were not given any instructions. The researcher did not engage in children's spontaneous interaction.
The peer-talk corpus comprises 36,490 words, in 11,870 utterances. Among these, the current paper examines a sub-corpus composed of only those utterances containing all pronominal (both overt and zero) and lexical grammatical subjects. This is the Subject Corpus (SC), containing 5,694 utterances in 11,283 words. Table 3 provides the number of words and utterances in the two corpora by age group.

Method
Every utterance was coded according to the following criteria. (1) Role in conversation: As noted above, conversational data allows one to explore whether (and how) speakers use different patterns in conversation which correlate with different functions. Thus, the following roles of utterances were coded: Initiated, Repeating, Response, or Resonating utterances. This coding was conducted separately by the first and second authors, with a discussion resolving disagreement. (2) Predicate temporality (tense): past, present, and future. Imperative and infinitive verbs were excluded as irrelevant to the current analysis of subjects.
(3) Subject features: (i) lexical or pronominal (both overt and zero), (ii) person and number, and (iii) order relative to predicate. Table 4 provides a summary of the coding parameters.

Revealing usage patterns using cluster analysis
Taking a multivariate exploratory statistics perspective, we sought usage patterns in an unsupervised statistical learning manner, specifically using Hierarchical Clustering on Principal Components (henceforth, HCPC, with the FactoMineR R package; Lê et al. 2008; Thus, it functions as a pre-processing step allowing to compute clustering on categorical data. Second, the MCA functions as a method for reducing noise in the data. Without such noise, the clustering analysis is more stable as it is done on the signal itself. In the present case, utterances were clustered together, groupings which we will describe according to the variables each utterance was coded for. We finally used Correspondence Analysis (CA) to reveal associations between age groups and usage patterns, as age was not part of the HCPC calculation. CA is based on a contingency table representing relationships between two variables, calculating χ 2 distances (Yelland 2010;Husson et al. 2017). For a comprehensive illustration of revealing usage patterns using cluster analysis, see Dattner (2015;2019).

Cluster analysis: Subject-related usage patterns
A Hierarchical Clustering on Principal Components analysis suggested an optimal division of the utterances in the data into five clusters. The optimal number of clusters was chosen according to growth of inertia, such that the difference in explained variation between dividing the data into four versus five clusters is greater than the difference between dividing the data into five versus six clusters. The cluster dendrogram is presented in Figure 1, and a description of each cluster according to the categories that were mostly associated with it is given bellow, summarized in Table 5. Cluster one comprised 846 tokens, which were 14.85% of the 5,694 utterances in the data. It is a multifactorial usage pattern, mostly associated with initiated utterances containing a first person plural pronominal subject and a future tense verb (see Table 5 for a description of category distributions). It has the functions of (i) profiling the at-the-scene peer group as a single entity (hence the first person plural), and (ii) planning, suggesting and conveying intentions (hence the future tense verb). The following is a representative example for cluster one: (6) Cluster one: bo nire et ze. come will.see.1.pl acc this. 'Let's look at this one.' Cluster two was the largest cluster of tokens in the data, comprising 3,054 utterances, constituting 53.63% of the utterances in the data. It is a usage pattern associated mostly with second person singular pronominal subjects, initiated and response utterances, with present tense verbs, and less so with first person singular. These characteristics are coupled with the functions of (i) highlighting the role of the interlocutor in the current state of affairs, and (ii) interpreting and commenting on current situations: Cluster three consists of 475 tokens, which are 8.34% of the utterances in the data. This usage pattern is mostly associated with post-verbal lexical subjects, resonating utterances, and present tense verbs. Each of these forms is a marker of a particular function: The present tense verb marks the function of commenting on current state of affairs, and the post-verbal subject is used to introduce an out-of-the-scene entity to the discourse (hence the use of a lexical subject, see Ariel 2004 Cluster four was very small, consisting of 60 tokens, which are 1.05% of the utterances in the data. It was mostly associated with post-verbal lexical singular subjects in repeating utterances. This form, as was the case in cluster three, is linked with introducing a new entity into the discourse, as in the following:  The usage pattern realized as cluster five consists of 1,259 tokens, constituting 22.11% of the utterances in the data. It is mostly associated with post-verbal (plural) lexical subjects, present tense verbs, in initiated utterances. The main difference between cluster five's usage pattern and the usage pattern of clusters three and four is the role of the utterance in the conversation. While clusters three and four consists of resonating and repeating utterances (respectively), cluster five's usage pattern is linked to initiated utterances. Such utterances constitute a more natural place in the discourse for introducing new entities, as in: (10) Cluster five: beseder, yesh lanu po dapim. fine, be to.us here pages. 'OK, here we have some sheets of paper.' In the following section we will look at the distribution and correspondence of each usage pattern with regard to age groups.

Usage patterns in the development of the grammatical subject as a grammatical category
The cluster analysis presented above was done without taking into account the association of each utterance with a specific age group. That is, for the purpose of the cluster analysis, the tokens in the data were considered to be a homogeneous corpus of 5,694 utterances, each coded for the same set of categorical variables. This allowed us to reveal usage patterns in the data regardless of the speaker's age. However, given each token's identification and cluster, we were able to conduct a post-processing analysis, looking for correspondences between cluster-based usage patterns and age groups. Figure 2 presents a Correspondence Analysis of age group and clusters.
The Correspondence Analysis map in Figure 2 depicts a clear developmental path in terms of similarity between age groups, expressed in shared cluster-based usage patterns of the grammatical subjects in the data. The youngest age group is located distantly relative to all other age groups, indicating that 2 year olds share very few usage patterns with the older children. Half a year older, the 2;6-3;0 children are more similar to the 3;0-4;0 and 4;0-5;0 age groups than to the 2-2;6 group. The 4;0-5;0 group is located a little higher on the map, due to its higher usage-based similarity to the older children in the corpus. And finally, the 5;0-6;0 and 7;0-8;0 groups are located near each other, with the 5;0-6;0 group functioning as a bridge between the 4;0-5;0 and the 7;0-8;0 age groups. That is, based on shared usage patterns of grammatical subjects in conversation, we can detect a gradient of three developmental stages in the preschool years: (i) 2;0-2;6 year olds, (ii) 2;6-5;0 year olds, and (iii) 5;0-8;0 year olds. This developmental path can be further examined by the usage pattern that mostly corresponds to each group. Note, however, that "mostly corresponds" does not mean "exclusively used": all usage patterns can be found in all age groups at some frequency. Rather, a characterization of an age group according to a usage pattern is based on a scalar, probabilistic distribution. The following sections present the developmental path of the grammatical subject usage patterns as realized by the correspondence analysis.

2;0-2;6 year olds
The youngest children in our database (2;0-2;6 years) are characterized by the usage patterns in clusters three and four (see Figure 2 and Table 5): post-verbal lexical subjects, in repeating and resonating utterances. The post-verbal position in Hebrew is mostly used to introduce and discuss new entities in the discourse (Melnik 2006). The fact that the 2;0-2;6 age group is linked to the usage patterns of clusters 3 and 4 indicates that they hardly engaged in a conversation, and produced few new grammatical structures without keeping a continued discourse topic. This is exemplified in the following excerpt (each turn represents a different speaker):  Moreover, looking at the conversations in this age group in a qualitatively perspective, we can see that the youngest children in our corpus tended to re-use given grammatical structures (realized as resonating and repeating utterances), in order to continuously introduce concrete, extant entities denoted by their size and color to the conversation in skeletal clauses devoid of verbs, as in the following examples: The next age groups that share similar usage patterns according to the CA analysis can be characterized by the usage patterns of cluster three -resonating utterances in the present tense and post-verbal lexical subjects (also characterizing the younger children); and cluster one -initiated utterances containing a first person plural pronominal subject, and a future tense verb. (14) and (15) below exemplify the usage patterns of cluster three and one in these age groups, respectively: These examples indicate how the peer talk of children between the ages of two and a half to five years differs from that of their younger peers. First, they used nominal labels for new entities rather than their tangible characteristics (e.g., cake rather than the small one). Moreover, their clauses tended to include finite verbs and inflected nominals in addition to the skeletal verbless clauses used by younger participants. The 2;6-5;0 year olds already used resonating rather than repeating utterances, and a less restricted use of grammatical subjects, mainly including first person plural. That is, they re-use accessible structures to produce new and extended verbalizations, and they initiate conversations with reference to the group of interlocutors (1 st person pl.), mainly discussing mutual planning and intentions in collaborative future tense, as in (15)  While we defined this age range uniformly (2;6-5;0), we can nevertheless detect a developmental path within this range, albeit a milder one. The 3;0-5;0 groups are less structurally restricted in terms of characterization by a specific cluster, as represented by the distance from the 2;6-3;0 group on the CA map in Figure 2. Unlike the younger peers, the 3;0-5;0 year olds also initiate conversations using new structures rather than producing resonating utterances. In (17), an interlocutor from the 3;0-4;0 group is addressed explicitly by name in a well-formed question, responding with an inflected verb describing the act of handing out tableware for a planned meal. In the exchange that follows, the speakers use the predicate-first construction to describe the current situation, but contrary to the youngest age group, this usage targets explicitly labeled objects (glasses and plates), includes a logical conjunction (because), and already demonstrates the characteristic absence of agreement between the predicative adjective (missing) and the post-verbal subject. In the CA map ( Figure 2) we can see that the 3;0-4;0 and 4;0-5;0 groups are also characterized by the use of cluster two, which is less linked with the 2;6-3;0 group. Cluster two contains initiated and response utterances, with first and second person singular pronominal subjects in past tense. Example (18)  The oldest age range is mostly characterized by cluster five's usage pattern: initiated utterances in the present tense, with post verbal (plural) lexical subjects: (20) beseder, yesh lanu po dapim. fine, be to.us here pages. 'OK, here we have some sheets of paper.' The groups in this age range are located at the top half of the CA map due to their use of post verbal lexical subjects, shared with the youngest group. However, while the young children used this structure in repeating (cluster 4, see example (12) above) or resonating (cluster 3, see example (13)) utterances, the older groups are using it to appropriately introduce new entities (both concrete/present and abstract/absent) into the discourse in initiated utterances (Examples (20) and (21)): (21) tihye mesiba nehederet. will.be.3.fm.sg party.fm great.fm. 'There will be a great party.' While distinguished by cluster five, the proximity of the older age groups to the 3;0 to 5;0 age groups indicates that the former too use the common patterns of clusters 1 and 2: initiated and response utterances with first person (plural and singular) and second person pronominal subjects, in the past, present and future tenses (as in (22), and (25)-(26)): (22) axshav nikax balonim. now will-take.1.pl balloons. 'Now we'll take balloons.' That is, the oldest age range in the current data shows a non-restricted use of the grammatical subject category in conversation, appropriately utilizing specific usage patterns for the task they are most suitable for.

Usage patterns as Discourse Profile Constructions
Pairings of usage patterns and discourse functions, like the pairings revealed in our results, are described by Dattner (2015;2019) as constituting Discourse Profile Constructions. Discourse Profile Constructions (DPCs) are emergent form-function correlations that consist of multiple sources of formal and functional information, conventionally pairing a usage pattern of clauses with a unified construal or discursive function. DPCs are claimed to be the basic clausal form-function correlations in the language, and to constitute the main usage-based source from which grammatical meaning emerges (Dattner 2019).
Three DPCs emerged from the data regarding the use of grammatical subject in conversation. Note that the labels for the DPCs are ad-hoc in that they are built bottom-up. That is, it is not the case that there is a predefined inventory of DPCs in the grammar of the language, such that the child needs to acquire it. Rather, DPCs are probabilistic clusters of similar usage events. Thus, only if a set of tokens are linked to a single discursive function frequently enough, and this function has discriminatory and discursive importance, a DPC will emerge. Importantly, these DPCs are relevant to a conversational type of linguistic interaction. In other types of linguistic data (e.g., in written expository texts, or in child directed speech by parents) different DPCs might emerge with regard to the use of grammatical subjects, that will join the heterogeneous category of subjects in Hebrew.
The first DPC is the joint action planning DPC, emerging from the tokens of cluster one: initiated utterances with a pre-verbal, first person plural pronominal subject (S/A) and a future tense verb, as in: (23) bo'u na'ase hacba'a. come.2.pl will.do.1.pl voting. 'Let's vote.' This is a pattern of grammatical features (constituting both form and function), that serves a discursive function within conversation: it is used to plan joint actions, with reference to the interlocutors as a unified group of agents. According to the present corpus, this DPC is frequently used only from age 3;0 and up.
The second DPC is the basic conversational narrative DPC, emerging from the tokens in cluster two: pre-verbal 1 st and 2 nd person singular pronominal subjects (S/A), in present and past tense, in both initiated and response utterances: This pattern is used to serve the communicative function of describing current actions, commenting on past events, and telling narratives, while identifying each of the interlocutors as an agent and keeping continuous conversational topics. That is, it is a usage pattern composed of concrete usage events in which these grammatical features and functions corresponded with each other. The central location of this usage pattern on the CA map in Figure 2 is an indication to its wide use by all age groups in the data, except for the youngest 2 year olds. That is, it is a basic conversational pattern of initiation and response, concerning the interlocutors themselves (in 1 st and 2 nd person singular), for commenting and narrating in present tense, or a hypothetical past tense (showing a well developed inflectional system and the use of adverbials such as suddenly or one moment to enhance the pretend situations): The third DPC identified here is the conversational presentation DPC, emerging from the tokens of cluster five: post-verbal lexical subjects (mainly S) with verbs in the present tense: li, le-Ana'el ve-le-Ofri yesh manuy le-hacagot. to.me, to-Ana'el and-to-Ofri be subscription to-shows. 'I, Ana'el, and Ofri have a theater subscription.' This pattern is used to introduce new topics to the discourse and to refer to less accessible entities outside of the conversational arena. Importantly, we show the learning curve of this DPC: The youngest children use it mostly with repeating utterances (cluster 4). That is, they are not introducing new entities, but rather repeating the presentation of such by their peers (as in (12) above). The mid-range children use this DPC mainly in resonating utterances (cluster 3). That is, while not repeating, they still re-use an immediately available structure: A: yesh li gam et ha-kadur ha-ze. be to.me too acc the-ball the-this. 'I also have this ball.' B: yesh lexa gam et ha-sha'on.
be to.you too acc the-watch. 'You also have the watch.' Finally, the older groups appropriately use this DPC in initiated utterances (cluster 5). That is, once presented to the discourse, an off-stage entity (the star in (29) bellow) becomes accessible and is further referred to using a pronominal object.
(29) A: sheli kvar ne'ebad ha-koxav. mine already was.lost.3.ms.sg the-star. 'My star is already gone.' B: ulay kvar nimca oto. maybe already will.find.1.pl acc.it. 'Maybe we'll find it.' Summing up the discussion of our results, we see that each usage pattern serves a particular communicative function. Importantly, note that each function can potentially be served by other grammatical features as well. However, we show that each function is prototypically served by a single usage pattern, which is a combination of morphological, syntactic, semantic and pragmatic information. This is a concrete, discourse-oriented, probabilistic view of the links between form and function in language, made possible by assuming a similarity based categorization of grammatical notions realized as Discourse Profile Constructions.
The associations between DPCs and age groups demonstrate the development of the grammatical subject category as a heterogeneous, exemplar-based category (Abbot-Smith & Tomasello 2006) vis-a-vis conversational skills: At the young age of 2;0-2;6 years, Hebrew-speaking children hardly relate to their interlocutors, and hardly keep track of a coherent topic, hence the use of the usage pattens of clusters 3 and 4. That is, they do not engage in conversation. Conversation begins to sprout at the age of 2;6 years according to the present corpus, with the mid-range ages referring mainly to themselves and their interlocutors as subjects (cluster 1, and to some extent cluster 2). Full command of conversation appears only in the older groups of 5;0-8;0 years old, realized as a non-restricted use of usage patterns, and command of all types of subjects within the grammatical subject category. The older children in the corpus talk about real as well as hypothetical events, in which both them or their interlocutors may be realized as subjects, as well as entities which are not part of the speech event itself. This suggests a less restricted discourse, containing reference to the interlocutors and to the off-stage scenario, presenting entities to the discourse and maintaining them, initiating conversations as well as responding, planning, commenting, and telling narratives. The present study thus shows that distributional information of multiple factors is important for categorization, facilitating the learning of the grammatical subject category (c.f. Romberg & Saffran 2010). Du Bois, Kumpf, and Ashby (2003) note that "[t]o understand grammar, find out how it is used." In order to account for the development of the grammatical subject in Hebrew, the present study shows how it is used in conversation, in the peer talk of children 2-8 years old. We show that seeking a homogeneous abstract notion of grammatical subject is not a felicitous approach; rather, we identify three Discourse Profile Constructions of subjects in Hebrew, corresponding to three different usage patterns and three discursive functions. We show that these DPCs are not evenly distributed across age groups, but rather that each pattern is correlated with a particular age range, ordered according to a developmental path of conversational and grammatical skills.

Conclusion
Language can be metaphorically conceptualized as a toolbox consisting of unique formal devices used to convey a message, discriminating a particular construal of a state of affairs from all other possible construals. As tools are extensions of the body, enabling it to accomplish tasks it is not physically sufficient for, so do linguistic usage patterns to the mind. Language development, thus, is learning to properly use each linguistic device for the purpose it is most suitable. Continuing with the toolbox metaphor, a novice user might use a hammer for driving both a nail and a bolt into a wall, while a professional user will use a hammer for the former and a screwdriver for the latter. That is, a temporary equilibrium state of learning is reached when a user (a speaker in our case) uses a variety of tools for a variety of tasks. The present paper shows that different patterns of linguistic forms conventionally correspond with different conversational functions to form Discourse Profile Constructions, and that young children use a small set of patterns for all tasks at hand, while older children seem to have mastered a wider range of tools, better fitting each usage pattern for a different task.