Where to from here? Increasing language coverage while building a more diverse discipline

Our original target article highlighted some significant shortcomings in the current state of child language research: a large skew in our evidential base towards English and a handful of other Indo-European languages that partly has its origins in a lack of researcher diversity. In this article, we respond to the 21 commentaries on our original article. The commentaries highlighted both the importance of attention to typological features of languages and the environments and contexts in which languages are acquired, with many commentators providing concrete suggestions on how we address the data skew. In this response, we synthesise the main themes of the commentaries and make suggestions for how the field can move towards both improving data coverage and opening up to traditionally under-represented researchers.

In our original target article, we reported fairly sobering statistics concerning both the linguistic coverage of child language research and the geography of research production. In our study of the four major international child language journals (Journal of Child Language, First Language, Language Acquisition, and Language Learning and Development), we found: (i) these journals have published data on an estimated 103 languages, with a large skew in the number of papers published in favour of English and other closely related Indo-European languages and (ii) most research is produced by researchers based in wealthy countries in the Global North. While we also reported on the range of topics covered and monolingualism versus multilingualism, the commentaries on our article focused exclusively on issues related to language coverage and linguistic and researcher diversity. This is what we will focus on here.
Before responding to the commentaries, we first express our gratitude to the authors of the 21 commentaries. It is a privilege to discuss these issues, which go deep to the core of our work. That so many of our colleagues took the time to thoughtfully engage with the article shows that the field is ready to tackle these critical issues. We first clear up two points that recurred in several commentaries and follow with a discussion of the importance of taking linguistic diversity into account in child language research. We then turn to concrete solutions to address the related problems of linguistic and researcher diversity, synthesising the many excellent suggestions made by our commentators.

There's more data out there
Several commentators (Arunachalam et al., 2022;de León, 2022;Hellwig, 2022;Henke, 2022;Lillo-Martin & Hochgesang, 2022;Slobin, 2022) noted there is published data on acquisition in languages not found in the four journals we surveyed, which would increase the number of languages for which we have data, a point on which we only briefly touched. This is very true. We limited our search to the four journals to make our project more tractable (i.e. it was an easily accessible 'sample' of the field): it would have been overly ambitious to conduct an exhaustive search and so we limited ourselves to the four major international journals in the field. However, we cannot pretend these journals have not contained systemic biases that have prevented work on lesser-studied languages from making it into their pages (Arunachalam et al., 2022;Paradis, 2022;Singh, 2022), and so we applaud the efforts of commentators like Henke for conducting exhaustive searches within language families (Chee & Henke, in press). Following Arunachalam et al. (2022), one productive way forward would be to create similar overviews combined with open searchable and editable bibliographies of source materials for all language families for which there is child language data, a kind of stocktaking of sorts that could breathe new life into work that may otherwise be lost to changes in publishing trends or simply to the ravages of time (for an example of a review of signed language and gesture research in Australia that contains an editable bibliography, see Green et al., 2022). This would also foster the inclusion of research written in languages other than English, sidestepping another clear limitation in our analyses (Arunachalam et al., 2022;de León, 2022).
It would also be remiss of us to not address an important point raised by Slobin (2022): our estimate of the proportion of language coverage crucially did not take into account that many languages spoken today are not being acquired by children. Sadly, we have lost the opportunity to study the acquisition of a great number of the 7000 or so languages currently spoken today because the number of languages being transmitted to children has steadily reduced across time. However, even if the number were halved to 3500 and we revised our language coverage estimate upwards to account for languages not covered in the four journals to a generous 300, we would still only have data on around 300/3500 = 8.5% of languages currently being acquired by children. Unfortunately, the data skew towards English and closely related Indo-European languages would not change.

For whom do we conduct research?
A second theme that ran through several commentaries was an apparent tension between the scientific enterprise of child language research versus the beneficiaries of our work. This was most explicitly articulated by Henke (2022), but was also addressed by Foushee and Casillas (2022) and Havron et al. (2022). We agree with Henke on the importance of broader impact of research, especially if that research is on low-resource languages, and particularly if the language is spoken by traditionally marginalised groups. In these cases, there is a crucial need for a stronger connection between basic research, of which members of the child language community (broadly construed) are the primary producers, and the application of that knowledge by educational and health professionals. We have both worked on these issues in different capacities (e.g. Amora et al., 2020;Freire et al., 2022). In our experience, the most productive contribution we as child language researchers can make in applied contexts is to provide an understanding of how acquisition proceeds in a language, which can inform the development of speech and language assessments or educational materials.
Gaining a first approximation of how acquisition proceeds is the goal of the Sketch Acquisition Project (Hellwig et al., 2021), with an explicit outcome of an acquisition sketch being the production of community materials (the sketch idea itself being indebted to the pioneering work of Dan Slobin and colleagues; see also Pye, 2021Pye, , 2022. Some commentators were mildly cautious about the scientific value of this kind of language description (Vihman, 2022; see also Christiansen et al., 2022a). We agree that careful, typologically-informed crosslinguistic comparisons are crucial to scientific progress in the field (see next section), but we must not lose sight of the cultural importance of a language to its community. There are multiple beneficiaries of our work, and once we see language as a crucial determinant of identity, wellbeing and societal participation (among other things), in addition to being a repository of cultural knowledge (as understood in Language Documentation, Hellwig, 2022), the scientific value of a language, for want of a better term, is only one consideration when deciding on where to direct our research efforts. In the current United Nations Educational, Scientific and Cultural Organization (UNESCO) Indigenous Languages Decade (2022-2032), our field can strive to play an important role in the preservation, revitalisation and support of Indigenous languages worldwide (https://en.unesco.org/idil2022-2032).
This discussion raises a broader issue -the nexus between description and theory (Arunachalam et al., 2022;Karasik & Kuchirko, 2022;Pye, 2022). As a field, we tend to privilege theory over description. There is very good reason to hold theoretical advance in high regard: theory is the foundation of science, synthesising a collection of (sometimes seemingly disparate) facts into a coherent series of concepts and principles that allow prediction. We cannot do without it. But a theory is nothing without an adequate and representative set of observations, and it is here that we are still lacking (for a parallel discussion in the evolutionary social sciences, see Clark Barret, 2020a). Good descriptive data have always had a major role in child language research, with major initiatives like the Child Language Data Exchange System (CHILDES) (MacWhinney, 2000) allowing us, among other things, to directly observe the input data children have at their disposal to acquire language, and the range of unique problems they must solve along the way (as in the Eegima demonstrative system, Sagna et al., 2022). Further description better maps the problem space facing the child and gives us a first look at how they navigate it, providing crucial data for further refinement of theory, which is sorely needed in the psychological and cognitive sciences (see Scheel, 2022). 1 We are encouraged by the responses of Arunachalam et al. (2022) and Paradis (2022), who both identified the need for journals to widen the scope of what counts as a publishable contribution.

Harnessing cultural and linguistic diversity
Many of the commentaries underlined the main point of our article -crosslinguistic research pushes the field further in better understanding the acquisition process. As Karasik and Kuchirko (2022) point out, we are not alone in our over-reliance on data from the Anglosphere and Europe. Their commentary, drawing on reflections from the field of motor development, reveals how cultural conceptions of development and childhood influence the child's environment, which in turn challenge generalisations made on the basis of culturally-restricted data.
Commentaries on child language echoed this point. The target language and the culture it is embedded within influence the acquisition process from the earliest observable point in development. Vihman (2022) provides a striking example of this, where Mandarin-acquiring children show a seemingly rare pattern of syllable substitution during the single-word period; a pattern that is nonetheless perfectly aligned with experience of their input language. Another comes from Chen and Narasimhan (2022), whose commentary addressed a feature of language that is chronically understudied in acquisition -prosody. Their discussion of Chen's (2018) work on how children acquiring different languages use prosody to mark focus is a model for experimental investigations of how typological variables can influence development. Standardised methods and tasks are incredibly important when attempting to make crosslinguistic and cross-cultural comparisons (although they may not be possible in all cultures, Karasik & Kuchirko, 2022; see also Hellwig, 2020). Toolkits like those developed by Chen and those regularly produced by the former Language and Cognition Department at the Max Planck Institute for Psycholinguistics and which produced a wealth of crosslinguistic data would be one productive way to bring together researchers from diverse linguistic and cultural backgrounds to increase data coverage for targeted components of language (see http://fieldmanuals.mpi.nl/). Some recent examples of this for acquisition are Deen et al. (2016) and Gagarina and Bohnacker (2022).
The commentaries by Berman (2022), Edward (2022), Hellwig (2022), de León (2022), Lillo-Martin and Hochgesang (2022), Sagna et al. (2022) and Sultana (2022) all discussed specific issues concerning languages or language families they have worked on, revealing the rich insights we gain when we study what Pye (2022) calls 'the dark matter of the linguistic universe' (p. 799). Together these commentaries highlight two crucial variables we are in danger of taking for granted when we limit ourselves to English and a handful of other Indo-European languages: the language environment (see also Foushee & Casillas, 2022) and linguistic diversity. Language socialisation research, such as the research conducted by de León (2022) and by pioneers like Ochs and Schieffelin (1984), reveals the diverse nature of children's early communicative experiences and how it may or may not influence development (see also Casillas et al., 2020). There is a shortage of studies in this area, and as such we lack a comprehensive understanding of the range of children's early communicative experiences. Such studies are crucial: social experiences are the bedrock upon which language is built and understanding how their diverse nature influences early language is core to any complete account of acquisition.
The discussion of linguistic diversity shows the value of attending to typological diversity in building a more accurate picture of acquisition processes. Sultana's (2022) work on Bangla shows us that what were interpreted as optional infinitives in Germanic languages might be grammatically admissible 'near-misses', thereby revealing the learning mechanisms guiding acquisition (see Freudenthal et al., 2015). Berman's (2022) pioneering work on Hebrew reveals the exquisite skill with which children map from form to function in a language-specific manner, while also revealing that many aspects of acquisition have a prolonged developmental course. Research on signed languages, which are themselves highly diverse (e.g. the use of legs as articulators and the use of larger signing space in African signed languages, Edward, 2022), highlights the flexibility of the language faculty and forces us to think carefully about the role of input in acquisition (Lillo-Martin & Hochgesang, 2022). Both Rochanavibhata and Marian (2022) and Yip and Matthews (2022) remind us that research on acquisition in multilingual contexts reveals the complex nature in which speakers and signers master the linguistic and cultural repertoire of their languages. Multilingualism research demonstrates how linguistic systems can combine to form integrated joints across the two (or more) languages (i.e. crosslinguistic transfer, see Serratrice, 2013), with genetic relatedness (or lack thereof) being no barrier to this process.
The commentaries by Pye (2022), Slobin (2022) and Christiansen et al. (2022a) dig deeper into the necessary connection between linguistic typology and child language research. Slobin (2022) and Christiansen et al. (2022a) point to the importance of making comparisons at multiple levels: crosslinguistic (i.e. inter-typological comparisons across language families), intra-typological comparisons (i.e. comparisons within language families) and intra-language (e.g. studies of dialect variation, or indeed, the acquisition of mixed languages in multilingual contexts, see O'Shannessy, 2015). Christiansen et al. (2022a) argue that intra-typological comparisons of well-known languages will yield important insights into the acquisition process, a point on which few would disagree. We whole-heartedly agree these are valuable, either when the languages are well described and have a significant amount of existing child language research, as in the Continental Scandinavian languages, or when a critical mass of fieldwork allows careful comparison, as has been done in Mayan languages (de León, 2022; Pye, 2022 see also Foushee & Casillas, 2022). However, it is important to remember that confining ourselves to wellstudied languages, which we established in our target article were those that originate primarily in Western Europe, limits the phenomena we can study. Not moving beyond these well-studied languages risks building models of acquisition that are not crosslinguistically applicable.
The prediction bears fruit in Pye's (2022) analysis of the six theoretical articles recently published in a 2021 special issue of Journal of Child Language, where only one paper addressed acquisition in lesser-studied (and typologically different) languages (Arnon, 2021). The reason for this likely lies in the field's scientific quest to identify common underlying mechanisms for acquisition. No doubt, whatever these mechanisms look like, they will be common to all humans, since we all share the same neurological adaptions for language. However, in the broadest of senses, learning algorithms will produce different solutions depending on their input, and so the developmental pathway through a given language will be language-specific (see Berman's [2022] point about elongated learning trajectories in Hebrew). Taking typological diversity into account means that nativist accounts need to specify a sufficiently flexible innate toolkit, and more generally, test their assumptions about what is universal to language. Learningbased accounts, on the other hand, must explain how learning mechanisms interact with language-specific input, and make the prediction that representations for language will be language-specific. Thus, while different approaches place the burden of accounting for diversity at different points of the acquisition process, it is a key desideratum nonetheless. The framework of Christiansen et al. (2022b) usefully maps out how comparisons at multiple levels of difference can allow us to take advantage of the 'living laboratory' of linguistic diversity. However, to build comprehensive theories of acquisition the field will need to increase data coverage from the current low base, an issue to which we now turn.

Moving forward
We are at a point in the history of the cognitive and psychological sciences where we are critically evaluating the degree to which our disciplines represent the entire spectrum of human experience (e.g. Cheon et al., 2020;Clark Barrett, 2020b;Henrich et al., 2010;Medin et al., 2017;Nielsen et al., 2017;Roberts et al., 2020;Singh et al., 2022;Thalmayer et al., 2021). The repeated finding of studies in this space is that we have fallen short of building a representative set of culturally and linguistically diverse research findings. In our target article, we argued that, given the rapid rate at which languages are disappearing, there is some urgency to broaden language coverage while we still can. In this final section, we sketch some pathways forward, drawing upon many constructive suggestions offered by commentators.
As some commentators pointed out (Foushee & Casillas, 2022;Havron et al., 2022;Slobin, 2022), the number of languages for which we have data is so low that the solution cannot simply be to roll up our sleeves and get to work. Indeed, borrowing from Havron et al.'s (2022) commentary, any solution must be SMART (i.e. Specific, Measurable, Attainable, Relevant and Time-bound). Taking into account Foushee and Casillas' (2022) point about the different ways in which one may define child language research as diverse, we add that any solution must be flexibly SMART, and that there will likely be many pathways to achieving greater language (and cross-cultural) coverage. We consider the following suggestions, which draw upon many commentaries and are therefore already the work of many, to be the beginning of a community-wide conversation about the direction of the field.

Diversity as a guiding theoretical construct
We cannot get data from every language, and for many researchers, collecting data from understudied languages may not be feasible (Christiansen et al., 2022a), but we can avail ourselves of the cross-cultural and crosslinguistic literature, both within the field and in related disciplines. Pye (2022) makes the important suggestion that child language courses devote some time to typology, to which we add that a firm foundation in anthropology would also be useful. As researchers, we should use cross-cultural and crosslinguistic facts to constrain our theories and the interpretations of our data. Asking the simple question how does this idea work cross-culturally and crosslinguistically? costs nothing but may serve as a good mechanism to avoid proposing overly narrow theoretical concepts or over-interpreting data. Following similar suggestions in adjacent fields (Nielsen et al., 2017;Singh et al., 2022; for an extended discussion see Simons et al., 2017), we encourage journals to consider asking authors to explicitly consider the generalisability of their results given the target language(s) and language-learning environment, regardless of whether the work is on a well-studied or understudied language.

Increasing data coverage
Even if collecting data from every language currently being acquired by children is not a realistic goal, it should not dampen efforts to increase our current data coverage with the specific aim of making it more representative of socio-cultural and typological diversity. As we noted in the target article (see also Vihman, 2022), this is not an issue that is easily separated from our lack of diversity in author country affiliation. In this section, we outline suggestions for increasing data coverage, while leaving more specific suggestions for increasing representation until the next section. Suffice it to say, we consider the following to be best implemented within a more diverse discipline where native speakers and signers play a significant role in the research process.
In the first instance, it will be important to identify and set goals to investigate relevant socio-cultural and typological dimensions. Socio-cultural variation will need to be treated with nuance, avoiding blunt dichotomies that frequently pervade the psychological sciences (e.g. WEIRD vs Non-WEIRD, Collectivist vs Individualist) but which are unlikely to adequately capture important details present in a child's socio-communicative environment (e.g. see Clancy & Davis, 2019;Singh et al., 2022). Linguistic typology gives us a set of dimensions on which languages vary, and it would be a fruitful exercise to take stock of where our knowledge is lacking. We provided some examples in our target article (e.g. tone, polysynthesis), and Slobin (2022) suggested more (head-marking vs dependent-marking, verb-framed vs satellite-framed, verb specificity). Concerted efforts to synthesise and add to typological gaps in our knowledge, which has already occurred for features like ergativity (Bavin & Stoll, 2013), would be a welcome contribution to the literature.
These goals primarily pertain to the scientific enterprise, which will involve both individual and coordinated efforts. An important question concerns the types of data that are needed. As we pointed out in our target article, an important first step in an understudied language would be to collect naturalistic data because it allows the simultaneous observation of many variables in situ. Here, we recommend language documentation approaches to acquisition as a first step (Hellwig, 2022;Hellwig et al., 2021;Pye, 2021Pye, , 2022, with the hope that some sketch corpora may evolve into bigger projects (Vihman, 2022). Comparable elicited data derived from materials adapted to a language in a culturally sensitive way, such that they yield externally valid data, are also important (e.g. Chen, 2018;Deen et al., 2016;Gagarina & Bohnacker, 2022). This could be one way in which the field could develop large-scale collaborations (e.g. Katsos et al., 2016;The Many Babies Consortium, 2020). We cautiously note that such endeavours should be sensitive to the many issues around testing in a new cultural context, with the need to flexibly adapt methods (see Hellwig, 2020;Karasik & Kuchirko, 2022;Singh et al., 2022; for an extended discussion on problems in assuming the use of identical research methods across cultures will produce equivalent and externally valid data, see Kline et al., 2018). As a field, we will need to be mindful of such issues, and not automatically make their negotiation a barrier to dissemination if the work does not meet the strict standards of laboratory work.

Researcher inclusion and research dissemination
We were heartened to read about the many ongoing or soon-to-be implemented initiatives to increase research on understudied languages within the pages of Journal of Child Language (Paradis, 2022) and Language Acquisition (Arunachalam et al., 2022). Goals like increasing linguistic and researcher diversity require a good deal of vision and nuance in decision-making, and in this sense we are in safe hands. However, Singh's (2022) cogent analysis of intersectional visibility and the role of power and privilege in the discipline reminds us that we will need to navigate many potholes in the road ahead. The reality is that the existing structure of the discipline renders work on understudied languages less visible and excludes many.
Some of these barriers force us to reflect upon unconscious biases. Unfortunately, we too have been on the receiving end of the cultural misattribution that Singh (2022) describes, where our work on understudied languages has been unfairly criticised on grounds that would not be levelled at work on more commonly studied languages. To reiterate Singh's point (for more discussion see Causadias et al., 2018;Kline et al., 2018), cultural misattribution is a bias to view work on well-represented groups as reflecting 'basic, acultural aspects of development', whereas work on under-represented groups is 'often invoked as evidence for sociocultural variation rather than for fundamental processes ' (p. 815). This uneven playing field can be perpetuated if research production is dominated by a privileged few, as we found in our analysis of researcher affiliation. It is unlikely that we can completely flatten the hierarchy of research production, but there are several things we can do to promote greater inclusiveness.
The first domain, over which we have the most control, is in our professional organisations, conferences and journals. As Singh (2022) notes, the editorial boards of the journals we sampled contain mostly scholars from the United States and United Kingdom/ Europe. An easy way to increase visibility is thus to create more diverse editorial boards, thereby distributing the decision-making across a wider range of experiences and perspectives. Conference organisers could prioritise diverse programmes by promoting papers on understudied languages and/or by under-represented groups to more prestigious presentation spots, thereby giving greater exposure to the work. 2 Both of these suggestions are achievable short-term goals.
As we pointed out in our target article, and which was also discussed by Vihman (2022) and Singh (2022), initiatives that attract diverse students and language workers into child language projects are the best ways to improve linguistic and researcher diversity. This requires a multifaceted approach and thus the input of many, and we can only hope to contribute to this conversation here. One way is to promote the discipline in institutions in countries that do not traditionally conduct child language research, or whose existing research, for many of the reasons discussed above, is not promoted more widely in international journals. Horizontal partnerships between universities in countries that have traditions of child language research and universities or institutes in countries where the research tradition is less strong would be a mechanism to open up the field. This solution could be straightforwardly implemented in cases where local researchers could (or already do) collect data on a national or majority language (e.g. Thai; Rochanavibhata and Marian, 2020, or India's scheduled languages). That is, in cases where languages have state support and there are institutions that conduct research.
However, there are many languages spoken by minority and often marginalised communities that do not have state support, and which stand to benefit more from knowledge of how children acquire their languages and the contexts in which they acquire it. Havron et al. (2022) raise the important issue of not falling into colonialistic traps in our attempts to promote research on understudied languages. There is a growing literature on the decolonisation of linguistics (e.g. Charity Hudley et al., 2019, and commentaries), which we can only touch on here but which we highlight because of its importance. Our discipline is currently dominated by the Global North, which mirrors the dominance of the West and Western epistemologies in academic research. There is a danger that, through the power and privilege we possess because we mostly come from the Global North (with all the trappings that come with living in a wealthy society), working with marginalised communities can replicate patterns of colonialism. For example, many Indigenous communities who have worked with non-Indigenous academics view aspects of linguistic fieldwork as a form of epistemic violence because their cultural knowledge has been taken and crystallised into an abstract form that is almost always inaccessible to them (e.g. a grammar), with often very few tangible benefits flowing back (see Woods, 2022). This replicates the uneven power relations that permeate the lives of marginalised communities, adding another layer of disenfranchisement. The same argument can be made for any minority or marginalised group (e.g. immigrant communities).
Creating equal, horizontal relationships with language communities is key to conducting fair and equitable research, which translates to a different set of research practices and norms than might be familiar to many child language researchers. This includes but is not limited to (i) jointly creating research questions that respond to community needs or concerns about children's language, (ii) respecting, understanding and incorporating the epistemic traditions of the community into the work (see Singh, 2022) and (iii) in a world where data sharing is becoming the norm, understanding that communities often have long-standing norms about the protection of cultural knowledge, and that placing restrictions on who accesses that knowledge may be one component of the process of self-determination (Eira, 2007;Woods, 2022). A closer alignment with the community may mean changes to the research process, but in the best of cases could also respond to the needs of the community, thus increasing the impact of the research (Henke, 2022), while also beginning to redress our diversity problem through the training of native-speaker researchers (for an extended discussion see Medin et al., 2017).

Incentivising research on understudied languages
Finally, it is unlikely that we will achieve greater breadth of language coverage and researcher diversity without changing the reward structures of the academy (Singh, 2022). The reality of research on understudied languages is that, in many cases, it can require greater amounts of work than conducting research on well-studied languages. There are many reasons for this. For example, research in remote field contexts can be expensive and time-consuming, particularly when working with naturalistic data. Even in more accessible places, lack of language resources (e.g. existing corpora, standardised assessments) can place limits on data interpretation. Thus, in addition to the acknowledgement of structural biases present in publishing, there are clear disincentives for individuals to work on understudied languages. Therefore, beyond promoting the work in our journals with initiatives like those outlined by Arunachalam et al. (2022) for Language Acquisition, we must strive to level the playing field. At the institutional level and for funding bodies, the added value of work on understudied languages must be fed back to decision-makers. One way to do this could be official statements by professional associations, like the International Association for the Study of Child Language and the Society for Language Development, which outline the scientific and practical need for this work, which could be used by researchers and community workers in job, tenure and funding applications. If possible, targeted grants for data collection, like those offered by other professional bodies, could be provided to individuals and groups to collect data and create language materials. In our journals, we could create a separate format of papers that describe datasets on understudied languages, thereby providing more opportunities for researchers to benefit from their hard work in a language that administrators understand.

Prospects
In our original target article, we concluded our abstract by saying that 'despite a proud history of crosslinguistic research, the goals of the discipline need to be recalibrated before we can lay claim to a truly representative account of child language acquisition'. Thanks to the vision and hard work of many of the field's pioneers, we know a good deal about a range of diverse languages. However, the analyses in our target article revealed a deeper problem not simply concerning the number of languages for which we have data, but also the volume of work that is conducted on well-studied languages compared to lesser-studied languages, which is linked at least in part to a lack of researcher diversity in the field. In this response to the commentaries on our article, we have sketched a broad roadmap that we hope will refocus the field's interest in linguistic diversity, and which we also hope will open up the discipline to a more diverse set of voices. We eagerly anticipate the progress to be made if we are collectively able to reimagine the field in this direction.