Genetic pathways involved in human speech disorders.

Rare genetic variants that disrupt speech development provide entry points for deciphering the neurobiological foundations of key human capacities. The value of this approach is illustrated by FOXP2, a transcription factor gene that was implicated in speech apraxia, and subsequently investigated using human cell-based systems and animal models. Advances in next-generation sequencing, coupled to de novo paradigms, facilitated discovery of etiological variants in additional genes in speech disorder cohorts. As for other neurodevelopmental syndromes, gene-driven studies show blurring of boundaries between diagnostic categories, with some risk genes shared across speech disorders, intellectual disability and autism. Convergent evidence hints at involvement of regulatory genes co-expressed in early human brain development, suggesting that etiological pathways could be amenable for investigation in emerging neural models such as cerebral organoids.


Introduction
Following decades of speculation over genetic contributions to distinctive human communication skills, advances in molecular methods enabled scientists to begin identifying critical genomic factors [1]. Much research so far focused on linkage mapping and association screening of developmental speech and language impairments, revealing that while such disorders have a complex genetic architecture, a significant subset of cases involve rare high-penetrance variants disrupting single genes [2]. Here, we discuss the importance of rare variants as entry points for studying neurobiological pathways, describe how next-generation sequencing and genedriven studies are transforming this field, and argue that emerging cell-based models of human brain development will be crucial for a fuller understanding of how gene disruptions yield speech disorders.
Molecular perspectives on speech -the example of FOXP2 FOXP2 was the first gene for which rare variants could be implicated in a monogenic speech disorder (primarily characterized by childhood apraxia of speech; CAS; Table 1). Since the initial report describing a causative point mutation in a multigenerational family, as well as a translocation disturbing the gene in an independent case [3], different genetic disruptions of FOXP2 have been identified in multiple cases of speech/language disorder, both inherited and de novo [4,5]. The discovery of FOXP2 led to an array of studies of its functions in the brain ( Figure 1) [2,5].
FOXP2 encodes a transcription factor with a high degree of evolutionary conservation (both for protein sequences and neural expression patterns), facilitating functional analyses in animal models [6]. Conditional knockout and targeted knockdown/overexpression strategies in mice and birds are being used to dissect roles of FoxP2 in different parts of the brain (Figure 1). Studies of mouse models build on a well-established genetic toolkit, as well as rich literature on brain development, and can therefore teach us about gene function for conserved molecular mechanisms and behaviors. Mice are known to produce sequences of ultrasonic vocalizations, but their abilities to learn these appear limited, and the relevance of such behaviors for gaining insights into biology of human speech is much debated [7]. In contrast, although birds are more distantly related to humans than are mice, some species of songbird have sophisticated skills for auditoryguided vocal learning, which involves integration of auditory processing and motor learning, showing parallels to processes underlying speech. Moreover, there is evidence that birdsong and speech are coded in somewhat analogous brain circuitries [8].
cortical Foxp2 show abnormalities in tests of social behavior and cognitive flexibility [13,14]. Single-cell transcriptomics in cortical-specific mouse knockouts suggests that the gene contributes to development and function of dopamine-receptor expressing neurons [13].
Within the rodent striatum, Foxp2 is predominantly expressed in D1-receptor-positive medium spiny neurons; studies of global heterozygous knockout mice revealed effects on inhibitory presynaptic strength of these cells, implicating the gene in excitation/inhibition balance of pathways underlying motor-skill learning [15]. Striatalspecific Foxp2 knockouts show increased variability in skilled motor behaviors, assessed via operant lever-pressing tasks [9 ]. Viral-based manipulations (knockdown versus overexpression) of this brain region in adult mice demonstrate post-developmental roles of Foxp2 in regulating corticostriatal synapse functions and associated behaviors [16]. Moreover, knockdown/overexpression experiments targeting Area X (a striatal nucleus involved in vocal production learning of male zebra finches) underline the importance of this gene for learning of song by juvenile birds [17 ], and its maintenance in adulthood [18]. Regarding cerebellar functions, mice with Purkinje-cell specific knockouts of Foxp2 display slower sequencing in lever-pressing tasks, and reduced performance on tests of skilled locomotion. In vivo electrophysiology indicates that Foxp2-deficient Purkinje cells have increased intrinsic excitability, and show abnormal firing properties during limb movement [9 ].
According to the latest human cell-based studies (Figure 1), FOXP2 is part of a broader interacting network of brain-expressed transcription factors [19 ], promoting pathways for neuronal maturation via chromosomal remodeling, while repressing genes that would maintain a neural progenitor state [20 ]. Of the molecules known to be regulated by and/or interact with FOXP2, many are themselves associated with brain-related disorders [19 ,20 ]. Therefore, the FOXP2 interactome could provide useful inroads for defining and characterizing neurobiological pathways involved in speech development. An example is the close paralogue FOXP1, which is coexpressed with FOXP2 in a subset of brain structures, where the transcription factors can heterodimerize to potentially co-regulate targets. Rare variants disrupting human FOXP1 cause a phenotype that is broader and more severe than FOXP2-related disorder, including features of autism and/or intellectual disability (ID) [21]. Human cell-based analyses of an etiological missense variant in the DNA-binding domain of FOXP1, equivalent to the most studied mutation of FOXP2, showed comparable functional effects, suggesting that it is the differences in neural expression patterns of the two paralogues that account for distinctive phenotypes of the associated disorders [22]. Taken together, these molecular studies uncover distinct roles for FOXP2 in different brain regions that implicate the gene in development and function of cortico-striatal and cortico-cerebellar circuitries [9 ,10-16,17 ,18,19 ,20 ], converging with identification of subtle cortical, striatal and cerebellar abnormalities in patients with FOXP2 disruptions [10,11]. For example, integrating data from different model systems, a recurrent finding is that striatal FoxP2 helps modulate neuronal plasticity involved in complex motor skills of various kinds (locomotor behaviors, manual skills and/or vocalizations) [9 ,15,16], consistent with cellbased studies showing roles of this transcription factor in neuronal differentiation and maturation [20 ]. Hence, the development, plasticity and maturation of the relevant circuits may be crucial for proficient speech, not only during early development [9 ,15,17 ], but also at post-developmental stages [16,18]. Of note, FoxP2 is also expressed in other brain structures where its roles have been less well studied, including the thalamus [23] and amygdala [24]. Moreover, the demonstration that this transcription factor belongs within a strongly interconnected network With links to human speech disorder, and high conservation throughout the animal kingdom, FOXP2 has also received attention from the field of evolutionary biology.
One prominent focus has been on two amino-acid substitutions which occurred on the human lineage after splitting from the chimpanzee, and which are reported to affect striatal-dependent neurophysiology and behaviors when introduced into transgenic mice [25]. However, initial evidence of positive selection acting on intronic regulatory sequences of FOXP2 in recent hominin evolution [26] was not supported by subsequent systematic next-generation sequencing of global populations [27]. The details are beyond the scope of the current article, but are discussed further elsewhere (e.g. Ref. [28]).

Genomic screening of disorder cohorts identifies novel risk variants
As illustrated by FOXP2, initial insights into the roles of rare DNA variants in developmental speech disorders came from analyses of pedigrees with multiple affected relatives across successive generations [3]. In another example of this strategy, genetic mapping in families with multiple cases of persistent stuttering (Table 1) has implicated variants in genes involved in intracellular trafficking [29] followed up further using animal models [30].
The past decade has seen emergence of another way to identify high-penetrance variants disrupting human brain development, relying not on multiplex pedigrees, but instead based around affected probands with a normal family history. Large-scale genomic screening revealed that de novo mutations (disruptive DNA variants found in an affected child, but absent from unaffected parents) account for a substantive proportion of cases of severe undiagnosed developmental disorders, ID, and autism spectrum disorders (ASD), among other major human disease phenotypes [31,32]. For speech/ language traits, progress has lagged behind, in part because challenges for disorder ascertainment and diagnosis have precluded systematic recruitment of large well-phenotyped cohorts [2]. Lack of consistency in criteria for detecting and classifying childhood language disorders led to establishment of a special initiative, CATALISE, in which experts worked toward consensus for the field [33]. However, issues continue to be debated by some researchers/practitioners, for example over relevance of information on general cognitive performance when diagnosing language difficulties. For disorders severely affecting speech production, like CAS, best-practice diagnostic guidelines are available from professional societies, like the American Speech-Language Hearing Association (e.g. https://www.asha.org/Practice-Portal/Clinical-Topics/ Childhood-Apraxia-of-Speech/) but there remains considerable variation in how such terms are applied in practice, both Genetics of human speech disorders den Hoed and Fisher 105 Human phenotype Cellular models Animal models clinically and for research. Identification of rare causal DNA variants could also be enhanced incorporating data from quantitative phenotyping, as has proved effective for other developmental disorders [34].
So far, a handful of phenotype-driven genome-screening studies reported rare variants in speech/language disorder cohorts, including developmental language disorder (DLD, previously often referred to as SLI) and CAS (Table 1; Figure 2). With modest sample sizes, the number of causal variants identified is small. For example, the SLI consortium performed whole exome sequencing (WES) in 43 unrelated DLD probands from the UK, identifying a de novo missense variant in GRIN2A, inherited co-segregating stop-gain variants in OXR1 and MUC6, and putative pathogenic variants in a few other genes, including SRPX2 and ERC1, previously implicated in speech-related disorders [35]. WES was applied only to probands, not parents; testing for de novo/inherited status was performed post-hoc using Sanger sequencing. An earlier study of this cohort used SNP-array data to investigate copy number variants (CNVs) in 127 cases, 385 first-degree relatives and 269 population controls. DLD cases carried more CNVs than controls, and the CNVs were of higher average size, but this overall increased burden was mainly driven by common events [36]. Subsequent array-based analyses of 58 severe DLD probands, 159 relatives and 76 controls, from Sweden, found that rare CNVs tended to be larger in probands, and that (both for probands and siblings) more coding genes were affected [37 ]. 4.8% of cases (2 of 42 tested) carried de novo CNVs, and 6.9% (4 of 58) had clinically significant rearrangements [37 ], including two cases of 16p11.2 deletion, a CNV originally identified in ASD, which has since been linked to speech/language deficits [38].
The first whole genome sequencing (WGS) study of a speech disorder investigated nineteen probands from the USA with a primary diagnosis of CAS [39 ]. For nine probands, WGS could also be carried out for unaffected parents, leading to identification of de novo single-nucleotide variants disrupting CHD3, SETD1A and WDR5 in three cases. In the other ten probands (for whom parental DNA was unavailable) novel loss-of-function variants were found in KAT6A, SETBP1, ZFHX4, TNRC6B and MKL2. Through analyses of Brainspan RNA-sequencing data these CAS-related genes were found to belong to a co-expression module with high expression during early human brain development ( Figure 2) [39 ]. More recently, WES and WGS in 34 Australian probands ascertained for CAS identified twelve rare high-confidence etiological variants, nine of which were de novo [40 ]. In coexpression analyses using Brainspan, the ten genes highlighted in this later study (DDX3X, EBF3, GNB1, MEIS2, SETBP1, UPF2, ZNF142, GNAO1, CDK13, POGZ) showed 106 Molecular and genetic bases of disease Genes Speech/language phenotypes Speech/language impairments Human development

Expression
Neurobiological pathways Brain development

Genomic screening studies
Sequencing of cohorts with speech/language disorder strong overlap with the early brain-expressed gene network from the earlier WGS study of CAS, consistent with a shared pathway [39 ,40 ].

Insights from gene-driven studies
Genome screening of CAS/DLD cohorts uncover novel genetic disruptions linked to speech disorders, but initial evidence implicating a particular gene may come from one or perhaps a few index cases. Such findings are followed-up with a gene-first approach, using information-sharing across global networks of clinical geneticists to identify independent high-risk variants in that gene, ideally regardless of routes used for proband recruitment. These efforts increase understanding of the consequences of gene disruption, evaluating variant pathogenicity through in silico analyses and lab-based experiments (e.g. in cellular models), and gathering data on phenotypic profiles observed in people who carry them ( Figure 2).
Often when a mutation is found in an index case with a speech disorder, analyses of additional etiological variants through gene-driven studies reveal a variable spectrum of phenotypic consequences in different individuals, including those with more severe impairments affecting multiple cognitive domains, evidence of both heterogeneity and pleiotropy ( Table 2). For instance, following identification of a de novo microdeletion spanning BCL11A in a child with severe speech impairments and mild intellectual delays [41], heterozygous missense, nonsense, and frameshift variants were shown to cause a distinct syndrome involving ID (mild to severe; most cases showing moderate dysfunction) and global developmental delays, with persistence of hemoglobin representing a non-neural biomarker [42]. More recently, a de novo missense variant of POU3F3 in a child with severe developmental speech/language disorder, ASD, and mild ID, led to a gene-driven study of 19 mutation cases, who showed a wide range of functioning, most having borderline-to-moderate levels of ID and/or developmental delays [43]. All had delayed expressive language, and almost all had received speech therapy; oral motor problems, word-finding difficulties, and social communication issues were common.
Variants uncovered in WGS/WES screens of CAS cohorts [39 ,40 ] have facilitated subsequent gene-driven studies defining novel syndromes that were not previously described. Identification of a missense variant disrupting the helicase domain of CHD3 in a proband from the first WGS screen of CAS [39 ] led researchers to gather 34 other individuals with de novo variants in the gene; overlapping features included global developmental delay and/or ID, with many showing macrocephaly and a distinctive facial phenotype [44]. Speech/language problems were common, but occurred against a wide background of levels of general cognitive dysfunction, without an obvious relationship between the specific mutation and severity.
Next-generation sequencing of CAS cohorts also identified variants in genes already investigated in earlier gene-driven studies, for which loss-of-function variants had been linked to an array of neurodevelopmental disorders, such as SETD1A [45]. Etiological variants found in probands ascertained for CAS thus expand the phenotypic spectrum associated with several known neurodevelopmental disorder genes. These observations are in line with a broad consensus that singlegene disorders often show variable co-occurrence of diverse neurodevelopmental features, and that pleiotropy is a major theme, with the same gene being implicated across multiple different syndromes, in ways that are not yet fully understood [46]. Curiously, FOXP2 appears to stand out somewhat; while new cases have expanded the profile of deficits and range of severity associated with rare disruptions [4,5], disproportionate effects on speech and language skills are consistently noted. We argue that valuable insights about speech neurobiology can be gleaned from an integrated approach -one that not only focuses on the most specific cases of disorder, but also considers data from genes linked to distinct speech phenotype profiles in only a subset of the affected people, and/or genes in shared neuromolecular pathways. Table 2 gives selected examples from the literature, with explanations of why each gene could be of interest, including evidence of known interactions with FOXP transcription factors [3,21,22,35,39 ,40 ,41,43,44,45,47-57].
Effects of speech-related regulatory genes on early brain development The number of genes implicated in developmental speech disorders is still too low for comprehensive enrichment analysis, but it is intriguing that unbiased screening of CAS cohorts converged on regulatory genes co-expressed during early brain development [39 ,40 ], with transcription factors and chromatin remodelers being prominent in gene-driven studies in this area [42][43][44][45]50,53]. Moreover, proteomic analyses of FOXP transcription factors identified protein-protein interactions with other brain-expressed regulatory molecules linked to neurodevelopmental diseases [19 ]. Involvement of regulatory genes is a common theme in etiology of brain-related disorders, including ID [58], and experimental studies show that chromatin remodeling is crucial for differentiation and maturation of the developing brain [59][60][61]. So far, searches for rare gene disruptions underlying speech disorders have mainly focused on protein-coding variants, but the field could benefit from newly emerging deep-learning tools to help identify potential risk variants affecting chromatin state (Deep-SEA [62]; ExPecto [63]).
As shown for FOXP2, animal models and cellular assays can increase understanding of gene (dys)function. Nonetheless, for disorders disturbing human capacities like speech, and that involve regulatory genes with impacts on early brain development, it could be especially valuable to also adopt more physiologically relevant models. Brain organoids [64], grown in the lab from human stem cells, display species-specific developmental programs [65] and Table 2 Selected examples of genes that could be of interest for studying the neurobiology of human speech, including information on gene function, phenotypes associated with gene disruption in humans, and rationale for highlighting. This is not intended as a comprehensive list of all potentially relevant genes, but an illustration of the broader approach discussed in the text A mutation case Identified in unbiased screening of a CAS cohort [40 ]. Speech phenotypes further described in a recent gene-driven study [57] capture the complex cellular diversity of the developing human cortex [66], although see [67] for important limitations. Applying such methods to patient-derived cells is illuminating pathogenic mechanisms in neurodevelopmental disorders, including idiopathic autism [68]. Longterm and pre-patterned cultures can model complex events, including neuronal activity and cellular migration [69,70], with recent studies demonstrating neuronal network formation [71,72]. Ever more sophisticated geneediting technologies (CRISPR and beyond) allow researchers to insert causal variants into isogenic celllines and/or repair mutations in patient-derived tissue, while single-cell transcriptomics facilitates systematic analyses of molecular and cellular consequences. Application of this powerful new tool-kit to rare variants implicated in developmental speech disorders could shed light on fundamental neurogenetic pathways underlying unique aspects of human biology.

Conflict of interest statement
Nothing declared.

17.
Norton P, Barschke P, Scharff C, Mendoza E: Differential song deficits after lentivirus-mediated knockdown of FoxP1, FoxP2 or FoxP4 in Area X of juvenile zebra finches. J Neurosci 2019. 1250-1219 Experimentally reducing the expression of different FoxP genes in a key part of the songbird brain gives novel insights into evolutionarily conserved effects of these transcription factors on vocal behaviors.