Genome-wide analysis of runs of homozygosity in Pakistani controls with no history of speech or language-related developmental phenotypes

Abstract Background Runs of homozygosity (ROHs) analysis of controls provide a convenient resource to minimize the association of false positive results of disease-associated ROHs and genetic variants for simple and complex disorders in individuals from the same population. Evidence for the value of ROHs to speech or language-related traits is restricted due to the absence of population-matched behaviourally defined controls and limited family-based studies. Aim This study aims to identify common ROHs in the Pakistani population, focussing on the total length and frequency of ROHs of variable sizes, shared ROHs, and their genomic distribution. Subjects and methods We performed homozygosity analysis (in PLINK) of 86 individuals (39 males, 47 females) with no history of speech or language-related phenotypes (controls) who had been genotyped with the Illumina Infinium QC Array-24. Results ROHs of 1-<4 megabases (Mb) were frequent in unrelated individuals. We observed ROHs over 20 Mb among six individuals. Over 30 percent of the identified ROHs were shared among several individuals, indicating consanguinity’s effect on the Pakistani population. Conclusion Our findings serve as a foundation for family-based genetic studies of consanguineous families with speech or language-related disorders to ultimately narrow the homozygosity regions of interest to identify pathogenic variants.


Introduction
Familial aggregation and high heritability estimates indicate that genetic components are involved in speech and language disorders (Kang and Drayna 2011).Family studies suggest Mendelian causes, which are rare alleles of large effect, and limited population-based genome-wide association studies (GWAS) predict the associated risk alleles may explain missing heritability (Kang and Drayna 2011;Reader et al. 2014;Andres et al. 2019;Nudel et al. 2020;Andres et al. 2021;Benchek et al. 2021).Both approaches have provided several candidate genes, but the underlying molecular basis of speech and language disorders remains.These findings have been largely limited to the European and Englishspeaking populations.To fully understand the genetic basis of speech and language disorders, genetic studies of non-Western and non-English speaking populations are needed, but behaviourally tested control samples have so far been unavailable (Mountford et al. 2019;Andres et al. 2021).
Children with genetically related parents, resulting from consanguineous marriages, are expected to carry a high proportion of homozygous genome (Smith 1974).Phenotypically characterised controls may provide essential information about the autozygosity commonly present in the population, such that we can further the study of the Mendelian basis of complex disorders in consanguineous families through homozygosity mapping.The effect of kinship is well understood in the aetiology of genetic disorders.Still, such influence is difficult to comprehend in the speech and language-related disorders in which the Mendelian mode of inheritance is unknown (Raza et al. 2010;2012;Andres et al. 2019).Different populations give rise to a divergent distribution of long and short ROHs (Bittles and Black 2010).It is essential to identify such ROH variations in phenotypically characterised control individuals.Identifying such ROHs in the population may either show a protective effect or be used to determine the risk alleles when the populationmatched cases or families are investigated.For example, in the current study, those genomic regions that contain fewer ROHs in the phenotypically characterised controls might contain the loci that are prevalent in the cases or co-segregate in families with specific language impairment (SLI) or related speech and language disorders.Therefore, homozygosity mapping in controls may help to prioritise the uncommon ROHs while studying the homozygosity mapping in a particular disease-associated population (Pemberton et al. 2012).
Runs of homozygosity (ROHs) in the genome are defined as long stretches of homozygous genotypes inherited together as haplotypes (Gibson et al. 2006).Long ROHs in family-based genetic studies of inherited disorders may carry a segregating variant allele and indicate a recessive mode of inheritance (Gamsiz et al. 2013).In many populations, consanguineous marriages, or marriages between genetically related individuals, are common, such as in Pakistan, where the consanguinity rate is 62% (Bittles 2001;Jabeen and Malik 2014).There is a higher probability of longer ROHs in consanguineous populations because chromosomal segments with no cross-over events are more likely to be inherited from individuals with shared ancestors.In contrast, ROHs are more likely to be shorter in outbred populations due to high recombination and random assortative mating.Long ROHs are sometimes observed in outbred individuals, perhaps due to linkage disequilibrium (LD), unusual mutation, and recombination rates at specific genomic locations (Gibson et al. 2006;Li et al. 2006).
Depending upon the degree of shared parental ancestry in ethnic groups, the frequency and length of ROHs vary from one population to another (Woods et al. 2006;Pemberton et al. 2012).Previous studies compared the extent of homozygosity between populations with varying levels of consanguinity (Li et al. 2006;Kirin et al. 2010;Pemberton et al. 2012;Ceballos et al. 2018).These studies reported the prevalence, distribution, and location of ROHs that might help predict the population history, structure, and natural selection (Li et al. 2006;Kirin et al. 2010;Szpiech et al. 2013).Estimating the distribution of ROHs of different sizes in the European subpopulations revealed variation between communities regarding the total number and length of ROHs.The ROHs up to 4 Mb were standard in individuals from more isolated communities that reproduced with others of the same ethnicity (endogamous communities).Additionally, ROHs >10 Mb were rare in outbred European communities (McQuillan et al. 2008).Similarly, the HapMap project included 209 unrelated individuals belonging to 4 different populations (Han Chinese, Japanese, Yoruba Ibadan, and CEPH), showing the presence of widespread homozygous segments (1393 ROH >1Mb) in genomic regions with low recombination and high linkage disequilibrium.The largest ROH of 17.9 Mb was found in a Japanese individual (Gibson et al. 2006).All these studies included randomly collected participants from the population and did not consider the family history of speech and languagerelated impairment or other developmental disorders.Such populations could be excellent epidemiological resources but may not be ideal for studying complex genetic diseases because no phenotypic information is available.Participants in such studies may have one or more phenotypes.Therefore, we cannot use these data as a reference to estimate the frequencies of ROHs in affected individuals of any phenotype under study, like SLI.Thus, it is crucial to identify the common ROHs through homozygosity mapping using a phenotypically characterised control group from a population of interest to estimate the distribution of variable lengths of ROHs, to provide context for future family-based genetic studies of individuals from the same population.In the current study, we utilised individuals from Pakistan with no history of speech and language-related disorders, who were then assessed with speech and language related measures and determined to be phenotypic controls.
The ROH approach provides the opportunity to explore the genetic architecture of the human genome and understand the role of ROHs as risk factors in simple and complex clinical phenotypic traits.Several previous studies suggested the potential importance of identifying ROHs, which utilised family and population-based case-control approaches to determine the risk of ROHs associated with different behavioural phenotypes, especially autosomal recessive disorders.Genetic studies in consanguineous and isolated populations supported a genome-wide effect of homozygosity in complex phenotypes such as Alzheimer's disease, blood pressure, schizophrenia, autism, intellectual disabilities, psychosis, rheumatoid arthritis, and breast & prostate cancer (Lencz et al. 2007;Nalls et al. 2009;Enciso-Mora et al. 2010;Yang et al. 2012;Gamsiz et al. 2013;Lin et al. 2013).At the same time, the ROH effect was insignificant in other complex phenotypes, like bipolar disorder and Parkinson's disease, major depression, colorectal cancer, and childhood acute leukaemia (Spain et al. 2009;Vine et al. 2009;Wang et al. 2009;Hosking et al. 2010;Power et al. 2014).A whole-genome homozygosity association study in 178 unrelated schizophrenia cases and 144 controls from the European population confirmed the presence of 9 risk ROHs, 4 of which contained genes associated with schizophrenia (Lencz et al. 2007).Similarly, another study of schizophrenia that included the cases and controls from multiple countries and/or ancestries reported the association of ROHs with schizophrenia cases (Keller et al. 2012).At the same time, a population-based study conducted on 506 subjects with bipolar disorder from the United Kingdom found no significant association of risk ROHs for bipolar disorder (Vine et al. 2009).
Gene hunting for simple Mendelian and complex disorders has successfully detected ROHs in family-based studies (Schraders et al. 2010;Noronha and Chauffaille 2018).Detection of ROHs facilitated the mapping of potential loci and/or genes containing the causative variants for simple and complex diseases, including hearing loss (Schraders et al. 2010), intellectual disability (ID); (Gamsiz et al. 2013), mental retardation (Najmabadi et al. 2007), and specific language impairment (SLI); (Andres et al. 2019).Novel mutations were found in GAN, GBA2, and ZFYVE26 genes in 4 out of 12 consanguineous Turkish Roma families with hereditary spastic paraplegia or Charcot-Marie-Tooth (Kancheva et al. 2016).In another study conducted on Dutch and Pakistani families with non-syndromic hearing loss, a homozygous region of 2.0 Mb was observed containing the causative gene for the phenotype in the Dutch family (Schraders et al. 2010).Homozygosity mapping in 5 families with hearing loss from Saudi Arabia confirmed that five different genes were involved in the disorder (Imtiaz et al. 2011).
Considering the importance of ROHs for simple and complex diseases, this study aimed to determine frequencies of ROHs of different sizes and their genomic distribution in the Pakistani population, using a sample of individuals (N ¼ 86) with no history of speech/or language-related disorders.This study's findings will provide utility in the population-based and family-based genetic studies of speech and languagerelated disorders when low-density SNP arrays are used.

Subjects and methods
Our study is approved by the University of Kansas Institutional Review Board (IRB #STUDY00143136) and is part of an ongoing research project funded by the National Institutes of Health (NIH).Pakistani normal control (PKNC) individuals labelled as controls onward in the study were recruited from public and private schools and through personal contacts from the Punjab Province of Pakistan.To enrol individuals from schools, we provided schoolteachers with a guideline to identify children not at risk of neurodevelopmental disorders (NDD) and/or hearing impairment.In addition, we collected the family history of the individuals using the family history questionnaire developed by Dr. Rice (1998), which explicitly targets the history of speech and language-related impairments.Informed consent was obtained from all participants, and parents provided consent for individuals under 18.Based on the information provided by teachers and the information obtained from the family history questionnaire, 100 controls (45 males, 55 females) fulfilled the study criteria described above and provided their saliva samples.Of these 100 controls, 84 were unrelated, and 16 were related.Nine related individuals were siblings, and first cousins belonging to a single family.Among others, there were three sibling sets of two or three belonging to different families.Saliva samples of all controls were collected using the Oragene-Discover OGR-500 kits obtained from DNA Genotek.The DNA extraction was performed according to the manufacturer's protocol (https://www.dnagenotek.com/us/products/collection-human/oragene-discover/500-series/OGR-500.html).

SNP genotyping and quality control
We used DNA samples of 100 controls and performed SNP genotyping using the Illumina Infinium QC Array-24.The SNP genotyping was outsourced to the Johns Hopkins University School of Medicine, Genetic Resources Core Facility (https:// grcf.jhmi.edu/genotyping/).The Illumina Infinium QC Array-24 array is a cost-effective, low-density SNP array, proven to be efficient in detecting sample-specific variant calls, consanguinity in samples, sex, and ethnicity (Ponomarenko et al. 2017).It has been widely used in association studies and proved efficient enough to find genetic linkage and associations (Ponomarenko et al. 2017;Andres et al. 2019;2020;Pinese et al. 2020).This array contains 15,949 SNPs evenly distributed throughout the genome with an average density of 0.5 megabases (Mbs).There are 11,994 SNPs spread across autosomal chromosomes, and the rest are dispersed across sex chromosomes and mitochondrial chromosomes.The SNP genotyping data of 97 control individuals was available, and 4 CEPH samples were used as positive controls during genotyping.The SNP genotyping was unsuccessful for the three controls, one belonged to the related individuals, and the other two were unrelated.In the current study, we excluded the genotyping data of related individuals from the analysis and only the data of 86 unrelated individuals (39 males, 47 females) were used in the ROH analysis.
The SNP genotyping clustering was performed in the GenomeStudio v 2.0 according to the recommended quality parameters by Illumina.In the GenomeStudio, the intensities of all SNPs distributed on autosomal chromosomes were checked by using the clustering option (Zhao et al. 2018).There was 0.11% missing data, and the subset of SNPs was manually reviewed, including SNPs with cluster separation < 0.4, AB T Dev 0.065, AA R Mean < 0.3, AB R Mean < 0.3, BB R Mean < 0.3, or a call rate < 99%.We excluded all SNPs with a call rate < 95% and/or missing genotypes.Then excluded SNPs that deviate from Hardy-Weinberg equilibrium proportions with p < 0.001.Additionally, we performed quality control (QC) measures in the PLINK before determining the frequencies of ROHs.We excluded all SNPs with low call rates (<95%) or missing genotyping (113 SNPs) and those that diverted from Hardy-Weinberg equilibrium proportions with p < 0.001 (30 SNPs).After QC, 11,851 SNPs remained in the analysis.

Analysis of runs of homozygosity
We used PLINK v1.9 to perform ROH analyses in controls.ROH analyses were performed using a sliding window approach that takes a window of a set number of SNPs and slides along the genome to capture the homozygous regions in each window (Purcell et al. 2007).Some default parameters have been developed in PLINK that can be modified depending on the genotyping data used to detect ROHs.The default parameters are suitable for high-density SNP arrays or whole-genome sequencing (WGS) data but are not optimised for medium and low-density SNP arrays.We used recommended guidelines and modified the default parameters and PLINK conditions to analyse ROHs in our low-density array to capture the ROHs of variable length and frequency in controls (Table 1) (Kirin et al. 2010;Ceballos et al. 2018).The homozyg-snp density and homozyg-gap parameters affect the ROHs outcome.In our analysis, we adjusted the SNP density to 450 kilobases (kb;1 SNP in 450 kb) and reduced the homozyg-gap from the default value of 1000 kb to 900 kb.We kept the homozygous sliding window size equivalent to the preferred minimal ROHs length (300 kb).The presence of genotyping error or missing genotypes in an unbroken homozygous segment causes the underestimation of ROHs.The number of heterozygous SNPs allowed in each window was set to default value of 1 SNP, and missing genotypes of 5 SNPs were permitted in each window (Table 1).We determined the ROH of different lengths/sizes and compared their frequencies in the unrelated controls (N ¼ 86), using the PLINK parameters described above (Table 1).

Identification of shared/consensus ROHs
It might be possible that a shared ROH in unrelated individuals starts and ends at different positions for each individual.Therefore, defining the sizes of consensus ROH overlapping in unrelated individuals is essential (Christofidou et al. 2015).
In the current study, we identified the overlapping and shared ROH of different sizes in two or more controls.We used tools in Microsoft Excel to manually arrange the data and identify the shared ROH.

Analysis of ROHs in controls
We observed ROHs of variable lengths in the controls (Table 2).We identified 158 ROHs < 1 Mb (mean size of 0.63 Mb) and 982 ROHs > 1 Mb.The most common size of ROHs observed were between 1-< 4 Mb; 487 ROHs.A total of 171 ROHs of 4-< 6 Mb and 128 ROHs of 6-<8 Mb were found in controls.Similarly, a total of 15 ROHs > 20 Mb were found in controls (Table 2).The longest ROH of 42.1 Mb was found on chromosome 6 in an individual labelled PKNC-407.It was observed that the mean ROH of 1-<4 Mb found in each individual was relatively common compared to 4-<6 Mb.A plot of the mean total number of ROHs (NROH) of different sizes and the mean total sum of ROH (SROH) sizes found in controls is shown in Figure 1.
We observed that the ROHs of 1-< 4 Mb were very frequent, 41.73% of the total ROHs identified (Figure 2).The ROHs over 4 Mb were less frequent in the controls compared to the ROHs with < 4 Mb.However, the percentage of ROHs of < 1 Mb observed in controls was 13.54% of the total ROHS.Furthermore, ROHs of 4-<6 Mb were more frequent (14.65%) than ROHs of 6-<8 Mb and 8-<12 Mb with 10. 96% and 11.65% of the total ROHs, respectively (Figure 2).

Shared ROHs in unrelated controls
We determined the number of ROHs of different sizes among unrelated individuals.The majority of shared ROHs were between 1-<4 Mb in unrelated controls (Table 3).The shared ROHs represented 31.21% of the total shared ROHs observed in controls.

Genomic distribution of ROHs
We analysed the size distribution of ROHs on autosomal chromosomes in our study.The ROHs were nonuniformly   distributed throughout the genome; chromosomes 1, 2, 6, and 7 hold more numbers of ROHs than the rest of the autosomes (Figure 3).The number of ROHs was consistent with the length of chromosomes.The highest accumulated ROHs were found in chromosome 2, spanning about 823.42 Mb.Similarly, the next highest accumulated region spans 448.73 Mb and 448.24Mb on chromosomes 1 and 6.The most significant number of ROHs of 1-< 4 Mb size was observed on chromosome 6, covering 158.53 Mb.The highest number of ROHs > 20 Mb was found on chromosomes 15 and 6, totalling 97.66 Mb and 95.13 Mb, respectively (Figure 3).

Discussion
We performed homozygosity analysis in Pakistani controls to examine frequencies of ROHs of variable sizes using a lowdensity SNP array (Infinium QC array).The current study used control samples from Pakistan with no history of speech and/or language-related phenotypes.Our findings observed varying sizes of ROHs, up to 8 Mb, with the highest frequency typical in the consanguineous population, showing the significance of these findings for family-based gene mapping efforts in speech and language-related disorders.We observed that ROHs are nonuniformly distributed across the genome in the Pakistani population, with the longest ROHs and the most frequent ROHs spanning regions on chromosomes 2, 6, and 15.The high rate of consanguinity in the study population could explain the high number of ROHs with an average size of 8 Mb.
Pakistan is the 5 th most populated country, and the rate of consanguinity is estimated at approximately 62% (Jabeen and Malik 2014;Hina and Malik 2015).High consanguinity rates favour longer ROHs in the population with uncertain significance, which may show up in family-based studies of specific phenotypes, including speech and language-related disorders.The information available in the target population (Pakistani individuals) is pertinent to future family-based ROH analyses in individuals from Pakistan with speech and/or language-related disorders.Our study showed that ROHs 1-< 4 Mb are remarkably common in the Pakistani population, representing 41.73% of the ROHs observed in our sample.The increased number of ROHs of >1 Mb arise due to the shared haplotypes frequently inherited from both parents.In the Human Genome Diversity Project (HGDP), 1043 individuals belonging to 51 different populations were studied for the genomic distribution of ROHs in these populations (Kirin et al. 2010).Clusters of ROH of variable sizes, especially 0.5-2 Mb were frequently observed.Long homozygous stretches (> 4 Mb) were three times more abundant in the consanguineous populations of South and West Asia than in the African and Eurasian people.The more significant percentage of individuals from South and West Asia contain ROHs >16 Mb, which explains the consanguinity in these populations (Kirin et al. 2010).Some African populations like Kikuyu (Eastern Niger-Congo), Africa Amhara (Horn of Africa), and Yoruba (Guinea) showed lower frequencies or complete absence of ROHs > 1.5 Mb.Whereas in other African populations like Fula (Western Africa) and Somali (Horn of Africa), high frequencies of ROHs >1.5 Mb were observed.The difference in ROH frequencies of >1.5 Mb is explained by the impact of consanguineous practices in some African populations (Ceballos et al. 2019).In our study of phenotypically characterized controls, we found that the frequencies of ROHs of 4-<6 Mb and 6-<8 Mb sizes were 14.65% and 10.97%, respectively.The high percentage of these regions could be explained by the consanguinity.Furthermore, in the future, these outcomes would help to determine the ROHs associated with speech and language related phenotypes in the Pakistani population using a family-based approach.
Previous studies reported that short ROHs do not arise due to inbreeding but are commonly present in all populations.Typically, short ROHs make up the highest volume of ROHs and cover more genomic regions than long ROHs.A ROHs of different sizes are distributed unevenly across the genome, and there are regions with a high prevalence of ROHs or a complete absence (McQuillan et al. 2008).In the current study, we found that chromosomes 2, 6, 1, and 7 contained more ROHs than the other chromosomes.The longest homozygous regions were observed on chromosome 15.A detailed analysis of the genomic distribution of ROH in 23 European populations confirmed the uneven distribution of ROH across the genome.Regions with the highest percentage of ROH termed as "ROH islands" were found on chromosomes 3, 4, and 14 in European populations (Nothnagel et al. 2010).
We observed in the current study that ROHs cover many genomic regions and are highly frequent in consanguineous populations.In a study conducted on individuals whose parents were first cousins suffering from neurodevelopmental autosomal recessive disorders, 11% of their genomes were homozygous (Woods et al. 2006).A study was conducted on a Pakistani community living in the United Kingdom to check the impact of first-cousin marriage on the prevalence of autosomal recessive disorders (Christianson et al. 2006).The increase in the inbreeding coefficient determined the risk of autosomal recessive disorder among British Pakistanis.Therefore, the prevalence of inherited single-gene disorders significantly increased due to high consanguinity rates in the Pakistani population (Christianson et al. 2006).
Multiple studies have used the ROHs approach and identified many risk genes for inherited disorders like autism, ID, and specific language impairment (SLI) (Gamsiz et al. 2013;Lin et al. 2013;Gandin et al. 2015;Andres et al. 2019).Recently, homozygosity analysis in 14 consanguineous families from Pakistan identified overlapping homozygous regions associated with SLI (Andres et al. 2019).A population-based study on the Chilean population used both loss of heterozygosity (LOH) and parametric & non-parametric linkage analyses to identify the genetic basis of SLI.Detailed analysis of chromosome 7q locus that was replicated in this study with a linkage score of 1.24 revealed the presence of 2 SNPs haplotype in high frequency in individuals with SLI (Villanueva et al. 2011).A case-control study on the Taiwanese Han population characterised an ROH on 11q22.3 in patients with autism spectrum disorder.This region spanned several risk genes, including NPAT and ATM, which were of interest due to their role in speech delay and language impairment (Lin et al. 2013).
ROHs led to identifying disease-associated risk factors and mutations in several genes.A case-control study on the Irish population revealed several potential risk loci with ARHGEF1 as a candidate gene for amyotrophic lateral sclerosis (ALS) (McLaughlin et al. 2015).Similarly, homozygosity analyses conducted on Russian children with neurodevelopment disorders (autism, epilepsy, and intellectual disability) revealed disease associated ROHs of 1-1.6 Mb on 7q21, 7q31, 11p15, and 15p11.It is crucial to investigate short homozygous regions to determine the causative mutations in children with neurodevelopmental disorders (Iourov et al. 2015).Determining the length of homozygous regions is critical to defining disease-associated loci, which may differ in different populations.Therefore, population-based ROH frequencies and their size provide valuable information to determine the significance of ROHs in population-matched family-based studies.Cataloguing the population-based ROHs lays the foundation to map family-based ROHs carrying the causative or risk variants for simple and complex inherited diseases.
Multiple analytic tools are available to analyse ROHs (Gusev et al. 2009;Seelow et al. 2009;Quinodoz et al. 2021).PLINK is one of the most widely used programs in population-based samples, with default parameters suitable for SNP arrays and whole-genome data (Ceballos et al. 2018).Default PLINK parameters affect the accurate detection of ROHs for the medium and low-density SNP arrays.Unlike highly dense and deep coverage of WGS data and high-density SNP array, low-density SNP array data is more abundant and affordable, and less intensive computational effort is needed for obtaining the ROHs.Meyermans et al. (2020) determined the effect of default parameters of PLINK in detecting ROHs using a medium-density SNP array.They concluded that minimal SNP density and maximal gap are essential parameters for the ROHs using mediumdensity genotypes.If SNP density was set too low, genome coverage of the ROH analysis was limited.Similarly, the maximal gap, scanning window length, and window threshold settings affect the outcomes and should be adjusted according to the density of the SNP array used in the analysis (Meyermans et al. 2020).We used the default parameters with some alterations in the PLINK conditions to detect ROH using the lowdensity SNP array.We adjusted SNP density to 1 SNP in 450 kb to capture ROH from the whole genome.When we reduced the SNP density from 450 kb, it did not capture the ROH from the entire genome.The homozygous gap affects the ROHs determination.The default value of homozyg-gap of 1000 kb is too high for detecting of ROH for low density SNP genotyping data.Literature shows that genome coverage drops < 95% for homozygous gap < 500 kb (Meyermans et al. 2020).Therefore, we reduced the homozyg-gap from the default value of 1000 kb to 900 kb in the analysis.Ceballos et al. observed the difference in short size (0.3-1 Mb) ROHs captured using whole genome sequencing and high-density array data.They argued that this gap could be corrected by changing the PLINK parameter homozygous-SNP 30 for the SNP array instead of using the PLINK default parameter of 50 SNP (Ceballos et al. 2018).Therefore, when using a low-density SNP array, ROHs of short sizes were captured by relaxing the number of SNPs needed to consider an ROH (homozyg-snp 10).

Conclusion
In the current study, we conducted a genome-wide estimation of ROHs in unrelated individuals of the consanguineous Pakistani population using a low-density SNP genotyping array.We found that ROHs of 1-<4 Mb are most abundantly present in these individuals with no history of speech or language-related impairments.As ROHs are potential risk factors associated with common and complex diseases, they will allow population-based estimation of the risk arising from recessive genetic variants.The estimates of ROHs commonly present in the control population will be helpful in detecting and fine mapping the ROH regions that may contain the causative variants for simple and complex diseases in the affected individuals of the same population.The current study provides a unique source for future populationmatched family-based, and population-based, homozygosity analysis for speech and language-related phenotypes.

Figure 1 .
Figure 1.Mean total number (NROH) of different sizes and the total sum of ROH (SROH) of different sizes found in Pakistani controls.Mean sizes are shown in Megabases (Mb).

Figure 2 .
Figure 2. Percentage of ROHs of different sizes in control individuals of the total number of ROHs identified.

Table 1 .
Parameters for detection of runs of Homozygosity in PLINK.

Table 2 .
Total numbers of ROHs of different sizes in controls.

Table 3 .
ROHs of different sizes shared among controls.Genomic distribution of ROHs on autosomal chromosomes.The sizes of ROHs are shown in Megabases (Mb).high-densitySNParray ( 3 million SNPs) was used in a study conducted on the Chinese Han population, revealing a short ROH of 0.3-1 Mb covering a region of 510 Mb(Ceballos et al. 2018).Our study observed a total of 158 ROHs <1 Mb spanning a 99.99 Mb region of the genome.The low frequency of ROHs <1 Mb obtained in our study could be attributed to the low-density SNP array.