Genomic diversity and differentiation of Alu insertion polymorphisms in a native British and four South Asian migrant populations

Abstract Background Alu insertions are bi-allelic and primate-specific, this makes them a useful marker for studying genetic variation, migration patterns, forensic analyses, paternity, and evolutionary heritage; however, specific population studies are limited. Aim The objective of this study is to document the level and extent of genetic variation at 39 different Alu loci in five populations (British, Indian Punjabi, Indian Gujarati, Pakistani, and Bangladeshi) from the East Midlands region of the UK. Genetic data on migrant populations is currently limited. Subjects and Methods DNA samples (n = 543) were analysed for 39 Alu insertion polymorphisms using specific primers and standard protocols. Data were analysed for population and forensic genetic parameters. Results All studied Alus were polymorphic in the British White population while South Asian migrant populations had a variable number of loci which were monomorphic. Highest heterozygosities and lowest match probabilities were observed in the British sample, while the Bangladeshi sample had the lowest heterozygosity and higher match probability. Conclusion The analysed Alus insertions (TPA25, Ya5NBC123, Ya5NBC182, Ya5NBC241, and Ya5NBC242) are highly polymorphic and variable among migrant populations. These loci could be useful for population genomic and differentiation studies.


Background
Alu insertion (characterised by the presence of AluI restriction site) DNA sequences of around 300 bp in length are highly abundant (> 1 million repeats) in the human genome, contributing to roughly 11% of the DNA sequence (Batzer and Deininger 1991;Batzer et al. 1994;Deininger 2011).Alu insertions are identical by descent and are members of a larger group of transposable elements called Short Interspersed Nuclear Elements (SINEs) and they randomly distribute themselves throughout the genome (Chadli et al. 2009).However, Alus lack removal mechanisms so, once they have been inserted, they propagate in the genome through generations.Alu insertions are stable, bi-allelic, and primate-specific, this makes them a useful marker for studying genetic variation, migration patterns, forensics, paternity, and evolutionary heritage (Stoneking et al. 1997;Batzer and Deininger 2002;Mastana et al. 2003;Watkins et al. 2003;Deininger 2011;Laybourn et al. 2016;Singh et al. 2016;Mastana et al. 2017).Indian populations often have endogamous marriages and strict religious and cultural traditions that restrict the diversity within native and migrant populations (Pemberton et al. 2012;Mastana 2014;Sankaran et al. 2017;Brearley et al. 2020).The purpose of this study is to investigate the levels of genetic variation in a sample of four migrant South Asian and one native White British/European population using a panel of selected Alu polymorphisms and investigate the usefulness of the Alus for genomic variation and genetic differentiation among populations.

Sample and methods
Genomic DNA samples from five different populations were collected to analyse the level of genetic variation.A native British population was analysed besides four South Asian populations; Indian Punjabi (Sikhs), Indian Gujarati, Bangladeshi, and Pakistani, occupying the East Midlands.According to the 2011 census, roughly 12% of the region's non-white population were Indian, followed by Pakistani (4.4%) and Bangladeshi (3%) (England and Wales 2011 Census).Blood samples were collected from volunteered donations at various sites and local events in parallel with the genetic studies occurring in the East Midlands region at the time.The participants were between the ages of 18 and 60 and confirmed to be unrelated within three generations.Ethnic backgrounds were defined using a questionnaire concentrating on each of the participant's four grandparents.The participants completed written consent before donating a blood sample.Institutional ethics and the NHS blood donation service approved the collection and analysis of the anonymous samples for genetic analyses.The final East Midlands Alu dataset contained samples from five different ethnic backgrounds; British White (n ¼ 113), Indian Punjabi (n ¼ 133), Indian Gujarati (n ¼ 92), Bangladeshi (n ¼ 100), and Pakistani (n ¼ 105).We hypothesised that there would be a significant genetic variation at different Alu polymorphisms amongst the East Midlands populations based on different population and geographical origins, marriage, and migration patterns.
In this study, 39 different Alu loci were chosen for laboratory analysis based on their level and extent of genetic variation among different populations in previous studies (Watkins et al. 2003).DNA extraction was carried out using the salting-out method.The extracted DNA was amplified using specific Alu primers and PCR products were electrophoresed on 2% agarose gel.The gels were scored based on the presence of Insertion or Deletion bands and each locus had three genotypes (Insertion/Insertion, Deletion/Deletion, and Insertion/Deletion).All genotypes were scored by two independent researchers and any discrepancies were repeated.Ten percent of genotypes were repeated randomly to ensure that genotyping was consistent and repeatable.All data were collated in an Excel sheet for further analysis.

Data management and statistical analysis
The genotype data was used to carry out the population genetic analyses and forensic indices calculations.The genotype counts, allele frequencies, observed and expected heterozygosities, inbreeding coefficients, Nei's genetic distance, and pairwise genetic identity were determined using the GenAlEx 6.5 Excel add-on (Peakall and Smouse 2012).A Multidimensional Scaling (MDS) plot of all 39 Alu loci was configured using the NtSys programme to show the observed distances and dissimilarity amongst the five populations (Rohlf 2018).Multiple software packages were used to cross-check the consistency in statistical analyses.

Results
Table 1(a-e) presents the results from all the genetic parameters for the Pakistani, Indian Punjabi, Indian Gujarati, Bangladeshi, and British White populations.All the loci in the British population were found to be polymorphic, however, several loci in the migrant South Asian populations were not.Monomorphic loci were not included in further data analysis and comparisons.Bangladeshi and Indian Punjabi samples had seven and six monomorphic loci, respectively, for deletion alleles.Ya5NBC132 and Ya5NBC157 were monomorphic for deletion alleles in all South Asian groups.All Alu loci were polymorphic in the British population.
Thirteen Alu loci presented a substantial departure from HWE with a significant chi-squared value (p < 0.05-0.001)but none of the loci were significant after Bonferroni correction (Corrected Bonferroni p-value ¼ 0.0001).The highest chisquared value was recorded at Ya5NBC157 and Ya5NBC241 in the British White population, followed by Ya5NBC45 in the Bangladeshi sample (Table 1).Observed (H O ) and expected heterozygosity (H E ) are displayed in Tables 1(a-e).Table 2 presents the mean observed and expected heterozygosities, which were highest in the British White population and smallest in the Bangladeshi sample.As expected, all South Asian populations have lower observed heterozygosities.Interestingly, contrary to expectations, the highest average level of inbreeding co-efficient estimate was seen in the British White sample and the smallest mean was measured in the Gujarati population.
As expected, the genotype and allele frequencies were variable between individual loci as well as between populations.Multiple pairwise chi-square tests were conducted to assess differences between the study populations.However as there are multiple loci (39) and populations, leading to multi-dimensional relationships between populations, the allele frequency data were used to work out Nei's unbiased (using sample size) genetic distance matrix to evaluate population differences and affinities (Table 3).The British White sample had significant genetic distance from all other South Asian samples, with the highest distance with Bangladeshi and Pakistani samples (0.077), while Pakistani and Punjabi showed the smallest distance (0.002).The genetic distance tree diagram (not included here, but available) and independent multidimensional scaling analysis plot of the populations (Figure 1) confirmed the patterns observed in the distance matrix, with the British White population being distinct from other study populations, with Pakistani and Punjabi populations closely placed in the middle.

Comment
This is the first study to analyse a large battery of Alu polymorphisms (39) to assess genetic composition and population relationships among a British White and four migrant South Asian populations in the East Midlands region of the UK.Similarities in allele frequencies in migrant South Asian samples indicated a strong common genetic heritage, ancestry, and differing levels of endogamy.This investigation emphasised the importance of understanding population similarities and differences and the effect of different religious traditions on genetic composition (Laybourn et al. 2016).The results from this study supported those of other Alu and other genetic polymorphisms (STRs, SNPs), demonstrating the significant genetic variation between South Asian subpopulations and British samples.The process of South Asian groups migrating to Europe promotes genetic drift, which reduces the gene pool size and risks the possibility of certain Alus being lost (Wang et al. 1998).Analysed Alus had a high level of discrimination, therefore these could be effective loci in analysing the genetic variation among individuals and populations.Analysis of roughly 50 Alus has been shown to provide the most accurate genetic estimates (Watkins et al. 2003).Many of the loci included in this dataset are widely under-studied.Therefore, more research is needed to address the usefulness of Alus in population genomics.
In conclusion, Alu insertions are stable and easy to genotype genetic markers which could be used as an effective tool in population genomics and exploration of population relationships, migration patterns, and ancestry.The results of this study emphasised the genetic divergence between European and

Table 1 .
Distribution of genotypes, allele frequencies and population genetic parameters.

Table 1 .
Continued.Shows initial Chi-square value higher than 3.84 for 1 degree of freedom, indicating significant departure from Hardy Weinberg Equilibrium (HWE). a