Autosomal Short Tandem Repeat (STR) Variation Based on 15 Loci in a Population from the Central Region (Riyadh Province) of Saudi Arabia

Awad E Osman1*, Habiba Alsafar2, Guan K Tay3, Jasem BJM Theyab4, Mohamed Mubasher1, Nezar Eltayeb-El Sheikh1, Hanan AlHarthi1, Michael H. Crawford5 and Gehad El Ghazali6 1PCLM, King Fahad Medical City, Riyadh 11525, Saudi Arabia 2Khalifa University of Science Technology and Research, Abu Dhabi, United Arab Emirates 3Centre for Forensic Science, The University of Western Australia, Crawley, Western Australia 4Department of Sociology and Social Work and Anthropology, Kuwait University, Kuwait 5Laboratory of Biological Anthropology, University of Kansas, USA 6Sheikh Khalifa Medical City, PaLMS, SEHA, Abu Dhabi, United Arab Emirates


Background
Short Tandem Repeats (STRs) are nucleotides sequences with repeat motifs of variable lengths (2 to 8bp) that are polymorphic (i.e. number of repeats varies between individuals) [1,2]. STRs represent about 3% of human DNA and occur approximately one time in every 10,000 nucleotides. STRs are an invaluable tool and their unique sequences used in genetic finger printing during forensic investigations involve significant biological evidence. Their high degree of polymorphisms makes them informative [3], especially when considering multiple loci simultaneously. The new STR kits now contain over 20 STR loci that are amplified in a single multiplex reaction. Additionally, STR markers have been widely used in medical applications such as for the assessment of allogenic bone marrow transplantation engraftment in calculating the ratio of donor/patient DNA presence and in the study of population genetics. The use of STRs depends on the allele frequencies distributions that vary between various populations [1][2][3][4].
Various DNA-based techniques have been used to identify the genetic differences in human populations. STR loci are useful and preferred because of their small size, relatively low incidence of mutation and wide spread distribution [5]. Genetic studies and in particular those based on STR applications have been very important towards developing an appreciation of the extent of genetic polymorphisms that exists between different populations. Throughout history, society had been stratified on the basis of caste, class, clan, race, region, religion, ethnicity, gender, age and socioeconomic status. It is ethnicity and racial discrimination that distinguishes one nation from the other. Ethnicity is, as defined by Macionis; "a shared cultural heritage and people define themselves or others as members of an ethnic category based on common ancestry, language or religion that gives them a distinctive social identity" [6]. The same is the case within the Arab world, which has maintained a unique ethnographic identity, historical background, ancestry, cultural traits, social norms, moral values, religious beliefs and genealogy. At present, the total Arab population is estimated at about 325 million, increasing at a rate of approximately 2.3% annually. The Arab population of the Middle East and North Africa is distributed throughout 17 different countries, namely: Algeria, Bahrain, Egypt, Iraq, Jordan, Kuwait, Lebanon, Libya, Morocco, Oman, Palestine, Qatar, Saudi Arabia, Syria, Tunisia, the United Arab Emirates, and Yemen.
In Saudi Arabia, approximately 87% of the population is of Arab descent, a group of people who have inhabited this region for thousands of years. The remaining 13% of inhabitants of this region migrated into this country during the last, 400 years [7]. The rate of consanguineous marriage is relatively high and has been estimated to at around 57.7% [8]. The Saudi population can be characterized by a specific ethnic subset which presumably reflects a unique set of allele frequencies and genotypes and that could differentiate them from other populations in near geographical areas and worldwide i.e. the alleles frequencies will be different from other populations and the genetic distances between these populations will be variable depending on the time of divergence from the common human ancestor.
Therefore, this study was conducted to investigate the genetic variation of populations from Saudi Arabia by using 15 autosomal STR markers -currently used in forensic and paternity testing. DNA was collected from a sample of 190 unrelated volunteers residing in the central region (Riyadh Province) of Saudi Arabia. The allele frequencies of the 15 STR markers were calculated using gene counting methods and compared with frequencies from other populations previously studied.

Methods Population
A total of 190 healthy unrelated individuals from the central region (Riyadh Province) of Saudi Arabia were randomly recruited at King Fahad Medical City (KFMC) (Riyadh, Saudi Arabia) from the donors of patients under preparation for Bone Marrow Transplantation. According to KFMC hospital record system, the selection of the study subjects precluded having any two individuals or their father/mother descending from the same parent. Thus, first and second cousins are not included in the study. The age of the study population ranged between 10 and 45 years (median of 25) at the time of blood collection. There were 131 (68.9 %) males and all were of Saudi origin. The sample size was selected to provide sufficient analytical power in terms of degrees of freedom to determine allele frequencies that will allow testing of the Hardy Weinberg Equilibrium (HWE) assumption based on a Fisher's Exact Test. The study received ethical approval from the Institutional Review Board at KFMC.

DNA extraction, PCR and Fragment analysis for STR markers
Genomic DNA was extracted from whole blood samples in EDTA anticoagulant using a MagNa pure compact instrument (Roche Diagnostics GmbH, Mannheim, Germany). PCR amplification was performed according to manufacturer instruction on GeneAmp ® PCR System 9700 Thermal Cycler (Applied Biosystems, Foster City CA, USA) using AmpFlSTR ® Identifiler ® PCR amplification kit (Applied Biosystems, Foster City CA, USA) that include D8S1179, D21S1, D7S820, CSF1PO, D3S1358, THO, D13S317, D16S539, D2S1338, D19S433, vWA, TPOX, D18S5, D5S818, FGA loci. The PCR product was separated on the 3130xL Genetic Analyzer (Applied Biosystems, Foster City, CA, USA) and analyzed using GeneMapper ® Software v3.2 (Applied Biosystems, Foster City CA, USA). For quality assurance purposes, multiple DNA samples with known combined STR genotypes were run in parallel with our study samples and appropriate results were obtained for these control samples on every occasion.

Statistical analysis
The allele frequencies were calculated based on the number of the detected alleles for each specific locus. To assess departures from Hardy-Weinberg Equilibrium (HWE) (observed heterozygosity (Ho), expected heterozygosity (He), p-value; using Fisher's Exact test) and Gene Diversity indices (GD), the Arlequin software version 3.5.1.3 was used [9]. Forensic and population genetic parameters including power of discrimination (PD), Polymorphic Information Content (PIC), Matching probability (MP) and probability of exclusion (PE) were derived utilizing Powerstats software version 1.2 [10]. An overall significance level for testing HWE hypothesis was set at 0.05 and adjusted for multiple comparisons using Bonferonni Criteria. Genetic relationships between groups were assessed through multidimensional scaling (MDS) plot of the genetic distance matrix using NTSYS 2.1. Software to determine the genetic differences and similarities between this study sample and published data from other populations [11].

Results and Discussion
The 15 autosomal STR loci from all samples tested were amplified successfully. A total of 150 alleles were identified for the 15 STR loci. Their corresponding frequencies were calculated ( Table 1). The highest allelic frequencies observed were: allele 10 of TPOX (55.8%) and allele 12 of D13S317 (41.1%) with the most polymorphic loci observed in this study being D18S5, FGA, D21S11--defined by 17, 15 and 15 alleles respectively. However, the Polymorphic Information Content (PIC) of the 15 loci were greater than 0.57 through all loci that suggesting that the markers are highly polymorphic and would be useful as informative markers for differentiating individuals of Saudi descent. The degree of polymorphism at each locus can also be expressed in terms of heterozygosity along with the PIC value [12]. The highest values were observed for the D19S433 locus (He=0.86977; PIC=0.85) while the lowest values were identified in TPOX (He=0.6212; PIC=0.57).
After applying the Bonferonni corrections to the Fisher's exact test a deviation from Hardy-Weinberg equilibrium (HWE) was detected for the TH0, D5S818 and FGA loci. A deviation from HWE was also detected in a previous Saudi study of HLA allele frequencies [13]. This deviation was attributed to the high rates of consanguineous marriage among Saudis or to the Wahlund effect resulting in the reduction of heterozygosity. Findings in a previous Saudi study observed that 2 out of 8 STR loci did not conform to HWE which is consistent with our data [14]. As shown in Table 2, the p values for HWE of the available STR loci data that were obtained from previous Arab related populations reports were compared with the data from this study. It is worth noting that some alleles such as 34.2 (D21S11), 16.2 (D18S51), 12.2 (D16S539), 29 and 22.2 (FGA) and 7.3 and 8.3 (THO1) that were considered to be variant alleles in previous reports were also identified in frequencies ranged between 0.3 to 1.6% within the Saudi population studied in this project.
This study was based on a random sample of unrelated healthy volunteers of Arab descent who reside in the Riyadh district of Saudi Arabia. Allelic frequencies of 15 STR markers and their population genetic/forensic parameters were analyzed and results showed that the polymorphic nature of loci examined were sufficient to allow the  were also identified by a previous study that used 8 STR loci to study unrelated subjects in Saudi Arabia (14).
In other studies, the most frequent allele for forensic STR loci can vary depending on the population, for example 12 (0.4093) at D5S818 in Slovenian population [15], 15 (0.4143) at D3S1358 in Bolivians [16], 8 (0.4890) at TPOX among Wallachians in South Romania [17] and 8 (0.424) at TPOX in Adaima community from Egypt [18]. In this study the most frequent alleles were 8 (0.558) at TPOX, 12 (0.411) at D13S317, 12 (0.385) at CSF1PO, 11(0.382) at D16D539 and 10 (0.358) at D7S820. The most polymorphic marker in this study was D18S51; spanning 17 tandem repeat alleles. Allele 13 of this locus was the most predominant one with a frequency of 0.268. Interestingly, a study of Tunisians found markers D19S433 and D21S11 to be the most polymorphic; each spanning 18 alleles [4]. In contrast, a lesser degree of polymorphism was reported for Iraqi individuals as compared to Saudis and Tunisians [19] due to less admixture.
For the sample used in this study, the observed values of heterozygosity ranged from 0.621 at TPOX to 0.869 at D19S433. This lower degree of homozygosity suggests presence of a random mating in the study population and less consanguinity. Despite prolific rates of consanguineous marriage in the Saudi population [8], this study was able to clearly identify seven markers that can be used for characterizing the genetic makeup among individuals of Arab descent in Saudi Arabia.
Furthermore, it can also be argued that Independent Inheritance property could apply to two cases for markers collocated on the same chromosome, namely CSF1PO and D5S818 on chromosome 5, and TPOX and D2S1338 on chromosome 2. This was reflected by the disparate statistical values regarding HWE assumption that rendered in each case one marker as statistically significant while the other one was not.
A Multidimensional Scaling (MDS) plot was constructed to illustrate the genetic distances between 14 populations (Figure 1). The analysis of these populations showed clustering into four groups: (1) the Asian subpopulation, (2) the Caucasians, (3) African and (4) Middle East subpopulations. The Saudi group in this study clustered with populations from Yemen, Iraq, Qatar, Oman and Bahrain. In contrast, African populations like the Hutu and Kenyans as well as the Pakistani, Bangladeshi, Punjabi and those of Caucasian origin like the Belgium, Croatians and Georgians were located furthest from the samples of this study.

Conclusions
TPOX, D13S317, CSF1PO, D16D539 and D7S820 markers were found suitable for forensic analysis, paternity testing and can also be used for chimerism study after allogenic bone marrow transplantation for Saudi population. On the other hand, the variable degree of genetic distances of this population and other Arab related groups might be explained by the populations admixture with other ethnic origins. Further analyses of Arab population are needed to understand the interrelationship between the different ethnic groups of the region.