Allele Frequencies and Forensic Data of 25 STR Markers for Individuals in Northeast Brazil

Identifying DNA markers such as Short Tandem Repeats (STR) can be used to investigate genetic diversity based on levels of heterozygosity within and between populations. Allele frequencies and forensic data for STRs were obtained from a sample of 384 unrelated individuals living in Bahia, Northeastern Brazil. Thus, the present study aimed to identify the allele frequency distribution, in addition to the forensic and genetic data, of 25 STR loci in the population of Bahia. Buccal swabs or fingertip punctures were utilized to amplify and detect 25 DNA markers. The most polymorphic loci were SE33 (43), D21S11, and FGA (21). The least polymorphic were TH01 (6), TPOX, and D3S1358 (7). Forensic and statistical data were obtained through data analysis, which revealed a large genetic diversity, with an average value of 0.813 for the analyzed population. The present study was more robust than previous STR marker studies and will contribute to future research on population genetics in Brazil and worldwide. The results of this study allowed the establishment of haplotypes found in the forensic samples of Bahia State to serve as a reference in the elucidation of criminal cases and paternity tests, as well as population and evolutionary investigations.


Introduction
Advances in molecular biology have provided powerful tools for reconstructing genetic history. DNA contains valuable information that includes sequences from the evolutionary past that can be extracted from any type of biological material [1,2]. Human DNA sequences are now being used to study how distinct cultural groups are genetically related [3].
The identification of DNA markers such as short tandem repeats (STR) or microsatellites, which constitute highly variable genetic loci, has become the test of choice for genetic linkage analysis [4]. The approach is based on the identification of repetitive human DNA regions, which are characterized by size variation. A microsatellite region or locus consists of one to six base-pair sequences repeated several times throughout the genome. These are co-dominant markers and therefore can be used to investigate genetic diversity based on levels of heterozygosity within and between populations [5].
Moreover, the identification of polymorphisms is of great importance for the reconstruction of historical human migrations [5]. Since its discovery, Brazil has been a major focus of immigration. The Brazilian population is derived from the interbreeding of Portuguese, Africans, Europeans, Japanese, Brazilian natives, and many other nationalities. In geographically separated populations, the allelic distribution of DNA markers is often different. In Bahia, the highly mixed-sex population is derived from Brazilian natives, from Atlantic West Africa, and from Europeans, but with a predominance of African ancestors as African slaves [6]. This diversity reveals an intense gene flow reflecting a peculiar pattern of geographic diversity, a very interesting scenario for genetic studies [7].
The use of bioinformatics tools provided statistical data with regard to population genetics through forensic parameters such as genetic diversity (GD), polymorphic information content (PIC), heterozygosity (H), and the probability of the Hardy-Weinberg equilibrium (p-value) [3,5,7]. These parameters play a fundamental role in population genetics by identifying similarities and differences between and within populations [7].
Forensic analyses and studies of the distribution of allele frequencies, GD, and PIC in heterozygous populations are necessary to create databases for reference populations and to obtain information on the genetics of the population under study. To determine the genetic diversity and gene flow in this population, the present study determined the distribution of allele frequencies in addition to forensic and genetic data from 25 STR loci in the Bahia population.

Samples and DNA Extraction
A total of 384 samples from unrelated individuals living in Bahia, northeastern Brazil, were obtained as secondary data from paternity cases performed between 2016 and 2017 at the Laboratório de Investigação de Vínculo Genético do Centro de Diagnóstico do Grupo de Apoio à Criança com Câncer (CDG). Patients had previously given consent to the paternity tests. DNA was extracted from buccal swabs or directly from fingertip puncture using FTA ® cards (Whatman™ Bioscience, Cambridge, UK) according to the manufacturer's protocol [8]. Samples were quantified by real-time PCR to assess the quantity and quality of selected DNA. The isolated DNA was stored at 4 • C until amplification. The study was approved by the Research Ethics Committee of the Universidade Salvador (UNIFACS): CAAE: 68946617.9.0000.5033.

Statistical Analysis
Samples were calculated to obtain statistical data and forensic parameters from 384 individuals from the state of Bahia. After obtaining the allele profiles, the allele frequencies of the 25 markers were determined. The allele frequencies were determined using the relative frequency method and the number of allele repeats observed in the samples. The data were analyzed using GenAlEx version 6.5 software [9] to calculate both forensic and statistical parameters, including: the number of alleles (Nall), observed heterozygosity (Ho), expected heterozygosity (He), total heterozygosity or genetic diversity (GD), polymorphic information content (PIC), match probability or identity probability (PI), exclusion probability (PE), and probability using the Hardy-Weinberg equilibrium (p-value).

Results
The allele frequencies for the 25 STR loci studied in Bahia's population are presented in Table 1. A total of 347 alleles were detected, with 51 of those qualifying as rare alleles (Allele Frequency < 0.005). The highest number of rare alleles were found in SE33 (six rare alleles), which also consisted of the highest total number of alleles. For each analyzed locus, Table S1 presents the range of alleles, the number of alleles obtained in each locus, the allele frequency, and the gender distribution identified by the locus AMEL. The most polymorphic loci were SE33 (43 alleles), D21S11 (21 alleles), and FGA (21 alleles). TH01 (six alleles) had the least polymorphic loci. The allele frequencies ranged from 0.0013 to 0.39583 (Table S1). Table 2 presents comparisons between the most frequent alleles in Bahia, (northeastern Brazil), and in two studies in different periods in Brazil and Portugal. Of the 16 loci in common between Bahia and Portugal (D3S1358, VWA31, D16S539, CSF1PO, TPOX, D8S1179, D21S11, D18S51, TH01, FGA, D5S818, D13S317, D7S820, SE33, PENTA D, and PENTA E), only six (CSF1PO, TPOX, D21S11, D5S818, D7S820, and PENTA E) presented the same result for the analyzed parameter, demonstrating both the distances between the Portuguese and Brazilian populations, and the formative relationships within the total population. The forensic parameters obtained for the 25 loci are presented in Table 3.

Discussion
This is the first study in Brazil reporting on 25 markers that were analyzed to evaluate the genetic variability of a specific human population. The allelic frequencies were compared to other regions of Brazil. No relevant differences were observed when comparing the frequencies of other studies performed in Rio Grande do Norte, Paraíba, Pernambuco, Santa Catarina, Mato Grosso do Sul, Rio Grande do Sul, Rio de Janeiro, Amazonas, or Paraná, with one other Brazilian study (Brazil) [10][11][12][13][14][15][16][17][18][19][20]. No comparisons were made for CD4, D8S639, and PENTA D, due to the lack of studies analyzing these markers.
In addition to allelic analysis, the present study provides statistical and forensic data for the 25 analyzed STR loci in Bahia's population. Other Brazilian studies realized in this category present an average of 8 to 13 analyzed loci [10][11][12][13][14][15][16][17][18][19][20]. The present study thus promotes greater quantitative certainty for the genetic data of the Brazilian population.
The analysis of Genetic Diversity (GD) values ranged from 0.7316 (TPOX) to 0.9339 (SE33) ( Table 3). These high values reveal the high genetic variability of Bahia's population since the heterozygosity values approached their maximum values [22].
The highest discrepancy between parameters was 0.046 (TPOX). He values ranged from 0.731 (TPOX) to 0.934 (SE33), while Ho values ranged from 0.685 (TPOX) to 0.971 (SE33). The higher Ho as compared to He in the other 11 loci (D8S639, CSF1PO, D21S11, DS2441,  D19S433, TH01, FGA, D5S818, SE33, and D1S1656, D2S1338) is indicative of external breeding [22] since the samples were obtained at random. The range obtained for the Ho and He values reveals that Bahia's population presents a high genetic content.
Previous small-scale forensic studies in other northeastern states, such as Rio Grande do Norte, Paraíba, and Pernambuco have shown similar values for the discrepancy between observed and expected heterozygosity [10][11][12]. Due to the lack of studies, and on larger scales, it is not possible to establish average values for the forensic parameters analyzed across the Brazilian territory.
However, when comparing studies in other states such as Santa Catarina, Rio Grande do Sul, Rio de Janeiro, Mato Grosso do Sul, and Amazonas, it is possible to identify a preliminary average discrepancy between Ho and He which is lower than 0.05 for the Brazilian population [13][14][15][16][17], revealing a high genetic variation in the studied populations.
The same type of study done on a small scale with other populations and regions worldwide shows that populations with a history of miscegenation, such as Brazil, Africa, and northern Europe, present substantial genetic diversity and low discrepancies between He and Ho. Populations with low miscegenation present low heterozygosity [23,24]. A study done in Guangxi Zhuang, China, revealed a high discrepancy between these parameters (0.4975). This was attributed to the lack of geographic spreading and breeding among the native populations [23]. As highlighted in the present study, Bahia's population was revealed to be a genetically varied population with a history of high miscegenation.
The values for PIC obtained in this study were higher than 0.6 for every locus analyzed, ranging from 0.6905 (TPOX) to 0.93 (SE33), which means that each of the 25 loci analyzed were highly polymorphic, and would contribute to the genetic variation of the analyzed population. Previous studies in other populations realized with PIC have shown similar results within some of the analyzed loci: TPOX, CSF1PO, PENTA E, FGA, and TH01 [11][12][13]. Due to the number of loci analyzed in the present study, further comparisons were not possible with the published studies. This reveals the importance and necessity of completing more studies on the loci in the miscegenated population analysis, especially in Brazil.
Values for probability of identity (PI) within the studied loci ranged from 0.008 (SE33) to 0.113 (TH01). TH01, DS2441, TPOX, and CSF1PO presented the highest PI values in our analysis, ranging from 0.1046 (CSF1PO) to 0.113 (TH01). Previously available studies on TH01, TPOX, and CSF1PO in northeastern Brazil have shown similarities in values for PI in these same loci [10][11][12]. The PI range must fall between 0 and 1 [9], and thus, given the values obtained, every locus analyzed in this study presented a considerable difference when comparing genotypes in the studied population, confirming the high genetic variability of Bahia's population.
For the analyzed group, the PE ranged from 0.771 (TH01) to 0.989 (SE33) for the loci studied. The PE values were higher than 0.7 for every locus, confirming a great genetic variability in the genetic profile of Bahia's population. Previous studies undertaken in northeastern Brazil with PE analysis demonstrate only three loci in common with the present study (TH01, TPOX, and CSF1PO) [10][11][12]. The PE values obtained from these loci are evidence that Bahia's population maintains a higher genetic variation than other populations in the region. The PE averages for these same loci in the other studies were 0.5713 (TH01), 0.4223 (TPOX), and 0.4876 (CSF1PO).

Conclusions
When compared to other populations, an analysis of the data confirms that Bahia's population is genetically diverse. Furthermore, the present study provides statistical forensic data that may help guide future research on STR markers.
When comparing the allele frequency of the 25 STR markers in Bahia's population with other studies done in different populations in northeastern Brazil, no significant differences were found, revealing great similarities between populations. When forensically analyzed, Bahia's population reveals great genetic variety, a sign of significant miscegenation in its genetic formation. The allelic frequencies of the studied population revealed contributions from Native Americans, Africans, and the Portuguese. The results also show that studies in specific populations are more likely to produce reliable results. To better understand the genetic diversity of this particular Brazilian population, more studies on the populations that contributed to its formation are needed.
Finally, the present study provided forensic and statistical data that may help guide future research on STR markers and forensic studies in miscegenated populations, such as that of Brazil.

Data Availability Statement:
The data used to support this study are included in this paper and as Supplementary Materials.