Estimating Asian Contribution to the Brazilian Population: A New Application of a Validated Set of 61 Ancestry Informative Markers

Estimates of different ancestral proportions in admixed populations are very important in population genetics studies, especially for the detection of population substructure effects in studies of case-control associations. Brazil is one of the most heterogeneous countries in the world, both from a socio-cultural and a genetic point of view. In this work, we investigated a previously developed set of 61 ancestry informative markers (AIM), aiming to estimate the proportions of four different ancestral groups (African, European, Native American and Asian) in Brazilian populations. To the best of our knowledge, this is the first study to use a set of AIM to investigate the genetic contribution of all four main parental populations to the Brazilian population, including Asian contribution. All selected markers were genotyped through multiplex PCR and capillary electrophoresis. The set was able to successfully differentiate the four ancestral populations (represented by 939 individuals) and identify their genetic contributions to the Brazilian population. In addition, it was used to estimate individual interethnic admixture of 1050 individuals from the Southeast region of Brazil and it showed that these individuals present a higher European ancestry contribution, followed by African, Asian and Native American ancestry contributions. Therefore, the 61 AIM set has proved to be a valuable tool to estimate individual and global ancestry proportions in populations mainly formed by these four groups. Our findings highlight the importance of using sets of AIM to evaluate population substructure in studies carried in admixed populations, in order to avoid misinterpretation of results.

the city of Santos (located in São Paulo state, in the Southeastern Brazil), brought by the Kasato-Maru ship (Takenaka 2003). Currently, Brazil is home of approximately 1.5 million Japanese, becoming the largest Japanese community outside Japan (Suarez-Kurtz 2010). It shows the importance of this parental group to the formation of the Brazilian population. However, most studies still only consider European, Native Americans and African as the main contributors to the admixed Brazilian population and this could lead to misinterpretation of results. Therefore, when carrying association studies in the Brazilian population, it is important to consider that: i) the current Brazilian population was mainly formed by the admixture of four continental populations (Native Americans, Europeans, Africans and Asians); and ii) association studies carried in admixed populations should be carefully analyzed to avoid misinterpretation due to population substructure, which can be done through ancestry informative markers (AIM). These markers, also known as population-specific markers, are genetic markers with great variation of allelic frequency between different continental populations (i.e., a certain allele can range from exclusive absence to exclusive presence in different populations) (Parra et al. 2003).
Estimates of genetic ancestry proportions in admixed populations are not only fundamental to control the effect of population substructure in association studies, but they can also be useful in other types of investigations, being considered more accurate than physical traits (Ramos et al. 2016). AIM sets have been developed and used to answer questions related to epidemiology, forensic anthropology, pharmacogenetics and population genetics (Parra et al. 1998).
The aim of this study was to evaluate how efficient a 61 INDELtype AIM set would be to estimate the Asian contribution (represented by Japanese) in a sample of individuals from the São Paulo population.

Investigated Samples
This study included 939 samples to represent the parental groups that contributed to the formation of the Brazilian population and 1050 individuals from the admixed population of São Paulo state, Brazil. The samples considered as parental groups included: 200 Sub-Saharan African individuals, from Angola, Mozambique, Zaire, Cameroon and the Ivory Coast; 290 European individuals, mainly from Portugal and Spain; 246 Native American individuals from tribes in the Brazilian Amazon (Tiriyó, Waiãpi, Zoé, Urubu-Kaapor, Awa-Guajá, Parakanã, Wai Wai, Gavião, Zoró); and 203 Japanese individuals residing in the North region of Brazil that were either: i) immigrants born in Japan; or ii) Brazilian individuals with Japanese parents or grandparents. With the exception of the Japanese group, all samples were collected in the origin country and have been previously described by (Santos et al. 2010;Ramos et al. 2016).
Furthermore, to validate the AIM set usage for estimating Asian ancestry, we employed it in the analyses of 1050 individuals from São Paulo state, in Southeastern Brazil. This population was formed by the admixture of European (higher contribution), African and Native American populations, as well as, more recently, by a significant amount of Japanese individuals. Therefore, the São Paulo population is suitable to be analyzed in this study.
All participants have authorized the collection of their biological samples by signing a consent form and the ethical aspects of this study have been approved by the Ethics Committee (Santos et al. 2010).

Multiplex PCR and Fragment Analysis
Genotyping of the 61-AIM set was performed by Multiplex PCR (two runs), followed by capillary electrophoresis with fragment analysis, as For the capillary electrophoresis and fragment analysis, we used the following protocol for each sample: 1.0 mL of the PCR product to each 8.5 mL of deionized formamide HI-FI (Life Technologies) and 0.5 mL of GeneScan 500 LIZ pattern size (Life Technologies). Separation of DNA fragments was performed using ABI PRISM 3130 Genetic Analyzer and GeneMapper ID v3.2 software (Life Technologies) was used for peak reading.

Statistical Analyses
We used Arlequin v.3.5 software (Excoffier and Lischer 2010) to verify Fixation index (FST) and to analyze Hardy-Weinberg Equilibrium (HWE) and allelic frequency of the markers in the studied samples. Analyses and construction of graphs with the individual schematic representation of individual admixture estimates and Discriminant Analysis of Principal Components (DAPC) were performed in the R environment, using adegenet package (R core team 2018; Jombart and Ahmed 2011). For all of the other ancestry analyses, we used the Structure v2.3.4 software (Pritchard et al. 2000;Falush et al. 2003;Falush et al. 2007;Hubisz et al. 2009). P-value # 0.05 was considered statistically significant.

Data and reagent availability
The authors affirm that all data necessary for confirming the conclusions of the article are present within the article, figures, and tables. Supplemental material available at Figshare: https://doi.org/10.25387/g3.7040102.
n  Table 1 shows allelic frequencies data of all investigated markers in the four parental populations and in the admixed population of São Paulo. There was no deviation from HWE in the investigated populations regarding genotype distributions of the markers. These data were used to estimate delta values (d), which are positive values that correspond to the frequency differences between two populations. The delta values of the comparison of the four parental populations are presented in Table 2.

RESULTS
The mean d was higher in the comparison between NAM and AFR (0.40), and the lower mean d was observed between ASN and NAM (0.24), which suggests a greater proximity between the NAM and ASN, corroborating the results by (Ribeiro-dos- Santos et al. 2013). The mean d of the comparisons ASN/EUR and ASN/AFR do not differ much from the observed means between the other previously investigated continental populations: EUR and NAM (0.35), EUR and AFR (0.33) and NAM and AFR (0.40). These values seem to be within the expected and they corroborate the work of Santos et al.
Considering that the d results indicate that this AIM panel is suitable to estimate interethnic admixture with Asian in individuals from admixed populations, we performed additional analyses. We quantified the level of standard error (SE) using data obtained in analysis performed with Structure v.2.3.4. software. As described by Halder et al.
[18], SE represents the bias caused by the nature of allele frequency distributions, not the bias that could be generated by the process of sample selection, and it can be defined as population SE (total ancestry from the noncontributing populations to individuals) or ancestry SE (total contribution from one noncontributing population to other populations). Table 3 shows the results from these analyses. In the four parental populations, the estimate of interethnic admixture showed more than 99% similarity within the expected group, which establishes the mean population SE with less than 1%.
The measure of ancestral SE was also very low (less than 1%), which means that none of the parental populations could have contributed with more than 1% in the formation of the other three populations. It suggests that the panel of 61 AIM could be employed in the estimates of individual interethnic admixture involving the Asian ancestral population. Figure 1 presents the estimates of genetic contribution for each individual included in the study and it shows that the panel of 61 AIM successfully differentiates European, African, Native American and Asian populations.
Furthermore, we performed the DAPC analysis in the four parental populations, which generated four distinct clusters (Figure 2). It is noteworthy that the populations described as more genetically similar (Native Americans and Asians) are still clearly separated.
For the investigated markers, the FST values show a difference of 0.32 between European and Native American; 0.32 between European and  African; 0.26 between European and Asian; 0.43 between Native American and African; 0.23 between Native American and Asian; and 0.35 between African and Asian. Considering the obtained values, the FST results indicate a great degree of differentiation between Native American and Asian populations, and an even greater degree in the analysis between the other populations (all with P-value , 0.05). To further investigate this finding, we applied the 61-AIM set to estimate interethnic admixture in samples from 1050 individuals from the São Paulo population that have reported Asian ancestry. Then, we performed analyses considering three and four parental populations (EUR/AFR/ NAM; EUR/AFR/ASN and EUR/AFR/NAM/ASN) (Table 4). European contribution presented the higher contribution in the São Paulo sample, with similar percentage, regardless of the carried test: i) four ancestral populations (67.5%), ii) three ancestral populations without ASN (68.4%) and iii) three ancestral populations without NAM (67.8%). There is also a small variation in the AFR contribution between the three analyses: 16.1% when considering the four ancestral populations, 19.6% when considering three ancestral populations without ASN and 18.2% when considering the three ancestral populations without NAM. Schematic representation of the individual admixture estimates in Brazilian admixed populations (São Paulo) is presented in Figure 3.
In addition, we present the estimates of individual interethnic admixture in Figures S1, S2 and S3. Both European and African proportions showed a great similarity in these estimates, when considering three or four ancestral populations.
Here, we highlight that the NAM and ASN contributions show similar percentage when analyzed separately (12% and 14%, respectively), but when they are considered in the same analysis, they are notably different (6.6% and 9.8%, respectively), as shown in Table 4. These percentages suggest that the contribution of these populations, when analyzed together, is split between them.
When we analyzed the contribution from all four populations, we observed that 0.6% of the studied individuals from the Southeast region presented at least 30% of Asian contribution. From these individuals, 3% had more than 70% of Asian contribution.

DISCUSSION
In this study, we aimed to determine whether a 61-AIM set with reported efficiency in estimating individual interethnic admixture in three ancestral groups (European, African and Native American) (Santos et al. 2010;Ramos et al. 2016) would be able to infer Asian ancestry and measure the Asian component in Brazilian admixed populations. For this purpose, in addition to groups representing European, Native American and African ancestries, we included a population from the Northern Brazil with known Asian origin and a population from the Southeastern Brazil with admixture history of the 4 populations (EUR, AFR, ASN and NAM).
To validate the AIM panel, we employed an approach that has been previously used (Halder et al. 2008;Santos et al. 2010) and observed that this set provides reliable estimates of the admixture of the four ancestral populations that are the main contributors to the formation of the Brazilian population (Native American, African, European and Asian). This was further strengthened by the results of the DAPC analysis with the set of markers, in which all populations are visibly separated. Moreover, the obtained FST values are indicating large or very large genetic differentiation between the four populations, according to the guidelines proposed by Wright (Wright 1978 Corroborating previous studies (Wallace et al. 1985;Schurr et al. 1990;Horai et al. 1994) our results show a greater similarity between Native American and Asian populations, when compared to the other continental groups investigated here. For instance, the mean d between Asian and Native American populations (0.244) are lower than the correspondent measures between Asian and European populations (0.307) and between Asian and African populations (0.353).
Although there is a greater proximity between Native American and Asian groups, the AIM panel provides a robust and specific estimate of ASN individual ancestry in admixed populations, successfully separating ASN population from others. This was strengthened with the obtained results of the analyses in the samples from São Paulo, in which we were able to identify 0.6% of global Asian ancestry in this admixed population from the Southeast region of Brazil. This result differs from the registered data of 1.9% of Asian contribution in this region (IBGE 2008). However, this registered data were estimated based on selfdeclaration criteria, which is different from measures of genomic ancestry.
In conclusion, we demonstrated that this INDEL panel not only can be used to genetically distinguish different continental populations (specifically Europeans, Africans, Native Americans and Asians), but n  it can also identify substructure in admixed populations. When applied to such populations, this panel allows estimates of the individual and global interethnic admixture regarding the genetic contributions of these ancestry groups. Moreover, it was able to efficiently separate Asian and Native American populations, despite their proximity when compared to the other continental populations. Therefore, we showed in this study that the 61-AIM panel is a useful tool that could be valuable in studies in Brazilian populations, which is extremely important to avoid misinterpretations of the findings in association studies. To the best of our knowledge, this is the first study to apply an AIM panel to estimate the individual contribution of these four main parental populations (European, Native American, African and Asian) in an admixed population from Brazil.