Genetic ancestry and ethnic identity in Ecuador

Summary We investigated the ancestral origins of four Ecuadorian ethnic groups—Afro-Ecuadorian, Mestizo, Montubio, and the Indigenous Tsáchila—in an effort to gain insight on the relationship between ancestry, culture, and the formation of ethnic identities in Latin America. The observed patterns of genetic ancestry are largely concordant with ethnic identities and historical records of conquest and colonization in Ecuador. Nevertheless, a number of exceptional findings highlight the complex relationship between genetic ancestry and ethnicity in Ecuador. Afro-Ecuadorians show far less African ancestry, and the highest levels of Native American ancestry, seen for any Afro-descendant population in the Americas. Mestizos in Ecuador show high levels of Native American ancestry, with substantially less European ancestry, despite the relatively low Indigenous population in the country. The recently recognized Montubio ethnic group is highly admixed, with substantial contributions from all three continental ancestries. The Tsáchila show two distinct ancestry subgroups, with most individuals showing almost exclusively Native American ancestry and a smaller group showing a Mestizo characteristic pattern. Considered together with historical data and sociological studies, our results indicate the extent to which ancestry and culture interact, often in unexpected ways, to shape ethnic identity in Ecuador.


Introduction
The South American country of Ecuador is home to a multiethnic society that emerged from contact among the numerous Indigenous communities that inhabited the region for millennia, European colonizers (Spanish conquistadors and mercantile immigrants), and enslaved Africans brought to the New World by force. 1 Similar to other Latin American nation-states, interactions among these groups over the last five centuries have led to the consolidation of ethnic identities that were initially grounded in varying degrees of admixture. Examining the genetic ancestry of different groups in Ecuador can give us insight into how ethnic identities, both old and new, are shaped and have possibly changed from the original castas definitions instituted during colonial times. 2,3 Here, we investigate the ancestral origins of four officially recognized Ecuadorian ethnic groups: Afro-Ecuadorian, the majority Mestizo ethnic group, the newly recognized Montubio ethnic group, and the Indigenous Tsáchila. This is the first genetic study of the Montubio and Tsáchila populations.
The Ecuadorian census defines five major ethnic groups: Afro-Ecuadorian, Indigenous, Blanco, Mestizo, and Montubio. 4,5 Mestizos-historically defined as the descendants of Europeans and Indigenous Americans-constitute the largest part (71.9%) of the Ecuadorian population according to the current census (2010). Individuals who self-identify as Mestizo live primarily in urban areas and are native speakers of Spanish. Along with Mestizos, Ecuador has four-teen distinct Indigenous groups, inhabiting different parts of the country. Ecuadorians who identify as Indigenous make up 7.0% of the population. In this report, we focused on the relatively under-studied Tsáchila Indigenous group, who are the native inhabitants of the Santo Domingo province of Ecuador and are speakers of the Tsáfiqui language. Montubios were officially recognized as a distinct ethnocultural group by the government of Ecuador in 2001 and represent 7.4% of the total population. Recognition of the Montubio as a distinct ethnic group followed years of struggle, including a protracted hunger strike that drew widespread attention to their cause. 6 Montubios are thought to descend from Indigenous groups who traditionally inhabited the coastal regions of Ecuador and later admixed with Spanish settlers and enslaved or freed Africans, starting in the colonial era. [7][8][9] The extent of ancestral contributions from each of these three groups to the modern Montubios is an open question. Afro-Ecuadorians, who live primarily in the provinces of Esmeraldas and Guayas, make up 7.2% of the Ecuadorian population. The presence of Afro-Ecuadorians in Esmeraldas dates to 1553, when a group of 23 Africans escaped from a stranded slave ship and mixed with the local Indigenous groups to establish an autonomous community. From these small beginnings, the region continued to receive an influx of escaped slaves, from both Ecuador and Colombia, giving rise to a large and independent Afro-descendant population. 10,11 Nation building in colonial-era Latin America was explicitly aware of, and informed by, notions of race and ancestry. An emphasis was placed on the delineation of new racial (ethnic) groups formed by various combinations of the three continental ancestry groups that came together in the New World: African, European, and Native American. This movement reached its apogee with the Spanish Sistema de Castas (Caste System). 2,3 Under this racialized classification scheme, numerous groups were defined by specific combinations of admixture, often in a very granular way across multiple generations. The Sistema de Castas was inherently hierarchical, with European (Spanish) ancestry at the top and Native American or African ancestry at the bottom. High levels of Spanish ancestry were almost always associated with higher social status. The related concept of mestizaje refers to the underlying process of racial and cultural mixing, also with an implicit preference for Spanish ancestry and culture. [12][13][14][15] In Ecuador, the process of mestizaje played out in a very particular way, whereby the initial ancestry-based definition of Mestizo slowly gave way to a more culture-based definition that was tied to language, education, and social status. Given the conflation of ancestry, culture, and social status with ethnicity in Ecuador, the ancestral origins and makeup of different ethnic groups have been obscured. Genetic studies, considered in the context of this unique history, can shed light on the extent to which ancestry does or does not contribute to Ecuadorians' ethnic identities. Indeed, there is a growing interest among Ecuadorians to better understand the ancestral composition of these socially constructed ethnic groups, particularly with respect to historically marginalized Indigenous and African identities. [16][17][18] Previous studies have characterized the genetic origins of the different ethnic groups in Ecuador, 16,17,19 but they have only described a small part of the vivid landscape of genetic diversity in Ecuador. These efforts were limited either by (1) the populations sampled, which often included only Mestizos or did not specify the Ecuadorian sub-population; or (2) their use of limited numbers of Ancestry Informative Markers (AIMs), which do not allow for deep characterization of genetic ancestry. In this study, we use genome-wide variant data from four distinct Ecuadorian populations, which allowed us to infer fine-scale population structure and the ancestral origins of ethnic groups that collectively represent most of the Ecuadorian population. In addition to the widely studied majority Mestizo population, we characterized the genetic ancestry for previously understudied and historically neglected Afro-Ecuadorian, Montubio, and Tsáchila minority groups.

Population terminology
The terminology used to describe the populations studied here is intended to distinguish ethnicity from genetic ancestry. For ethnicity, we use the names of the officially recognized Ecuadorian ethnic groups following the 2010 census: Afro-Ecuadorian (translated from the Spanish Afroecautoriano), Indigenous (translated from the Spanish Indigena), Blanco, Mestizo, and Montubio. 4,5 We use the broad term Afro-descendant (translated from the Spanish afrodescendiente), which is widely used in Latin America to refer to the descendants of African people who arrived in the Americas via the trafficking of enslaved persons during the colonial era. 20 While many individuals in Latin America have some degree of African ancestry, Afro-descendants identify as having direct ancestral and cultural connections to Africa. We use the officially recognized name Tsáchila for the Indigenous ethnic subgroup studied here. We use the broad term Indigenous to describe the original, native inhabitants of the Americas, including but not limited to Ecuador. For genetic ancestry, we use continental ancestry group labels-African, European, and Native American-following conventions of the scientific literature on genetic ancestry. [21][22][23][24] The presence of Native American ancestry in any individual does not necessarily imply tribal affiliation or identity with a specific Indigenous group.

Donor sample collection
DNA samples were characterized for 300 sample donors from four Ecuadorian ethnic groups across seven locations: Afro-Ecuadorian (91), Mestizo (35), Montubio (82), and Tsáchila (92). Afro-Ecuadorian samples were taken in the city of Esmeraldas in Esmeraldas province and in the Chota Valley, located between provinces of Imbabura and Carchi. Montubio samples were taken in the cities of Portoviejo, Jipijapa, and Chone in the Manabí province. Mestizo samples were taken in Quito, Pichincha. Tsáchila samples were taken in Santo Domingo de los Colorados of the province of Santo Domingo de los Tsáchilas. Blood samples were obtained from donors by the finger-prick method and collected on FTA cards (GE Healthcare Life Sciences). Donors provided demographic information for themselves, their parents, and their grandparents: surnames, place of birth, place of residence, ethnic self-identity, and spoken language. The same technician was responsible for collecting all donor blood samples and demographic information. Donor blood samples and demographic data were collected in accordance with the ethical standards of the 1964 Declaration of Helsinki and its later amendments. All samples were provided voluntarily, deidentified, and securely archived. All donors approved and signed the Informed Consent Form (Data S1), and donor sampling was approved by Translational Medicine Unit of Faculty of Medical Sciences at Central University of Ecuador, in Quito, Ecuador. Genome analysis of the samples was also approved by the Institutional Review Board of the Georgia Institute of Technology.

Genome characterization and analysis
DNA was extracted from FTA card blood spots using QIAGEN's DNeasy Blood & Tissue Kits. Genome-wide genotyping was performed for 22 Mestizo samples using the Multi-Ethnic Global Array 25 (MEGA) to characterize $1.7 million variants. The remaining 278 samples were characterized using the Illumina Global Screening Array (GSA) to characterize $690,000 variants. Genome-wide genotype data from Ecuadorian populations were merged with different reference panels for characterizing (1) global and local continental ancestry, and (2) African, European, and Native American subcontinental ancestry, yielding 3 different genome-wide variant datasets for subsequent analysis. The workflow for genotype data harmonization and analysis is illustrated in Figure S1, and the reference populations used in this study are listed in Table S1. Genome-wide genotype data from Ecuador are available upon request from the corresponding authors.
For characterizing global and local continental ancestry, the Ecuadorian genome-wide genotype data were merged and harmonized with whole genome sequence data from global reference populations representing four continental population groups-African, East Asian, European, and Native American-characterized as part of the 1000 Genomes Project, 26 using PLINK v.1.90 27 and bespoke scripts. The genetic variant data from the Ecuadorian samples and the reference samples were merged to include variants that were present in both datasets with a missingness and minor allele frequency filters of 5% and 1%, respectively. Variant strand flips and identifier inconsistencies were corrected as needed. The merged and harmonized dataset contained 371,355 genome-wide variants. The PLINK implementation of the KING algorithm was used to test for kinship among individuals from the harmonized dataset and to exclude one member of each pair of samples with a kinship coefficient > 0.25. 28 Next, the merged dataset of unrelated individuals was pruned for linkage disequilibrium (LD) using the ''-indep'' command in PLINK 1.9 with a window size of 50 kb, a step size of 5 variants, and a variant inflation factor (VIF) threshold of 2. The program ShapeIT version 2.r837 29 was used to phase the merged and harmonized variant dataset. Phasing was performed on all individuals at the same time and without reference haplotypes. Each chromosome was phased separately, and the X chromosome was phased using the ''-X'' flag. Together, these steps yield a final variant dataset for continental ancestry inference covering 220,009 genomic sites for 275 Ecuadorian samples and 1,728 reference population samples.
The final continental ancestry variant dataset was further harmonized with two additional reference panels for characterizing subcontinental ancestry. We separately merged the final variant dataset with (1) 1,235 African genomes across 37 additional African reference populations, 30 and (2) 251 Native American genomes across 23 Native American reference populations 31 to yield an African-harmonized dataset and a Native Americanharmonized dataset (Table S1). The African-harmonized variant dataset covers 157,746 genomic sites, and the Native Americanharmonized variant dataset covers 56,937 genomic sites. For European subcontinental affinity characterization, the final variant dataset was used directly, since the 1000 Genomes Project contains five European reference populations, covering all of the main source regions for immigration to the Americas. Distinct African and Native American reference panels, and the resulting merged datasets, were created owing to the fact that African and Native American reference panels were characterized on different genotyping technologies, often leading to a small overlap of genetic variants after harmonization.

Ancestry and admixture analysis
Principal component analysis (PCA) of the final continental ancestry dataset was performed using PLINK using the ''-pca'' option, and the first two PCs for all samples were plotted using the ggplot2 package 32 in R v.3.5.1. 33 ADMIXTURE v.1.30 34 was used to characterize samples' genome-wide ancestry fractions for four continental ancestral population components-African, East Asian, European, and Native American-using 1000 Genomes Project reference population samples (Table S1). ADMIXTURE was run in the unsupervised mode with default settings and K ¼ 4. Admixture entropy values are measured as the Shannon's entropy ðSÞ of the four ancestry component fractions: where p i is the population fraction for ancestry component i.
A modified version of RFMix 22,35 was used to characterize local ancestry patterns for the three main continental ancestral popula-tion components observed in the Ecuador samples-African, European, and Native American-on the final dataset. RFMix was run using African and European reference population samples from the 1000 Genomes Project, and samples from Peru were used as a surrogate for Native American ancestry. RFMix was run for 12 generations in the ''PopPhased'' mode with a minimum node size of five, and the ''-use-reference-panels-in-EM'' for two rounds of expectation maximization (EM), to assign continental ancestry for haplotypes genome-wide. Haplotype ancestry assignments were made for regions where the RFMix ancestral certainty was at least 95%.
The RFMix ancestry assignments were used to generate masked genomes, each of which only contain haplotypes from one of the three main continental ancestry groups, for subsequent subcontinental ancestry inference using either the program Chromopainter version 2 36 (African and European) or ADMIXTURE (Native American). Chromopainter was run on each of the African and European ancestry-specific masked genomes separately for each individual, comparing to either the corresponding harmonized variant dataset (African) or the final variant dataset (European). Non-negative least-squares (NNLS) was used to convert Chromopainter output painting vectors to percent ancestry estimates using the R package nnls version 1.4, 37 as we described previously. 22 Subcontinental ancestry inference validation was performed by generating simulated admixed genomes via Monte Carlo simulation of reference population haplotypes. For the simulation of admixed genomes, reference populations were divided into training and validation samples. Training samples were used to simulate admixed genomes, and validation samples were used for subcontinental ancestry inference on the simulated admixed genomes. Simulated admixed genomes were generated using genome-wide haplotype boundaries, defined using interpolated genetic map positions based on the 1000 Genomes Project, and haplotypes were randomly selected from reference population training samples across 20%-80% admixture proportions in 5% increments, generating 20 simulated genomes for each 5% increment. For sets of simulated admixed genomes, the simulated (expected) ancestry proportions were compared to the observed ancestry proportions inferred via the NNLS approach, using Pearson correlation, to estimate the accuracy of the NNLS method for subcontinental ancestry inference. Simulated admixed genomes were generated and validated in this way for African subcontinental ancestry inference (West Africa, West Central Africa, Southwest Africa, and East Africa) and European subcontinental ancestry inference (North/Central Europe and South Europe).
Subcontinental ancestry for Native American populations was discerned using Native American masked genomes with the Native American-harmonized dataset using ADMIXTURE run in unsupervised mode, with values of K ranging from 2-12. Cross-validation error values were calculated for each value of K and used to select the optimal value of K ¼ 10 for subsequent analysis. Native American subcontinental ancestry was characterized using ADMIX-TURE, because haplotypes in Native American populations are quite distinct, owing to very high levels of population structure among these populations, 38 and the Chromopainter approach is not suited for these data. Phylogenetic analysis of Native American reference populations, and the Native American ancestry component of the Ecuadorian populations, was performed by calculating pairwise population F ST values with smartpca from the EIGENSOFT package version 7.2.1. 39 Pairwise F ST values were used to make a neighbor-joining tree 40 with the program MEGA6, 41

Admixture timing analysis
The TRACTS program 43,44 was used to infer the timing of admixture events in the admixed populations from the ancestry-specific haplotypes (i.e., ancestry tracts) defined by RFMix. For the Afro-Ecuadorian and Montubio three-way admixed populations, three possible orderings of admixture were evaluated with TRACTS: (1) European, Native American, and African; (2) European, African, and Native American; and (3) African, Native American, and European. For the Mestizo two-way admixed population, two possible orderings of admixture were evaluated with TRACTS: (1) European and Native American, and (2) Native American and European. For each ordering, TRACTS evaluated possible admixture timing from 14 to six generations ago, in 1,000 bootstrap attempts. From the bootstrap attempts, the most likely series of admixture events was chosen to represent the population.

Sex-biased admixture inference
Sex bias in admixture for the different Ecuadorian population groups was inferred by comparing the ancestral composition of the X chromosome to the autosomes as previously described. 21,45 For each Ecuadorian sample, the normalized difference between each ancestral component for the X chromosome versus the autosomes (DAdmix) is defined as: F anc;X À F anc;auto F anc;X þ F anc;auto ; where F anc;total , F anc;X , and F anc;auto refer to genome-wide, X chromosome, and autosomal ancestry fractions, respectively.

Results
Continental genetic ancestry Individuals from four Ecuadorian ethnic groups were sampled from seven sites around the country: Afro-Ecuadorian from Esmereldas and the Chota Valley, Mestizo from Quito, Montubio from Manabí, and Tsáchila from Santo Domingo de los Colorados ( Figure 1A). Genome-wide genotypes for individual sample donors were characterized and compared to global reference populations from Africa, the Americas, Asia, and Europe. PCA was used to visualize the genetic relatedness among individuals from the Ecuadorian and reference populations (Figures 1B and 1C; Figure S2). African, European, and Native American continental ancestry groups are clearly separated as three poles of diversity on the two-dimensional PCA plot, and Ecuadorian populations are clustered between the three continental ancestry groups. Most Afro-Ecuadorians fall close to the African pole, as would be expected; however, a number of Afro-Ecuadorians cluster very closely with the Native American pole. Mestizo individuals fall along the European-Native American axis, with little apparent African admixture. It should be noted that Mestizos sampled from Ecuador fall much closer to the Native American than the European pole, pointing to a relatively high Native American contribution to their genetic ancestry. The Montubios seem to lie mostly along the European-Native American axis, with some of them extending toward the African pole, indicative of low levels of African admixture in this population. Finally, most of the Tsáchila individuals cluster tightly at the Native American pole, but some individuals extend toward the European pole, suggesting similar levels of admixture compared to what is observed for some Mestizos.
The program ADMIXTURE was used to quantify the levels of continental ancestry-African, East Asian, European, and Native American-for the four Ecuadorian populations compared to other admixed American populations ( Figure 2; Figure S3). On average, Afro-Ecuadorians have 49.5% African genetic ancestry, 35.9% Native American ancestry, 13.8% European ancestry, and 0.8% East Asian ancestry. Some of the individuals who identify as Afro-Ecuadorian have high fractions of African ancestry, while others show almost completely Native American ancestry. The Afro-Ecuadorian population shows the highest overall variance in ancestry components for the Ecuadorian groups. Mestizo individuals have mostly Native American ancestry (66.1%), with some European admixture (30.0%), and small amounts of African (2.4%) and East Asian genetic ancestry (1.5%). The Montubios show primarily Native American ancestry (51.4%), followed by substantial European ancestry (38.1%), along with lower levels of African ancestry (9.9%) and very little East Asian ancestry (0.7%). It should be noted that the Montubio population has 4.1 and 5.2 times as much African ancestry when compared to the Mestizo or Tsáchila populations, respectively. Tsáchila individuals in the data have high overall Native American ancestry (87.1%), followed by European (10.7%), African (1.9%), and East Asian ancestry (0.3%). The Tsáchila population includes two distinct groups of individuals, one group with almost entirely Native American ancestry and a second admixed group. The overall continental genetic ancestry fractions for the Ecuadorian populations, compared to six other admixed American populations, are summarized in Table 1.
Timing and sex bias for continental admixture Continental ancestry was inferred at the local level by assigning genome-wide haplotype origins corresponding to the three major ancestry components of the Ecuadorian populations-African, European, and Native Americanusing a modified version of the program RFMix. 35 East Asian ancestry was not considered for local ancestry inference owing to the very low levels observed for the Ecuadorian populations studied here. The patterns of local ancestry were then used to infer the timing of continental admixture for the Ecuadorian ethnic groups based on the size distributions of ancestryspecific haplotypes using the program TRACTS. This analysis relies on the fact that the sizes of ancestry-specific haplotypes in admixed genomes decay over time owing to recombination. Three-way admixture models were run with TRACTS for the Afro-Ecuadorian and Montubio populations, and a two-way admixture model was run for the Mestizo population. The Afro-Ecuadorian population shows evidence of initial rounds of admixture between Native American and European ancestry components 10 and 9 generations ago, followed by two more recent pulses of African admixture 6 and 5 generations ago ( Figure 3A). The Montubio population shows primarily Native American and European admixture 11 and 10 generations ago, with a much smaller round of African admixture occurring 9 and 8 generations ago ( Figure 3B). The Mestizo population shows two pulses of Native American and European admixture 10 and 9 generations ago ( Figure 3C).
The chromosomal distributions of ancestry-specific haplotypes can also be used to evaluate sex-biased patterns of continental admixture. Since X chromosomes spend twice as much time along the female lineage, compared to autosomes, excess continental ancestry on the X chromosome indicates female-biased admixture, whereas excess continental ancestry on the autosomes points to male-biased admixture. We previously developed the DAdmix parameter, which quantifies differences in ancestry between the X chromosome and the autosomes, to test for sex-biased ancestry in admixed American populations. 45 The four Ecuadorian ethnic groups studied here all show evidence of sex-biased admixture, each with its own characteristic pattern ( Figure 3D). However, all four of the populations show a similar pattern of male-biased European ancestry coupled with female-biased Native American ancestry. This pattern is most pronounced in the Mestizo and Montubio populations. The Tsáchila group shows a less-pronounced pattern of sex-biased ancestry, with the three median DAdmix values are all very close to zero. However, when the admixed Tsáchila individuals (<90% Native American ancestry) are analyzed separately, they do show a strong pattern of European-biased male ancestry and Native American-biased female ancestry ( Figure S4).

Subcontinental genetic ancestry
Ancestry-specific haplotypes were leveraged to perform fine-scale, subcontinental ancestry inference for the Ecuadorian samples, with separate analyses run for each of the three continental ancestry components. To do so, continental ancestry-specific genomes were generated by masking haplotypes that correspond to two of the three continental ancestry components. This process yielded three masked genomes per sample: an African-haplotype-only genome, a European-haplotype-only genome, and a Native American-haplotype-only genome. Each ancestry-specific genome was then compared against corresponding reference populations from Africa, Europe, and the Americas to explore the subcontinental ancestral origins for the Ecuadorian ethnic groups.

African origins
African subcontinental ancestry for the Ecuadorian ethnic groups, along with other admixed American populations, was characterized using a panel of 42 African reference populations, 37 of which were sampled from six of the seven main western African regions involved in the transatlantic slave trade. [46][47][48][49] We divided these colonial-era African regions into three broad regions based on geographic and genetic affinity of the reference populations: West Africa, West Central Africa, and Southwest Africa ( Figure S5). West Africa includes reference populations sampled from Gambia, Sierra Leone, and the Ivory Coast, corresponding to the colonial era slave trading regions of Senegambia, Sierra Leone, and the Windward Coast. West Central Africa includes reference populations sampled from Benin and Nigeria, corresponding to the Bight of Benin. Southwest Africa includes reference populations sampled from Cameroon, Gabon, and Angola, corresponding to the Bight of Biafra and the Loango Coast. We also included East African and Rainforest Hunter Gather (RFHG) African populations for comparison.
The African reference populations from these regions show distinct patterns of ancestry, with coherent patterns of ancestry seen for West African and West Central African populations and diverse ancestry seen for the Bantuspeaking populations of Southwest Africa ( Figure 4A; Figure S5). The observed genetic population structure closely mirrors the geographic distribution of the African reference populations, with the cosmopolitan Yaounde population showing admixture between the West Central African and nearby Southwest African groups. The East African and Rainforest Hunter Gatherer populations show distinct patterns of genetic ancestry. The Banbongo population from Gabon shows a mix of Bantu and Rainforest Hunter Gatherer ancestry.
The Ecuadorian groups show varied patterns of African ancestry, with affinities to different admixed American populations ( Figure 4A; Table 2). The Afro-Ecuadorians show primarily West Central African ancestry, with similarity to populations from modern-day Benin and Nigeria, followed by West African ancestry, with similarity to populations from Gambia, the Ivory Coast, and Sierra Leone. This pattern of African ancestry is closest to the patterns seen for the African American reference populations and populations from English-speaking countries in the Caribbean. 49 The Montubio and Tsáchila show primarily West African ancestry followed by a West Central African component, which is most similar to populations from Mexico, Central America, Colombia, and Venezuela along with Spanish-speaking countries in the Caribbean. The Mestizo population shows the most distinct pattern of African ancestry, with more Southwestern and Bantu ancestry, but this may be an artifact of the small amount of African ancestry seen for this population; there were only 8 Mestizos with enough African ancestry (>3%) to allow for subcontinental ancestry analysis. We validated our approach to African subcontinental ancestry inference using simulation of admixed genomes containing different combinations of ancestry for the four main African regions analyzed here. Simulated genomes were generated for a range of ancestry fractions (20%-80%) for each of the four African regions, and our subcontinental ancestry inference approach was applied to the simulated admixed genomes. Ancestry inferences for all four of the main African regions show high levels of accuracy when region-specific simulated (expected) ancestry values are compared to observed values generated via the subcontinental ancestry inference approach used here (R 2 ¼ 0.99; Figure S6).

European origins
All four Ecuadorian populations show a pattern of European ancestry that is mostly consistent with Spanish ancestry, similar to what is seen for other modern Latin American populations from Colombia, Mexico, Peru, and Puerto Rico ( Figure 4C; Table 2). While this is very much unsurprising, it does serve as a positive control for our approach to subcontinental ancestry analysis. The Afro-Ecuadorian population has the highest level of Northern and Central European ancestry, albeit as a minor fraction, which may reflect immigration of laborers from Jamaica starting in the late 19 th century. 50 Admixture simulations were used to validate the ability of our subcontinental ancestry inference method to distinguish North and Central European ancestry from South European ancestry. Simulated (expected) and observed European subcontinental ancestry fractions show high correspondence in support our approach (R 2 ¼ 0.96).

Native American origins
Given the complex demographic history of Indigenous populations in the Americas, and the high levels of population structure seen for Native American ancestry reference populations, we were not able to directly quantify Native American admixture proportions for the Ecuadorian populations in the same way that was done for their African and European ancestry components. Native American ancestry in the Ecuadorian populations was analyzed via ADMIXTURE, to get a qualitative view of their ancestry composition, and with phylogenetic analysis to infer the most closely related reference populations. The Native American origins of the Ecuadorian ethnic groups were characterized using a panel of 23 reference populations from Mesoamerican, Central American, Colombian, Amazonian, and Andean tribes ( Figure S7), in comparison with modern admixed populations from Colombia, Peru, and Puerto Rico. ADMIXTURE showed an optimal number of K ¼ 10 ancestry components (Figure S8), and the four Ecuadorian groups form a single, closely related cluster, to the exclusion of all the other admixed American populations and all of the Native American reference populations ( Figure 5A). One of the Native American ancestry components (dark red) for the Ecuadorian groups is most pronounced in the Tsáchila and shared mostly, albeit to a small extent, with Indigenous and modern Colombian populations. The minor Native American components A B for the Ecuador populations (green, yellow, and purple) are also most closely related to nearby Colombian populations and appear to correspond to Andean Indigenous ancestry. Interestingly, the primary Native American ancestry component of the Tsáchila (light red) is substantially less abundant in other Ecuadorian groups and largely absent from the Native American reference populations used here. Thus, similar to what was seen at the continental level, the Tsáchila show two distinct Native American ancestry components, with the less-abundant pattern far more similar to what is seen for the other Ecuadorian groups. The primary Native American component for the Tsáchila may represents an Indigenous source population for Ecuador, for which we do not currently have a reference population, or it could reflect high levels of genetic drift and resulting structure for this population. Phylogenetic analysis and the outgroup f 3 statistic confirm that the Native American component of the Ecuadorian groups is most closely related to Andean Indigenous populations followed by Colombian Indigenous populations ( Figure 5B; Figure S9).

Discussion
A major aim of this study was to consider Ecuadorian ethnic identity in the context of the genetic ancestry and origins of the people that make up the country's officially recognized ethnic groups. An emphasis was placed on previously understudied and historically marginalized groups, including Afro-Ecuadorian, Montubio, and Tsáchila populations. The genetic ancestry of the majority Mestizo ethnic group was considered in light of historical knowledge on Ecuadorian population dynamics and the cultural forces related to mestizaje and assimilation.
High Native American ancestry in Afro-Ecuadorians Contrary to our expectations, Afro-Ecuadorians did not show a large majority of African ancestry, although it was the single largest ancestry component on average (49.5%; Table 1), and they did show a high overall level of Native American ancestry (35.9%; Table 1). Afro-Ecuadorians show substantially less African ancestry than seen for the African American and African Caribbean reference populations characterized here, and they have the highest level of admixture seen for any of the four Ecuadorian ethnic groups (admixture entropy ¼ 1.48; Table 1). Even more strikingly, this population has a number of individuals who show very high levels of Native American ancestry with little or no African ancestry (Figures 1 and 2). To our knowledge, this has not been observed for any other Afro-descendant population in the Americas. [21][22][23][24][51][52][53][54] The high levels of Native American ancestry seen for Afro-Ecuadorians may reflect the historical legacy of the autonomous Afro-descendant communities established in Esmeraldas starting in the 16th century, which included both escaped slaves and members of local Indigenous communities. 10,11 This finding underscores the extent to which ethnic identity in Ecuador can serve as a marker of shared culture rather than common ancestry.

Social construction of Mestizos in Ecuador
The Mestizo ethnic group shows an average of 66.1% Native American ancestry, which is higher than any other admixed American population studied here except for Peru. As with the Afro-Ecuadorians, there are a number of individuals who identify as Mestizo but have almost entirely Native American ancestry (Figures 1 and 2). After Bolivia (62%), Peru (24%) has the second-largest Indigenous population in South America, whereas the Indigenous Amerindian group in Ecuador makes up only 7% of the population. Thus, one may not expect to see such high levels of Native American ancestry in Ecuador's majority Mestizo population. The pattern we observe could be explained by historical records and sociological studies of Ecuador that indicate a relatively low founding immigrant population from Spain coupled with cultural forces that led many Indigenous people to adopt a Mestizo identity. 8,18,55 In much of Ecuador, Mestizo came to imply someone who was fluent in Spanish and who lived in and around urban centers, irrespective of their ancestry. Indigenous people who migrated to cities and learned Spanish would either be ascribed, or adopt, a Mestizo identity, and in so doing gain access to a broader, shared national identity. This cultural assimilation had the effect of marginalizing Indigenous identity and communities, while also leaving out Afro-Ecuadorians whose markers of ancestry were harder to ignore. Our genetic ancestry results are consistent with the social construction of a Mestizo identity in Ecuador that is distinct from ancestrally grounded Mestizo identities in other Latin American countries.

Montubio ancestry and admixture
The complex relationship between ethnic identity and ancestry in Ecuador is exemplified by the newly recognized Montubio ethnic group. Montubios live primarily in the coastal region of Ecuador-in the provinces of Manabí, Guayas, Los Rios, and El Oro-and their cultural identity is tied to a distinctly rural and agrarian lifestyle. Given their recent origins and recognition, this group stands out as a counterexample to the narrative that ethnic identities in Latin America were formed in the early colonial period, via the process of mestizaje, and have remained largely unchanged over centuries. Some scholars have claimed that Montubios are essentially another Mestizo group, with respect to mixed Spanish and Native American ancestry, albeit with a distinct cultural heritage. 7 However, other scholars have pointed to African contributions to Montubio ancestry and culture, consistent with their rural location in the province of Manabí, which also has a large Afro-Ecuadorian population. 56 Our results highlight the admixed nature of the Montubio population. The main ancestry component is Native American, followed closely by European ancestry, with a smaller but not insubstantial African component ( Figures  1 and 2). Montubios show the second-highest overall level of admixture seen for any of the admixed American populations studied here (admixture entropy ¼ 1.40; Table 1). The high levels of Native American ancestry in the Montubio are similar to what is seen for all other Ecuadorian ethnic groups and consistent with the relatively low number of European immigrants who contributed to the  (Figures 1 and 2). This same grouping can be seen for the Native American subcontinental ancestry of the Tsáchila ( Figure 5A). There is a primary Native American ancestry component that is entirely unique to the Tsáchila (light red) and a secondary component that much more closely resembles the Native American ancestry of the Ecuadorian Mestizos (dark red). The Tsáchila individuals studied here were sampled from the city of Santo Domingo de los Colorados located in the province of Santo Domingo de los Tsáchilas, which, as the name suggests, is the historical homeland of the group. Even the name of the city ''de los Colorados'' (of the dyed) refers to the Tsáchila ethnic group and their custom of covering themselves in the red juices of achiote seeds to prevent smallpox infection. Thus, it may be the case that residents of Santo Domingo de los Colorados with Spanish and Native American ancestry, who may be expected to ethnically identify as Mestizo, identify as Tsáchila. This would be an interesting example of individuals from a relatively high-status majority group choosing to identify with a historically oppressed Indigenous group. On the other hand, Tsáchila ethnic identity may provide social advantages for Mestizo individuals who reside in their Indigenous homeland. The distinction between ancestry and ethnic identity among the Tsáchila Santo Domingo de los Colorados also suggests the possibility that ethnic identity in Ecuador is strongly influenced by local geographic origins and culture.

Conclusions
The results reported here show how genetic ancestry covaries with ethnicity in Ecuador: Mestizos and Montubios are primarily admixed with Spanish and Native American ancestry, whereas the Tsáchila and the Afro-Ecuadorians show the highest levels of Native American and African ancestry, respectively. All four ethnic groups show evidence of sex-biased admixture with greater levels of male European and female Native American ancestry, consistent with the historical record. Nonetheless, we find the exceptions to these general trends to be the most interesting and revealing findings. We observed several unexpected patterns of genetic ancestry for different ethnic groups, which underscore the extent to which ethnic identity in Ecuador is shaped by both culture and ancestry. Mestizos show surprisingly high levels of Native American ancestry, when considered together with the size of the Indigenous population in the country, pointing to the role of language and cultural assimilation in the formation of this ethnic group. The Afro-Ecuadorians show the lowest levels of African ancestry, and the highest levels of Native American ancestry, seen for any Afro-descendant population in the Americas. This population includes a number of individuals with almost entirely Native American ancestry, pointing to the possibility of a distinctly African cultural identity for the region, shaped by its unique history.

Data and code availability
The Ecuador genotype data supporting the current study have not been deposited in a public repository because the data are not public. They data are available from the corresponding author on request.