The diverse genetic origins of a Classical period Greek army

Significance By studying genome-wide data from 54 individuals from eighth- to fifth-century Sicily, we gain insights into the composition of Classical Greek armies (ca. fifth c. BCE) and the populace of a Greek colony. The presence of mercenaries in Greek armies fighting in the Mediterranean, as early as 480 BCE and with origins as far away as northern Europe and the Caucasus, is absent from historical texts and thus so far underappreciated in ancient classical scholarship. Our interdisciplinary study both underlines the value of integrating genetic studies to complement archaeological and historical research and highlights the importance of warfare in facilitating continental-scale human mobility, cultural contact, and cooperation in the Mediterranean of the Classical period.

Greek tyrannies became increasingly prominent on Sicily, and a division emerged between Greek poleis opposed to, and allied with, Phoenicians; specifically, Carthage (17,18). In the early 5 th century, the tyrant of Himera, Terillus, opted for alliance with Carthage. When Terillus was ousted by Himerans in 483 BCE (Herodotus 7.165), Carthage agreed to aid him in regaining control, and in 480 BCE, dispatched a large force to retake the city. Faced with the threat of Carthaginian invasion in 480 BCE, Himerans successfully rallied outside forces, with specific mentions of soldiers coming from Syracuse and Agrigento (Herodotus 7.165-166; . This Greek alliance was successful in defending Himera in 480 BCE (Diod. 11.22; 10, 19).
An act of revenge, Carthage returned to Himera in 409 BCE with a large mercenary army (Diod. 13.59; (15, 20). Although Syracuse dispatched a relief force, a small number of whom aided Himera early in the battle, most Syracusan soldiers turned back out of concern for their own city's defense (Diod. 13.59-61; Xen. Hel. 1.1). Many citizens of Himera evacuated, and the remainder unsuccessfully attempted to defend their city (10) (Diod. 13.62; Xen. Hel. 1.1) (15,19). The Greek loss in the Battle of Himera in 409 BCE marked the destruction and abandonment of the polis (Diod. 13.62) (10).
Until recently, knowledge of the two Battles of Himera came from historical documents, many of which were written hundreds of years after they occurred (e.g., 13.62). In 2009, excavations of a necropolis near the ancient city of Himera revealed a series of mass graves that are likely associated with the Battles of Himera (15,21,22).
There were three cemeteries in use at the site of Himera: the East, the West, and the South (which remains unexcavated). The East necropolis was in use for the entire occupation of the colony, whereas the West necropolis was only in use during the 6 th and 5 th centuries BCE (23). Preliminary analyses of differences between the East and West necropoleis indicate that individuals interred in the East necropolis had lower δ13C and δ15N values and higher prevalence of skeletal pathology, suggesting that individuals in the East necropolis may have had differential access to resources, likely as a result of lower social status or due to changing access to resources through time (24).
Except for one individual excavated at the East necropolis, all skeletal remains included in this study are all from the West necropolis of Himera, including single interments of members of the civilian population, and multiple interments of individuals interpreted as casualties of the ancient battles in 480 BCE and 409 BCE (15) (Table S1). Mass graves #1-7 are associated with the Battle of 480 BCE. Objects from within the graves, and in other burials disturbed by the mass graves, place mass graves #1-7 in the late 6 th -early 5 th centuries. Mass grave #8/9, a continuous, large interment that represents a single mass burial (22), is associated with the Battle of 409 BCE and contains objects dating to the 5 th century. Mass graves #1-7 are consistently orderly with bodies in extended, supine position on an East-West axis. They range in size from 18 bodies (grave #1/2, which came to be recognized as a single interment after excavation had begun), to just two bodies (grave #7). In contrast, mass grave #8/9, which contains 69 bodies associated with the 409 BCE battle, is less orderly, with bodies in multiple layers and positioned to fit the grave rather than vice versa, suggesting hastier burial. Evidence suggesting the mass graves are casualties from the Battles of Himera and not, for example, victims of epidemics, include the fact that they comprise young-to mid-aged adult males, many have signs of perimortem trauma, including weapon wounds, and weapon points are embedded in some of the bodies (22). Their deliberate burial in the Greek city's own necropolis suggests these individuals are Greek, and not enemy, casualties (15). Retrieval and burial of corpses was of paramount social and religious importance in Greek society at this time (25). We know nothing of the manner in which members of the Carthaginian army were buried, but it was typical in Greek warfare for the victor to allow the enemy access to the battlefield to remove its dead (25,26). *Mass graves 8/9: 69 skeletons recovered in mass grave 8/9 but 88 bodies estimated based on taphonomic analysis: 5/88=6% Mass graves #1-4 and #5-7 are spatially segregated in Himera's West necropolis, forming two clusters in the western portion. All are along a N-S axis, with bodies oriented E-W. Mass graves #1/2, 3 and 4 each contain more than 10 individuals, whereas mass graves #5-7 each contain fewer than 10 individuals. Individuals in mass graves #1-4 tend to have partially abducted right humeri, evidence of having been dragged, whereas individuals in mass graves #5-7 do not (36). The individuals in mass graves #1-4 are also significantly younger than those in mass graves #5-7, and mass graves #5-6 contain grave goods, unlike the other 480 BCE graves (22). Mass grave #8/9 is the largest and densest of the mass graves, containing 69 individuals (27). Unusually for the necropolis, mass grave #8/9 is laid out on an E-W axis, and bodies within it are oriented in numerous positions, of which N-S is most common. An effort was made to sample more than one individual from every mass grave. Individuals were selected for analysis chiefly on the basis of skeletal completeness, prioritizing partial skeletons over highly fragmented remains. In 2016, only petrous pyramids were collected. If a petrous pyramid was identified separated from the rest of the skull, the entire petrous pyramid was collected. In 2017 the sampling strategy expanded to include teeth, which are more numerous than petrous pyramids at Himera. Teeth were chosen based on presence of intact enamel crowns and of roots, and presence of a remaining antimere (Datasets S1 and S2).
Sixty-three individuals from the mass graves were studied by the Bioarchaeology of Mediterranean Colonies Project in 2016 including demographic and paleopathological analyses. The 70 individuals that were not studied osteologically in 2016 almost always were represented by only poorly preserved bones or fragments that were not observable for skeletal stress markers. Of 63 individuals studied in 2016, approximately one-third had less than 15% of cranial bones present and three had no teeth. Therefore, the sample of soldiers included here (n=19) makes up approximately 14% of the total number of individuals documented in the mass graves (n=133), but approximately 30% of the skeletons available for osteological, isotopic and genetic analysis with sufficient cranial or dental remains. Although the number of individuals analyzed from every mass grave associated with 480 BCE is the same -two from each, excepting mass grave #1/2 for which four individuals were sampled -representativeness is higher for mass graves #5-7 by virtue of their smaller sizes (fewer than 10 individuals in each); for example, mass grave #3 contains 22 individuals and only two were genetically analyzed (9%), while mass grave #7 contains two individuals and both were analyzed (100%). The five individuals genetically analyzed from mass grave #8/9, associated with 409 BCE, represent approximately 7% of the total number of individuals (n=69 with identifiable remains and n=88 estimated taphonomically) and 55% of the total number studied osteologically in 2016. Members of the civilian sample were selected prioritizing partial skeletons over highly fragmented remains, and in an effort to include individuals buried in each of the two most common body positions at Himera: supine and flexed. In Sicily, it is unlikely that the two different body positions reflect ethnic identity and ancestry (see discussion in Shepherd (28)), but genetic evidence is lacking. Approximately 10,000 skeletons have been excavated from the West necropolis at Himera, meaning the 11 individuals from the civilian sample reported here are a preliminary, and not a representative, estimation of the genetic diversity of the population.
The following individuals from Himera provided genome-wide data -

Giulio Catalano and Luca Sineo
The sample of natives of Polizzello and Baucina (described below), respectively 6 th -5 th century BCE (Hellenized) and Iron Age, serve as a comparison to understand, with respect to the population of Himera, the indigenous size and the level of variability of the population. The site of Polizzello is located on a mountain 800 meters high just a short distance from Mussomeli (CL) in the central interior of Sicily. Archaeological excavations, started in the 1920s, have provided a long prehistoric and historic sequence from the Early Bronze Age to Medieval period (29). The human remains analyzed here were recovered in four rock-cut tombs excavated in 2004 by the Soprintendenza ai Beni Culturali e Ambientali of Caltanissetta. Typological analyses of burial style and grave inclusions indicate that the skeletal sample is attributed to the Iron Age period (900-700 BCE) when Polizzello was one of the most important indigenous Sicanian settlement in central-south Sicily (29). According to scholars, the site of Polizzello could be considered a proto-urban center characterized by a strong pastoral component. Archaeological surveys have attested the presence of Aegean trace in handicrafts and funerary architecture supporting the existence of contacts with Aegean-Cyprian cultures. Moreover, elements of Phoenicans derivation were also found. Together with these exotic features, Polizzello also developed a local production that seems to derive from previous Sicanian elements found in Mid-Late Bronze Age cultural phases of Thapsos and North Pantalica. The peculiar tradition found in Polizzello seems also the result of impermeability of Sicanian settlements of central-south Sicily to the penetration of Siculo-Ausonian and Italic elements (30). With the emergence of extensive contacts with Greek colonies probably started around the early 7th century BCE, a process of acculturation took place in western Sicily and the cultural uniqueness of Sicanian settlements was contaminated by Greek components.
Based on anthropological studies, the presence of at least 224 individuals was estimated, including adults and infants of both sexes. Palaeopathological analyses have revealed high percentages of cribra variants and dental enamel hypoplasia (31).
The osteological collection of Polizzello is housed in the Museum of Mussomeli (Caltanissetta).
Genome-wide data in this study came from 19 petrous bones:

Giulio Catalano, Luca Sineo and Britney Kyle
Together with Polizzelo, Baucina was chosen as a comparative sample for Himera because these are two of the most significant sites from which quantitative information (number of individuals) of ancient indigenous people was available. The necropolis is located on the Monte Falcone mountain (695 meters) near the modern town of Baucina. The site, excavated by the Soprintendenza and the University of Palermo several times, was a vast indigenous center between the 6 th century and above all the 5 th century BCE (32). Although dates for the skeletal assemblage at Baucina are not certain, with the exception of a few pieces of prehistoric pottery fragments and pieces of obsidian, the finds from the site all date to the 6 th -5 th c. BCE (33). Furthermore, the presence of Greek colonial pottery from the end of the 6th century BCE indicates contact with Greek settlements, likely Himera due to its proximity, and the presence of Punic amphora suggest commercial relations with nearby Solunto and Palermo. Thus, "Punic, Greek and indigenous traditions [are thought to have] coexisted" at the site, making it an important location to examine relationships between these groups (32). Simple burials and collective tombs of the type known as a 'grotticella' (chamber tombs) also show significant influences by the culture of the neighbouring Greek colony of Himera. The human skeletal remains found in the necropolis are estimated to derive from at least 50 individuals.
We use individuals with fewer than 50,000 1240K SNPs covered in exploratory analyses and in group-based analyses, but do not individually formally model them.

Exploratory analyses
We tested for pairs of related individuals using the software READ (38), and found that in our dataset only two IA individuals from Polizzello (I13378 and I13394) share a likely 2 nd degree relationship (Fig. S8). ANOVA between the different groups show significant differences in the variances of P0 (p=3.22E-8), and multiple one-tailed T-tests confirm this is due to the significantly lower mean P0 in the Sicily_IA group (Fig. S8B). This might suggest greater genetic homogeneity in this group compared to the inhabitants and combatants of Himera.
We merged the dataset above with 3291 modern-day individuals from 109 worldwide populations genotyped on the Affymentrix Human Origins (HO) array, 96 newly reported in this study (Dataset S3). We used the smartpca function of EIGENSOFT (39)  Most samples plot in PCA space together with Aegean and Sicilian LBA individuals with the remarkable exception of seven soldiers of the 480 BCE battle that plot together with Central Europeans, Northeastern Europeans, IA Steppe nomads and Caucasian populations, respectively. For the 10 low-coverage samples whose data intersects with fewer than 25,000 HO SNPs, 8 attributed to the civilian population of Himera and 2 480 BCE soldiers, we additionally plot ellipses representing the 95% jackknife confidence region (Fig. S12). The civilian samples overlap in their confidence distribution with individuals from the BA Aegean, Sicily_IA and Sardinian Punic individuals, and are inconsistent with populations from Western and Northern Europe, the Eurasian Steppes, the Caucasus, Near East or North Africa, suggesting that the genetic heterogeneity seen in the soldiers from the 480 BCE battle is due to their specific military context and not an effect of them being drafted from an already extremely heterogeneous local Himeran population. More subtle genetic heterogeneity in the Himeran civilian sample is nevertheless hinted at by the two higher coverage samples I20163, I20168 and I20166 whose 95% CI regions do not overlap, with the former two clustering more closely with Sicily_IA and the latter more closely with the BA Aegean. The 95% CI region of the low-coverage individuals I17870 and I17872 from the 480 BCE battle is consistent with modern-day individuals from Central Europe and the Caucasus, respectively.
We performed clustering using unsupervised ADMIXTURE (41), using all samples used in the worldwide PCA and ancient reference samples that overlap more than 50,000 HO SNPs. We pruned SNPs in linkage disequilibrium with one another with PLINK using the parameter --indeppairwise 200 25 0.4, which left us with 282,184 SNPs. We performed an ADMIXTURE analysis for values of k between 2 and 15, and carried out 5 replicates at each value of k. We retained the highest likelihood replicate at each k and mainly discuss results for k = 6, as it serves to visually discriminate the relevant ancestral components (Figs. 2B and S10).
Most individuals exhibit at k=6 three major components, maximized in WHG, EEF and CHG, respectively, with the proportions qualitatively similar to that seen in the PCA. Notably, the two individuals that cluster with IA Steppe nomads have two additional components, maximized in Han and Karitiana; these components also appear in almost all Steppe nomads in highly variable proportions. Some of the lower coverage civilian genomes exhibit a small proportion of the component maximized in the southern African Ju|'hoan which is also found as a minor component in the Punic individuals from Sardinia and Ibiza as well as Eastern Mediterranean and North African individuals. This might point towards some genetic influence through contact with the Phoenician/Punic sphere, although in principle it could also reflect statistical noise due to the lower coverage of these individuals. In the higher coverage civilian individuals this component also appears in I20163 (although not I20166 or I20168), increasing confidence that this is a true signal and not just an artefact of low coverage.

Genetic clustering and outlier detection
To determine suitable analysis groupings and assess genetic homogeneity within and between groups of individuals from the same chronological and archaeological contexts, we used the qpWave tool in ADMIXTOOLS (42) Fig. S13; we consider models with a p-value <0.01 as rejected. We confirm that cladality is not rejected for most pairs of individuals within a chronological and archaeological grouping, with the notable exception of 5 subgroups that can be identified within the soldiers of the 480 BCE battle. These subgroupings mirror genetic differentiation also seen in PCA and ADMIXTURE analyses. Genetic heterogeneity is furthermore evident within groups, as some pairwise models are rejected.
We then tested each individual for which pairwise qpWave tests with the others was rejected against the rest of the individuals grouped together, and treat the individual as a genetic outlier if this model is rejected, an approach to outlier detection previously used by Fernandes, et al. (43). We use 16 reference populations (P16) Mbuti.DG, Russia_Ust_Ishim_HG.DG, CHG, EHG, Spain_ElMiron, Czech_Vestonice16, Russia_MA1_HG.SG, Israel_Natufian, Jordan_PPNB, Turkey_N_Barcin, WHG, Iran_GanjDareh_N, Russia_Samara_EBA_Yamnaya, Morocco_EN.SG, Greece_Crete_BA and Russia_Shamanka_Eneolithic.SG, adding Greece_LBA in tests of groupings that chronologically post-date them. As f-statistics computed by qpfstats are expected to differ somewhat from the direct computation and we chose a different set of outgroup populations, we obtain in some cases different results here from those that have previously been published.
We additionally use the program qpfmv, which is a specialized implementation of qpWave that tests if every pair of samples Ii, Ij within an chronologically and archaeologically defined grouping is a clade with respect to the set of populations Pk, asking if all the pairwise f4 statistics f4(Ii, Ij ;O, Pk) are 0 up to noise, where O is an outgroup. In our tests we used Mbuti.DG as the outgroup and as reference populations the rest of the populations used with qpWave. qpfmv tests for rejection of the null hypothesis of homogeneity using a Hotelling T 2 statistic. This test indicates whether there remains any subtle genetic inhomogeneity in the grouping, even after outlier removal.
For the purpose of defining source populations in subsequent analyses, we reanalyze 17 previously published individuals (43): 11 individuals dated to the EBA, 2 from the MBA and 4 from the LBA:

Sicily_EBA
The qpfmv test (P16) rejects genetic homogeneity among all samples with a p-value <<0.01. In pairwise qpWave models based on the qpfstats output, using the same outgroups as in the original qpWave test without the chronologically later Greece_LBA, we identify 3 potential outliers for which cladality is rejected with other individuals (I3124, I8561, I11443; Table S2). We reran the test, grouping the other eight samples that formed a clade. In this test, cladality to the main grouping was not rejected for I3124 (p=0.058). Cladality was strongly rejected for I8561 and I11443 (p=14.3E-14 and 1.5E-42, respectively). We therefore treat these two individuals as genetic outliers from the Sicily_EBA group. Repeating the qpfmv analysis on the remaining group, homogeneity is rejected (p=6.4E-04), suggesting an inhomogeneous population overall which is consistent with the variation seen on PCA and ADMIXTURE and recent admixture. Table S2. P-values for the qpWave models between Sicily_EBA individuals. Rejected models with p<0.01 marked in red.

Sicily_MBA
In pairwise qpWave models (P16), we cannot reject cladality between the two samples (p=0.868). The qpfmv test cannot reject genetic homogeneity (p=0.8) and we group both individuals in further analyses.

Sicily_LBA
In the qpWave test (P16+Greece_LBA), we detect one sample I10371 for which pairwise qpWave models result in rejection (Table S3). Comparing this sample against the group of other three samples that form a clade, we find that cladality is still rejected and this individual represents a genetic outlier (p=1.85E-03). Before outlier removal, the qpfmv test rejects genetic homogeneity with p=1.01E-04. After removing I10371, we cannot reject homogeneity for the remaining 3 samples (p=0.077). We therefore treat I10371 as an outlier. For the following groups we report new data:

Sicily_IA
This group comprises 21 individuals from two Sicilian IA sites. They cluster closely together on PCA space also occupied by most Sicilian MBA and LBA individuals (Figs. 2A and S11), and are similar in their proportions of genetic components determined by ADMIXTURE (Fig. 2B). Two individuals (I13125 and I13382) have evidence of genetic heterogeneity in that pairwise qpWave models (P16+Greece_LBA) with respectively two or more other individuals in this cluster are rejected (p-values <0.007; Table S4). When we model these individuals as clades with the combined groups of the other individuals, the models for I13125 and I13382 are not rejected (p=0.0325 and 0.0165, respectively) and so we subsequently group all Sicily_IA individuals. However, the qpfmv test rejects genetic homogeneity among all samples with a p-value <<0.01, and the result remains after removing the two potential outliers. This suggests the presence of subtle genetic differences between the samples, that qpfmv is able to pick up due to joint analysis of all the individuals.

Sicily_Himera_480BCE_1
Pairwise qpWave tests (P16+Greece_LBA) confirm the extreme genetic variation in the soldiers of the 480 BCE battle (Table S5), already shown in PCA and ADMIXTURE (Figs. 2, S2 and S11) and as expected genetic homogeneity is rejected by qpfmv (p<<0.01). One pair of individuals (I10945/W0494 and I10948/W2587) that falls within the "Aegean-like" cluster appears inconsistent with forming a clade in qpWave with a p-value of 6.13E-04. However, these individuals are each cladal with the combined groups of the other individuals (p=0.0253 for I10945/ W0494 and p=0.936 for I10948/W2587) and so we include them in the genetic analysis grouping of "Aegean-like" individuals which we refer to as Siciliy_Himera_480BCE_1. However, the presence of subtle inhomogeneity within this group is suggested by qpfmv, with a p-value of 1.14E-3.  (Fig. 2B). This pair of individuals (I10946/W1771 and I10950/W0814) forms a clade in the qpWave model with a p-value of 0.869 to the exclusion of any other pairwise models formed with other 480 BCE soldiers (Fig. S13).

Sicily_Himera_480BCE_3
This group comprises two individuals archaeologically identified as soldiers of the 480 BCE battle that cluster on PCA closest to ancient Eastern Baltic populations (Figs. 2A and S11), and have like these the highest proportion of the component maximized in WHG in ADMIXTURE compared to other contemporaneous groups (Fig. 2B). This pair of individuals (I10943/W0396 and I10949/W0403) forms a clade in the qpWave model with a p-value of 0.256, to the exclusion of any other pairwise models formed with other 480 BCE soldiers.

Sicily_Himera_480BCE_4
This group comprises two individuals archaeologically identified as soldiers of the 480 BCE battle that cluster on PCA closest to ancient nomadic populations of the Western Steppe (Figs. 2A and S11), and also carry the component maximized in modern-day Han in ADMIXTURE that is found in varying proportions in most ancient Steppe nomads (Fig. 2B). This pair of individuals (I10944/W0461 and I10947/W1774) forms a clade in the qpWave model with a p-value of 0.572, to the exclusion of any other pairwise models formed with other 480 BCE soldiers.

Sicily_Himera_480BCE_5
A single individual archaeologically identified as a soldier of the 480 BCE battle clusters on PCA closest to ancient and modern populations from the Caucasus and Iran (Figs. 2A and S11), and also carries the highest proportion of the genetic component maximized in CHG in ADMIXTURE compared to the other ancient Sicilians (Fig. 2B). This individual (I10951/W0653) is not cladal with any of the other 480 BCE soldiers according to the qpWave models and thus is treated as his own group.

Sicily_Himera_409BCE
This group comprises five individuals archaeologically identified as soldiers of the 409 BCE battle that cluster on together with LBA Aegeans and the soldiers of Himera Group 1 (Figs. 2A and S11), and are similar in their proportions of genetic components determined by ADMIXTURE (Fig. 2B). All pairs of individuals appear cladal in our qpWave models (p>0.05; Table S6) and homogeneity is not rejected by qpfmv (p=0.120) and we group them together for further analyses.  After we established genetic groups and outliers, we investigated their genetic affinities and ancestry with outgroup f3-statistics of the form f3(Sicilian group, X; Mbuti.DG) to measure shared genetic drift between Sicilian groups and ancient populations X relative to the outgroup Mbuti.DG. We show the results in Fig. S16.

Ancestry modeling with qpAdm
We used qpAdm (42) with the parameter allsnps: YES with precomputed f-statistics from qpfstats to estimate proportions of ancestry in each individual (Dataset S5) as well as in the established analysis groupings (Table S8)  We confirm the appearance of an Iran-related ancestry component in the MBA (43) that is not present to the same extent or at all in the previous EBA and makes up 9-13% of the group's ancestry (p=5.13E-02; Table S8). The model of this group involving the least sources does not require input from Russia_Samara_EBA_Yamnaya, although EBA requires 7-10% of this ancestry. The later Sicily_LBA can be modeled (p=8.36E-02) using only the three sources Turkey_N_Barcin (83.9 ± 1.6%), WHG (5.4 ± 1.6%) and Russia_Samara_EBA_Yamnaya (10.6 ± 2.1%). Table S8. Admixture proportions estimated by qpAdm for the most parsimonious models with distal sources. The sources used are Turkey_N_Barcin (1), WHG (2), Iran_GanjDareh_N/CHG (marked with †) (3), Russia_Samara_EBA_Yamnaya (4), and Russia_Shamanka_Eneolithic.SG (5).
As more proximal ancestry sources in the qpAdm analysis, we used diverse Eneolithic to IA populations that appear to have affinity to the analysis groups informed by the results of PCA, ADMIXTURE and outgroup f3-statistics, using the outgroup set P11 and adding additional outgroups Turkey_N_Barcin, WHG, Iran_GanjDareh_N, Russia_Samara_EBA_Yamnaya and Sicily_EBA. We examine all possible 1-, 2-and 3-way models and consider models valid if their fit was p>0.01and report the most parsimonious model that uses the least sources. In cases where several models from the most parsimonious rank are not rejected, we apply a "model competition" approach in which we add all unused sources to the outgroup set P16.
The only model that was not rejected was a 1-way model with Lithuania_BA as the source (p=0.129). This group, comprising 4 Eastern Baltic individuals dated to the 13 th to 7 th century BCE, carries a greater proportion of WHG and EHG ancestry than other contemporary European populations, but lacks the Siberian-related ancestry found in this region in later times and roughly contemporaneously further north (44,45).

Sicily_Himera_480BCE_4
This group yields no feasible models when using only the common 4 distal sources for European populations. Due to the position on the PCA clustering with IA steppe nomads and the additional ancestry component maximized in East Asians (Fig.  2B), we moved Russia_Shamanka_Eneolithic.SG to the sources, accounting for any East Asian-related ancestry also seen in varying proportions in contemporaneous Steppe nomads (46,47), resulting in models that are not rejected. The individuals can be modeled as 20-28% Turkey_N_Barcin, 59-63% Russia_Samara_EBA_Yamnaya and 12-16% Russia_Shamanka_Eneolithic.SG (Dataset S5). As a group, the best-fitting model (p=6.19E-01;  (Table S12). We find a working 1-way model in TianShan_IA_Nomadic_o2 (p=1.53E-01; Table S12), a genetic outlier who appears to carry more Western Eurasian ancestry than the main cluster of contemporaneous nomads sampled from the TianShan region. Since this group is only represented by a single individual, we also report working 2-way models, which include the Mediterranean sources or groups known to have more western Eurasian affinity. We investigate the possible source of western Eurasian admixture further by running the competitive modeling approach. While none of the models return a p-value>0.01, the 2-way models with the highest pvalue (p>3.86E-04), derive ~88% of ancestry from CentralSteppe_IA_Nomadic_Steppe (i.e. individuals associated with nomadic IA cultures excavated in the Central region of the Eurasian steppes, who genetically resemble BA Steppe pastoralists with added East Asian/Siberian gene flow), and ~12% from an Aegean source (represented by Greece_LBA or Sicily_Himera_480BCE_1), consistent with two 3-way models with the highest p-values (1.19E-03 and 1.27E-03, respectively) that require, in addition to ~79% CentralSteppe_IA_Nomadic_Steppe and ~13-15% of an Aegean source, around 6-7% ancestry deriving from CentralSteppe_IA_Nomadic_EA1 (Table S12). The rejection of these models suggests that while a western Eurasian, and possibly Mediterranean source is plausible, we have not used the exact contributing source in our models. This admixture between groups from the steppe and western Eurasia could have taken place in the steppe, since IA Steppe nomads such as the Scythians were genetically very homogeneous and also included individuals of mostly Aegean ancestry (47,48); furthermore, Aegean migrants living in Greek colonies on the shores of the Black Sea are likely to have been in contact with the neighboring nomadic populations. Using more proximal sources of ancestry, we tested 1-, 2-, and 3-way models of Sicily_IA, Greece_Crete_BA, Greece_LBA, Turkmenistan_IA.SG, Armenia_EBA, Armenia_MBA.SG, Armenia_MBA, Armenia_LBA.SG, Iran_Hasanlu_IA.SG, CentralSteppe_IA_Nomadic_Steppe, TianShan_IA_Nomadic, TianShan_IA_Nomadic_o3, CentralSteppe_IA_Nomadic_o, Turkey_IA_o3.SG, WesternSteppe_IA_Nomadic_intermediate, Tajikistan_Ksirov_Kushan and Iran_IA_HajjiFiruz. A single 1-way model with Armenia_MBA as the source was valid (p=0.293).
I20163/W1838 can be fit by four valid 2-way models, most involving a Sicilian source. The second source is either Spain_IA, Israel_MLBA_Canaanite, Italy_Sardinia_IA_Punic_1 or Italy_Sardinia_C_o. The inclusion of the latter three, a Levantine group, Punic individuals with variable North African and European ancestry, and a Chalcolithic Sardinian with North African ancestry, indicate that this individual is the descendant of local Sicilians who also harbors Levantine or North African ancestry -around 9-15%, if Italy_Sardinia_C_o is representative of unadmixed ancestry of this type. The model with the highest p-value gives 61.6 ± 8.7% Sicily_IA and 38.4 % ± 8.7% Italy_Sardinia_IA_Punic_1 (p=0.295). In proximal 1-way models the individual is cladal with preceding Sicilian groups Sicily_MBA, Sicily_LBA and Sicily_IA (Table S15), consistent with its position on the PCA and ancestry components inferred by ADMIXTURE (Fig. 2).

The relationship between the civilians and soldiers of Himera
In most Greek cities the army was composed of citizens who could furnish themselves with armor. The results of the qpAdm analysis, as well as the PCA and ADMIXTURE analyses, which involve low coverage individuals, indicate a higher genetic diversity among the civilians than among Aegean-related soldiers (the groups Sicily_Himera_480BCE_1 and Sicily_Himera_409BCE), although not as high as among the genetic outliers of the 480 BCE battle, who likely included foreign mercenaries. To compare the genetic diversity between the groups of Aegean-related soldiers (including both battles) and the civilians of Himera, we used the software PAST 4.04 (49) to conduct a one-way PERMANOVA (50) on the coordinates of the first two PCs, and show that the centroids of the two groups are not significantly different in their position (p=0.218; 9999 permutations, Euclidean similarity index; Fig. S15). However, the variance within groups, represented by the Euclidean distances of points to the centroid of their respective group, is significantly higher in the civilian sample (p=2.51-05; t-test, Exact permutation; Fig. S15). On face value, this suggests that the soldiers might be a less diverse subset of the civilian population, in accordance with the historical hypothesis. However, as the scatter in the civilian sample could be due to noise created by the low coverage of the majority of these individuals, we repeated the tests with 5 iterations of randomly downsampling each soldier to the coverage of one of the civilian individuals (and dropping one soldier to match the number of civilian samples available). After downsampling, the centroids of the soldiers remain not significantly different from the centroid of the civilian group (p>0.140; 9999 permutations, Euclidean similarity index), while the difference in variance becomes non-significant in 4 of 5 cases (p>0.209 in 4 cases, and p=0.024 in one case; t-test, Exact permutation; Fig. S15). This demonstrates how high missingness can make a population appear more genetically heterogeneous than it really is, when only taking PCA into account. However, qpAdm results (see above) show that at least one of the civilian individuals, I20163/W1838, has an ancestry composition distinct from what we see in any of the soldiers. Overall, the results do not contradict the hypothesis that the general populace comprised a more diverse ancestry than that represented by Aegean-related soldiers who best represent the descendants of Greek settlers. The possibility that the citizen-soldiers who defended Himera against attack in 409 BCE represent a subset of the population more closely related to "Greek" founders than other citizens warrants further investigation.

Local Sicilian ancestry in the Aegean-related soldiers
We find that for each of the battles the largest groups of soldiers (Sicily_Himera_480BCE_1 and Sicily_Himera_409BCE) are primarily related to the Aegean BA. This inference is based on according to their clustering with Greece_BA on the PCA, similar proportions of genetic components in ADMIXTURE analysis, and valid qpAdm models involving Aegean sources. However, these groups do not seem completely genetically homogeneous in the mentioned analyses, as well as in the qpfmv result for Sicily_Himera_480BCE_1, and valid qpAdm models involve also non-Aegean sources (Table S10). It is possible that the heterogeneous nature of the groups can be explained by genetic substructure in their ancestral Aegean populations, as well as admixture of their ancestors with non-Aegean groups such as local Sicilians. This could be indicated by the position of three of the Sicily_Himera_409BCE individuals shifted more closely to indigenous Sicilian groups on the PCA. Indeed, we do not expect both groups to share the same recent population history, as, according to historical sources, the 480 BCE battle involved supporting armies from other Greek poleis, while in the 409 BCE battle Himera was defended by her own inhabitants with no outside support.
Grouped modeling with proximal sources as performed previously might obscure slight differences in individual ancestry. As a simple model, we might interpret the position of the Himeran soldiers on the PCA as a genetic cline between an Aegean-related and an indigenous Sicilian-related source, in which each individual has slightly differing proportions of the two different ancestries. Local admixture between the communities that inhabited the same island and undoubtedly interacted with each other for decades or even centuries is expected. We therefore first model each individual in Sicily_Himera_480BCE_1 and Sicily_Himera_409BCE as a 2-way mixture between Greece_LBA and a local preceding or contemporaneous indigenous Sicilian group; here we use Sicily_LBA and Sicily_IA. We first confirm that we can distinguish the source proxies with our chosen outgroups (P16 + Greece_Crete_BA, Armenia_LBA.SG, Israel_MLBA_Canaanite, Italy_Sardinia_C_o, Sicily_Himera_Civilians_I20163, Sicily_Himera_Civilians_I20166, Sicily_Himera_Civilians_I20168) using qpWave (Table S16). All models of cladality are rejected with p-values <<0.01. We ran qpAdm by adding all unused sources in a given model to the outgroups. In Table S17, we report the results for all 2-way models using Greece_LBA and Sicily LBA or IA as sources. When these are all rejected for a given individual or result in coefficients and standard errors consistent with no ancestry deriving from one of the sources, we additionally test feasible 1-way models. We

Runs of homozygosity and population size
We used the hidden Markov model algorithm implemented in the software package hapROH (51) to detect ROH, which is possible for samples covered at more than 400,000 SNPs ( Fig. S14 and Table S18).
Fewer than half of the individuals exhibit any ROH, indicating a large effective population size (51). Endogamous practices are unlikely, as we find no evidence of recent inbreeding between close relatives, and most individuals do not carry ROH over 4 cM in length. However, in all groups except Sicily_Himera_480BCE_2 and Sicily_Himera_409BCE some of the individuals carry a low amount of ROH in the length range of 4-20 cM. This could signify consanguinity between very distantly related parents which can happen by chance, especially if no cultural institutions are in place to prohibit unions between even distant relatives, or if genetic substructure existed among the population of Sicily.  Sicily_IA Sicily_IA 0 0

Phenotypic analysis
We used the HIrisPlex-S system (52)(53)(54) to predict the pigmentation of individuals included in our study. Individuals with at least one SNP of the system with data could be submitted to HIrisPlex-S for phenotype prediction, and discussion is limited to the individuals for which the system returned inference probabilities for the different phenotypes (n/a was returned if the SNPs were insufficient for a prediction to be made).
We also screened the data for the derived alleles of MCM6 at locus rs4988235 and EDAR at locus rs3827760. We find the presence of the derived alleles in two individuals.
We find 4 in 6 of the sequences covering the MCM6 locus in the individual I10943/W0396 carry the derived allele, indicating that the individual is heterozygous. MCM6 is a regulatory gene that affects the lactase LCT gene and rs3827760 (A>G) is responsible for lactase persistence, the ability to digest milk into adulthood, in the majority of Europeans with this trait (56,57). It is hypothesized that this allele originated among Steppe pastoralists and spread in to Europe with their migration at the transition from Neolithic to Bronze Age (58). The lactase persistence trait was under positive selection and the frequency of the allele started to increase rapidly towards the end of the Bronze Age (59). In the eastern Baltic, a sharp increase is already seen at the end of the Late Neolithic with the variant reaching a frequency over 50% in the 1 st millennium BCE (44,45), while the variant remained less common further south in Europe and its distribution still follows a decreasing north to south cline to this day (60). This observation of the derived allele in I10943/W0396 agrees with his genetic ancestry, with the greatest affinity to BA and IA Eastern Baltic groups. Notably, this individual also has an uncommon physical appearance among the studied samples, with Hirisplex-S estimating highest probabilities for a combination of blond hair, blue eyes and pale skin. This phenotype also rose in frequency in the Eastern Baltic in the 1 st millennium BCE (44,45).