A global analysis of Y-chromosomal haplotype diversity for 23 STR loci

In a worldwide collaborative effort, 19,630 Y-chromosomes were sampled from 129 different populations in 51 countries. These chromosomes were typed for 23 short-tandem repeat (STR) loci (DYS19, DYS389I, DYS389II, DYS390, DYS391, DYS392, DYS393, DYS385ab, DYS437, DYS438, DYS439, DYS448, DYS456, DYS458, DYS635, GATAH4, DYS481, DYS533, DYS549, DYS570, DYS576, and DYS643) and using the PowerPlex Y23 System (PPY23, Promega Corporation, Madison, WI). Locus-specific allelic spectra of these markers were determined and a consistently high level of allelic diversity was observed. A considerable number of null, duplicate and off-ladder alleles were revealed. Standard single-locus and haplotype-based parameters were calculated and compared between subsets of Y-STR markers established for forensic casework. The PPY23 marker set provides substantially stronger discriminatory power than other available kits but at the same time reveals the same general patterns of population structure as other marker sets. A strong correlation was observed between the number of Y-STRs included in a marker set and some of the forensic parameters under study. Interestingly a weak but consistent trend toward smaller genetic distances resulting from larger numbers of markers became apparent.


Introduction
Early Y-chromosomal short-tandem repeat (STR) markers used in forensic practice either were discovered in cloning experiments [1,2] or were retrieved in silico from the Genome Database (GDB) [3]. These markers include, for example, the nine loci constituting the 'minimal haplotype' (MHT) marker set [4], which still forms the core of all Y-STR kits in current forensic use but at the same time represents a rather heterogeneous and somewhat random choice of markers with different population genetic properties. Meanwhile, the complete euchromatic region of the human Ychromosome has been sequenced [5] and, with the human reference sequence at hand [6], a more systematic search for potentially useful Y-STRs became feasible. Thus, a recent study by Ballantyne et al. [7] identified 167 novel Y-STRs and combined those 13 with the highest mutation rate in a set of so-called ''rapidly mutating'' (RM) markers. The same study also revealed that between 50% and 100% of pairs of related men (at most 20 meioses apart) can be resolved by at least one mutation of these RM Y-STRs. Such results indicated that low level haplotype sharing between patrilineal relatives pertain to combinations of RM Y-STRs in general, thereby overcoming a limitation of using Y-STR typing of forensic evidence. However, the multi-copy structure of some of the most mutable Y-STRs renders genotyping difficult and often unreliable so that the RM approach has not yet become fully integrated into forensic casework.
The PowerPlex 1 Y23 System (PPY23, Promega Corporation, Madison, WI) is a five-dye Y-STR multiplex designed for genotyping male samples at 23 loci. It is intended to be used in forensic casework, kinship analysis and population genetic studies. Advantageous features such as short fragment length and an uninterrupted repeat structure were taken into account when constructing the kit. Six new markers (DYS481, DYS533, DYS549, DYS570, DYS576 and DYS643), two of which (DYS570 and DYS576) categorized as ''rapidly mutating'' [7], were added to an existing panel of 17 markers, already contained within the Yfiler 1 kit (Yfiler, Life Technologies, Foster City, CA). The first studies employing the new PPY23 kit revealed a markedly increased haplotype diversity and discriminatory power in comparison to other marker sets, including the MHT, SWGDAM (recommended by the US-based Scientific Working Group for DNA Analysis Methods, www.swgdam.org), PowerPlex 1 Y12 (PPY12) and Yfiler panels [8][9][10]. Here is presented a much more comprehensive analysis of almost 20,000 Y-chromosomes, sampled from 129 populations in 51 countries worldwide and genotyped between September 2012 and June 2013. The gain in information for forensic casework was assessed from that provided by the PPY23 panel and compared to the Yfiler, PPY12, SWGDAM and MHT panels. Possible population differences [11] were determined based on genetic distances between single populations as well as between continental groups. All haplotype data used in the study are publicly available at the Y Chromosome Haplotype Reference Database (YHRD) website (www.yhrd.org). In a worldwide collaborative effort, 19,630 Y-chromosomes were sampled from 129 different populations in 51 countries. These chromosomes were typed for 23 short-tandem repeat (STR) loci (DYS19, DYS389I, DYS389II, DYS390, DYS391, DYS392, DYS393, DYS385ab, DYS437, DYS438, DYS439, DYS448, DYS456, DYS458, DYS635, GATAH4, DYS481, DYS533, DYS549, DYS570, DYS576, and DYS643) and using the PowerPlex Y23 System (PPY23, Promega Corporation, Madison, WI). Locus-specific allelic spectra of these markers were determined and a consistently high level of allelic diversity was observed. A considerable number of null, duplicate and off-ladder alleles were revealed. Standard single-locus and haplotype-based parameters were calculated and compared between subsets of Y-STR markers established for forensic casework. The PPY23 marker set provides substantially stronger discriminatory power than other available kits but at the same time reveals the same general patterns of population structure as other marker sets. A strong correlation was observed between the number of Y-STRs included in a marker set and some of the forensic parameters under study. Interestingly a weak but consistent trend toward smaller genetic distances resulting from larger numbers of markers became apparent.

DNA sample collection
Between 9/2012 and 6/2013, a total of 19,630 Y-STR haplotypes were compiled in 84 participating laboratories. In particular, unrelated males were typed from 129 populations in 51 countries worldwide ( Fig. 1; Table S1 and Fig. S1). Most of the samples had been typed before for smaller marker sets, mostly the Yfiler panel (DYS19, DYS389I, DYS389II, DYS390, DYS391, DYS392, DYS393, DYS385ab, DYS437, DYS438, DYS439, DYS448, DYS456, DYS458, DYS635 and GATAH4) and the corresponding haplotypes had been deposited in YHRD. All samples were now also typed for the full PPY23 panel (17 markers in Yfiler plus the loci DYS481, DYS533, DYS549, DYS570, DYS576 and DYS643), and samples from 40 populations were typed completely anew. The YHRD accession numbers of the 51 populations are given in Supplementary Table  S2. DNA samples were genotyped following the manufacturer's instructions [12] with the occasional adaptation to prevailing laboratory practice. Populations were placed into five groups ('meta-populations') according to either (i) continental residency (445 African, 3458 Asian, 11,968 European, 1183 Latin American, 2576 North American) or (ii) continental ancestry, defined as the historical continental origin of the source population (1294 African, 3976 Asian, 12,585 European, 558 Native American, 1217 Mixed American) (Table S2).

Quality control
Each participating laboratory passed a quality assurance test that is compulsory for all Y-STR studies to be publicized by, and uploaded to, YHRD. In particular, each laboratory analyzed five anonymized samples of 10 ng DNA each, using the PowerPlex 1 Y23 kit. The resulting profiles were evaluated centrally by the Department of Forensic Genetics at the Charité -Universitä tsmedizin Berlin, Germany. All haplotypes previously uploaded to YHRD were automatically aligned to the corresponding PPY23 profiles and assessed for concordance. Plausibility checks, including the allelic range and the occurrence of intermediate alleles, were performed for the six novel loci (i.e. DYS481, DYS533, DYS549, DYS570, DYS576 and DYS643).

Forensic parameters
Forensic parameters were calculated for all samples (n = 19,630) and for all 23 markers of the PPY23 kit. To this end, DYS389II alleles were encoded by the difference, henceforth labeled DYS389II.I, between the total repeat number at DYS389II and the repeat number at DYS389I. DYS385ab haplotypes were treated as single alleles thereby ignoring the internal order of its two component alleles. Forensic parameters were calculated for the study as a whole and for meta-populations defined according to the continental or ethnic origin of the samples (see above).
In particular, allele frequencies and haplotype frequencies were estimated using the counting method. Single-marker genetic diversity (GD) was calculated as GD ¼ n 1 À P p 2 i À Á =ðn À 1Þ, following Nei [13,14], where n and p i denote the total number of samples and the relative frequency of the i-th allele, respectively. Haplotype diversity (HD) was calculated analogous to GD. Match probability (MP) was calculated as the sum of squared haplotype frequencies. The discrimination capacity (DC) was defined as the ratio between the number of different haplotypes and the total number of haplotypes. To benchmark the practical utility of the PPY23 panel for forensic casework, all haplotype-based analyses were repeated for various subsets of Y-STRs, namely the MHT (9 loci), SWGDAM (11 loci), PPY12 (12 loci) and Yfiler marker panels (17 loci). The Yfiler and PPY23 panels also were compared to one another after confining both panels to Y-STRs with an amplicon length <220 bp.

Population differences
The extent of population genetic structure in our data was assessed by means of analysis of molecular variance (AMOVA). More specifically, genetic distances between groups of males were quantified by R ST , thereby taking the evolutionary distance between individual Y-STR haplotypes into account [15,16]. The DYS385ab marker was not included in the AMOVA because it does not allow easy calculation of evolutionary distances. Samples carrying a deletion, a null allele, an intermediate allele (i.e. an incomplete repeat unit), a duplication or a triplication at one or more markers were excluded from the AMOVA (n = 705, 3.6%), leaving 18,925 haplotypes for analysis (Supplementary Table S2). R ST values resulting from continental grouping were compared among the PPY23, Yfiler, PPY12, SWGDAM, and MHT panels. Multidimensional scaling (MDS) analysis served to visualize differences in Y-STR genetic variation between populations and was based upon pairwise linearized R ST values for PPY23, that is R ST /(1 À R ST ). MDS is commonly used to investigate genetic similarities between populations and has been described in detail elsewhere [17]. First, MDS analyses were performed for one to 10 dimensions considering either all 129 populations or the 68 European populations alone. For each population set and each dimensionality, a stable MDS solution was obtained iteratively reducing Kruskal's stress value [17,18] until it remained nearly unchanged (i.e. until the ratio of consecutive stress values  0.99). The optimal dimensionality then was determined for each population set by a visual 'scree' test.
All analyses were performed using R statistical software v2.15.3 [19] or Arlequin v3.5.1.2 [20], as appropriate. In particular, Arlequin was employed to estimate R ST values and for randomization-based significance testing of genetic distances (10,000 replicates per comparison) [20]. Covariance components (i.e. percentages of variation) associated with different levels of geographic grouping were tested for statistical significance using a non-parametric permutation approach described by Excoffier et al. [15] (10,000 replicates). For MDS, R package vegan v.2.0-10 was used [21]. Geographic maps were generated in R using packages maps v.2.3-6 [22] and mapdata v.2.2-2 [23]. The latter is based upon an amended version of the CIA World Data Bank II. In order to perform spatial interpolation, we estimated the spatial model using random Gaussian fields, while conventional kriging was used for interpolation, as implemented in the likfit and krige.conv functions from the geoR v1.7-4 [24,25].

Single-locus analysis
A high level of genetic diversity was observed in our study at all 23 Y-STRs of the PPY23 panel. Some 521 different alleles were observed in the 19,630 Y-chromosomes analyzed, with a median number of 16 alleles per marker and a range of 10 (DYS391) to 31 (DYS458 ; Table S3) Table S4). While of the 17 markers in common with the Yfiler kit, markers DYS385ab (GD = 0.923) on the one hand, and DYS391 (0.521) and DYS393 (0.534) on the other marked the extremes of the GD distribution, four of the six PPY23specific markers, namely DYS481, DYS570, DYS576 and DYS643, ranked near the top, with GD values exceeding 0.72. Notably, some loci ranked differently with respect to GD in different continental ( Fig. 3b) or ancestry groups (Fig. S2), most prominently with regard to the African meta-population (Table S4). For example, the DYS390, DYS438 and DYS392 markers were found to be less variable in Africa than, for example, in Europe. Of the six PPY23specific markers, all but DYS643 showed similar GD values on most continents. The DYS643 marker was found to be more variable in Africans, but less variable in Native Americans from Latin America, than in the other continental groups (Fig. S2).

Variant and off-ladder alleles
Variant alleles not representing simple repetitions of the respective STR motif occurred at all loci (Table S3). This included null alleles, likely due to a deletion or primer site mutation, intermediate alleles comprising fractional repeats, and copynumber variants such as duplications and triplications of the whole locus. All variant alleles were confirmed by retyping or sequencing at the laboratory that had performed the original STR typing. The proportion of variant alleles differed greatly among  (Table S3).
Some 75 different intermediate alleles occurred at one of 18 Y-STR loci and were seen in 550 samples (Table S3) . The structure of 11.1 at the DYS643 marker (observed in 11 samples in our study) has been reported previously [26] and is included already in the PPY23 allelic ladder.
A total of 133 null alleles were observed at 17 loci (Table S3), which corresponds to an overall frequency of 0.03%. The DYS448 locus showed the highest number of null alleles (n = 59), followed by PPY23-specific markers DYS576 (n = 14), DYS481 (n = 11) and DYS570 (n = 11). In nine samples, a large deletion was detected at Yp11.2 encompassing the AMELY region that removed four adjacent loci (DYS570, DYS576, DYS458 and DYS481). All these samples were of Asian ancestry, namely Indians from Singapore, Tamils from Southern India and British Asians with reported origins from Pakistan or India, where this type of deletion is frequent [27,28]. Furthermore, two of the nine samples also carried a null allele at DYS448 [29]. Upon retyping with autosomal kits, all these samples showed a deletion of the AMELY gene locus. Another large deletion located at Yq11 and encompassing the AZFa region [30] affected two adjacent loci (DYS389I/II and DYS439) and was detected in one African American sample. Concomitant null alleles at three loci were observed in a Han Chinese sample (DYS448, DYS458, GATAH4) and an Indian sample (DYS392, DYS448, DYS549). The DYS448 and DYS456 markers were both not amplifiable in an Iraqi sample. Furthermore, null alleles were observed at DYS576 in four samples of Asian ancestry from the UK.
There were 69 copy number variants, mostly duplications, observed at 21 loci (all except DYS438 and DYS549). Copy number variants were most abundant at the markers DYS19 (n = 30) and DYS448 (28), followed by DYS481 and DYS570 (11 each; Table S3). Note that, at DYS385ab, only copy numbers larger than two are conventionally counted. One triplication each of the DYS19 and DYS448 markers was observed in African American samples and a Duplications of several consecutive loci in the AZFa region [31] were detected in three samples at DYS389I/II and DYS439 in two samples and additionally including DYS437 in a Hispanic American sample. A previously published duplication affecting the DYS570 and DYS576 markers [10] was found a second time in a German sample from our study.

Haplotype analysis
The 23 markers of the PPY23 panel were evaluated with respect to their haplotype diversity (HD), discrimination capacity (DC) and other forensic parameters such as random match probability (MP). In total, 18,860 different haplotypes were observed (Table 1). Of the 19,630 samples analyzed, 18,237 (92.9%) carried a unique haplotype. The most frequent haplotype was detected 11 times across three different populations, namely the Athapaskans, Estonians and Finns. Finland, Alaska and Kenya had the highest numbers of haplotypes occurring more than once (Table 1). Notably, eight Maasai individuals from Kinyawa (Kenya) and seven Xhosa from South Africa shared an identical haplotype, respectively. Haplotypes that were observed at least four times in a population were found in Reutte (Austria, Tyrolean; n = 1), Finland (Finnish; n = 5), Netherlands (Dutch; n = 1), Xuanwei (China, Han; n = 2), Kinyawa (Kenya, Maasai; n = 5), South Africa (Xhosa; n = 2), Peru (Peruvian; n = 1), Northern Alaska (USA, Inupiat; n = 5) and Western Alaska (USA, Yupik; n = 1) (data not shown).
Of the meta-populations formed according to continental residency, Asia showed the highest DC (>0.97), followed by Europe and Latin America (DC $ 0.96), and finally Africa (DC $ 0.85;

Comparative analysis of five forensic Y-STR marker sets
We compared the haplotype-based forensic parameters for five different sets of Y-STR markers commonly used in forensic practice, namely MHT, SWGDAM, PPY12, Yfiler and PPY23. Not surprisingly, a strictly monotonous relationship emerged among Y−STR marker   (Table 2). Correspondingly, DC increased from 43.0% for MHT to 96.1% for PPY23 (r = 0.97). HD showed a similar trend (r = 0.81) whereas MP decreased rapidly with increasing marker number (r = À0.81). Similar trends were observed in the metapopulations defined according to both continental origin and ancestry (Table S5). In summary, an increasing number of markers was found to be associated with an almost linear increase of all forensic parameters used to discriminate among individuals.

Comparison of short amplicon subsets of Yfiler and PPY23
The forensic parameters were compared of Y-STRs that have amplicons shorter than 220 bp and that are included in Yfiler (DYS456, DYS389I, DYS458, DYS19, DYS393, DYS391, GATAH4, and DYS437) or PPY23 (DYS576, DYS389I, DYS391, DYS481, DYS570, DYS635, DYS393, and DYS458). A substantially stronger discriminatory power of PPY23 compared to Yfiler was evident for these short haplotypes, mostly due to the higher diversity of PPY23specific markers DYS576, DYS481, DYS570 and DYS635. In particular, DC and the number of different short haplotypes were nearly twice as high for PPY23 as for Yfiler whereas MP was more than 4-fold smaller (Table 3).

Population structure
At the continental level, by far the largest genetic distances were observed between the African meta-population and the other four groups (all R ST > 0.2 for PPY23, p < 10 À4 ). Genetic distances between non-African meta-populations were much smaller although still significant (p < 10 À4 ). The smallest genetic distance was noted for North and Latin America (R ST = 0.009 with PPY23; Table 4). Similarly, at the population level, pairs of African and non-African populations showed much larger genetic distances (with R ST > 0.3 in some instances) than pairs of non-African populations or African populations (Fig. 5, Table S6). Upon AMOVA, 85.1% of the overall PPY23 haplotype variation was within populations, 9.1% was among populations within meta-populations, defined according to continental residency, and 5.8% was among meta-populations (Table S7).
With an increasing number of Y-STRs included in a marker set, the genetic distances between meta-populations decreased monotonical. However, the Yfiler panel was exceptional in this regard in that it yielded smaller distances than PPY23 for pairs of African and non-African meta-populations, but larger distances than PPY12 for pairs of non-African meta-populations (Table 4). Nevertheless, corresponding to the general trend, the proportion of variation both within populations and within meta-populations increased with increasing marker number, while the variation among populations decreased (Table S7). All covariance components associated with the different levels of continental groupings were significant (p < 10 À4 ) for all marker sets (data not shown).
Multidimensional scaling (MDS) analysis was performed based upon linearized R ST , separately for the five marker sets, considering either all 129 populations or the 68 populations of European residency and ancestry alone. When assessed for the PPY23 marker panel, Kruskal's stress value showed a clear 'elbow' with increasing dimensionality in both population sets, pinpointing an optimal trade-off between explained variation and dimensionality. For the worldwide analysis, two MDS components were optimal with PPY23 whereas four components were deemed optimal for the Europeans-only analysis. Both solutions explained the haplotypic variation well, with R 2 = 95.1% in the worldwide analysis and R 2 = 99.2% in the Europeans-only analysis. For comparability, MDS analyses for other marker panels were carried out with two or four  Table 4 Pairwise R ST value estimates (below the diagonal) and corresponding p value (above the diagonal) between meta-populations defined by continental residency for five forensic marker panels. dimensions, respectively. Haplotypic variation among populations within continental groups was lower than between continental groups (Fig. S3). For all five marker sets, the first MDS component clearly separated the African populations from the non-African populations (Fig. 6a, Fig. S4). Moreover, MDS also confirmed the previously reported East-West separation in the Y-STR haplotype variation [32] in the European analysis (Fig. 6b, Fig. S5). Higher MDS components were strongly dependent upon the respective marker set (Figs. S4-S6) and lacked comparably clear population patterns. Finally, the question was addressed of how closely related selected source and migrant populations might be in terms of their extant Y-STR haplotype spectra. A comparison between Han Chinese from Colorado (USA) and Han Chinese from Beijing, Chengdu (both China) and Singapore, respectively, yielded nonsignificant PPY23-based R ST values (all $ 0) (Table S6). In strong contrast, African Americans from Illinois, the Southwest and the whole of the US were quite distant to Africans from Ibadan (Nigeria) (R ST = 0.10, 0.13 and 0.09, respectively). Although likely not to represent the true source population, the distance between a group of Tamil from India and the Texan Gujarati population was as low as R ST = 0.008, while the distance between the Tamils and a migrant Indian population in Singapore equalled 0.01. Finally, the distance between European Americans from Illinois, Utah and the whole USA on the one hand, and the Irish on the other was found to be consistently small (R ST = 0.01, 0.04 and 0.02, respectively). A similar trend applied to other European source populations and to European migrant populations in South America. Thus, Argentineans of European ancestry from Buenos Aires, Formosa, Mendoza and Neuquen showed virtually zero genetic distance to Spaniards from Galicia (all three pairwise R ST $ 0).

Discussion
In this study, by far the largest collection of Y-chromosomal STR haplotypes worldwide, genotyped with the PowerPlex 1 Y23 kit (PPY23; Promega Corporation) were compiled and analyzed. As expected, PPY23 provided higher discriminatory power for  forensic purposes than other marker sets in our data. Remarkably, in almost one third of the populations studied, each sample could be identified unambiguously because all haplotypes in the population were unique. Most of the non-unique haplotypes were detected in populations that either passed through a recent bottleneck (e.g. Finland [33]) or that have a high reported degree of endogamy (e.g. Alaskan Natives and Kenyan Maasai). The higher number of unique haplotypes arising with PPY23 is a result of the larger number of markers in the kit and the preferential choice of markers with a higher discriminatory power. In particular, among the five Y-STRs with the highest diversity in our study, both globally and in all meta-populations, three (DYS481, DYS570 and DYS576) were specific to PPY23.
The practical utility of highly polymorphic Y-chromosomal profiles, for example, in biological stain analysis results from the greatly decreased chance of coincidental matches among different individuals. In the case of non-identity, exclusion becomes overwhelmingly likely. On the other hand, use of the PPY23 kit in kinship analysis or familial searching will render these practices increasingly complex because even close relatives may exhibit one or more mismatches, particularly at loci with high mutation rates. For these applications, there should be mandatory use of likelihood-based approaches that take allele frequencies, mutation rates and the presumed degree of relatedness properly into account [34].
The performance of forensic analysis with degraded DNA has also improved with the advent of PPY23. Typically, only partial DNA profiles can be generated from degraded DNA, with a pronounced dropout of longer amplicons. Compared to Yfiler, the short haplotypes of PPY23 (i.e. those comprising the eight markers with amplicons <220 bp) were much more variable. This difference is clearly due to the high mutation rates of four of the six markers specific to PPY23 selected for a short amplicon length. Thus, it is likely that the PPY23 kit will greatly improve the analysis of aged or otherwise damaged DNA samples.
The present study revealed a considerable number of null and duplicated alleles that were caused either by non-allelic homologous recombination between paralogous DNA sequences [35] orin the case of nulls -by deletions or primer site mutations [36]. Compared to Yfiler, the PPY23 allelic ladder has been enriched with new length variants to accommodate the various intermediate alleles that were observed as well.
Previous population genetic analyses consistently revealed that Y-chromosomal haplotypes have a highly non-uniform geographical distribution characterized by less variation within, and more variation between, population groups than autosomal markers [37]. This difference has been explained by (i) the smaller effective population size of Y chromosomes causing stronger genetic drift, and (ii) haplotype clustering due to widespread patrilocality. Therefore, population structure, will be more pronounced in Ychromosomal genetic databases and must be taken into account when database counts are used to quantify the evidential value of matches in forensic casework [38]. It has been shown, however, that so-called meta-populations may be constructed for Y-STRs that have low haplotypic variation among population groups within a meta-population, but large variation between metapopulations [39]. If necessary, such meta-populations can be defined ab initio using geography as a proxy of genetic relatedness, or by taking ethnic or linguistic data into account.
For all five forensic marker sets studied here, samples of African ancestry were clearly separated genetically from all other continental meta-populations. Pairwise genetic distances, measured by R ST , between Africa and the four non-African metapopulations were of similar magnitude. These results confirm a previous study of 40,669 haplotypes from 339 populations typed only for the nine markers of the MHT panel [39]. Moreover, genetic distances between non-African meta-populations were comparatively small. While North and South America still differed to some degree in the first MDS component, Eastern and Western Asia showed notable differences only in the second component. However, since the study here lacked samples from large parts of Northern and Central Asia, reasonable inference about the population structure in Asia as a whole was not possible.
Europe was the most intensively sampled continent in the present study and made up $60% of the overall sample size. A separate MDS analysis of samples of European residency and ancestry recapitulated the outcome of previous studies with smaller marker sets [32,40]. In particular, a clear East-West divide became evident in the first component of the MDS analysis for all five forensic marker sets. Finland and some regions of the Balkans (Croatia, Bosnia-Herzegovina) showed consistently large differences to other European populations in the second MDS component.
It must be emphasized that this population genetic analysis was based upon marker sets that were designed for forensic purposes, and that shared several markers. That all five sets yielded a similar picture of the geographic distribution of Y-STR haplotypes may therefore indicate that, in terms of population structure, the effects of markers included in the MHT (which are common to all five sets) dominate those of more mutable markers, such as PPY23-specific STRs DYS576, DYS570 and DYS481. Indeed, it has been shown recently that haplotypes comprising only rapidly mutating markers lack strong signals of population history (Ballantyne et al., submitted for publication).
In summary, the recently introduced PowerPlex 1 Y23 system provides unprecedented discriminatory power for forensic applications but at the same time shows similar patterns of population structure as established forensic marker sets. These patterns coincide well with groupings according to prior information on geographical and ethnic origin. In some cases, relatively large genetic distances were found for pairs of migrant and potential source populations, such as African Americans and autochthonous Africans. For forensic casework involving these populations, separate reference databases need to be established and used. On the other hand, populations showing small genetic distances, such as Western or Eastern Europeans, Arabs from Iraq and Lebanon or Mestizos from Peru and Bolivia may be merged into meta-populations for the purpose of reference databases. The annotated PPY23 data used in this study have been fully integrated into the YHRD database as of October 2013 (release 45, www.yhrd.org). Certain commercial equipment, instruments and materials are identified in order to specify experimental procedures as completely as possible. In no case does such identification imply a recommendation or endorsement by the National Institute of Standards and Technology nor does it imply that any of the materials, instruments or equipment identified are necessarily the best available for the purpose. The authors would like to acknowledge the Promega Corporation for providing financial support for several of the laboratories participating in this study.