Genetic variability of Akhal-Teke horses bred in Italy

Background The Akhal-Teke horse (AKH) is native of the modern Turkmenistan area. It was introduced in Italy from 1991 to 2000 mainly as an endurance horse. This paper characterizes the genetic variability of the whole Italian AKH horse population and evaluates their inbreeding level by analyzing microsatellite markers and mitochondrial D-Loop sequences. Methods Seventeen microsatellite marker loci were genotyped on 95 DNA samples from almost all the AKH horses bred in Italy in the last 20 years. Standard genetic variability measures (Ho, He, FIS) were compared against the same variables published on other eight AKH populations. In addition, 397 bp of mtDNA D-loop region were sequenced on a sub-group of 22 unrelated AKH out of the 95 sampled ones, and on 11 unrelated Arab horses. The haplotypes identified in the Italian population were aligned to sequences of AKH (56), Arab (five), Caspian Pony (13), Przewalskii (two) and Barb (15) horses available in GenBank. The Median Joining Network (MJN), Principal Component Analysis (PCA) and Neighbor-joining (NJ) tree were calculated on the total 126 sequences. Results Nucleic markers showed a high degree of polymorphism (Ho = 0.642; He = 0.649) and a low inbreeding level (FIS = 0.016) in Italian horses, compared to other AKH populations (ranged from −0.103 AKH from Estonia to 0.114 AKH from Czech Republic). High variability was also recorded in the D-Loop region. 11 haplotypes were identified with haplotype diversity (hd), nucleotide diversity (π) and average number of nucleotide differences (k) of 0.938, 0.021 and 6.448, respectively. When all the 126 D-Loop sequences were compared, 51 haplotypes were found, and four were here found only in the Italian AKH horses. The 51 haplotypes were conformed to eight recognized mtDNA haplogroups (A, C, F, G, L, M, P and Q) and confirmed by MJN analysis, Italian horses being assigned to five haplogroups (A, C, G, L and M). Using a PCA approach to the same data, the total haplotypes were grouped into two clusters including A+C+M+P and G+F haplogroups, while L and Q haplogroups remained ungrouped. Finally, the NJ algorithm effectively discretizes only the L haplogroup. All the above data univocally indicate good genetic variability and accurate management of the Akhal-Teke population in Italy.


INTRODUCTION
In the recent past, several molecular markers and techniques have been used to investigate the genetic origin and variability within and across local endangered populations to disclose biodiversity and to support conservation plans (Cothran, van Dyk & van der Merwe, 2001;Bömcke, Gengler & Cothran, 2011;Kusza et al., 2013).
Mitochondrial DNA (mtDNA) has also been widely used to estimate genetic diversity, phylogenetic relationships and maternal origins among closely related modern populations. Particularly for such purposes, the mtDNA D-loop hypervariable region 1 has been sequenced in equids (Beja-Pereira et al., 2004;McGahern et al., 2006;Royo et al., 2005;Prystupa et al., 2012;Zhang et al., 2012;Hristov et al., 2016;Sziszkosz et al., 2016;Cieslak et al., 2010;Cieslak et al., 2017;Cozzi et al., 2017). The distribution of haplogroups of the hypervariable region 1 and the complete sequence of the mtDNA D-loop from samples of horse remains from archaeological sites have been recently analyzed by several study teams to date horse domestication events (Jansen et al., 2002;Lippold et al., 2011;Achilli et al., 2012;Librado et al., 2017). These authors find a large number of maternal lineages in the domestic horse gene pool (Vilà et al., 2001;Cieslak et al., 2010). The geographic structuring of mtDNA is low today, but there is also evidence that some maternal lineages ware regionally confined (Cieslak et al., 2010). Recently, Elsner and collaborators (2017) tracked the dynamics of horse population migrations due to climate change in the Pleistocene-Holocene transition from nine archaeological sites adjacent to the Swiss Jura Mountains, providing a novel approach to the evolutionary study of the impact of human encroachment and habitat change on the development of horse populations.
Akhal-Teke (AKH) is a very ancient horse breed that is native of central Asia, precisely the southern region of modern Turkmenistan, and has adapted well for endurance capacity to cover long distances in harsh climate conditions of Central Asian steppes. The breed during the last 100 years has been selected strongly for speed (Leisson et al., 2011). As reported by several authors using protein markers, microsatellites, Single Nucleotide Polymorphisms (SNP) and mitochondrial DNA, the AKH belongs to the clade of Middle Eastern Breeds that includes Arabian, Caspian pony and Turkoman horses (Cothran, van Dyk & van der Merwe, 2001;Luis et al., 2007;Bömcke, Gengler & Cothran, 2011;Priskin et al., 2010;Achilli et al., 2012;Petersen et al., 2013).
The AKH breed is renowned for the distinguishing metallic shine of its hair. Moreover, natural athleticism makes a well-rounded sports horse of this breed, which is good at dressage, show jumping, eventing, racing, and endurance riding competitions. This ancient breed represents a unique genetic resource that differs from all the other breeds worldwide in terms of type, conformation and use. The pedigree recording started in 1885, and the Studbook was closed in 1932. The AKH was introduced in Italy from 1991 to 2000 mainly as an endurance horse.
This paper conducts a genetic variability analysis based on genomic DNA loci (microsatellite markers) and mtDNA D-loop of AKH horses reared in Italy. Furthermore, a meta-analysis was performed using horse mtDNA sequences available in GenBank, besides a comparison with literature data on the genetic variability of AKH horses bred in other countries.

DNA extraction, genotyping and sequencing
The Italian AKH Association's reference laboratory for genetic testing for parentage control at the Department of Veterinary Medicine (University of Milan) maintains the bio-bank of the biological samples of all AKH horses bred in Italy since they were first imported (107).
All applicable international, national and/or institutional guidelines for the care and use of animals were followed. The bio-banked samples were approved by the Ethics Committee of Università degli Studi di Milano Prot. OPBA_56_2016.
Ninety-five of the 107 blood samples (males and females) were genotyped using 17 microsatellite markers of StockMarks for the Horses R Genotyping Kit (utilized for parentage testing), including 12 from the official ISAG panel (http://www.isag.us/Docs/ EquineGenParentage2016.pdf). All PCR products were run on ABI PRISM R 310 Genetic Analyzer, and fragments were analyzed with the Genescan R and Genotyper R software (http://tools.thermofisher.com/content/sfs/brochures/cms_040300.pdf).
According to the available pedigrees (first generation/parents), the horses with no common mare and stallion (n = 22) were sequenced, and so were 11 additional Arabian horses reared in Italy and bio-banked at the University of Milan. The 379 bp fragment of the mtDNA D-loop region (nucleotide position: 15,382-15,778) was amplified and sequenced by classic Sanger methods, using previously published primer pairs (Cozzi et al., 2004). The PCR products were purified utilizing SEPHADEX R G-50 (SIGMA) and sequenced for both strands by using the Big Dye R Terminator v.3.1 Cycle Sequencing Kit (Applied Biosystems, Foster City, CA, USA) according to the manufacturer's guidelines. The PCR products were processed on an ABI PRISM R 310 Genetic Analyzer, and the nucleotide sequence was determined using the ABI PRISM R DNA Sequencing Analysis Software.

Statistical analysis on microsatellite data
Allelic frequencies, genetic equilibrium according to Hardy-Weinberg (HW) (P-val), linkage disequilibrium (LD) and inbreeding coefficient (F IS ) were estimated using the GENEPOP software (Raymond & Rousset, 1995). LD between a pair of loci was also tested for genotypic data using the likelihood-ratio test of Arlequin v.3.5.2.2 software (Excoffier & Lischer, 2010). The number of alleles (NA), the effective number of alleles (NE) and the observed (H o ) and expected (H e ) heterozygosity per locus and on the entire population were calculated with POPGENE v.1.32 (Yeh, Yang & Boyle, 1997). Allelic richness (AR) per locus and in the population was also used to estimate genetic population diversity. AR was standardized for variations in sample size, and was calculated using FSTAT 2.9.3 (Goudet, 2002), based on 79 diploid individuals with no missing genotypes.

Statistical analysis on sequence data
Sequences were checked for the presence of ambiguous bases by using the software Chromas v.2.5.1 (http://www.technelysium.com.au/). In order to define the haplotypes (H-) all obtained sequences were aligned with the complete horse mtDNA sequences X79547 (Xu & Arnason, 1994) and JN398377 (Achilli et al., 2012), widely used as reference sequences, using the ClustalW algorithm implemented in BioEdit (Hall, 1999).
In order to define the genetic relationships among our mtDNA sequences of AKH and those available in Genbank for other AKH lines and for other populations originated from the same Middle Eastern clade, a total of 104 sequences (two reference mtDNA, two Przewalski, five Arabian horse, 15 Barb horse and 13 Caspian pony, 56 Akhal-Teke and the 11 Arabian horses bred in Italy) were considered. All the sequences were trimmed at 254 bp to maximize the number of sequences included. Haplogroup definition was performed according to Achilli et al. (2012). The mtDNA sequences from GenBank database used for haplogroup definition and relationship investigation were reported in Table S1.
Three different approaches were used to investigate the haplotype and haplogroup relationship: i) the Median-Joining Networks (MJN) analysis performed by Network v.5.0 software (http://www.fluxus-engineering.com); the default setting of Epsilon (0) was applied and the mutational hot spots present in the 254bp fragment were down-weighted as suggested by Jansen et al. (2002) and Cieslak et al. (2010); ii) the Neighbor-joining (NJ) tree (Saitou & Nei, 1987) based on the Kimura-2-parameter model distances (Kimura, 1980) was calculated among the haplotypes with MEGA v.7.0.14 software (Kumar, Stecher & Tamura, 2016); the bootstrap analysis, running 1,000 bootstrap replicates, was applied to check the robustness of the resulting dendrogram and a cut-off bootstrap at a significant value of 50 was applied; iii) the Principal Component Analysis (PCA) with a minimum spanning tree based on haplotype sequences was calculated using the PAST v.2.7 software (Hammer, Harper & Ryan, 2001). The pairwise population genetic differentiation (F ST ) was calculated by Arlequin v.3.5.2.2 (Excoffier & Lischer, 2010) and used to evaluate genetic differentiation among Italian AKH, Italian Arabian, and the 93 horse GenBank sequences.
All the horse mtDNA sequences obtained in this study were deposited (GenBank ID: MF580226-MF580244).

Microsatellite markers
The results of microsatellite analyses are reported in Table 1. A total of 112 allelic variants were found on 17 microsatellites, ranging from four (HTG6 and HTG7) to 10 alleles (ASB2). The microsatellite LEX03 (nine alleles), mapping on X chromosome, was excluded from the statistical analyses. The mean number of alleles (NA) calculated on 16 microsatellites was 6.4, whereas the effective mean number of alleles (NE) was 3.1. The analyses of the Hardy-Weinberg genetic equilibrium (P-val), calculated using HW formulae, showed three microsatellites (AHT4, HTG4 and CA425) that were not in genetic equilibrium (P < 0.05). This drift is probably caused by strongly unbalanced allelic frequencies in the small population originally imported to Italy from 1991 to 2000 (24 horses: 17 females and seven males).
According to the position (cM) of the microsatellites as reported by Swinburne et al. (2006), LD analyses were performed only for the markers located on the same chromosome: HMS6 and HTG7 (chr 4), HMS3 and HTG4 (chr 9) and ASB2, HTG6 and HMS1 (chr 15) ( Table S2). LD between a pair of loci was tested in all samples together and by groups of individuals: those imported in Italy from 1991 to 2000 (Founders) and the ones born in Italy after 2000 (New population). Considering all samples together, all pairs of loci were in LD, except for HTG7 vs HMS6 (Table S2). The group of Founders was in equilibrium for all pair of loci; instead, for the New population the LD was observed for ASB2 vs HMS1 (GENEPOP and Arlequin) and ASB2 vs HTG6 (Arlequin) pairs of markers (Table S2). Hence, we calculated the NE based on the observed allelic frequencies that ranged from 1.7 (HTG4) to 5.3 (ASB2), thus revealing that some loci had a very unbalanced representation of allele variants. These included AHT4 and HTG4, two of the three loci that were not balanced, which present an NE value of 2.6 and 1.7, respectively.
Allelic richness (AR) is a parameter that is used to estimate diversity in the population. It ranges from 4 (HMS1) to 9.8 (ASB2), with a mean value of 6.4, comparable to the AR of 6.09 reported by Warmuth et al. (2011).
H o and the H e values ranged from 0.379 (HTG4) to 0.810 (ASB2), with a mean value of 0.642 and, from 0.410 (HTG4) to 0.816 (ASB2), with a mean value of 0.649, respectively The F IS values per locus that ranged from −0.113 (HTG10) (outbreeding) to 0.174 (HTG6) (inbreeding); the overall F IS value for the population was 0.016.
We estimated the diversity parameters on the AKH horses imported in Italy from 1991 to 2000. The values obtained were compared with those estimated in the AKH population born in Italy (after 2000). The two groups of horses are in HW equilibrium and have similar genetic values, except for the F IS value that denotes, in the first nucleus, an apparent excess of heterozygosis Table S3. Table 2 shows the eleven haplotypes found in AKH reared in Italy (H1 to H11). The 22 polymorphic sites identified were 19 parsimony informative and three singleton sites (15495, 15538 and 15709). All the polymorphic sites reported in Table 2 were transition sites. The maximum composite likelihood estimate of the nucleotide substitution pattern, identified for the 22 sequences, were 32.5% (A), 26.6% (T), 29.1% (C), and 11.8% (G), depending on the order of nucleotide composition in the vertebrate mitochondrial genome (A>C>T>G). The A+T content at the mtDNA D-loop region was 59.1% and exceeded G+C content (40.9%). Zhang et al. (2012) for Chinese Native horse breeds and Sziszkosz et al. (2016) for Hungarian Gidran horses report similar rich A+T content values in horse mtDNA, precisely 55.8% and 64.1% respectively.
The values of variability parameter haplotype diversity (hd), nucleotide diversity (π ) and average number of nucleotide differences (k) were 0.938, 0.021 and 6.448, respectively.
The pairwise distance matrix ( Fig. 1) represents the number of nucleotide substitutions in each haplotype, where Hap_R represent the haplotype of the reference sequences. Inter-haplotypic differences ranged from 1 (Hap_R vs Hap_1 and Hap_10 vs Hap_11) to 13 (Hap_4 vs Hap_10 and Hap_4 vs Hap_11).
A total of 51 haplotypes with 36 variable sites (seven singleton and 29 parsimony informative) were identified by the alignment of the 126 mtDNA D-loop sequences, showing an hd value of 0.971. The detailed sequence variations of haplotypes and the identified haplogroups were reported in Table S4.
Four haplotypes (H16, H18, H21 and H22) were found exclusively in Italian AKH and were not reported in GenBank for the same breed. Additionally, our AKH sequences were grouped in five haplogroups A, C, G, L and M, as defined by Achilli et al. (2012).
The MJN ( Fig. 2A) showed the relationships among the 51 haplotypes occurring in the 126 mtDNA sequences, and discretized them in the eight haplogroups. Relationships  among sequences were also defined in the NJ tree (Fig. 2B) and in the PCA analyses ( Fig. 2C) that, nevertheless, are unable to dissect the variation among haplogroups as well as the MJN. Finally, the matrix of pairwise population genetic differentiation (F ST ) showed a remarkable similarity between our AKH and other AKH populations, as expected (Table S4). The NJ tree based on Kimura-2-parameter model distances and performed by applying a cut-off value of 50 for bootstrap value significance clearly separated haplogroups L from the others (62% of bootstrap value) (Fig. 2B). Haplogroups A and Q are divided into a different branch of the dendrogram (Fig. 2B). Particularly, haplotypes H2, which belong to haplogroup A, group together with haplotypes belonging to haplogroups C in another branch of the tree. The H2 located in the same operational taxonomic unit (OTU) of H3 and H50 differs from these by one mutation 15,650 (H2 vs H3) and by two mutations 15,650 and 15,728 (H2 vs H50), respectively. The haplotype H42, formally belonging to The PCA analysis showing a minimum spanning tree based on sequence haplotypes, is displayed in Fig. 2C. The graphic placed in Fig. 2C reports the percentage of variance explained by PCA components. The first component (PC1) that explains most of the variability (21.2%) clearly separated haplogroups G+F and L, whereas the second component (PC2) (17.9%) defines haplogroup Q, confirming the MJN and NJ tree

DISCUSSION
In Italy, AKH horses are reared by few breeders and are mainly trained for shows and endurance races. The small number of animals and the use of the same stallions for reproduction entail a potential risk of decreasing variability. Hence, we analyzed genetic variability by using both microsatellite DNA nuclear markers and mitochondrial DNA variations.
As mentioned above, AKH belong to the clade of Middle Eastern Breeds that involved Arabian horses, Barb horses, Turkoman horses and Caspian pony. The native area of the AKH is considered one of the two hotspots for horse diversity. In fact, the Caspian Area of Western Asia served as refugium for wild horses in the early and mid-Holocene when, due to a lack of suitable habitat as a result of forest expansion, the wild horses migrated to more comfortable areas for this steppe-adapted species (Warmuth et al., 2011). Later, around 3500 BC, the Caspian Area and the Kazakh steppe were horse domestication points from where horses spread across Eurasia (Outram et al., 2009;Cieslak et al., 2010;Warmuth et al., 2012).
The mean number of alleles (MNA) (6.5) obtained in Italian AKH was comparable to those reported by Szontagh et al. (2005) for the AKH population bred in Turkmenistan and in Turkoman horses, using the same panel of 12 microsatellites (Table 3).
The NE value indicates the actual impact of alleles on population variability, precisely, a small value reports an unbalanced frequency of the alleles at a certain locus. The NE is associated to the expected heterozygosity. Indeed, heterozygosity is highest when allelic frequencies are balanced, and every allele contributes to the absolute number of alleles detected per locus.
In the Italian AKH horses we identified a lower effective number of alleles (NE) compared to the other AKH populations. Indeed, in the Italian population the founder effect is still having a strong impact on the population's genetic composition due to the low number of stallions and dams imported from abroad to constitute the first Italian nucleus in 1991-2000. Particularly, the low number of stallions used in the last twenty years by breeders appears to have strongly influenced allele frequencies, as can be derived by the deviation from HW equilibrium for some microsatellite markers as shown in Table 1.
Several authors reported the influence of inbreeding, founder effect and population subdivision on LD. Nevertheless as pointed out by Farnir et al. (2000), McRae et al. (2002) and Miller et al. (2015) after one generation, it can be observed that the major part of the alleles in most of the offspring could be in the same phase as in the parents. The LD here found suggests a possible bottleneck effect occurred when the Founder group has been imported in Italy (Table S2). Despite that, the Italian Akhal-Teke population maintained a large genetic variability and low degree of inbreeding over the years.
The MNA value for Italian AKH, comparing with values obtained by Jiskrová, Vrtková & Prausová (2016) using a panel of 13 microsatellites, is lower respect to the Czech Republic AKH (Table 3). Additionally, the effective number of alleles is similar to those found by Jiskrová, Vrtková & Prausová (2016) in the small AKH population bred in Estonia and in Switzerland (Table 3).
The overall F IS values obtained using the 17 microsatellites panel (0.016) and both 12 and 13 microsatellites panels (0.009 and 0.004), revealed a low level of inbreeding, i.e., the Turkmenian population, indicating the fact that Italian breeders are prone to avoid mating between related individuals as much as possible. This trend can be highlighted by comparing the F IS values in Italian horse to other populations (Table 3). US and Czech AKH horses appear to have a higher level of inbreeding, compared to Italian AKH, which reflect good breeding practice of Italian breeders.
Finally, H o and H e in the Italian population are similar to those observed in the other AKH populations (Table 3), showing that, overall, selection pressure for endurance and athleticism is not intense and genetic variability appear to be conserved in the worldwide AKH population.
The high genetic variability of mtDNA is proof of the broad genetic base of founder mares in the AKH population, though only the sire lines were historically documented (http://www.akhal-teke.org/at-lines.html), whereas no information was recorded for the female lines.
As expected, we found a great similarity between our AKH sequences and all the AKH sequences from GenBank, in contrast with Priskin et al. (2010), who identified significant differences in F ST values (F ST = 0.075, P = 0.027) among modern AKH, and the 18 Kazakh large genetic variability with a low degree of inbreeding has maintained the breed over the years, confirming a well-balanced breeding practice. In fact, microsatellite markers showed a high level of heterozygosity that was comparable to findings in AKH populations reared in other countries.
The mtDNA analyses on the Italian AKH confirmed the breed's high matrilineal variability, and added four new haplotypes that were never described before in the breed. The results also confirmed the close genetic relationship with the other AKH populations and the Middle Eastern horses represented by Arabian horse, Caspian Pony and Barb horse breeds, estimated as one of the most ancient and primitive clades of Equus caballus.