Studies of modern Italian dog populations reveal multiple patterns for domestic breed evolution

Abstract Through thousands of years of breeding and strong human selection, the dog (Canis lupus familiaris) exists today within hundreds of closed populations throughout the world, each with defined phenotypes. A singular geographic region with broad diversity in dog breeds presents an interesting opportunity to observe potential mechanisms of breed formation. Italy claims 14 internationally recognized dog breeds, with numerous additional local varieties. To determine the relationship among Italian dog populations, we integrated genetic data from 263 dogs representing 23 closed dog populations from Italy, seven Apennine gray wolves, and an established dataset of 161 globally recognized dog breeds, applying multiple genetic methods to characterize the modes by which breeds are formed within a single geographic region. Our consideration of each of five genetic analyses reveals a series of development events that mirror historical modes of breed formation, but with variations unique to the codevelopment of early dog and human populations. Using 142,840 genome‐wide SNPs and a dataset of 1,609 canines, representing 182 breeds and 16 wild canids, we identified breed development routes for the Italian breeds that included divergence from common populations for a specific purpose, admixture of regional stock with that from other regions, and isolated selection of local stock with specific attributes.

breeding, giving rise to a myriad of phenotypic variants. However, most modern breeds are 200 years old and are of European ancestry (Club, 2006Lindblad-Toh et al., 2005;Parker et al., 2017).
High-density, genome-wide breed phylogeny analysis has been applied to well-established dog breeds to elucidate the complex structure of breed relationships and their development von Holdt et al., 2010;Mortlock, Khatkar, & Williamson, 2016). The largest phylogenetic study reported to date includes 161 dog breeds and over 1,300 dogs, consisting of 23 cladistic groupings based on genetic similarities (Parker et al., 2017).
Smaller studies exist which have addressed population demography in globally recognized breeds sampled from discrete locations with some success (Bigi, Marelli, Randi, & Polli, 2015;Ceh & Dovc, 2014;Koskinen & Bredbacka, 2000;Oliehoek, Bijma, & van der Meijden, 2009;Parra, Mendez, Canon, & Dunner, 2008;Pertoldi et al., 2013;Pribanova et al., 2009;Suarez, Betancor, Fregel, & Pestano, 2013;Wiener et al., 2017). While such studies provide insight into the status and genetic health of local populations, they do not necessarily consider the preexisting breedspecific genomic patterns developed prior to localization or the impact of import and export from the local breeding pool. Additionally, most prior attempts to address the genetic history and composition of so called "niche" dog populations have used low depth genome coverage incorporating only SNPs, microsatellites, or mitochondrial DNA, and utilizing a small number of breeds which are frequently chosen for their modern population numbers (Ceh & Dovc, 2014;Kang et al., 2009;Pires et al., 2006;Puja, Irion, Schaffer, & Pedersen, 2005;Suarez et al., 2013). These approaches address only superficial relatedness, diminishing the impact of artificial selection, natural divergence, and directed hybridization of breeds, which are evident only through analysis of deep genetic data.
Purebred dog registries, such as the American Kennel Club (AKC) or the Fédération Cynologique Internationale (FCI), attempt to classify dog populations as distinct breeds by enforcing regulations of pedigree tracking, adherence to a written standard, and population size. A genomic pattern for breed status classification based on values from 79 globally recognized purebreeds has recently been defined and includes threshold levels from four homozygosity measurements Dreger, Rimbault, et al., 2016). These metrics demarcate the extremes of single-dog length of homozygosity, ten-dog shared length of homozygosity, coefficient of inbreeding, and rate of shared homozygosity decay represented by standardized pure dog breeds.
The Italian Kennel Club (ENCI) was founded in 1882 with the registration of the Bracco Italiano pedigree (ENCI). Today, ENCI recognizes 16 different Italian breeds (ENCI), of which 13 are utilized in this study. Additionally, seven local landrace populations, defined as distinct dog varieties unique to a specific geographic region with historically limited breeding populations, are included in this study. The latter are not recognized by any purebred canine registry but, nonetheless, may display a genetic pattern consistent with other purebreeds (Alam, Han, Lee, Ha, & Kim, 2012;Puja et al., 2005;Tanabe, 2007;Wijnrocx, Francois, Stinckens, Janssens, & Buys, 2016; Yoo et al., 2017), such as has been observed for one Italian regional population, the Fonni's Dog Dreger, Rimbault, et al., 2016;Sechi et al., 2016). By focusing our analyses on breeds with diverse phenotypes that have all originated in a single country, we aim to employ genetic data to expand upon historical breed formation accounts and define the modes by which humans have produced recognizable and diversified dog breeds. While smaller-scale studies have identified genetic routes of breed development relative to discrete dog breed types (Akkad, Gerding, Gasser, & Epplen, 2015;Parra et al., 2008), these aims have not yet been applied to a large, comprehensive representation of diverse breeds.
In this study, we investigated 23 dog populations of Italian origin (Table 1, Figure 1) and a sampling of seven wild wolves belonging to the Italian gray wolf population which were collected from the Italian Apennine mountain ranges (hereafter referred to as "Apennine wolves"; Table 1). The Italian breeds were derived from populations selected for hunting, tracking, herding, property and livestock guarding, coursing, and companionship, each having a long and distinct history of development in niche Italian regions. We have determined the genetic population structure of these regional populations relative to a large sampling of 161 global dog breeds with well-established relationships (Parker et al., 2017) using data from the ~170K Illumina CanineHD SNP array. With a total of 1,609 domestic dogs, representing 182 breeds, and 16 wild canids, we have assembled the largest and most diverse dataset of canine genomes to determine breed status of domestic dog varieties in a singular geographic region. Utilizing analyses of phylogeny, identity-by-descent haplotype sharing, and admixture, we identify specific instances of three modes by which dog breeds have been formed: (1) specialization of breeds through segmentation within a phenotypically similar population; (2) directed attainment of common species-wide phenotypes within multiple diverse regionally isolated gene pools; (3) introduction of desired characteristics through introgression with distantly related populations.

| Samples
Blood samples from 256 dogs and seven Apennine wolves (Table 1) were collected following the European Rules for Animal Welfare when collected in Italy and approved National Human Genome Research Institute (NHGRI) Animal Care and Use Committee protocols when collected or received in the United States. The dog populations represent 23 breeds or varieties of historical Italian origin. The ENCI recognizes 13 of these populations as purebreeds and nine of which are also recognized by the AKC.
Ten populations are termed "varieties," and represent regional homogeneous populations, generally managed by informal registries and maintained by owners for specific behavioral applications. Three breeds, the Cane Corso, Italian Greyhound, and Neapolitan Mastiff, were sampled from Italian populations to compliment a preexisting collection of American populations of the same breeds. Animals selected for analysis were as dis- Genotyped samples were merged with a larger dataset of 1,346 dogs representing 161 breeds (described in (Parker et al., 2017)) and publically available genotypes from five New Guinea Singing Dogs and three Catahoula Leopard Dogs (Hayward et al., 2016) to produce a dataset of 1,609 dogs representing 182 breeds, seven Apennine wolves, seven global Grey wolf representatives, and two Golden Jackals.

| Data editing
The initial genotype dataset was screened to remove all SNPs with a call rate < 95% and a minor allele frequency<0.01% using Plink 1.9 software (Purcell et al., 2007). Only markers on autosomes were retained for subsequent analysis, resulting in 142,840 SNP variants.
After low-quality marker exclusion, dogs with a low call rate (<90%) were excluded. Duplicated individuals were detected using identityby-state (IBS) with a cutoff of >99%. One animal for each identical pair was excluded from subsequent analysis. Closely related pairs of dogs were identified and pruned using discordant homozygote count (Mendelian error, ME), where a pair of individuals was classified as related if their ME was <100.

| Phylogenetic and genetic distances estimation
All breeds in the dataset were resized to a maximum of 10 randomly selected individuals to avoid size-related bias affecting phylogenetic analysis. For individuals belonging to Cane Corso, Neapolitan Mastiff, and Italian Greyhound breeds, a maximum of 10 individuals were allowed for both sampling locations. Genetic distances between individuals were estimated using the "-distance" function of Plink 1.9 (Purcell et al., 2007) and the "1-ibs," "square," and "flat-missing" modifiers, with 100 bootstraps. Neighbor-joining phylogeny and consensus tree calculation, with Golden Jackal as the outgroup, were conducted using the PHYLIP software package (Felsenstein, 1989). Any dogs that did not group according to their expected breed with >90% bootstrapping confidence were considered to be misclassified and removed from subsequent analyses.
Breeds were assigned to clades, represented by different colors in

| Haplotype sharing
Haplotype sharing, as determined by identity-by-descent estimations among individuals, was calculated using the Beagle v4.1 software (Browning & Browning, 2013). Haplotypes shared between individuals, estimated on all markers, were reconstructed using 100-SNP sliding windows with a step of 40 markers each and allowing for trimming of up to 10 markers. The sum of shared haplotype lengths for each pair of dogs from different clades was detected, allowing for the calculation of the median length of shared haplotypes between all dogs of each pair of cross-clade breeds. All individual pairs that had no haplotype sharing were considered to have a median estimation of shared haplotypes of length = 0. Haplotype sharing was considered to be significant when median values were above the 95th percentile (9,257,455 bp) of all cross-clade breed pairs. The relative age of shared genetic history between breeds was estimated using previously published methods based on median size of haplotype sharing (Parker et al., 2017).   (Pickrell & Pritchard, 2012). The number of admixing events ranged from 0 (no migration events) to N, where N is the number of breeds in the analyzed branch. The migration model with the lowest absolute residual error was considered ideal.
Population genetic structure analysis was performed using the fast-STRUCTURE software (Raj, Stephens, & Pritchard, 2014). Population divisions (K) range from two to the number of breeds within each cladistic analysis. The optimal K value was determined by maximum likelihood of the subdivision (Pritchard, Stephens, & Donnelly, 2000).

| Shared homozygosity, decay rate, and inbreeding coefficient calculation
Shared genetic homozygosity (LnH) and homozygosity decay rates for each breed were calculated from the IlluminaHD SNP chip genotypes using Plink 1.9 (Purcell et al., 2007) as outlined previously Dreger, Rimbault, et al., 2016). Three breeds were collected in both the United States and Italy and maintained for independent analyses. Breeds were pruned to a maximum of ten random individuals, and only breeds with a minimum of ten individuals were used in LnH calculations. Calculation of decay rate was conducted for all breeds with a minimum of five individuals. All Italian breeds, however, were included in the calculations for single-dog homozygosity and SNP-based inbreeding coefficient (F). The breed-specific F was determined by averaging individual F values for all dogs of a single breed. These values were compared to previously reported purebred genetic parameters Dreger, Rimbault, et al., 2016).

| Breed status classification through genomic metrics
In a previous study, we analyzed a population of 79 dog breeds to outline the genetic parameters of homozygosity and inbreeding expected from purebred populations (Dreger, Rimbault, et al., 2016). These metrics were likewise calculated for the Italian breeds listed in Table 2 and compared to the expected purebred values as follows: single dog LnH ≥ 1098.365 Mb, 10 dog shared LnH ≥ 48.092 Mb, homozygosity decay ≤ 0.607, F ≥ 0.133 Dreger, Rimbault, et al., 2016. Fifteen of the Italian dog breeds or populations expressed values within purebred ranges for three or more of the four metrics and presented as a single-breed-specific clade in the bootstrapped phylogeny. We found that 12 of the initial 23 Italian populations can be classified as "breeds," including four not yet recognized by ENCI or AKC ( Table 2). Seven of the initial 23 populations have sufficient measures of homozygosity and inbreeding to qualify as breeds, but fail to cluster as unique breed-specific phylogenetic clades and are therefore categorized as paraphyletic "varieties" of other populations.

Exceptions of note include the two hair-type populations of Segugio
Italianos which, when treated as one population, qualify as a breed.
Also, the Pastore della Sila falls short on measures of breed or variety distinction.
This collection of sample populations allowed visualization of multiple stages of breed formation. While each population is managed by human-driven selection toward a breed standard, the various populations present distinct time points along the road to breed classification. For the purpose of further analysis, breed classification was considered equivalent for all populations. As such, each "breed," "variety," or "nonbreed" was treated as a distinct population in analyses that required prior assignment of identity.

| Phylogeny analysis reveals breed formation through convergent and divergent selection
Following sample size reduction to a maximum of ten dogs per breed,

| Haplotype sharing analysis identifies breed specialization through genetic isolation or purposeful admixture
While phylogenetic analysis displays breed relationships based on sequential differentiation of continually smaller subpopulations, observation of haplotype sharing through identity-by-descent accounts for admixture between distantly related breeds. Across-clade haplotype sharing was calculated for each as determined by the bootstrapped genetic distance phylogeny (Figure 2) to better understand breed development through introgression. Pairwise comparisons of identityby-descent haplotype sharing between individual dogs from different breeds were considered, resulting in a total of 1,240,389 comparisons.
The median shared haplotype value of each breed-to-breed comparison was then calculated; 287 were >9,257,455bp (top 5% of all haplotype sharing), and these measures were considered significant.
While most potential comparisons did not involve an Italian breed, 133 of those observed included one or more Italian breeds (Figure 3, Table S1).
Nine Italian breeds show significant haplotype sharing only with breeds within their own clades ( Figure 3,  (Table S1). Interestingly, however, for all six of the Italian herding breeds, the German Shepherd Dog haplotype is the most predominant.
To our surprise, we detected several instances where a single Italian herding breed shares genetic history with a breed from outside the clade.

| Single clade analysis
Admixture analysis with TREEMIX and structural analysis with fast-STRUCTURE were conducted on each breed group that contains an Italian breed as a separate test of breed relationships. The ideal Silas express the breed-specific signatures at a level of 50%. This implies that these breeds are comprised of dogs that vary from a central breed definition and that the long history of strong selection which defines many established breeds is lacking in these populations.

| Impact of geographic separation on breed identity and differential selection
We additionally examined a set of three breeds for which we collected samples in both the United States and Italy: Italian Greyhounds (Figure 1a), Cane Corsos (Figure 1v), and Neapolitan Mastiffs ( Figure 1w). As population substructure has previously been observed in international populations of breeds (Pedersen, Liu, Leonard, & Griffioen, 2015;Pedersen, Liu, McLaughlin, & Sacks, 2012;Quignon et al., 2007), we hope to characterize genetic divergence between US and Italian populations of Italian breeds, caused by importation bottlenecks, regional popular sire effects (Leroy & Baumung, 2010;Pribanova et al., 2009), or variation in selection. All three were correctly grouped by breed. Within the breeds, however, distinctions emerged. The Italian Greyhounds split into two sub-branches of American and Italian dogs with 90% confidence. The Neapolitan Mastiffs, while forming a distinct breed, did not show separation by country. The Cane Corsos did not group as a breed, rather they are paraphyletic to the Neapolitan Mastiffs, as was noted previously (Parker et al., 2017).
Haplotype sharing analysis was then applied to detect the presence of variable genetic histories between the US and Italian collections, revealing new subtleties (Table S1). First, we found that the American population of the Cane Corso shows significant haplotype sharing with the Rottweiler, reproducing previous results (Parker et al., 2017

| Timing of breed formation
We used the equation described by Parker et al. (Parker et al., 2017), to calculate the number of the years since shared genetic history was observed between breeds, based on the amount of haplotype sharing across breeds. A shared genetic history implies that the two breeds in question were either (1) the same population that diverged into two distinct populations or (2)  importantly, that the genetic foundations of these breeds were laid in the more distant past.

| Italian dog breeds
While most modern dog breeds have been developed in the past 200 years (Club, 2006Lindblad-Toh et al., 2005;Parker et al., 2017), some populations have taken recognizable forms, suited to distinct tasks, for much longer. Modern Italy stakes claim to a minimum of 23 dog varieties, of which 13 are internationally recognized, and 10 are known locally (Table 1). These breeds can be broadly classified into seven phenotypic categories: scent hound, flock guardian, mastiff, and sighthound, as were described by historical Roman authors (Xenophon;Columella, 1954), as well as the more modern hunting, herding, and companion breeds.
We have combined whole-genome SNP data from 263 dogs repre- unavoidably rely on a small source pool (Alam et al., 2012;Calboli, Sampson, Fretwell, & Balding, 2008;Kumpulainen et al., 2017;Pfahler & Distl, 2015;Streitberger et al., 2012;Wijnrocx et al., 2016). As such, dog breeds which perform similar tasks are frequently more closely related to each other than to breeds with different occupations, allowing for the visualization of phylogenetic breed clades sometimes segregating by broad behavior patterns (Parker et al., 2017;Vaysse, et al. 2011;von Holdt et al., 2010;Parker et al., 2004;and Figure 2). Within each of those application-based clades, individual breeds may diverge by specialized skill, appearance, or environmental adaptation. We sought to match methodologies to modern breeds, looking for breed formation commonalities in subsets of phenotypes.
An example of breed development through divergence from a common genetic pool is the Small Spitz breeds. Phylogenetically, the Small Spitz clade (Figure 2) includes the Pomeranian, American Eskimo, and Volpino Italiano (Figure 1p)  Conversely, the Levriero Meridionale from Southern Italy shows no heritage with the Cirneco dell'Etna and its close relatives, the Pharaoh and Ibizan hounds, but rather the Sloughi and Azawakh breeds originally from Northern Africa.
An additional example of genetic data superseding historical lore is that of the Segugio Italiano breed which, since 1989, has been classified as two separate breeds, the Pelo Raso (smooth haired) ( Figure 1j) and Pelo Forte (rough haired) (Figure 1k) (Pallotti et al., 2017). Early writings by the Greek historian Arrian (c. 92-175 AD) describe "Segusian hounds" as "shaggy" (Arrian, 1831), suggesting that the original breed may have more closely resembled the Pelo Forte of today.
The genetic distance phylogeny and structure predictions presented here failed to reproducibly distinguish the two Segugio populations, implying that at the genetic level, they are the same breed. A similar relationship pattern was previously noted between the Tervuren and Groenendael varieties of Belgian Sheepdog, classified as separate breeds in the United States since 1959 (von Holdt et al., 2010;Parker et al., 2004Parker et al., , 2017. Considering both examples, then, it is not surprisingly that recent separation of breeding stock based on phenotype is inconsequential and does not override the genetic patterns laid out by a breed history.

| Italian herding breeds and the German shepherd dog
Herding behavior is suggested to have arisen from multiple geographic and genetic backgrounds (Arnott, et al. 2015;Storteig Horn, 2017;Parker et al., 2017)

| Flock guardian breeds develop along lines of transhumance
The flock guardian breeds of Italy present a surprisingly accurate reflection of human industry. Management of domestic sheep and goats has been a key need since early human occupation, with skeletal remains of domestic caprines and dogs found in the Neolithic settlements of Sicily (Debono Spiteri et al., 2016;Leighton, 1999).
The importance of the sheep industry in Italy continued through the Roman era (Canfield, 1853;Lambert 2014). Due to the topography and climate of the region, transhumance, the practice of moving sheep flocks and the dogs that protected them from mountainous pastures in summer to sheltered lowland pastures during winter, has long been employed. The standard routes, termed "tratturi," between summer and winter pastures were etched into the landscape (Higgs, 1976;Pounds, 1990 (Pounds, 1990), breed formation is more likely to occur by convergent selection, as demonstrated with the guardian breeds of the regions.

| Summary
Human-driven selection has yielded hundreds of dog breed variants.
Utilizing 23 closed dog populations from a distinct geographic location, in-depth demographic analyses relative to a large cohort of established dog breed populations reveals three modes through which selection has been applied: divergence from a large collection of phenotypically similar individuals; independently through repeated selection for a particular purpose without influence of introgression from other similar breeds; and inclusion of foreign breeding stock with desired traits. The present sampling of Italian breeds allows for the analysis of populations passing throughout multiple breed development stages.
The impact of human advancement on breed distribution and development is readily visible in relation to the agricultural practice of transhumance and the spread of livestock guardian breeds along the resultant routes. We have further elucidated a theory for a common agrarian shepherd dog type, regional to Central Europe, which likely existed prior to modern breed differentiation. The genetic signatures of this population can be recognized as having contributed to many regional breeds of European herding dogs, including the wellknown German Shepherd Dog, and the multiple Italian herding breeds adapted to working in the pastoral regions of Northern Italy. The analysis of additional regional dog breeds will expand our understanding of the impacts and routes of artificial selection, revealing ever more interplay between human and canine coevolution.

ACKNOWLEDGMENTS
The authors are grateful to ENCI (Italian Kennel Club) for useful in-