The genomic architecture of mastitis resistance in dairy sheep

Mastitis is the most prevalent disease in dairy sheep with major economic, hygienic and welfare implications. The disease persists in all dairy sheep production systems despite the implementation of improved management practises. Selective breeding for enhanced mastitis resistance may provide the means to further control the disease. In the present study, we investigated the genetic architecture of four mastitis traits in dairy sheep. Individual animal records for clinical mastitis occurrence and three mastitis indicator traits (milk somatic cell count, total viable bacterial count in milk and the California mastitis test) were collected monthly throughout lactation for 609 ewes of the Greek Chios breed. All animals were genotyped with a custom-made 960-single nucleotide polymorphism (SNP) DNA array based on markers located in quantitative trait loci (QTL) regions for mastitis resistance previously detected in three other distinct dairy sheep populations. Heritable variation and strong positive genetic correlations were estimated for clinical mastitis occurrence and the three mastitis indicator traits. SNP markers significantly associated with these mastitis traits were confirmed on chromosomes 2, 3, 5, 16 and 19. We identified pathways, molecular interaction networks and functional gene clusters for mastitis resistance. Candidate genes within the detected regions were identified based upon analysis of an ovine transcriptional atlas and transcriptome data derived from milk somatic cells. Relevant candidate genes implicated in innate immunity included SOCS2, CTLA4, C6, C7, C9, PTGER4, DAB2, CARD6, OSMR, PLXNC1, IDH1, ICOS, FYB, and LYFR. The results confirmed the presence of animal genetic variability in mastitis resistance and identified genomic regions associated with specific mastitis traits in the Chios sheep. The conserved genetic architecture of mastitis resistance between distinct dairy sheep breeds suggests that across-breed selection programmes would be feasible.


Background
Mastitis is an inflammation of the mammary gland usually caused by pathogens, mainly bacteria, developed in the gland cistern after penetration through the teat canal. It is the most prevalent and costly disease in the dairy industry due to reduced and discarded milk, early involuntary culling, veterinary service and labour costs [1,2]. There is also the possibility for the spread of a zoonotic disease as well as the development of microbial resistance to antibiotics [1][2][3]. Mastitis compromises animal welfare causing pain, anxiety, restlessness and changes in feeding behaviour [4]. The issue of mastitis is a central element in the design and implementation of management schemes in dairy sheep farms. The notion is that for every case of confirmed clinical mastitis there are at least 40 other animals in the flock with subclinical mastitis. Reviews of management practices and practical approaches to control sheep mastitis have been published [5,6].
The susceptibility of dairy animals to udder infection is heritable [2,[7][8][9] but the current knowledge of the complex physiological and cellular events that occur in the mammary gland in response to pathogens is limited. In cattle, the common mastitis pathogens, Escherichia coli (E. coli) and Staphylococcus aureus (S. aureus), activate the mammary immune system in different ways, influencing disease severity and chronicity [10]. The recruitment of leukocytes (neutrophils) in the udder in response to infection is measured as the milk somatic cell count (SCC) [11]. Breeding programmes based on animal pedigrees and SCC records (as a correlated trait for mastitis) have been implemented in dairy cattle worldwide [12] as well as the French Lacaune dairy sheep [13,14]. Marker assisted selection or genomic selection based on genotyping of single nucleotide polymorphism (SNP) markers could enhance and expedite the efforts for genetic improvement.
Quantitative trait loci (QTLs) and SNPs associated with SCC have been previously identified based on linkage and linkage disequilibrium analyses of three distinct dairy sheep breeds (French Lacaune, Spanish Churra and Italian Sarda [15][16][17][18][19]). Several QTLs were common (i.e. located within 2 Mb regions) to two or more breeds in these studies. A custom-made 960-SNP DNA array for ovine mastitis resistance containing markers on six chromosomes (2, 3, 5, 12, 16 and 19) was then developed. The array was based on the common QTLs among breeds and some additional well-defined (i.e. confidence interval less than 0.5 Mb) QTL regions with large effects, as well as further genotyping with the Illumina OvineSNP800 Bead chip and re-sequencing of selected QTL regions (Additional file 1: Table S1) within an FP7 European funded project (http://cordis.europa.eu/result/rcn/163 471_en.html). For the design of the custom-made array, SNPs were selected from both the 50 K and the 800 K SNP DNA arrays, as well as the available re-sequencing data; the array had an average density of 1 SNP every 23 Kb. The present study builds on results from previous studies on Lacaune, Churra and Sarda sheep [15][16][17][18][19] in order to dissect the genomic architecture of mastitis resistance in an independent sheep population, the Greek Chios breed. Genotyping was performed with the aforementioned mastitis-specific custommade DNA array. We conducted variance component analyses to estimate genetic parameters and genomic association studies to identify genomic regions controlling mastitis. We also performed pathway analysis and examined gene expression and transcription factor binding site data to identify candidate genes within the relevant genomic regions.

Principal component analysis
French Lacaune, Spanish Churra, Italian Sarda and Greek Chios were among the sheep breeds analysed within the framework of the sheep HapMap project [20]. Principal component analysis (PCA) was performed to examine the relatedness among these dairy breeds using genotypes obtained with the Illumina OvineSNP50 Beadchip within the HapMap project. According to the results, the four dairy breeds are genetically distinct with the first and second PCA clusters explaining 14.4% and 9.9% of the variance, respectively (Additional file 2: Figure S1). Chios is equally distant from any one of the other three breeds, with an average pairwise genetic distance (D st ) of 0.33, confirming its independent population status for the purposes of the present study.

Descriptive statistics of phenotypes
Descriptive statistics for SCC, California mastitis test (CMT) and total viable bacterial count in milk (TVC) are listed in Additional file 3: Table S2. The average frequency of clinical mastitis occurrence (CM) was 5.1% in our studied population. Gram positive (58% Coagulase negative Staphylococci, 22% S. aureus and 7.7% Streptococcus spp) bacteria were mainly responsible for CM. Nevertheless, Gram negative bacteria (9.3% Pasteurella spp. and 3% Proteus) were also isolated from the milk samples.
Smoothed curves illustrating progression for each trait throughout lactation, after adjusting for all relevant sources of systematic variation, are shown in Fig. 1. These curves were based on second order polynomial functions of time (week of lactation) fitted in mixed model analyses of the studied traits. Higher order polynomials did not improve the model fit as attested to by the Wald statistic. As expected, values of SCC and CMT generally increased as lactation progressed [21]. TVC and CM values first showed a relative decrease until week 11 of lactation and then started gradually increasing.

Genetic parameters
Estimates of heritability and repeatability of the studied traits (Table 1) based on monthly records were derived for the entire lactation as well as different stages of lactation defined as early (weeks of lactation 1-7), mid (weeks 8-17) and late (weeks [18][19][20][21][22][23]. Because of the binary nature of CM, only overall lactation heritability and repeatability estimates were possible to derive for this trait. Low to moderate heritabilities (0.09-0.18) were estimated for all the mastitis traits; higher heritability estimates were observed in all cases toward the end of lactation. Moreover, the genetic correlations between the different lactation stages were positive for all studied traits (Additional file 4: Table S3). However, the genetic correlations between early and late lactation were low to moderate (0.26-0.56) suggesting that somewhat distinct genetic mechanisms may control mastitis resistance at different lactation stages. The three mastitis indicator traits (SCC, TVC and CMT) were correlated, both phenotypically and genetically, with each other and with CM, suggesting that any of the three indirect mastitis traits would be a useful predictor of CM. The genetic and phenotypic correlation estimates between traits are presented in Table 2.

Genomic association studies
Genomic association studies were conducted separately for early, mid, late and overall lactation for each trait. Multidimensional scaling analysis (MDS) revealed no substructure in the Chios population. In general, similar genomic associations were detected for CM and the three mastitis indicator traits but distinct associations were observed in the different lactation stages. SNPs significantly associated with the studied mastitis traits are shown in Table 3. Five (four at genome-wide and one at suggestive genome-wide significant level) of the seven QTLs previously identified in other dairy breeds were verified in the independent Chios population (Additional file 1: Table S1). Q-Q plots supported the genomic analyses results. Manhattan plots displaying these association results are shown in Fig. 2; the corresponding Q-Q plots are shown in Additional file 5: Figure S2.
All significant SNP markers that were identified in the genomic association study had also a significant effect  on the corresponding mastitis traits in the ensuing mixed model analyses based on the pedigree genetic relationship matrix among animals. The additive and dominance genetic effects, and the proportion of the total genetic and phenotypic variance explained by each of these SNPs, are summarised in Table 3. Individual significant SNP markers explained up to 27%, 39%, 33% and 39% of the genetic variance of CMT, SCC, TVC and CM, respectively. The significant markers corresponding to the same QTL were in linkage disequilibrium (LD as r 2 = 0.20-0.97), implying that they likely correspond to the same causative variation (Additional file 6: Table S4). Only small LD blocks (less than 200 kb length) were visualised with Haploview, probably due to a high number of recombination events having taken place in the outbred Chios population (Additional file 7: Figure S3). The significant markers were located in close proximity to each other, with most of them being inside neighbouring small LD blocks, and spanned the previously identified QTL regions in Lacaune, Churra and Sarda (Additional file 7: Figure S3).

Annotation of SNP markers and QTL candidate regions
A relatively small number of genes (53 protein-coding and 26 microRNAs) were identified in the candidate regions for mastitis resistance (Additional file 8: Table S5). All significant SNP markers, with two exceptions, were located in intergenic or intronic regions. The exceptions were two SNPs associated significantly with TVC (oar3_ OAR2_209,240,636) and SCC (Oar3_Oar2_208,650,955) in late lactation, which correspond to a missense variant in the exonic region of an enzyme of the citric acid cycle (isocitrate dehydrogenase 1, IDH1) and to a non-coding lincRNA variant, respectively. Annotations of all significant SNP markers for CM, SCC, CMT and TVC, are presented in Additional file 9: Table S6.

Pathway and functional clustering analysis
Based upon the significant heritability estimates and the large amount of genetic variance accounted for by the identified SNPs, we reasoned that the corresponding QTL regions would contain genes contributing to a common pathway associated with mastitis resistance. We therefore identified the sets of annotated genes lying within QTL intervals and sought evidence of gene set enrichment. These genes were enriched for pathways involved in inflammatory and immune responses, both innate and adaptive (Fig. 3a). Moreover, two networks of molecular interactions related to immunological diseases were constructed using the list of genes in the candidate regions (Fig. 3b). We further extracted the gene ontology (GO) terms for each of the genes and performed functional annotation clustering analysis. These genes were organised into 18 clusters, each given an enrichment score (ES) (Additional file 10: Table S7). The first (ES = 2.33) and the third (ES = 1.53) clusters were related to the regulation of apoptosis, and to innate and adaptive immune responses involving the same immune genes identified with the pathway and network analyses (C6, C8, C9, CD28, OSMR, SOCS2, CREB1, ICOS, CTLA4, NRP2, PRPKAA1, DAB2, LIFR, PTGER4, UBE2N, IDH1).

Gene expression analysis
Genes that contribute to mastitis resistance are likely to be expressed in milk somatic cells, in other immune cells, and/or in the mammary gland, and much of the genetic variation is likely to affect their transcriptional regulation. In humans, more than 80% of genes that are induced in stimulated monocytes have shown evidence of heritable variation in their level of expression (eQTL) [22]. To assess the expression profiles of genes located in the candidate regions for mastitis resistance, we obtained data from an ovine transcriptional atlas currently under development at the Roslin Institute, which is based upon the large-scale mRNA sequencing of multiple tissues and cell lines [23]; this development builds upon similar initiatives in other species [24,25]. We also used data from an RNA-seq characterisation of the milk transcriptome of two Spanish dairy sheep breeds, Churra and Assaf, where milk somatic cells had been sampled at 10, 50, 120 and 150 days after lambing [26,27]. A previous study identified genes involved in response to bacterial lipopolysaccharide (LPS), the major component of the outer membrane of Gram-negative bacteria, in a mouse mammary gland model of mastitis [28]. A similar model of mastitis has also been implemented in dairy cattle [29][30][31]. We reasoned that genes involved in mastitis susceptibility would likely be constitutively expressed in the mammary gland and immune tissues and/or differentially expressed in sheep bone marrow-derived macrophages (BMDMs) in response to LPS treatment. Both sources have been extensively analysed in the sheep atlas. In addition, we reasoned that these genes would be differentially expressed in sheep with high, intermediate and low SCC and/or their expression levels would correlate with SCC [32]. The sheep used in the milk transcriptomic study had also been phenotyped for SCC and milk yield [26,27].  Among all the genes located in the candidate regions for mastitis resistance identified in the present study, several (DAB2, IDH1, NDUFS1, MRPL42, FYB, SOCS2, OSMR, RGMB, PPP4R2, EGFLAM and NUDT4) were highly expressed in both mammary gland and various immune cell lines as shown in Additional file 11: Figure S4. These genes, along with others (CTLA4, CCNYL1, CEP83, CHD1, TTC33, METTL21A, PLXNC1, CARD6, FASTKD2, PRKAA1 and PTGER4) were also inducible by LPS in sheep BMDMs ( Fig. 4a and b).

Transcription factor binding site (TFBS) analysis
Selective breeding in dairy sheep has led to the development of selective sweeps associated with milk production traits that are common to different breeds [33]. The apparent commonality of QTL regions affecting mastitis amongst disparate dairy breeds suggest that they may also share specific variants that confer this phenotype. There are currently insufficient whole genome sequences available to identify such variants on a population-wide basis. Nevertheless, we examined the available genomes of six meat sheep (Scottish Blackface x Texel; BFxT), the cross used to create the transcriptomic atlas, and six dairy sheep sequenced as part of the Sheep HapMap project [15] (two of each Churra, Sakiz and Lacaune breed). There is no publicly available Chios genome, but the genetically similar East Aegean Sakiz breed [20] was considered as a proxy in the present study. To focus on likely functional elements, we identified candidate TBFS within 1 kb of the annotated transcription start site [34]. MATCH analysis [35] revealed similar TFBS profiles between dairy and meat sheep in the majority (62%) of the genes located in the candidate regions for mastitis resistance. For the remaining genes, distinct TFBS profiles may be attributed to differences among individual animals rather than breed type (Additional file 13: Table S8). Most of these latter genes (CHD1, PRKAA1, TTC33, CARD6, DAB2, FYB, OSMR, PPP4R2, C6, C9, PTGER4 and LIFR) were also differentially expressed in BMDMs after LPS treatment and/or in sheep with differing SCC. In general, no systematic differences were observed in TFBS profiles between dairy and meat sheep except for the following: (1) Mat1-Mc and AP-1 binding sites upstream of the OSMR and PDZRN3 genes, respectively, only in dairy sheep; (2) an HNF-1 binding site upstream of DAB2, only in meat sheep; and (3) HSF and c-Rel sites upstream of C9, only in meat and dairy sheep, respectively.

Selection of candidate genes
Based on all above results, a total of 14 genes were selected as the most promising candidate genes for mastitis resistance (Table 4) in the present study. Genes were selected using a combination of their known biological function, involvement in immune response pathways and networks, differential expression or enrichment in tissues relevant to mastitis, differences in TFBS, association of their expression levels with SCC and/or any previously known association with mastitis resistance in either dairy sheep or dairy cattle.

Discussion
The present study investigated the genomic architecture of mastitis resistance in Chios dairy sheep and assessed the utility of relevant genomic data in genetic improvement. Five QTLs located on chromosomes 2, 3, 5, 16 and 19 that had been previously identified in three unrelated dairy sheep populations in France, Spain and Italy were found to be segregating in an independent Greek sheep breed, suggesting that genetic variation may persist in diverse populations and that joint genomic selection programmes for enhanced mastitis resistance across breeds are, in principle, feasible. The high proportion of genetic variance explained by the five common QTL regions renders this mastitis-specific 960SNP-array a useful tool in dairy sheep breeding and genetic improvement. The significance of the QTL regions reported in the present study is supported by results of a previous selection mapping study that compared dairy with meat sheep breeds to identify genomic regions under selection [33]. In the latter study, multiple pairs of dairy versus meat sheep breeds were investigated. For two specific pairs a selection signal was identified in the QTL region for mastitis resistance on chromosome 16 reported in the present study; for another pair, a selection signal was also identified close to our QTL regions on chromosomes 2, 3 and 19. The consistency of these selective sweep signals with the QTLs verified in the present study attest to the significance and suitability of the genomic regions included in the custom-made 960SNP array for dairy sheep populations.
Substantial genetic variance was detected in all mastitis traits of our study. Nevertheless, our findings corroborate the notion that mastitis resistance is under mostly polygenic genetic control [7,16,36], since 60-70% of the (See figure on previous page.) Fig. 2 Manhattan plots displaying the genomic association results of the mastitis traits studied in Chios sheep. Manhattan plots for milk somatic cell count (SCC) in early (a), late (b), and overall (c) lactation; for California mastitis test (CMT) in early (d), mid (e), late (f), and overall (g) lactation; for total viable bacterial count in milk (TVC) in early (h), late (i), and overall (j) lactation; for clinical mastitis occurrence (CM) in early (k), mid (l), late (m), and overall (n) lactation. Chromosome location is plotted against -log 10 (P). Red and blue lines, respectively, are thresholds for significance post-Bonferroni correction (P < 0.05) and for suggestive significance (accounting for one false positive per genome scan). No significant results were identified for TVC and SCC in mid lactation genetic variance remained unexplained by the identified QTLs. Previous genetic studies on other breeds, reviewed by Bishop et al. [2], reported similar heritabilities ranging from 0.10 to 0.20 for SCC. Significant positive phenotypic and genetic correlations were detected between the CM occurrence and the three mastitis indicator traits (SCC, CMT and TVC) implying some common underlying mechanisms of resistance between acute clinical episodes and persistent intra-mammary inflammations. Similar correlations between CM and SCC have been reported in the Lacaune dairy sheep [32,37] and cattle [38]. These results confirm the utility of the indirect mastitis measures as predictors of CM in genetic improvement programmes as well as on-farm management practices. Furthermore, the three mastitis indicator traits in the present study appeared to be under generally similar genetic mechanisms of control. Considering that the average incidence of clinical mastitis in a well-managed flock is about 2-3% and the incidence of subclinical mastitis is much higher, the use of indicator traits associated with the latter would facilitate breeding programmes achieving improvements in both subclinical and clinical mastitis resistance. CMT and TVC were studied for the first time as phenotypes for breeding purposes. Implementation of CMT would constitute a useful tool to phenotype mastitis resistance on the farm without requiring any special equipment.
In Lacaune sheep, a previous transcriptomic study of SCC associated 7 genescytotoxic T lymphocyteassociated protein 4 (CTLA4), suppressor of cytokine signalling 2 (SOCS2), oncostatin M receptor (OSMR), FYN oncogene related to SRC (FYN), complement factor B (CFB) and isocitrate dehydrogenase 2 (NADP+), soluble (IDH2)with mastitis susceptibility [32]. In the present study, several of these genes (CTLA4, SOCS2, OSMR), along with some from the same family (FYB, C9, IDH1), were also found to be LPS-inducible in BFxT sheep BMDMs and/or differentially expressed in the milk somatic cells of Churra and Assaf, indicating they are likely to be involved in protective immunity across sheep populations. The SNP marker located within the IDH1 gene Fig. 3 Pathway and network analysis using the IPA software. a The most highly represented canonical pathways derived from genes located within the candidate regions for mastitis resistance in Chios sheep. The solid yellow line represents the significance threshold. The line joining squares represents the ratio of the genes represented within each pathway to the total number of genes in the pathway. b Two networks, both related to immunological disease, that illustrate the molecular interactions between the products of candidate genes selected from QTL regions for mastitis resistance in Chios sheep. Arrows with solid lines represent direct interactions and arrows with broken lines represent indirect interactions. Genes with white labels are those added to the IPA analysis because of their interaction with the target gene products corresponds to a missense mutation but further work is needed to verify if this is the actual causative mutation. For the OSMR and C9 genes, differences in the TFBS in the 1 kb upstream regions were also identified among dairy and meat sheep. The proto-oncogene c-Relwhose binding site was identified in the sequence upstream of C9 in dairy sheepis a member of the nuclear factor kappa b (NFkb) family of transcription factors, whose activity has been previously related with mastitis resistance in dairy cattle [39]. However, further studies are needed to confirm if these promoter variants are functionally important. Moreover, most of the candidate genes for mastitis resistance identified in the present study were previously reported to control mastitis resistance in dairy cattle, including CTLA4 [40], IDH1, OSMR, C6, C7 and C9, mitogen-responsive phosphoprotein, homolog 2 (DAB2), caspase recruitment domain family, member 6 (CARD6) [41,42], and prostaglandin E receptor 4 (PTGER4) [43,44]. This suggests that dairy sheep and cattle may partially share common underlying mechanisms and genes critical to a successful host response to mastitis infection. Based upon their transcriptional profiles and association with SCC, novel functional candidates for mastitis resistance were identified in the present study. These include inducible T-cell co-stimulator (ICOS), leukaemia inhibitory factor receptor A (LIFR), plexin C1 (PLXNC1) and chromodomain-helicase-DNA-binding protein 1 (CHD1). The proteins encoded by these genes have been previously associated with multiple diseases in humans [45][46][47][48][49] and other species [50]. The clear enrichment of LPS-inducible genes in the QTL regions also offers the possibility of using this assay to identify both cis-and trans-acting eQTLs that are relevant to mastitis, similar to a previous study of human monocytes that highlighted loci associated with inflammatory disease susceptibility [22]. The fact that the set of LPS-inducible genes overlaps with genes whose level of expression was correlated with SCC attests to the utility of the assay as a model for mastitis in sheep. However, further studies are needed to confirm the present results and identify the actual causative genes and mutations.

Conclusion
Both innate and adaptive immune responses, along with the induction of specific metabolic genes, are likely to be involved in the genomic architecture of sheep resistance to mastitis. All the mastitis traits analysed in the present study exhibited heritable variation, suggesting that selection and management programmes aiming at enhancing mastitis resistance are feasible. Genetic selection against mastitis may be achieved using primarily indirect (indicator) traits such as SCC, CMT and TVC but a combination of both clinical mastitis and indicator traits would be preferable. A possible implementation scenario might entail marker-assisted selection based on validated selectable markers and the candidate genes identified in the present study. Another option would be the implementation of genomic prediction and selection schemes across different dairy sheep breeds sharing common QTLs for mastitis resistance, leading to enhanced reference population size and greater accuracy.

Ethics statement
The study was approved by the Ethics and Research Committee of the Faculty of Veterinary Medicine, Aristotle University of Thessaloniki, Greece. Permits for access and use of the commercial farms were granted by the individual farm owners, who were members of the Chios Sheep Breeders' Cooperative "Macedonia". During sampling, animals were handled by qualified veterinarians. Permission to qualified veterinarians to perform milk and blood sampling was granted by the National (Greek) Legislature for the Veterinary Profession, No. 344/29-12-2000. For the ovine transcriptional atlas study approval was obtained from The Roslin Institute's and the University of Edinburgh's Protocols and Ethics Committees. All animal work was carried out under the regulations of the Animals (Scientific Procedures) Act 1986.

Principal component analysis
A total of 23 Chios, 103 Lacaune, 120 Churra and 20 Sarda dairy sheep were analysed within the sheep HapMap project [15]. Genotyping and SNP quality control are detailed in Kijas et al. [20]. PCA was performed using PLINK v1.9 [51], which created an identity-by-state matrix for assessing the genetic differences among the four populations, and allowed the average genetic distance between Chios and each of the other three breeds to be estimated.

ICOS
Inducible T-cell co-stimulator 2: 204,851,429-204,873,693 Enhances all basic T-cell responses to a foreign antigen, namely proliferation, lymphokine secretion, the upregulation of molecules that mediate cell-cell interactions, and antibody secretion by B-cells [67]. Adapter protein that functions as a clathrin-associated sorting protein, required for the clathrin-mediated endocytosis of selected cargo proteins. Mediates the activation of transforming growth factor beta (TFG-beta)-stimulated c-Jun N-terminal kinases. May inhibit the canonical Wnt/beta-catenin signalling pathway. Plays a role in the colony stimulating factor 1 (CSF-1) signal transduction pathway. May also act as a tumour suppressor [75][76][77]. in hematopoietic, immunologic, and inflammatory networks [79].

CARD6
Caspase Recruitment Domain 16:33,567,396-33,580,406 CARD6 encodes a microtubule-associated protein that has been shown to interact with receptor-interacting protein kinases and to positively modulate signal transduction pathways converging on activation of the inducible transcription factor nuclear factor kB (NF-kB). It has a role in facilitating apoptosis [80,81].

Animals, sampling and phenotyping
A total of 609 purebred Chios dairy ewes in the first or second lactation, raised in four commercial farms in Greece, were used. Presence or absence (0/1) of CM was recorded by an experienced veterinarian in monthly visits during the first five months of lactation. On the day of visit, two 50 ml milk samples were collected in the milking parlour under aseptic conditions for the measurement of three traits indirectly related with mastitis: CMT, SCC and TVC. CMT was scored on a scale from 0 to 4 [52], with high values indicating the presence of elevated SCC and, potentially, pathogens in milk; this test was performed with a commercial kit according to manufacturer's instructions (Bovi-vet, Kruuse, Germany). SCC was measured with Fossomatic 360 (Foss Electric, Hillerød, Denmark) and expressed as the number of cells/ml of milk. TVC was measured with Bactoscan FC 50 (Foss Electric, Hillerød, Denmark) and expressed as the number of viable bacteria/ml of milk. In total, 2436 individual animal records were collected. Peripheral blood samples were collected from each ewe in 9 ml K 2 EDTA Vacutainer blood collection tubes (BD diagnostics) by jugular venepuncture for genomic DNA extraction. A total of 638 samples with a CMT score greater than or equal to 2 and/or at least 500,000 somatic cells/ml milk (cut-off values selected on the basis of bibliographic evidence [53]), were further tested by selective culturing on MacConkey agar for Gram-negative bacteria, and on Mannitol Salt agar and Blood agar followed by Gram staining for Gram positive bacteria. The plates were incubated at approximately 37°C for 24 h. The next day, the plates were examined for bacterial growth.

Genetic parameter estimation
The distributions of SCC and TVC records were skewed; therefore, records were naturally log-transformed in order to achieve normality of the respective distributions. Monthly animal records were analysed with the following mixed model to derive variance components and genetic parameters for the overall lactation; each trait was analysed separately: Where: Y = record of ewe o in week of lactation (i.e. week post-lambing) m μ = overall mean F = fixed effect of flock i YS = fixed effect of year-season of lambing j α 1 = linear regression on age at lambing (age) L = fixed effect of lactation number k W = fixed effect of week of lactation m b n = fixed regression coefficient on week of lactation m (order n = 2) P n = orthogonal polynomial of week of lactation m (order n = 2) g = random additive genetic effect of ewe o, including pedigree genetic relationships among animals pe = random permanent environment effect of ewe o e = random residual effect A Logit function was fitted to the CM analysis to account for the binary nature (0/1) of the trait. For all traits, the fixed regression on week of lactation yielded a smoothed curve illustrating the trait progression throughout lactation. Heritability and repeatability estimates were derived from the variance components of the random effects for each trait. Residuals from these analyses were kept to be used as dependent variables in the ensuing genomic association studies aiming to assess the effect of SNPs on the respective traits expressed across the entire lactation. Subsequently, bivariate analyses were conducted with the above model to estimate genetic and phenotypic correlations among traits.
In separate analyses of SCC, CMT and TVC, secondorder polynomial functions of week of lactation were fitted to the additive genetic and permanent environment effects of model (1), resulting in separate variance component and genetic parameter estimates by week of lactation. The latter were then combined to derive average heritability and repeatability estimates for early (weeks of lactation 1-7), mid (weeks 8-17) and late (weeks 18-23) lactation. The corresponding residuals were also kept as input variables to the genomic association studies aiming to assess the impact of SNPs on traits expressed in different stages of lactation. These analyses were not possible with the Logit function fitted to the analysis of binary CM. Therefore, a linear model was fitted to this trait in order to derive residuals by lactation stage for the genomic association analysis. All mixed model analyses were conducted with ASReml v4.0 [54].

DNA extraction
Genomic DNA for all samples was extracted from blood buffy coat using a modified protocol (Modified Blood) as described by Psifidi et al. [55].

Genomic association analysis
All animals were genotyped with a customized 960-SNP DNA array containing SNPs located in seven previously identified QTL regions for mastitis resistance on chromosomes 2, 3, 5, 12, 16 and 19. Details of the SNPs comprising the DNA array are presented in Additional file 14: Table S9. SNP genotypes were subjected to quality control measures using the following thresholds: minor allele frequency < 0.05, call rate < 95% and Hardy-Weinberg equilibrium P < 10 −6 . After quality control, 710 SNP markers remained for further analysis. SNP locations were obtained from the Oar v3.1 assembly using the Ensembl genome browser (www.ensembl.org). To account for possible population stratification, a genomic relationship matrix was generated including all individual animals. This genomic relationship matrix was converted to a distance matrix that was used to carry out classical MDS analysis with the R package GenABEL [56]. Individual ewe phenotypes were adjusted for the same fixed effects included in model (1). Phenotypes pertained to the entire course of lactation as well as to different stages (early, mid, late) of lactation as explained above. In all cases, GEMMA v0.94.1 [57] was used to run the genomic association analyses of adjusted phenotypes based on a mixed model that included the genomic relationship matrix among individual ewes as a random effect. After Bonferroni correction for multiple testing, the P ≤ 0.05 significance threshold was set at P ≤ 7.04 × 10 −5 and a suggestive significance threshold (accounting for one false positive per genome scan) was set at P ≤ 1.41 × 10 −3 .
LD among significant SNPs was calculated as an r 2 value using PLINK v1.9 [51] in order to evaluate the extent of LD and identify candidate regions potentially containing causal mutations for mastitis resistance. Furthermore, LD blocks in the regions where significant SNPs were found with GWAS were visualised using the software Haploview v4.2 [58] SNP locus effect confirmation Individual markers found to be significant in the previous step were further examined with a mixed model that included the fixed effects of model (1), the fixed effect of the corresponding SNP locus genotype and the random effect of the animal. Additive (a) and dominance (d) effects, and the proportion of additive genetic variance (PV A ) and total phenotypic variance (PV P ) due to each SNP locus were calculated as follows: where AA, BB and AB were the predicted trait values of the respective genotypic classes, p and q were the allelic frequencies of A and B at the SNP locus, and VA and VP were the additive genetic and total phenotypic variance of the trait. The latter were estimated with model (1). All analyses were run with ASReml v4.0 [54].

Annotation of the SNP markers and the QTL candidate regions
For sheep assembly Oar v3.1, the Ensembl Variant Effect Predictor (VEP) (http://www.ensembl.org/Tools/VEP) and BioMart data mining tools (http://www.ensembl.org/bio mart/martview/) were used to map and annotate the significant SNP markers derived with the genomic association analysis on the reference genome, and to locate genes in the corresponding candidate regions for mastitis resistance.

Pathway and functional annotation clustering analysis
Gene lists in the candidate regions for mastitis resistance were analysed using the Ingenuity Pathway Analysis (IPA) programme (www.ingenuity.com) in order to identify canonical pathways and gene networks constructed by the products of these genes. IPA constructs multiple possible upstream regulators, pathways and networks serving as hypotheses for the biological mechanism underlying the trait. This analysis uses data from a large-scale causal network derived from the Ingenuity Knowledge Base. IPA then infers the most suitable pathways and networks based on their statistical significance, thereby setting a threshold above which the pathways are significant. Gene lists were also analysed against an Ovis aries background using the Database for Annotation, Visualization and Integrated Discovery (DAVID v6.7) [59]. For each gene, we determined its GO terms and performed functional annotation clustering analysis to detect possible enrichment. The ES calculated with DAVID is a modified Fisher's exact test P-value, with increasing ES (>1) reflecting over-representation of that functional category.

Gene expression analysis
Expression levels of each gene located within candidate regions for a mastitis trait were obtained from a transcriptomic atlas of 406 RNA-seq libraries, representing all major organ systems; the atlas was based on a BFxT sheep cross [23] and also included 83 libraries from a Texel sheep trio (ewe, ram and lamb) as described in Jiang et al. [60]. In the present study, we were specifically interested in mammary gland, immune-related tissues and cell lines. Therefore, we extracted data pertaining to the haemolymph nodes, mesenteric, popliteal, prescapular and submandibular lymph nodes, peripheral blood mononuclear cells, blood leukocytes, monocyte-derived macrophages, alveolar macrophages, tonsils, and mammary glands. The atlas also contained data from sheep BMDMs at different time points (0 h, 2 h, 4 h, 7 h and 24 h) after LPS treatment [23], which we used here as a model for mastitis, to identify candidate genes that are likely involved in the inflammatory response to bacterial infection. LPS stimulation mimics Gram-negative pathogens, which accounted for 12% of the mastitis incidence in the Chios sheep in the present study. LPS stimulation also provides a general model for toll-like receptor (TLR) signalling and inflammatory pathology which causes morbidity and production losses associated with mastitis. To supplement this dataset, we obtained expression levels from a previous transcriptomic study of the milk somatic cells of two Spanish dairy sheep breeds, Churra and Assaf [27]. The sheep used in this study had also been phenotyped for SCC and milk yield [26]; these records were made available to the present study.
To generate RNA for library preparation, BMDMs were differentiated for 7 days in recombinant human colony stimulating factor and treated with LPS (Salmonella minnesota Re 595 (L9764; Sigma-Aldrich)) at a concentration of 100 ng/ml. Salmonella is a TLR4 agonist; Staphylococcus aureus, which is the main cause of mastitis in dairy sheep, is known to activate both TLR2 and TLR4 [61] in cattle. BMDMs were harvested into Trizol (Thermofisher) at each different time point post LPS treatment in preparation for RNA extraction. For all samples in the transcriptomic atlas, RNA was extracted using Trizol (Thermofisher) followed by column purification and DNAse treatment using the RNeasy Mini Kit (Qiagen). The resultant RNA was checked for quality using the Agilent Tapestation 2200, and all samples were of high quality with RNA Integrity Numbers (RIN e ) greater than 9. All RNA-Seq libraries from the sheep transcriptional atlas included in this study were TruSeq stranded mRNA libraries (125 bp paired-end). Libraries were prepared by Edinburgh Genomics (https://genomics.ed.ac.uk/) using the TruSeq mRNA Library Preparation kit (Illumina) and sequenced on the Illumina HiSeq v2500 at a depth of >25 million reads per sample.
Expression levels for all samples from both the transcriptomic atlas and the milk somatic cell transcriptome were estimated using Kallisto v0.42.4 [62]. Rather than aligning RNA-seq reads to a reference genome, reconstructing transcripts from these alignments and then quantifying expression as a function of the reads aligned (the conventional means of RNA-seq processing), Kallisto employs a 'lightweight' algorithm, which first builds an index of k-mers from a known transcriptome: the Ovis aries v3.1 cDNAs (ftp://ftp.ensembl.org/pub/release-83/fasta/ ovis_aries/cdna/Ovis_aries.Oar_v3.1.cdna.all.fa.gz, obtained from Ensembl v84 [63]; n = 23,113 transcripts [22,823 protein-coding, 247 pseudogenes, 43 processed pseudogenes]). Expression level is then estimated directly (i.e., in an alignment-free manner) by quantifying exact matches between reads and k-mers. Expression is reported per transcript as the number of transcripts per million, and is summarised to the gene-level as described previously [64].
Differential expression analysis was run on the LPS time series data using the Kallisto output with the R/Bioconductor package 'Sleuth'. Heatmaps were drawn using the heatmap.2 function from the R package gplots v3.0.1. Additional differential expression analyses were conducted between sheep with high, moderate and low levels of SCC, after adjusting for breed (Churra and Assaf). Relevant data for this study are described by Suarez-Vega et al. [22]. Least square mean pairwise comparisons between SCC levels were conducted. Tukey's HSD post-hoc test adjustment was applied at a significance level of 0.05.
Furthermore, the effect of the expression level of each gene (located within a candidate region for mastitis resistance) on SCC and milk yield was assessed using the following linear model: where Y = record of ewe (SCC or milk yield across lactation), μ = overall mean, B = fixed effect of breed i, g = fixed effect of the mean expression of gene j, e = random residual effect.
The significance threshold in this analysis was set at 0.05. Since genes were located within 5 QTL regions, an FDR adjustment for multiple testing was applied and the significance threshold was subsequently re-set to P ≤ 0.0125. The analyses were conducted with ASReml v4.0 [36].

Transcription factor binding site analysis
In order to identify whether differences in TFBS are associated with differences in expression level of the genes found in the candidate regions for mastitis resistance, we extracted and compared the corresponding 1 kb upstream sequence in all genes from both meat and dairy sheep. Specifically, we extracted the sequences from the six sheep used to create the BFxT transcriptomic atlas and six sheep sequenced as part of the Sheep HapMap Project [15] (two Churra, two Lacaune [one dairy and one meat] and two Sakiz dairy sheep; NCBI Sequencing Read Archive (SRA) accession numbers SRR501848, SRR501909, SRR501850, SRR501851, SRR501843 and SRR501878, respectively; BioProject PRJNA160933).
The six atlas sheep (BFxT) had been fully re-sequenced. Illumina Tru-Seq Nano 350 gel-free libraries (125 bp paired end) were prepared, for each sample, by Edinburgh Genomics (https://genomics.ed.ac.uk). The six libraries were run on one lane of the Illumina HiSeq2500 System to generate in total approximately 220 M + 220 M reads resulting in a 10-fold coverage per sample. Details of the re-sequencing methodology within the HapMap sheep project can be found in Kijas et al. [15]. In both cases, the 1 kb upstream regions for each gene were extracted using BEDTools v2.25 [65]. These were then analysed using the MATCH algorithm within the TRANSFAC ® 7.0 Suite, with default parameters, which predicts TFBS in DNA sequences using a library of positional weight matrices [35].

Availability of data and materials
The SNP information of the mastitis specific custom-made 960 SNP array is available in Additional file 14: Table S9. Expression levels for all studied genes in all tissues and cell lines as extracted from the two sheep transcriptomic atlases are presented in Additional file 15: Table S10. In addition, all the expression data comprising the mini ovine (Texel) transcriptional atlas is available in the European Nucleotide Archive (ENA) under accession number PRJEB6169) and the data from the ovine (BFxT) transcriptional atlas has been submitted to the ENA under the accession number PRJEB19199. To supplement the data from these two transcriptomic atlas projects, we also obtain expression levels from a milk transcriptomic study of the milk somatic cells of two Spanish dairy sheep breeds, Churra and Assaf (NCBI BioProject ID PRJNA301615).

Authors' contributions
GBa, AP and GA conceived and designed the genetic study of mastitis in Chios sheep and secured substantial funding. AP, GBr and GBa performed data collection, phenotyping, DNA extractions and genetic parameter analysis. AP and GBa collated and edited the genotyping data and performed the genomic analysis. DAH and ELC conceived and designed the sheep transcriptomic atlas and DAH secured the substantial funding. ELC and SJB created the sheep transcriptomic atlas and analysed the gene expression data of the sheep atlas and the milk transcriptome. MBM generated the BMDM LPS time-course dataset for the transcriptomic atlas. AP, GS and JS performed the pathway and the TFBS analyses. AP, SJB and GS extracted and annotated the re-sequencing data of the HapMap sheep. GBa, DH, GA and AP interpreted these results. GBa and AP wrote the manuscript. All other co-authors provided manuscript editing and feedback. All authors read and approved the final manuscript.