Linkage disequilibrium in Brazilian Santa Inês breed, Ovis aries

Alvarenga, Amanda Botelho; Rovadoscki, Gregori Alberto; Petrini, Juliana; Coutinho, Luiz Lehmann; Morota, Gota; Spangler, Matthew L.; Pinto, Luís Fernando Batista; Carvalho, Gleidson Giordano Pinto; Mourão, Gerson Barreto

doi:10.1038/s41598-018-27259-7

Download PDF

Article
Open access
Published: 11 June 2018

Linkage disequilibrium in Brazilian Santa Inês breed, Ovis aries

Amanda Botelho Alvarenga¹,
Gregori Alberto Rovadoscki¹,
Juliana Petrini¹,
Luiz Lehmann Coutinho ORCID: orcid.org/0000-0002-7266-8881¹,
Gota Morota ORCID: orcid.org/0000-0002-3567-6911²,
Matthew L. Spangler²,
Luís Fernando Batista Pinto³,
Gleidson Giordano Pinto Carvalho³ &
…
Gerson Barreto Mourão¹

Scientific Reports volume 8, Article number: 8851 (2018) Cite this article

2004 Accesses
15 Citations
3 Altmetric
Metrics details

Subjects

Abstract

For genomic selection to be successful, there must be sufficient linkage disequilibrium between the markers and the causal mutations. The objectives of this study were to evaluate the extent of LD in ovine using the Santa Inês breed and to infer the minimum number of markers required to reach reasonable prediction accuracy. In total, 38,168 SNPs and 395 samples were used. The mean LD between adjacent marker pairs measured by r² and |D′| were 0.166 and 0.617, respectively. LD values between adjacent marker pairs ranged from 0.135 to 0.194 and from 0.568 to 0.650 for r² for |D′| across all chromosomes. The average r² between all pairwise SNPs on each chromosome was 0.018. SNPs separated by between 0.10 to 0.20 Mb had an estimated average r² equal to 0.1033. The identified haplotype blocks consisted of 2 to 21 markers. Moreover, estimates of average coefficients of inbreeding and effective population size were 0.04 and 96, respectively. LD estimated in this study was lower than that reported in other species and was characterized by short haplotype blocks. Our results suggest that the use of a higher density SNP panel is recommended for the implementation of genomic selection in the Santa Inês breed.

Genomic scans for selective sweeps through haplotype homozygosity and allelic fixation in 14 indigenous sheep breeds from Middle East and South Asia

Article Open access 02 February 2021

Identification of deleterious recessive haplotypes and candidate deleterious recessive mutations in Japanese Black cattle

Article Open access 23 March 2021

Genome-wide evaluation of copy gain and loss variations in three Afghan sheep breeds

Article Open access 22 August 2022

Introduction

Genomic information is currently used in animal breeding programs to enable selection for difficult to measure traits, increase the overall rate of genetic gain, and to improve the understanding of genetic and biological causes underlying phenotypic variation. Genomic selection (GS) is an approach which uses genome-wide markers simultaneously to predict breeding values¹. This approach has been shown to increase the rate of genetic gain when pedigree-based selection is suboptimal¹, which is the case for lowly heritable traits. For instance, GS based on simulated data showed an increase in reliability of breeding values for young animals when using genomic (r² > 60%) versus parent average (r² = 32%) information, equivalent to approximately 20 offspring². Furthermore, genetic gain can be increased using genomic information by shortening the generation interval¹. Alternatively, genetic markers scattered across the genome offer an opportunity to conduct genome-wide association studies (GWAS) to characterize genes underlying genetic variation for traits of interest.

The success of GS and GWAS are dependent on linkage disequilibrium (LD) or gametic disequilibrium between the markers and causal mutations³ because generally only the markers are observed and the casual mutations are unknown. The LD between a marker and a causal mutation can be considered as the proportion of causal mutation variance that can be captured by the marker variance^4,5. Through the knowledge of the degree of LD, it is possible to define the density of genetic markers necessary to achieve a certain accuracy of prediction and to determine when the estimates of genetic marker effects should be updated. It has been well documented that simply increasing marker density does not improve prediction accuracies. Although increased marker density improves resolution, it can also decrease power and add noise to the analyses by the use of non-informative SNP. Furthermore, increased marker density can dilute individual marker effects if, for example, two markers are associated with the same QTL and the two markers are in high LD with each other.

LD is defined as a non-random association between alleles at different loci⁶, and it is commonly represented by |D′| and r² metrics⁷. The extent of LD can vary between and within species due to evolutionary history and population structure mainly characterized by insertions, deletions, chromosomal rearrangements, or inversions⁴. This association between markers and causal mutations may change overtime due to recombination and selection⁴ necessitating the re-estimation of marker effects.

Estimates of LD have been reported in ovine for some domestic pure and crossbred populations, as well as in wild sheep by using microsatellites and SNP markers^{4,8,9,10,11,12,13,14}. Nevertheless, there are few studies that report LD estimates for Brazilian Santa Inês sheep using SNP. Ovine populations have retained a relatively high level of genetic diversity, unlike bovine, which justify the importance of LD mapping in many breeds within species¹⁵. Moreover, LD estimates between different breeds can be informative relative to the overall diversity level in a species and the selection level applied to them.

Therefore, the aim of the current study was to characterize LD structure in Brazilian Santa Inês sheep for the first time, given its commercial importance for meat production, reproductive efficiency, and tropical adaptation in Brazil, and compare the LD observed in the Santa Ines breed with other breeds. Beynon et al.¹⁶ mentioned the importance of studies focused on breeds as a chance to identify variation and understand the biological mechanisms that enable these breeds to survive in different local environments.

Many studies have evaluated imputation accuracy¹⁷ and the accuracy of genomic estimated breeding values using different marker panel densities in sheep^18,19,20. The appropriate panel density could be specific to each species and breed depending on overall LD structure. Unfortunately, the current genotyping costs in sheep are greater than the economic value of breeding animals²¹. Consequently, we also aimed to provide an estimate of the marker density required for genomic studies in the Santa Inês breed.

Results and Discussion

Descriptive statistics

After quality control (QC), 38,168 autosomal SNPs remained comprising approximately 53% of the entire panel. The SNPs retained after QC spanned a total of 299.63 megabases (Mb) of the genome, with a mean (standard deviation) distance between adjacent SNP of 0.07 (0.075) Mb. This value was close to that obtained by Liu et al. in Spanish Churra sheep (0.06 Mb)¹⁴. SNPs were evenly distributed throughout the genome as the distances between adjacent markers ranged from 0.064 to 0.085 Mb. The chromosomes differ in size and SNP quantity, with chromosome 24 being the smallest in size - OAR24 (44.21 Mb). Liu et al.¹⁴ observed a similar behavior considering the same SNP panel (OAR24- 44.85 Mb), with OAR24 being the smallest chromosome (44.85 Mb) whereas the OAR2 was the largest (263.11 Mb). The number of SNPs per chromosome was proportional to the size of each chromosome. Descriptive statistics of the SNP and LD (r² and |D′|) for each chromosome are presented in Table 1.

Table 1 Descriptive analyses, MAF, F, N_e,and average linkage disequilibrium (r² and |D′|) between adjacent and all pairwise SNP pairs by chromosome.

Full size table

In addition, 35% of the SNPs (18,716) had minor allele frequency (MAF) lower than 0.20, with a mean MAF over all SNPs of 0.35. According to another sheep study, 33% of the SNPs had MAF lower than 0.20²². Extending our comparison to other species, the mean MAF was relatively higher than those found for Bos taurus indicus, with values ranging from 0.19 to 0.25^23,24. The MAF is important because LD, independent of the metric used, is a function of allelic frequency. In general, low MAF may correspond to a larger difference in allele frequency of coupled alleles, which can result in lower estimates of LD as measured by either r² or |D′|²⁵. Consequently, applying QC and the choice of QC criteria can affect the distribution and extent of LD⁶.

Inbreeding coefficient and effective population size

For a better understanding of the population described in this study, inbreeding coefficient (F) and effective population size (N_e) were estimated for all chromosomes together and for each chromosome separately, using genomic information. The estimate of F was 0.04, a relatively low coefficient for a population that originated from the same commercial herd. Using pedigree information to estimate the inbreeding coefficient, Pedrosa et al. found values equal to 0.02 in the Santa Inês breed²⁶. Al-Mamun et al. found average inbreeding coefficients for Merino, Border Leicester and Poll Dorset equal to −0.013, 0.09 and 0.02, respectively¹³. A recently published study in ovine found average inbreeding coefficients based on excess of homozygosity (standard deviation- SD) of −0.008 (0.031), ranging from −0.079 to 0.301¹². Compared with Kijas et al.¹¹ and Liu et al.¹⁴, the F estimated from the Santa Inês breed was lower. Negative inbreeding coefficients occur when the number of observed homozygous loci is lower than the expected, suggesting that the population is more heterogeneous than expected, perhaps due to the composite nature of the breed.

In the N_e estimation process, genetic distance between markers was estimated by a fixed ratio across the whole genome of one Mb per centiMorgan (cM). Prieur et al. evaluated three different methods to transform the genetic distance in ovine, and concluded that the estimation process using CRIMAP software (v2.503) was more accurate²⁷. However, Prieur et al. also verified that the ranking for r² and N_e between breeds were not affected by the method used and mentioned that the LD estimator was not different between methods²⁷.

The N_e estimated herein was 96 in the current generation. Kijas et al.¹⁵ observed N_e equal to 520 in the Brazilian Santa Inês breed, however, in their study only 47 animals were used. Pedrosa et al. also estimated N_e using pedigree information and found a relatively low value (76) in Santa Inês²⁶. These differences in N_e can be due to the number of animals used (395 vs. 47 vs. 17,097) and the source of relationship information (genomics vs. pedigree). Al-Mamun et al. found values of N_e ranging from 140 (Border Leicester breed) to 348 (Merino breed)¹³. Brito et al.¹² found values of N_e in the most current generations in multi-breed sheep populations ranging from 125 to 974. Using a Spanish Churra sheep population, García-Gámez et al.²⁸ and Chitneedi et al.²⁹ estimated N_e equal to 159 and 83, respectively.

The presence of artificial selection in the population under study was verified through the reduction of N_e over the generations. In this study, N_e ranged from 1,705 to 28,191 between 16 and 296 generations, respectively, before the current generation. Mastrangelo et al. estimated the N_e at 295 generations ago to be 747 animals in Barbaresca sheep³⁰. Liu et al. observed N_e equal to 4,472 and 160 at 2,000 and 5 generations ago, assuming that one Mb is equivalent to one cM¹⁴. Brito et al.¹² reported estimates of effective population size of 5,537 animals 1,000 generations ago to 687 in the most recent generation. We hypothesize that the large difference in N_e between the current and historic generations could be because the breeds that comprise the composite breed of Santa Inês were divergent historically and, thus, these estimates include multiple divergent breeds. The Santa Inês breed is relatively new, having only begun in the 1950s by non-systematic crossing of the Brazilian Somali, Bergamasca and Morada Nova breeds³¹. This illustrates that the large estimates of historic N_e reflect time points before the formation of the breed, and even before the domestication of ovine.

We also estimated the N_e for each chromosome. Chromosome 6, OAR6, exhibited the smallest N_e, which was in contrast to the results of Liu et al. that reported the smallest N_e for OAR10¹⁴.

Linkage disequilibrium analysis between adjacent SNPs

The average (SD) r² and |D′| values estimated between adjacent SNPs from the 26 autosomal chromosomes were 0.166 (0.2189) and 0.617 (0.3349), respectively. Using the dairy sheep breed Frizarta, Kominakis et al. estimated r² and |Dʹ| equal to 0.18 and 0.50, respectively, at an average inter-marker distance of 0.031 Mb³². Mastrangelo et al. observed average r² (SD) in Sicilian sheep equal to 0.155 (0.2040)³³. Al-Mamun et al. also reported LD estimates from multiple domesticated sheep (Ovis aries) breeds including: Merino (MER), Border Leicester (BL), Poll Dorset (PD) and crossbred populations (i.e., F₁ crosses of Merino and Border Leicester (MxB) and MxB crossed to Poll Dorset (MxBxP)). The authors used the same genotype panel but adopted a different data quality control (MAF < 0.01) and reported a mean r² of 0.12 (MER), 0.20 (BL), 0.19 (PD), 0.13 (MxB) and 0.13 (MxBxP); and mean |D′| of 0.52 (MER), 0.72 (BL), 0.69 (PD), 0.54 (MxB) and 0.55 (MxBxP)¹³. In the Barbaresca sheep breed, the mean r² across autosomes was 0.215, with an average distance between adjacent SNP pairs of 0.063 Mb³⁰.

A study published with multi-breed sheep reported mean (SD) r² of 0.26 (0.100)¹². The estimates of r² are relatively consistent across sheep populations, with the exception of larger r² values reported by Brito et al. Nevertheless, we should consider that the distance between markers was much shorter in Brito et al. than herein (4.74 kb versus 70 kb in the present study), which can be one reason for the increase in r². Additionally, Brito et al. reported LD levels less than 0.10 for SNP located more than 0.04 Mb apart¹². A recent study from Michailidou et al. observed a mean r² equal to 0.121, 0.098, and 0.092 in Boutsko, Chios, and Karagouniko, respectively, with the average intermarker distance 0.27 Mb for all breeds³⁴.

Sheep populations have been associated with lower levels of LD in comparison to other ruminant and nonruminant species. Although the comparison between species is difficult due differences in genome size as well as the quality control applied, mean values between adjacent SNPs of 0.32 (r²) and 0.69 (|D′|) were estimated from the Australian Holstein-Friesian cattle population using 9,195 SNP with the mean SNP distance equal to 0.25 Mb⁶. The mean r² for pigs of Landrace (87 animals), Yorkshire (96 animals), Hampshire (78 animals) and Duroc (90 animals) breeds were 0.36, 0.39, 0.44, and 0.46 estimated from 40, 144, 39, 110, 32, 370 and 34,129 SNP spaced at average distances of 0.06, 0.06, 0.07, and 0.07 Mb, respectively³⁵.

The average LD (SD) between adjacent SNP within the same chromosome ranged from 0.135 (0.1972) to 0.194 (0.2423) for r² and 0.568 (0.3391) to 0.650 (0.3368) for |D′| (Table 1). Chromosomes 6, 11, 12, 14, 17, 20, 21, 23 and 24 had lower average LD using r² lower than the 0.16 threshold²⁴. Considering r² metrics between adjacent SNPs, chromosomes 2, 10 and 16 had higher levels of LD compared to other chromosomes. The high level of LD present on OAR10 was similar to that observed by Al-Mamun et al.¹³.

Linkage disequilibrium analysis among all pairwise SNPs

The average (SD) for r² and |D′| estimated between all pairwise SNPs on the 26 autosomal chromosomes were 0.018 (0.032) and 0.225 (0.213), respectively. In a study which used microsatellite markers to evaluate LD using chromosomes 1–10 of domestic sheep (Ovis aries) with mean distance between markers ranging from 10 to 40 Mb, a mean (SD) value of 0.211 (0.004) for |D′| was estimated¹⁰. Al-Mamun et al. who also used domesticated sheep (Ovis aries), found mean r² between all pairwise SNPs (0.05 Mb mean distance) of 0.007 (MER), 0.013 (BL), 0.018 (PD), 0.009 (BxM) and 0.012 (BxMxP); and mean |D′| of 0.168 (MER), 0.29 (BL), 0.27 (PD), 0.18 (BxM) and 0.19 (BxMxP)¹³. Additionally, Miller et al. using non-domesticated sheep (Ovis canadensis and Ovis dalli) and the same genotype panel but adopting a different QC (MAF < 0.10), reported a mean r² (SD) of 0.042 (0.067)⁴. Considering the confidence interval obtained for the estimates presented in this study as well as in the studies previously reported, it is possible to assume that estimates of r² and |D′| across all SNP combinations on a chromosome are relatively consistent across sheep populations.

Figures 1 and 2 illustrate r² and |D′|, respectively, as a function of the intermarker distance for chromosomes 1 and 24. Supplementary Fig. S1 and S2 depict r² and |D′|, respectively, for the other chromosomes. Overall, the relationship between LD and intermarker distance suggest that as intermarker distance decreases, LD increases. A notable exception is chromosome 1. On this chromosome, r² presented secondary high peaks around the interval from 100 to 150 Mb (Fig. 1). On all chromosomes, |D′| maximum was observed between many SNP pairs with high intermarker distances (Fig. 2). We contend that this might occur due to the dependence of |D′| on allele frequency. The unexpected increase in LD between some SNP pairs with larger intermarker distances could also be explained by selection. It is possible that favorable alleles for different traits were selected, resulting in a high degree of LD on longer intermarker distances, even extending to inter chromosome pairs of SNP. Another potential reason for high r² values when intermarker distance was large is assembling errors, potentially explaining the phenomenon on chromosome 1.

The average (SD) r² between all pairwise SNPs contained on the same chromosome with intermarker distance greater than or equal to 0.10 and lower than 0.20 Mb was 0.1033 (0.0807) across all chromosomes. Zhao et al. observed r² values equal to 0.044, 0.132 and 0.158 in Sunite, German Mutton Merino and Dorper sheep, respectively, in the same marker distance interval³⁶. Additionally, García-Gámez et al. observed r² equals to 0.086 for SNP also within the same marker distance interval in a Spanish Churra sheep population²⁸. Similarly, Chitneedi et al. observed the average of 0.066 for r² in Spanish Churra sheep using the high-density imputed genotypes²⁹.

Using LD categories defined by Espigolan et al., Table 2 shows the average intermarker distances between pairwise SNPs exhibiting low LD (r² ≤ 0.16), medium LD (0.16 < r² < 0.70), and high LD (r² > 0.70)²⁴. Higher levels of r² (greater than 0.70) were found at distances between markers smaller than 0.768 Mb with 3,296 combinations of SNPs (0.01% of all combinations). For medium levels of r² (0.16 to 0.70), distances lower than 5.277 Mb were observed with 273,659 combinations of SNPs (0.849%). Considering low levels of r² (lower than 0.16) distances found were higher than 15.110 Mb with 31,939,376 combinations of SNPs (99.140%).

Table 2 Mean intermarker distance and frequency for each category of linkage disequilibrium (high, medium and low) according to r² metrics.

Full size table

Relationship between linkage disequilibrium, inbreeding coefficient and effective population size

The relationships between r², |D′|, MAF, F, and N_e are reported in Table 1. The mean MAF was similar across all chromosomes. The correlation between the two measures of LD was 0.75 when LD was estimated between adjacent SNP and 0.97 when estimated among all pairwise SNP. Although |D′| tends to overestimate LD values compared to r² as reported by Zhao et al.³⁷, both LD metrics exhibited the same behavior (Table 1). This is expected since these metrics are defined similarly as a function of allele frequency. The differences between the two metrics (r² and |D′|) are related to the weight applied to the allele frequencies. Given |D′| is entirely dependent on the frequency of the alleles, |D′| possibly inflates LD estimates³⁷. On the other hand, the r² proposed by Hill and Robertson⁷ aims to reduce this frequency dependence.

According to Hill and Robertson⁷, LD (numerator of r²) and F have a linear relationship as shown in the equation below⁷. In a population under selection, the number of homozygotes tends to increase for many favorable alleles. Consequently, the inbreeding coefficient and LD between these selected alleles increase⁷.

$$E({D}^{2})=\frac{1}{15}{p}_{0}(1-{p}_{0}){q}_{0}(1-{q}_{0})[6(1-F)-5{(1-F)}^{3}-\,{(1-F)}^{6}]$$

(1)

where ${D}^{2}={({\rho }_{AB}-{\rho }_{A}{\rho }_{B})}^{2}$ and is the numerator of r², ${\rho }_{A}\,\,$is the probability of allele A at marker 1, ${\rho }_{B}$ is the probability of allele B at marker 2, and ${\rho }_{AB}$ is a probability of the pair of AB markers; ${p}_{0}$ and ${q}_{0}$ are the frequency of A and B alleles, respectively, in generation zero or with initial equilibrium. A positive relationship (0.22) was observed between the D² estimated by equation (1) as a function of inbreeding coefficients and the average D² observed between adjacent SNPs on each the chromosome. A possible justification for the low correlation could be the relatively limited number of SNPs per chromosome on the panel used in the current study. The SNPs contained on the panel used herein covers only 299.6 Mb out of a total of 2,615.52 Mb, equivalent to 11% of the sheep genome. However, a few negative values were observed (e.g., −0.08) when estimating the correlation between D² estimated by F (equation (1)) and average D² between all pairwise SNPs on the chromosome. Additionally, equation (1) was derived under the assumption of finite and natural populations⁷.

The expectation of D at generation t can be derived from c (the recombination rate) and ${N}_{e}$. This is given by³⁸:

$$E({D}_{t})=(1-c)(1-\frac{1}{2{N}_{e}})E({D}_{t-1})$$

(2)

A negative correlation between D, which is the numerator of |D′|, and both r² and effective size (N_e) is expected. Considering N_e as an indicator of selection, lower N_e values are a result of high selection pressure, and consequently a reduction in the number of breeding animals and genetic diversity. A negative relationship between average LD between all pairwise SNPs on a chromosome and N_e was observed (−0.16), as expected. However, the correlation between average LD between adjacent SNPs and N_e was positive (0.35). One potential reason for the observed discrepancy is the fact that N_e was estimated based on the LD between all pairwise SNPs rather than LD between adjacent SNPs. For instance, Lindblad-Toh et al. also observed that the effective population size and the inbreeding coefficient were reduced during dog domestication, resulting in a decrease of LD³⁹.

Haplotype blocks

The construction of haplotypes with only two (frequency = 1,879) to twenty-one (frequency = 1) markers was consistent with the low LD among pairwise SNP reported in this study. The mean size of haplotype blocks and the frequency of the number of SNPs for each chromosome are reported in Table 3. Short haplotype blocks in common among breeds have been observed by others¹⁷. The average distance (SD) between markers that formed the haplotype blocks was 0.04 (0.033) Mb. Considering the size of the sheep genome and the average distance between SNP that formed the haplotype blocks, it was possible to indirectly infer the minimum number of markers needed for genomic analyses, which was 61,415 SNPs. However, due to the high standard deviation of the distance between markers that formed the haplotype, it is important to use this number with caution.

Table 3 Summary of mean and standard deviation (SD) of intermarker distance in haplotype blocks for each chromosome and frequency of haplotype blocks size.

Full size table

Conclusions

The extent of LD among adjacent markers for the Santa Inês breed resembled those of previously reported results in other breeds of domesticated sheep. The mean LD values between all SNP pairs on each chromosome were consistent with domestic and wild sheep (Ovis canadensis and Ovis dalli) and they were lower than the estimates reported in other species. The findings reported in this study will be useful to provide a theoretical reference in determining the number of markers needed for future GS and GWAS in Santa Inês sheep.

Methods

Animal resources, genotyping and quality control

All experimental procedures employed in the present study that relate to animal experimentation were performed in accordance with the resolution number 07/2016 approved by Institutional Animal Care and Use Committee Guidelines from the School of Veterinary Medicine of University Federal of Bahia – UFBA and sanctioned by the president Prof. Claudio de Oliveira Romão to ensure compliance with international guidelines for animal welfare.

The dataset included the genotypes of 396 animals from the Santa Inês sheep breed collected between 2016 and 2017. These animals were fed in confinement for 54 to 92 days on average, during four different periods with slightly different nutritional management. This herd is located at the Experimental Farm of São Gonçalo dos Campos, the city of São Gonçalo dos Campos, Bahia, Brazil, and it is associated with the Federal University of Bahia (UFBA).

To characterize the Santa Inês sheep population, the relationship between animals was estimated using a genomic relationship matrix, G, as described in VanRaden (2008)⁴⁰. The G matrix was constructed by using the PREGSF90 software in the BLUPF90 package^41,42,43. The average relationship between animals (SD) was 0.001 (0.0634), with minimum and maximum values equal to −0.135 and 0.934, respectively. The hierarchically clustered heatmap of the G matrix was constructed using the gplots R package⁴⁴ and is presented in Fig. 3. The heatmap represents the relationship among individuals, with darker shades (red) representing low relationship between animals and lighter tones (light yellow) representing a high degree of relationship. The blocks observed in the heatmap represent individuals with stronger degrees of relationship than the overall mean relationship. By analyzing each block, we observed an overall relationship mean (standard deviation) within all blocks equal to 0.004 (0.0606), varying from −0.023 (0.0291) to 0.079 (0.1514). Random blocks with darker tones within the Fig. 3, for example, showed a lower mean (standard deviation) degree of relationship, with value equal to 0.001 (0.0555). None of the blocks can be considered as an exclusively full-sib or half-sib group⁴⁵, although they include full-sib and half-sib relationships. Inside the most defined diagonal block, for example, 13 full-sib animal pairs and 350 half-sib animal pairs are represented. In the population as a whole, there are one twin animal pair, 38 full-sib animal pairs and 3,089 half-sib animal pairs. The structure of this population can be observed by a distribution printed into the left of Fig. 3, which presents the frequency of pairs by relationship degree. The major density of animal pairs is near zero, representing the overall low relationship among them. It is also possible to observe higher density of animal pairs above zero, closely to 0.25, 0.5 and 1.0, representing the half-sibs, full-sibs and twins as well as a mass lower than zero. The genetic structure of sampling might influence the LD results. For instance, a population with an elevated level of relationship probably will also have a higher level of inbreeding and, consequently, a higher LD level. Therefore, the complex breeding history of Santa Inês may have influenced the estimates of LD.

DNA was extracted from tissue samples of the Longissimus dorsi muscle collected from the left hemi-carcass and stored in 2.0 milliliter (ml) Eppendorf tubes. DNA extraction was performed according to protocols for lysis buffer and RNase. A high-density SNP panel (Illumina High-Density Ovine SNP BeadChip®) containing 54,241 SNP was used for genotyping. Chromosomal coordinates for each SNP were obtained from the ovine genome sequence assembly, Oar_v3.1.

Quality control (QC) of the genomic data was performed by the GenABEL R package⁴⁶ for LD analyses⁴⁷. The PREGSF90 interface of the BLUPF90 program^41,42,43 was used to edit the genomic data for F, N_e, MAF, and haplotype analyses. SNPs with a call rate lower than 0.90, MAF lower than 0.05 and p-value lower than 0.1 for the Hardy-Weinberg Equilibrium Chi-square test were excluded. One sample with a call rate lower than 0.9 was also removed. Table 4 summarizes the number of SNPs per chromosome before and after QC. We considered only the autosomal chromosomes (OAR1 to OAR26) in this study resulting in 38,168 SNPs retained for further analysis.

Table 4 The number of SNPs per chromosome before and after quality control.

Full size table

Inbreeding coefficient and effective population size

Inbreeding coefficient (F) was calculated as a function of the expected and observed homozygote difference by using the PLINK software⁴⁸. This is given by

$${F}_{i}=\frac{({O}_{i}-{E}_{i})}{({L}_{i}-{E}_{i})}$$

(3)

where ${F}_{i}$ is the estimated inbreeding coefficient of the i^ih animal; ${O}_{i}$ is the number of homozygous loci observed in the i^ih animal, ${E}_{i}$ is the number of homozygous loci expected and ${L}_{i}$ is the number of genotyped autosomal loci⁴⁸.

Effective population size (N_e) was obtained by the SNeP software⁴⁹. This software provides a history of the effective population size, that is, the number of past generations based on the relationship between N_e, linkage disequilibrium represented by r², and recombination rate (c) by using the following equation⁵⁰.

$$E[{r}^{2}]={(1+4{N}_{e}c)}^{-1}$$

(4)

Therefore, by solving equation (4), we have:

$${N}_{e(t)}={(4f({c}_{t}))}^{-1}(E{[{r}^{2}|{c}_{t}]}^{-1}-\alpha )$$

(5)

where ${N}_{e(t)}$ is the effective population size at generation t, which is ${(4f({c}_{t}))}^{-1}$⁵¹; ${c}_{t}$ is the recombination rate in generation t which is proportional to the physical distance between markers, r² is LD, and $\alpha $ ${\rm{is}}$ the adjustment for mutation rate. The parameter α can assume three different values: $1,\,2$ or $2.2$⁵². When we consider $\alpha $ equal to 1, ${N}_{e}c$ tends towards 0 and we assume that there is no selection or mutation. On the other hand, when mutation does occur, the parameter $\alpha $ can be equal to 2 or 2.2. The value of 2.2 comes from the result of the equilibrium expression $\frac{E[{({\rho }_{AB}-{\rho }_{A}{\rho }_{B})}^{2}]}{E[{\rho }_{A}(1-{\rho }_{A}){\rho }_{B}(1-{\rho }_{B})]}$ that was equal to $\frac{5}{11}$. In this expression, ${\rho }_{A}\,\,$is the probability of allele A at marker (or SNP) 1, ${\rho }_{B}$ is the probability of allele B at marker (or SNP) 2, and ${\rho }_{AB}$ is a probability of the pair of AB markers; following Ohta & Kimura⁵². Tenesa et al. proposed $\alpha $ equal to two⁵³.

In our study, the ${N}_{e}\,\,$by chromosome was the result of a harmonic mean due to a relatively small number of SNPs in each chromosome. The physical distance was transformed to genetic distance considering one Mb as one centimorgan (cM).

Linkage disequilibrium analysis

The estimation of LD was performed in two ways for each chromosome: (1) between neighboring pairs of SNPs (adjacent SNPs) and (2) pairwise combination of all SNPs (pairwise SNPs) using the function LD in the R package genetics^47,54. The |D′| is a scale of the frequency difference of the allele pairs AB, where A is the allele of the marker (or SNP) 1, and B the allele of the marker 2, and the expected frequency of each allele separately. |D′| parameter ranges from 0 to 1 and it is given by⁵⁵:

$${D}^{\text{'}}=\frac{D}{Dmax}$$

(6)

And

$$D=\,{\rho }_{AB}-{\rho }_{A}{\rho }_{B}$$

(7)

Where

$$\,\{\begin{array}{c}D > 0,\,{D}_{max}=\,{\rm{\min }}({\rho }_{A}{\rho }_{b},{\rho }_{a}{\rho }_{B})\\ D < 0,\,{D}_{max}=\,{\rm{\max }}(-\,{\rho }_{A}{\rho }_{B},-{\rho }_{a}{\rho }_{b})\end{array}\,\}$$

(8)

Here ${\rho }_{A}\,\,$is the probability of allele A at marker 1, ${\rho }_{a}$ is the probability of allele a at marker 1, ${\rho }_{B}$ is the probability of allele B at marker 2, ${\rho }_{b}$ is the probability of allele b at marker 2, and ${\rho }_{AB}$ is a probability of the pair of AB markers. Maximum likelihood was used to estimate ${\rho }_{AB}$ because genotype AB/ab is not distinguishable from genotype aB/Ab⁵⁶.

The squared correlation between the markers, given by r², is expressed as⁷:

$${r}^{2}=\frac{{D}^{2}}{({\rho }_{A}{\rho }_{a}{\rho }_{B}{\rho }_{b})}$$

(9)

where $\,{D}^{2}={({\rho }_{AB}-{\rho }_{A}{\rho }_{B})}^{2}$, ${\rho }_{A}\,\,$is the probability of allele A at marker 1, ${\rho }_{a}$ is the probability of allele a at marker 1, ${\rho }_{B}$ is the probability of allele B at marker 2, and ${\rho }_{b}$ is the probability of allele b at marker 2.

In total, four LD estimates were obtained: (1) |D′| between adjacent SNPs; (2) |D′| between all pairwise SNPs; (3) r² between adjacent SNPs; and (4) r² between all pairwise SNPs.

Haplotype blocks

The haplotype blocks were identified by following the approach suggested by Gabriel et al.⁵⁷ which was implemented via PLINK⁴⁸. Blocks were partitioned according to whether the upper and lower confidence limits on estimates of pairwise |D′| measure fall within certain threshold values. The desired SNP panel density was estimated by the ratio of the megabase pair over the entire ovine genome and distance between markers that composed the haplotype blocks.

Data availability

Data are available on request.

Declarations

All experimental procedures involving sheep were approved by the Institutional Animal Care and Use Committee Guidelines from School of Veterinary Medicine of University Federal of Bahia – UFBA and sanctioned by the president Prof. Claudio de Oliveira Romão (n° 07/2016). All experiments were performed in accordance with relevant guidelines and regulations.

References

Meuwissen, T. H. E., Hayes, B. J. & Goddard, M. E. Prediction of total genetic value using genome-wide dense marker maps. Genetics 157, 1819–1829 (2001).
PubMed PubMed Central CAS Google Scholar
VanRaden, P. M. Efficient methods to compute genomic predictions. J. Dairy Sci. 91, 4414–23 (2008).
Article PubMed CAS Google Scholar
Pritchard, J. K. & Przeworski, M. Linkage Disequilibrium in Humans: Models and Data. Am. J. Hum. Genet. 1–14 (2001).
Miller, J. M., Poissant, J., Kijas, J. W. & Coltman, D.w. The I. S. G. C. A genome-wide set of SNPs detects population substructure and long range linkage disequilibrium in wild sheep. Mol. Ecol. Resour. 314–322, https://doi.org/10.1111/j.1755-0998.2010.02918.x (2011).
Lu, D. et al. Linkage disequilibrium in Angus, Charolais, and Crossbred beef cattle. Front. Genet. 3, 1–10 (2012).
Article Google Scholar
Khatkar, M. S. et al. Extent of genome-wide linkage disequilibrium in Australian Holstein-Friesian cattle based on a high-density SNP panel. BMC Genomics 9, 187 (2008).
Article PubMed PubMed Central CAS Google Scholar
Hill, W. G. & Robertson, A. Linkage disequilibrium in finite populations. Theor. Appl. Genet. 38, 226–31 (1968).
Article PubMed CAS Google Scholar
Kalinowski, S. T. & Hedrick, P. W. Estimation of linkage disequilibrium for loci with multiple alleles: basic approach and an application using data from bighorn sheep. Heredity (Edinb). 87, 698–708 (2001).
Article PubMed CAS Google Scholar
Meadows, J. R. S., Chan, E. K. F. & Kijas, J. W. Linkage disequilibrium compared between five populations of domestic sheep. BMC Genet. 9, 61 (2008).
Article PubMed PubMed Central CAS Google Scholar
Mcrae, A. F. et al. Linkage Disequilibrium in Domestic Sheep. Genetics 160(3), 1113–1122 (2002).
PubMed PubMed Central CAS Google Scholar
Kijas, J. W. et al. Linkage disequilibrium over short physical distances measured in sheep using a high-density SNP chip. Anim. Genet. 45, 754–757 (2014).
Article PubMed CAS Google Scholar
Brito, L. F. et al. Genetic diversity of a New Zealand multi-breed sheep population and composite breeds’ history revealed by a high-density SNP chip. BMC Genet. 18, 25 (2017).
Article PubMed PubMed Central Google Scholar
Al-Mamun, H. A., A Clark, S., Kwan, P. & Gondro, C. Genome-wide linkage disequilibrium and genetic diversity in five populations of Australian domestic sheep. Genet. Sel. Evol. 47, 90 (2015).
Article PubMed PubMed Central CAS Google Scholar
Liu, S. et al. Estimates of linkage disequilibrium and effective population sizes in Chinese Merino (Xinjiang type) sheep by genome-wide SNPs. Genes and Genomics 1–13, https://doi.org/10.1007/s13258-017-0539-2 (2017).
Kijas, J. W. et al. Genome-wide analysis of the world’s sheep breeds reveals high levels of historic mixture and strong recent selection. PLoS Biol. 10, (2012).
Beynon, S. E. et al. Population structure and history of the Welsh sheep breeds determined by whole genome genotyping. BMC Genet. 16, 65 (2015).
Article PubMed PubMed Central CAS Google Scholar
Ventura, R. V. et al. Assessing accuracy of imputation using different SNP panel densities in a multi-breed sheep population. Genet. Sel. Evol. 48, 71 (2016).
Article PubMed PubMed Central Google Scholar
Bolormaa, S. et al. Multiple-trait QTL mapping and genomic prediction for wool traits in sheep. Genet. Sel. Evol. 49, 62 (2017).
Article PubMed PubMed Central Google Scholar
Bolormaa, S. et al. Genomic prediction of reproduction traits for Merino sheep. Anim. Genet. 48, 338–348 (2017).
Article PubMed CAS Google Scholar
Daetwyler, H. D., Kemper, K. E., van der Werf, J. H. J. & Hayes, B. J. Components of the accuracy of genomic prediction in a multi-breed sheep population. J. Anim. Sci. 90, 3375–3384 (2012).
Article PubMed CAS Google Scholar
Raoul, J., Swan, A. A. & Elsen, J.-M. Using a very low-density SNP panel for genomic selection in a breeding program for sheep. Genet. Sel. Evol. 49, 76 (2017).
Article PubMed PubMed Central Google Scholar
Kijas, J. W. et al. A genome wide survey of SNP variation reveals the genetic structure of sheep breeds. PLoS One 4, e46n68 (2009).
Article ADS CAS Google Scholar
Matukumalli, L. K. et al. Development and Characterization of a High Density SNP Genotyping Assay for Cattle. PLoS One 4, (2009).
Espigolan, R. et al. Study of whole genome linkage disequilibrium in Nellore cattle. BMC Genomics 14, 305 (2013).
Article PubMed PubMed Central CAS Google Scholar
Wray, N. R. Allele frequencies and the r2 measure of linkage disequilibrium: impact on design and interpretation of association studies. Twin Res. Hum. Genet. 8, 87–94 (2005).
Article PubMed Google Scholar
Pedrosa, V. B., Santana, J. L., Oliveira, P. S., Eler, J. P. & Ferraz, J. B. S. Population structure and inbreeding effects on growth traits of Santa Inês sheep in Brazil. Small Rumin. Res. 93, 135–139 (2010).
Article Google Scholar
Prieur, V. et al. Estimation of linkage disequilibrium and effective population size in New Zealand sheep using three different methods to create genetic maps. BMC Genet. 18, 68 (2017).
Article PubMed PubMed Central Google Scholar
García-Gámez, E., Sahana, G., Gutiérrez-Gil, B. & Arranz, J. J. Linkage disequilibrium and inbreeding estimation in Spanish Churra sheep. BMC Genet. 13, (2012).
Chitneedi, P. K., Arranz, J. J., Suarez-Vega, A., García-Gámez, E. & Gutiérrez-Gil, B. Estimations of linkage disequilibrium, effective population size and ROH-based inbreeding coefficients in Spanish Churra sheep using imputed high-density SNP genotypes. Anim. Genet. 48, 436–446 (2017).
Article PubMed CAS Google Scholar
Mastrangelo, S. et al. Genome-wide analysis in endangered populations: A case study in Barbaresca sheep. Animal 11, 1107–1116 (2017).
Article PubMed CAS Google Scholar
ARCO. Assistência aos rebanhos de criadores de ovinos - Associação Brasileira de Criadores de ovinos. http://www.arcoovinos.com.br/index.php (2017).
Kominakis, A., Hager-Theodorides, A. L., Saridaki, A., Antonakos, G. & Tsiamis, G. Genome-wide population structure and evolutionary history of the Frizarta dairy sheep. Animal 11, 1680–1688 (2017).
Article PubMed CAS Google Scholar
Mastrangelo, S. et al. Genome wide linkage disequilibrium and genetic structure in Sicilian dairy sheep breeds. BMC Genet. 15, 108 (2014).
Article PubMed PubMed Central Google Scholar
Michailidou, S. et al. Genomic diversity and population structure of three autochthonous Greek sheep breeds assessed with genome-wide DNA arrays. Mol. Genet. Genomics, https://doi.org/10.1007/s00438-018-1421-x (2018).
Badke, Y. M., Bates, R. O., Ernst, C. W., Schwab, C. & Steibel, J. P. Estimation of linkage disequilibrium in four US pig breeds. BMC Genomics 13, 24 (2012).
Article PubMed PubMed Central CAS Google Scholar
Zhao, F. et al. Estimations of genomic linkage disequilibrium and effective population sizes in three sheep populations. Livest. Sci. 170, 22–29 (2014).
Article Google Scholar
Zhao, H. H., Fernando, R. L. & Dekkers, J. C. M. Power and precision of alternate methods for linkage disequilibrium mapping of quantitative trait loci. Genetics 175, 1975–1986 (2007).
Article PubMed PubMed Central CAS Google Scholar
Hill, W. G. & Robertson, A. The effect of linkage on limitsto artificial selection. Genetics 8, 269–294 (1966).
Article CAS Google Scholar
Lindblad-Toh, K. et al. Genome sequence, comparative analysis and haplotype structure of the domestic dog. Nature 438, 803–819 (2005).
Article ADS PubMed CAS Google Scholar
VanRaden, P. M. Efficient methods to compute genomic predictions. J. Dairy Sci. 91, 4414–23 (2008).
Article PubMed CAS Google Scholar
Legarra, a, Aguilar, I. & Misztal, I. A relationship matrix including full pedigree and genomic information. J. Dairy Sci. 92, 4656–4663 (2009).
Article PubMed CAS Google Scholar
Misztal, I., Legarra, A. & Aguilar, I. Computing procedures for genetic evaluation including phenotypic, full pedigree, and genomic information. J. Dairy Sci. 92, 4648–4655 (2009).
Article PubMed CAS Google Scholar
Aguilar, I. et al. Hot topic: a unified approach to utilize phenotypic, full pedigree, and genomic information for genetic evaluation of Holstein final score. J. Dairy Sci. 93, 743–52 (2010).
Article PubMed CAS Google Scholar
R, W. G. et al. gplots: Various R programming tools for plotting data. R Packag. version 2, 1 (2009).
Google Scholar
Visscher, P. M. Whole genome approaches to quantitative genetics. Genetica 136–351, https://doi.org/10.1007/s10709-008-9301-7 (2009).
Aulchenko, Y. Package GenABEL - R package reference manual. 143 Available at, https://cran.r-project.org/web/packages/GenABEL/index.html. (2015).
R Core Team R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria, http://www.R-project.org/ (2013).
Purcell S et al. PLINK (1.07). PLINK: a toolset for whole-genome association and population-based linkage analysis. American Journal of Human Genetics, 81, http://pngu.mgh.harvard.edu/purcell/plink/ (2007).
Barbato, M., Orozco-terWengel, P., Tapio, M. & Bruford, M. W. SNeP: A tool to estimate trends in recent effective population size trajectories using genome-wide SNP data. Front. Genet. 6, 1–6 (2015).
Article CAS Google Scholar
Sved, J. A. Linkage Disequilibrium and Homozygosity of Chromosome Segments in Finite Populations. Theor. Popul. Biol. 2, 125–141 (1971).
Article PubMed CAS Google Scholar
Hayes, B. J., Visscher, P. M., Mcpartlan, H. C. & Goddard, M. E. Novel Multilocus Measure of Linkage Disequilibrium to Estimate Past Effective Population Size Novel Multilocus Measure of Linkage Disequilibrium to Estimate Past Effective Population Size. Genome Res. 635–643, https://doi.org/10.1101/gr.387103 (2003).
Ohta, T. & Kimura, M. Linkage disequilibrium between two segregating nucleotide sites under the steady flux of mutations in a finite population. Genetics 68, 571–580 (1971).
MathSciNet PubMed PubMed Central CAS Google Scholar
Tenesa, A. et al. Recent human effective population size estimated from linkage disequilibrium. Cold Spring Harb. Lab. Press Hum. 2, 520–526 (2007).
Google Scholar
Warnes, M. G. & Leisch, F. Genetics: Population genetics (2005).
Hill, W. G. Estimation of linkage disequilibrium in randomly mating populations. Heredity (Edinb) 33, 229–239 (1974).
Article CAS Google Scholar
Leisch, F., Man, M. & Warnes, M. G. R-Package ‘genetics’ Ver.1.3.8.1. 43 (2013).
Gabriel, S. B. et al. The Structure of Haplotype Blocks in the Human Genome. Science (80-.) 296, 2225–2229 (2002).
Article ADS CAS Google Scholar

Download references

Acknowledgements

This work was supported by São Paulo Research Foundation (FAPESP- Fundação de Amparo à Pesquisa do Estado de São Paulo; process: 2015/25024-5 and 13/04504-3), Brazil. G.B.M. is recipient of productivity fellowship from CNPq. We are indebted to the Federal University of Bahia (UFBA, Brazil) for the partnership to sheep production and Biotechnology Lab (ESALQ- USP, Brazil) for support in genotyping.

Author information

Authors and Affiliations

Department of Animal Science, University of São Paulo (USP)/Luiz de Queiroz College of Agriculture (ESALQ), Piracicaba, SP, Brazil
Amanda Botelho Alvarenga, Gregori Alberto Rovadoscki, Juliana Petrini, Luiz Lehmann Coutinho & Gerson Barreto Mourão
Department of Animal Science, University of Nebraska, Lincoln, NE, USA
Gota Morota & Matthew L. Spangler
Department of Animal Science, Federal University of Bahia (UFBA), Salvador, BA, Brazil
Luís Fernando Batista Pinto & Gleidson Giordano Pinto Carvalho

Authors

Amanda Botelho Alvarenga
View author publications
You can also search for this author in PubMed Google Scholar
Gregori Alberto Rovadoscki
View author publications
You can also search for this author in PubMed Google Scholar
Juliana Petrini
View author publications
You can also search for this author in PubMed Google Scholar
Luiz Lehmann Coutinho
View author publications
You can also search for this author in PubMed Google Scholar
Gota Morota
View author publications
You can also search for this author in PubMed Google Scholar
Matthew L. Spangler
View author publications
You can also search for this author in PubMed Google Scholar
Luís Fernando Batista Pinto
View author publications
You can also search for this author in PubMed Google Scholar
Gleidson Giordano Pinto Carvalho
View author publications
You can also search for this author in PubMed Google Scholar
Gerson Barreto Mourão
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

A.B.A. and G.B.M. are responsible for designing the research. A.B.A. analyzed the data and drafted the manuscript. A.B.A., G.A.R. and J.P. participated of genotypic data editing. L.F.B.P., G.G.P.C. and G.B.M. provided the biological material and phenotypes. A.B.A. and G.A.R. participated in the collection of samples for DNA extraction and L.L.C. contributed with lab methodologies. G.A.R., J.P., G.M., M.L.S. and G.B.M. corrected and contributed with important modifications to the manuscript. All authors reviewed and approved the last version of the manuscript.

Corresponding author

Correspondence to Gerson Barreto Mourão.

Ethics declarations

Competing Interests

The authors declare no competing interests.

Additional information

Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Supplementary figures and supplementary tables

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Alvarenga, A.B., Rovadoscki, G.A., Petrini, J. et al. Linkage disequilibrium in Brazilian Santa Inês breed, Ovis aries. Sci Rep 8, 8851 (2018). https://doi.org/10.1038/s41598-018-27259-7

Download citation

Received: 15 August 2017
Accepted: 25 May 2018
Published: 11 June 2018
DOI: https://doi.org/10.1038/s41598-018-27259-7

This article is cited by

Elucidation of coat colour genetics in blue wildebeest
- Riana van Deventer
- Clint Rhode
- Rouvay Roodt-Wilding
Mammalian Biology (2021)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Subjects

Abstract

Similar content being viewed by others

Genomic scans for selective sweeps through haplotype homozygosity and allelic fixation in 14 indigenous sheep breeds from Middle East and South Asia

Identification of deleterious recessive haplotypes and candidate deleterious recessive mutations in Japanese Black cattle

Genome-wide evaluation of copy gain and loss variations in three Afghan sheep breeds

Introduction

Results and Discussion

Descriptive statistics

Inbreeding coefficient and effective population size

Linkage disequilibrium analysis between adjacent SNPs

Linkage disequilibrium analysis among all pairwise SNPs

Relationship between linkage disequilibrium, inbreeding coefficient and effective population size

Haplotype blocks

Conclusions

Methods

Animal resources, genotyping and quality control

Inbreeding coefficient and effective population size

Linkage disequilibrium analysis

Haplotype blocks

Data availability

Declarations

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing Interests

Additional information

Electronic supplementary material

Supplementary figures and supplementary tables

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Elucidation of coat colour genetics in blue wildebeest

Comments

Search

Quick links