The power of 28 microsatellite markers for parentage testing in sheep

Background: In sheep breeding, there are situations where relationships recorded at the farm among pedigrees such as parent-offspring, full-sibs or half-sibs need to be tested. A panel of 28 microsatellite (MST) markers was tested to provide accurate pedigree information and resolve the common problem of signi ﬁ cant error in pedigree records in Merino sheep. Three different ﬂ ocks of Australian Merino sheep were investigated. A private farm ﬂ ock represents a ﬂ ock with no record availability. Two other ﬂ ocks were maintained under good managements of full keeping records and being selected for high and low parasite resistances. Results: In the studied panel, eight MSTs provided an average of Polymorphic Information content (PIC) equal to 0.65 or more in order to be suf ﬁ cient to make an accurate and successful DNA-based parentage analysis. The panel of twenty-eight MST loci was obviously suf ﬁ cient for providing 100% accurate pedigree and genotyping data. DNA-based pedigree records were constructed and all signi ﬁ cant pedigree record errors were eliminated. Conclusions: These results were used for further study of population genetic parameters such as recombination and haplotyping which heavily based on pedigree information. Nevertheless MST based parentage testing is still available and affordable in most countries and for each farmer with reasonable cost in comparison with fast growing SNP based parentage technologies.


Introduction
In sheep breeding, there are situations where relationships recorded at the farm among pedigrees such as parent-offspring, full-sibs or half-sibs need to be tested. Many studies have shown that the accuracy of pedigree recording is still deficient and the amount of error in pedigree records made in the field is high [1,2]. Revealing pedigree errors is a fundamental step in animal breeding in order to obtain accurate values of heritability and estimated breeding values (EBVs).
Significant pedigree record errors seem to be a common problem in sheep populations and show a lack of accurate pedigree information that reduces the genetic progress of the populations whenever these records are used [1,3]. It has been found in Merino sheep incorrect pedigree record information for 9, 15 and 24% of singles, twins and triplets, respectively [3]. The incorrectness of ewe pedigree recording was usually due to ewes failing to keep their litter together, or lamb desertion, whereas the error in ram pedigree records was due to wrong recording of lambs at weaning or other times as progeny of particular ram mated to ewes in one paddock [3].
Genotyping of DNA using one or more of the genetic markers has become the most common procedure for paternity testing and pedigree inferences in human and livestock species. Many highly polymorphic MSTs have been reported and these MST loci have alleles that are often in the 70-250 bp range [4,5]. Rosa et al. [2] used panel of MST in order to evaluate their use in paternity testing of Brazilian sheep. They have shown that this MST panel was successful for paternity inference in randomly chosen animals. Crawford et al. [1] studied the reliability of pedigrees of five sheep flocks using protein polymorphisms and MST markers and found that pedigree error ranged from 0.31 to 5%. Barnett et al. [3] found that the overall proportion of Australian Merino lambs with incorrect pedigrees was about 10% using MST markers. In the same study it was also found that the proportion of error was 9.9% of single lambs, 15.2% of twins and 3.9% in ram pedigree records. Furthermore, Parsons et al. [6] reported that DNA-derived pedigree using MST markers could be successfully applied in the Australian Merino Sheep. A panel of MSTs for establishing parentage analysis in Australian Merino sheep was developed and used. The panel comprised sixteen MST markers which were highly polymorphic; about half of them had heterozygosity in excess of 80% [7]. The panel was extensively tested in Australian Merino sheep, giving accurate parentage assignment to unambiguously candidate parents even when they were highly related. It was estimated that DNA-based parentage as a pedigree system would almost eliminate any type of error because it has been reported that pedigree analysis using MSTs was very close to 100% accurate [2]. Many and different MST panels were recommended and utilized for parentage testing in sheep [8]. In this study, a panel of 28 MSTs for establishing parentage in Australian Merino sheep was tested for validation.

Sheep
Three different populations of Australian Merino sheep were investigated. A private farm flock represents a control population (CR). Ewes were self-replacing and superior fine wool rams were purchased. Two other populations were flocks maintained by the Commonwealth Scientific and Industrial Research Organization (CSIRO, NSW). One of these flocks was selected for low parasite resistance (LR) and the other for high parasite resistance (HR). Both flocks originated from the same initial population and have been totally separated from each other and other Merino sheep populations since 1976 [9]. The numbers of sampled sheep in the studied populations are shown in Table 1.

Sampling and DNA extraction
Tissue samples were taken from sheep ears. The samples were digested overnight at 55°C in 0.5 mL digestion buffer with 200 μg proteinase K. Following digestion, genomic DNA was extracted from the tissue using phenol/chloroform extraction protocol [10]. After extraction, DNA pellet was dried for 30 min in a 37°C incubator, resuspended in 100 μL TE buffer and then incubated at 55°C for 5 min to aid solubilization. DNA quantified and purified into 10 ng per mL.

Microsatellite genotyping
All studied sheep were genotyped for 28 microsatellite (MST) markers located on different chromosomes (Table 2). For the MST markers interrogated by the automated genotyping approach, analysis was performed using an ABI's 373XL sequencer [11]. A panel of twenty-eight MST markers from the ovine, caprine and bovine genome was used to genotype DNA ( Table 2). The panel of MST markers was designed, developed and used as a part of an automated progeny testing system used in sheep lineage analysis at the McMaster Laboratory-CSIRO, Prospect-Sydney, Australia [7]. The MST marker panels were grouped in four sets of fluorescent-labeled primers. In sets one to three, five primer pairs were used in each set for multiplex amplifications. Set four consisted of seven primer pairs. Forward and reverse primers in sets one to three were end-labeled with 6-carboxyfluorescein (6-FAM; blue), tetrachloro-6-carboxyfluorescein (TET; green), or hexachloro-6-carboxyfluorescein (HEX; yellow), respectively. In set four, primers were labeled with only one [7]. The size standard GX-350-6-carboxytetramethylrhodamine (GX-350 TAMRA; red) was used. Each MST panel was used individually in four PCR reactions.
PCR reactions of 10 μL were performed in 384-well microlitre PCR plates. The volume and concentration of PCR reagents used in the automated genotyping experiments were 3 μL of 10 ng/μL genomic DNA, 1 μL of 4 mM primer mix, 0.8 μL of 25 mM MgCl 2 , 1 μL of 2 mM 4dNTP's, 1 μL of 10× Taq polymerase buffer, 0.1 μL of 5 U/μL Taq polymerase and 3.1 μL of Sterile milliQdH 2 O. Master mixes for each of four MST sets were prepared individually. Sample DNA was loaded into the wells of the PCR plate and then 7 μL of master mix was added. The plate was then placed onto a PTC-200 programmable Thermal Controller (MJ Research, Inc.) using the following cycling parameters; initial denaturation at 95°C for 2 min, denaturation 94°C for 45 s, annealing 57°C for 45 s, extension 72°C for 60 s, and final extension 72°C for 7 min. Initial denaturation and final extension were performed for one cycle, whereas denaturation, annealing and extension were repeated for 30 cycles. The PCR products for panels one to three were co-loaded in each well and panel four was loaded in a separate well into the gel.

Statistical analysis
The CERVUS was also used for parentage analysis. The CERVUS [12] is designed for large-scale parentage analysis using autosomal and co-dominant loci. For each offspring tested, the parentage analysis module calculates likelihood ratio (LOD) scores for each candidate parent, finds the two most likely parents and calculates the corresponding Delta score. The final step is to evaluate the confidence of the Delta score using the appropriate criteria. Different modules were determined and then utilized to calculate the allele number, expected (H e ) and observed (H o ) heterozygosities and polymorphic information content (PIC), probability of exclusion (PE), Hardy Weinberg Chi-square statistics and null allele frequency at each locus. LOD score measures the likelihood that the candidate parent is the true parent divided by the likelihood that the candidate parent is not the true parent. For each offspring, a new score was then calculated called the Delta score (Δ), which is calculated as the difference in LOD scores between the first and second most likely candidate parents. Then when using real data in parentage analysis, any most likely candidate parent with Δ score exceeding the critical Δ score for 95% confidence was simulated and awarded parentage with 95% confidence.

Allele frequency and polymorphism
A total of 519 individuals from the three studied populations were genotyped for 28 microsatellite loci distributed across the  Table 1. These values were high and similar in the populations, except a few cases. PIC was also high in all loci across the three flocks. These results were quite expected for microsatellite loci, which have demonstrated high polymorphism in all species studied so far.

Parentage analysis
Correctness of the genotyping and pedigree data was critical for any genetic investigations, particularly the ones using genetic related-pedigree information. It is well know that errors in the pedigree records are quite common and misidentification of genotypes is also possible [13]. Accordingly, CERVUS program was used to infer parental-offspring assignment in the three sheep populations using MST genotypic data. CERVUS is a paternity/maternity allocation program which uses likelihood ratios to assign statistical confidence of parentage to a given set of parents. CERVUS uses three different modules in which different simulated and real parameters are estimated in order to perform the parentage assignment. In the first module, CERVUS provided genetic diversity statistics for each of the studied loci about a PE. PE is the average probability of excluding a single unrelated candidate parent from parentage of a given offspring at one or more loci, assuming that no typing errors occur and thus it is a good predictor of probability of correct parentage assignment. The results showed that the total PE for the first parent from parentage of an arbitrary offspring, given only the genotype of the offspring at the twenty-eight MST genotypes, was around 0.9999. The total PE for the second parent, given both the genotypes of the offspring and the first parent was 1.00 (Table 3). Such high probabilities were a good indication that the parentage assignment using the twenty-eight MST markers was done correctly.
Simulation, the second module of the program, simulated parentage analysis for different values of the parameters, based on allele frequency estimates. Assuming neither parent for an offspring was known, the simulation module estimated the Δ parameter which was the statistics used to assess the reliability of parentage assignment. The simulated parameters (Table 4) were then used in the third module, parentage assignment analysis, in order to estimate these parameters.
The parentage assignment calculated the critical differences (critical Δ) in likelihood LOD of the first most likely and second most likely parent (ram or ewe). An example of parentage assignment adapted from LR parentage test analysis can be found in Table 5. In Table 5, four offsprings were matched with their candidate parents when one parent is known. The higher candidate LOD score parent is considered putative parent.

Success rate of parentage analysis
CERVUS estimated the success rate of the assignment at both confidence levels of 95% and 80%. At the beginning of the analysis, genotyping errors were identified. In only fourteen cases of the total number of genotyping (13,235), discrepancies were found in 1 locus, and these were most likely caused by genotyping scoring errors or mutation events. These very rare discrepancies of a percentage of b1% in no way significantly affected confidence of the pedigree reconstruction and these fourteen genotypes were not utilized in further analysis. Checking these cases by repeating genotyping of alleles that have not been assigned to any of the parents, two mutations were found in each of LR and HR and none in the CR.
Excluding the fourteen cases of unresolved genotypes and mutations, the success rate of progeny test was as in Table 3. So the success rate at strict confidence level of the assignment was 100% for the three populations at both confidence levels ( Table 6). These results were observed on the assumption that there were no genotyping errors (error rate = 0) and they are equivalent to the simulation results.

The probability of exclusion
The PE increased sharply with extra markers genotyped (Fig. 1). As an example, using five markers with an average of seven alleles each gave PE close to 98.30%, while with an average of ten alleles gave PE close to 99.60%. Using six markers with an average of seven alleles gave PE close to 98.80%, whereas an average of ten alleles gave PE close to 99.82%. Fig. 1 shows the relationship between PE and the number of loci. The major finding is that the increment in PE becomes insignificant after using more than eight MST markers, in which PE is close to 99.60%. These eight markers had an average of PIC around 0.65.
It is important to emphasize that twenty-eight genotyped loci with an average of higher than eight alleles in each locus increased PE in the studied populations up to the possible maximum 1. It means that CERVUS was able to identify all rams, ewes and their offspring with very high precision and any other allocations of parents or offspring are extremely unlikely.

Pedigree recording errors
The results of DNA-based parentage tests revealed that the level of error rate in pedigree records was significant; 7.79 and 8.70% for ewes and 5.19 and 5.43% for rams in the LR and HR populations. The error rate was also 4.04% for rams in CR population ( Table 7). Results of parentage testing rather than pedigree records were used for further analysis in the three populations.
Regardless of management practices employed to record pedigrees, all populations had lambs which had incorrect pedigree information for both ewe and ram wherever pedigree records were available. In addition, the ewe pedigree errors were more common than the ram pedigree errors in the experimental populations. Out of the total lambs in both populations, 7.69% (Table 7) had incorrect ewe pedigrees. The percentage of the errors in the ewe pedigree was slightly higher in HR than LR. Of all lambs genotyped, 4.85% had incorrect ram pedigree (Table 7).

Discussion
Obviously, accuracy of pedigree and genotyping data is critical for any type of genetic investigation, in particular the ones that are pedigree related. For example, linkage disequilibrium analysis and segregation distortion analysis are heavily based on pedigree information, and the more accurate pedigree information and genotypes the more reliable the results. Only the 100% accurate Table 3 Total exclusion probability using twenty eight microsatellite genotypes.  Table 4 Parentage parameters used in CERVUS parentage analysis, following allele frequency estimations and simulations.

Parameter Value
Percentage of candidate parents typed 100% Percentage of loci typed 100% Rate of mismatching error used 0, 1 and 10% Number of tests performed 10,000 Strict confidence level of parentage assignment 95% Relaxed confidence level of parentage assignment 80% DNA-based pedigree is used to reconstruct maternal and paternal haplotypes as well as determine maternal and paternal alleles. Error rate in ewe pedigree recording (7.69%) was higher than for ram pedigree (4.85%) in both selected populations. This might be due to ewes failing to keep their litter together, lamb desertion, lamb separation also lamb stealing [14], whereas the error rates in ram pedigree were found to be higher in the HR population than the LR and CR populations. This is mainly due to incorrect information collected on the basis of lambing the ewes that were seen in the paddock with the putative ram. These results are close to those found by Crawford et al. [1] in five sheep populations where the pedigree error ranged from 0.31 to 5%. Moreover, Barnett [3] found that the overall proportion of Australian Merino lambs with incorrect ewe pedigrees was about 10%, whereas it was 3.9% in ram pedigrees. The significant pedigree record errors seem to be a common problem in sheep populations. The consequent lack of accurate pedigree information will reduce the genetic progress of the populations whenever these records are used.
The only reliable solution to this problem is to provide accurate pedigree information using DNA-based parentage analysis. The results of DNA-based parentage analysis showed that successful parentage assignment for offspring using the twenty-eight MST loci, given the genotypes of the offspring and the parents, was 100% (Table 7). However, the minimum number of MST markers required to make parentage assignment is approximately eight MST markers with an average of eight to nine alleles per MST marker (average PIC = 0.650) which will give PE close to 99.60%. Thus eight such markers are sufficient for successful parentage analysis. A similar finding was reported for Australian Merino sheep by Barnett [3]. However some MST markers with high PIC are more useful within a panel for pedigree analysis in Merino sheep than others. This is mainly because Merino sheep populations are not homogenous in terms of PIC of MST markers. Therefore, a panel of fewer MST markers and high PIC would give the same result as that of more markers but low PIC. Indeed, this reduces the cost of MST genotyping and thus reduces the cost of DNA based parentage assignment. Similar validation and power of MST panel were reported for bison and cattle [15], domestic horse [16], thoroughbred horses [17], goats [18,19], dairy breeds [20], beef cattle [21] and sheep [22,23].
The 100% accurate pedigree information using DNA-based parentage analysis is recommended in Merino sheep breeding programs for better estimates of EBVs and increasing genetic progress. Many sheep breeders indicated the importance of parentage testing and effect of misidentification on the estimation of breeding value [22,24,25,26]. Nevertheless, there is some debate over whether DNA-based pedigree is cost-effective in sheep. In this study such an analysis was not performed because it was not one of the project targets. However, it has been reported that DNA fingerprinting using MST markers could be cost effective for Australian Merino breeders if full pedigrees had been used in estimating EBVs by the BLUP procedure [3]. In addition, a review on recently common use of MST marker and SNP marker panels show affordability by researchers and farmers with future breeding perspectives [24,25,27,28,29]. However, SNPs, in terms of genetic information, are biallelic markers considered as a step backwards. The SNPs promising advantages are their greater abundance in the genome. On the other hand, 2-3 SNPs per one MST marker were needed to obtain equivalent cumulative exclusion power values. Generally 24 SNPs were equivalent to the 12 MST markers for cattle paternity recommended by International Society for Animal Genetics [28]. A typical microsatellite parentage test can be affordable for USD 25-35 per sheep. SNP of 100 markers tests, on the other hand, can be purchased for USD 15-20. Recently, for many reasons, SNPs have become the focus of efforts to improve sheep parentage testing [29]. However, SNP genotyping technologies are not available and affordable in each country and for each farmer.

Conclusion
Using MST markers was the only reliable solution to provide accurate pedigree information and resolve the common problem of significant error in pedigree records in Merino sheep populations. A panel of eight MST markers with an average of PIC equal to 0.65 or more would be sufficient to make accurate and successful DNA-based parentage analysis. In this work, twenty-eight MST loci were used and they were obviously sufficient for providing 100% accurate pedigree and genotyping data. These data were used for study population genetic parameters such as recombination, haplotyping, LD and SD, which are heavily based on pedigree information.
Nevertheless, there is some debate over whether DNA-based pedigree is cost effective in Merino sheep. It was reported that DNA fingerprinting using MST markers could be cost effective for Australian Merino breeders when full pedigree has been used in estimation of EBVs using Best Linear Unbiased Predication (BLUP) procedure [3].