Use of meta-analysis to combine candidate gene association studies: application to study the relationship between the ESR PvuII polymorphism and sow litter size

This article investigates the application of meta-analysis on livestock candidate gene effects. The PvuII polymorphism of the ESR gene is used as an example. The association among ESR PvuII alleles with the number of piglets born alive and total born in the first (NBA1, TNB1) and later parities (NBA, TNB) is reviewed by conducting a meta-analysis of 15 published studies including 9329 sows. Under a fixed effects model, litter size values were significantly lower in the "AA" genotype groups when compared with "AB" and "BB" homozygotes. Under the random effects model, the results were similar although differences between "AA" and "AB" genotype groups were not clearly significant for NBA and TNB. Nevertheless, the most noticeable result was the high and significant heterogeneity estimated among studies. This heterogeneity could be assigned to error sampling, genotype by environment interaction, linkage or epistasis, as referred to in the literature, but also to the hypothesis of population admixture/stratification. It is concluded that meta-analysis can be considered as a helpful analytical tool to synthesise and discuss livestock candidate gene effects. The main difficulty found was the insufficient information on the standard errors of the estimated genotype effects in several publications. Consequently, the convenience of publishing the standard errors or the concrete P-values instead of the test significance level should be recommended to guarantee the quality of candidate gene effect meta-analyses.


INTRODUCTION
Meta-analysis can be defined as the application of statistical procedures to collections of empirical findings from individual studies for the purpose of integrating, synthesising, and making sense of them [60]. Although in livestock it is not a usual analytical tool it is in medical research, perhaps because of the widely recognised importance of being systematic when reviewing the evidence available on the benefits and risks of medical decisions [54]. Only recently, meta-analysis has been introduced to study genes of interest in livestock [15,19,23] although not in the context of candidate gene studies.
The candidate gene approach presents several advantages over genome scanning and can provide strong knowledge on candidate genes but also presents some limitations. The main weakness of this approach is that an association does not necessarily demonstrate that the gene is causally related to phenotype. Association could also be because (1) the gene may not itself be causal, but may be sufficiently close to a causal locus being in linkage disequilibrium with it, or (2) the association may be due to confounding by population stratification or admixture [8]. Furthermore, most studies usually test multiple phenotypes and they do not adjust their significant thresholds for multiple comparisons. Consequently the possible involvement of the candidate gene in the regulation of a trait needs to be validated by other studies [18]. The need for validation makes the application of meta-analysis in candidate gene reviews especially interesting.
The Estrogen Receptor gene is an interesting example. The PvuII polymorphism at porcine estrogen receptor gene (ESR) [43] was designated in 1994 as a major gene for litter size. One ESR allele with a 3.7-kb fragment (called the B allele) was reported to be significantly associated with higher total number born (TNB) and number born alive (NBA) in a 50% Meishan synthetic line [44]. Since then, many studies have been published, mainly aimed at answering whether the association with litter size exists in other populations, but also focussed on the role of the ESR gene on other reproductive traits [22,55,56] and also on growth and carcass traits [28,37].
Reviews on the PvuII ESR polymorphism and litter size relationship tend to consider that results have clearly demonstrated they are significantly associated [2], but several of the last published results, like those of Drogemuller et al. [13], Gibson et al. [14], Kmiec et al. [25] or Noguera et al. [33], did not find any significant association, and different reasons, mainly of genetic nature, have been argued to explain these apparently controversial results. Beyond the scientific reasons, the gene is also interesting due to its commercial implications: the intellectual property of the PvuII ESR gene polymorphism is protected by patent applications [41,46] and is exclusively licensed to one pig breeding company [40]. This fact and the fact that to date a causative mutation has not been described should also be considered in order to understand the commercial controversy that ESR gene results have attracted [42].
The aim of this paper was to investigate the interest of the application of meta-analysis on livestock candidate gene effect reviews. The relationship between the PvuII polymorphism of the ESR gene and pig litter size is used as an example because of a large number of studies published and the controversies arising from the apparently conflicting results obtained.

MATERIALS AND METHODS
Meta-analysis was performed following the structure of meta-analysis of Mann and Ralston [31]. Studies in which the PvuII ESR polymorphism has been related to litter size in pigs were identified by electronic searches of Science Citation Index Expanded TM , CAB Abstracts  and Current Contents Connect  in February 2004 using several combinations of search terms including "estrogen", "oestrogen", "receptor", "gene", "locus", "polymorphism", "pig" and "swine". Later, the references of retrieved articles were also screened.
Four litter size traits were considered in meta-analysis: number born alive and total born in the first parity (NBA1, TNB1) and in later or all the parities (NBA, TNB). The criteria for excluding the identified studies and/or populations were the following: (1) Mean genotype values or differences for NBA1, TNB1, NBA or TNB were not estimated (i.e., only allele frequencies referred) [6,7,58] or reported [22,39]; (2) No sufficient information is provided in proceedings or English abstracts [24,36,50,51]; (3) Populations are monomorphic [12,13,30,53] or are analysed in other studies using a different number of data or methodology [34,44,45,48] (studies with more information or published in peer review journals were chosen).
Data were analysed using the Revman 4.2.3 software package available from The Cochrane Collaboration (http://www.cochrane.org). Two comparisons were made, "AB" heterozygotes versus "AA" homozygotes, and "BB" homozygotes versus "AA" homozygotes. An inverse variance method and fixed and random effects models were employed [1]. Differences between genotype groups were estimated from published values of genotype or allele effect. The standard errors of differences were computed based on available information: (1) the standard errors of genotype effect estimates [11,14,21,30,32,33,57]; (2) the standard deviation and the number of records [25] or, when not available, the number of animals [5] of each genotype class; (3) the standard deviation estimated at the population level and the number of records [29]; (4) the P-values of t-test and the number of records of each genotype class [13]; (5) the level of significance of t-tests and the number of records of each genotype class [26,47,49,52]. When the number of records of each genotype class was not available, they were deducted from estimated allele frequencies assuming Hardy-Weinberg equilibrium.
Funnel plots were performed to look for evidence of publication bias. Funnel plots are a graphical approach to research synthesis that informs on selective reporting, preferential publication of results consistent with expectation or statistically significant, that may introduce a profound bias on meta-analysis results [35]. Funnel plots were drawn as plots of standard error of effect estimate versus the effect estimate for each study.

RESULTS
In total, twenty-three of the studies identified were not considered and fifteen were included in meta-analysis. The details of the studies included are summarised in Table I.

Number of born alive
Ten and eleven studies analysing sixteen and seventeen populations with 8010 and 8012 animals were considered for NBA1 and NBA respectively. The results are shown in Figure 1. The values for NBA1 were significantly lower (P ≤ 0.005) in the "AA" (number of animals: n = 2638) versus "AB" (n = 3512) genotype groups. The difference was 0.38 under the fixed effects model and 0.41 under the random effects model. The NBA difference estimated between "AB" (n = 3570) and "AA" (n = 2555) groups under the random model was not significant (P = 0.06) although it was under the fixed model (0.20 piglets, P = 0.0002). The differences between "BB" (n = 1860) and "AA" (n = 2638) groups were significant (P ≤ 0.0009) for NBA1: 1.11 and 1.30 under fixed and random models respectively. Similarly, the differences of 0.22 and 0.61 NBA among "BB" (n = 1887) and "AA" (n = 2555) genotypes, for fixed and random models respectively, were also significantly different to zero (P ≤ 0.00001).

Total number of born
For TNB1 and TNB, ten and twelve studies analysing sixteen and eighteen populations with 7755 and 7932 animals were respectively considered.  2 Follicular-stimulating hormone beta subunit genotype. 3 All parities excluded first parities. 4 All parities included first parities. 5 Second parities. 6 First and second parities.
The results are shown in Figure 2. Higher TNB1 values were obtained in carriers of the "B" allele (n = 3460) when compared with "AA" homozygotes (n = 2436). The difference was 0.38 under both models, being the test significant for overall effect (P ≤ 0.01). For TNB the difference was 0.24 (P < 0.001) under the fixed effects models but non significant under the random model [0.22 (P = 0.07)] (n "AB" = 3578; n "AA" = 2467). The fixed effect model makes the assumption that there is one single average effect, and that all the studies come from a population of studies measuring this effect. The random effects model assumes that the true effect varies with normal distribution and is more conservative because confidence intervals end up wider. It should be considered that the random effects model gives less weight to bigger studies than the fixed effects model and when results from the two methods differ by a lot, the results can not be considered robust to the assumptions made in the analysis. Nevertheless, although the P-value of both models was different, the difference in the effect on TNB was quite small, which is the reason why a priori nothing indicated that the combination of the studies was inappropriate [1]. The TNB1 difference between genotypes for the "BB" (n = 1859) and "AA" (n = 2436) comparison under a fixed effects model was 1.08 (P < 0.001) and 1.21 (P = 0.003) for the random effects model. For TNB these differences were 0.36 (P < 0.001) and 0.66 (P = 0.002) (n "BB" = 1887; n "AA" = 2467).

Heterogeneity and inconsistency among results
The Chi-square test for heterogeneity was significant for all the traits and contrasts (P ≤ 0.03). The measure of inconsistency among the results, l 2 statistic, ranged from 43.8 to 80.2 % (Figs. 1 and 2). The l 2 statistic should be interpreted as the proportion of total variation that is due to heterogeneity rather than sampling error [1]. The results show that there is more variation in the results of studies than expected by chance. They suggest that the studies are different enough to be combined, at least without trying to work out why these studies came up with such different results to one other.

Funnel plot analysis
Funnel plots were symmetrical for the "AB" versus "AA" genotype group comparisons, providing no clear evidence in favour of selective publication of positive studies on these traits (Fig. 3). When "BB" versus "AA" genotype groups were compared, funnel plots also showed symmetry for first parities. For all parities, more studies referred positive than negative difference in favour   (2) all the parities (NBA): (a) "AB" heterozygous versus "AA" homozygous, (b) "BB" homozygous versus "AA" homozygous. The confidence interval (CI) for each study is represented by a horizontal line and the point estimate is represented by a square. Weight (%) and overall effect (95% Confidence Interval) and its tests are shown for fixed and random effects models.
← of the "BB" genotype and some bias in publishing positive B allele effects could be suspected, although we think that there is not clear evidence. The effect of selective reporting on meta-analysis can be addressed in several ways, but funnel graphs allow the readers to judge for themselves how well behaved the data are [35].

DISCUSSION
The present meta-analysis shows several limitations. The first drawback concerns the diversity among studies in traits (e.g. all or later parities), experimental design, models of analysis of genotype effects, etc. Nevertheless, the main problem that could be addressed is the diversity on the quality of standard error estimates of genotype effects. Standard errors originally not reported and computed in this work to make meta-analysis possible are over or underestimated. For example, for Korwin-Kossakowska et al. [26], Rothschild et al. [47], Short et al. [49] and Southwood et al. [52] standard errors were overestimated when the differences were significant, since they represent the maximum standard error inferred by the level of significance reported. And, oppositely, when differences were not significant, standard errors were underestimated, since they represent the minimum standard error inferred by the level of significance reported.
This problem is especially important because the inverse variance method was used in the present work. However, it could not be better surpassed given the information available in the publications of the studies. If no information is available in the study report to perform the meta-analysis, the reviewer is forced (1) to exclude the study and risk introducing bias, (2) to impute missing data and risk making a different type of error, or (3) to use a narrative approach to synthesis [1]. Here, the option to impute approximated, or threshold values, to missing data was chosen. A first unpretentious conclusion can be derived suggesting referees and editor's journals to demand the publication of (1a) TNB1: AB vs. AA Gibson et al [14] Isler [21] Kmiec et al [25] Korwin-Kossakowska et al [26] Legault et al [29] Matousek et al [ (2) all the parities (TNB): (a) "AB" heterozygous versus "AA" homozygous, (b) "BB" homozygous versus "AA" homozygous. The confidence interval (CI) for each study is represented by a horizontal line and the point estimate is represented by a square. Weight (%) and overall effect (95% Confidence Interval) and its tests are shown for fixed and random effects models.
← standard errors or concrete P-values instead of the significance level of performed tests. In general, it is particularly unhelpful to state that significant differences between means are achieved at the 5% level without stating (a) the mean values and the sample size, (b) the standard error and (c) the name of the test chosen [38]. Despite this serious limitation, the present meta-analysis can be helpful to disentangle the relationship between ESR PvuII polymorphism and sow litter size. The results show that a difference between B carriers and homozygote non-carriers was around 0.2 in all parities and 0.4 in first parities, for both traits piglets alive and total born. The difference between both homozygotes was close to 0.5 in all parities and close to 1.2 in first parities. It is a common practice to infer the genetic effects such as dominant, additive or recessive at candidate genes when significant effects are found. Here, ESR PvuII polymorphism did not reflect a consistent mode of gene action for all the traits. Under the fixed effects model, additive and dominant effect estimates were, respectively, 0.56 and -0.18 for NBA1, 0.54 and -0.16 for TNB1, 0.11 and 0.09 for NBA, and 0.18 and 0.06 for TNB.
The results also showed a high statistical heterogeneity among studies. The origin of this heterogeneity should be clarified before inferring from the results obtained. A common criticism of meta-analyses is that they combine 'apples with oranges' [60]. If there is considerable variation in the results, it may be misleading to quote an average value for the treatment effect [1]. The heterogeneity we observed was not free of standard estimation problems, but controversial results have been remarked in almost all the studies reviewed. Several explanations have been argued.
The first hypothesis, frequently discussed in the literature, is the sample size of experiments carried out. This hypothesis could be considered for several of the experiments reviewed but it could be rejected for the studies of Rothschild et al. [47], Short et al. [49] and Southwood et al. [52] given the number of pigs involved. However, the sampling error is not the only cause that explains the diversity found in the results of the literature. While Rothschild et al. [47] and Short et al. [49] found differences for all traits, Southwood et al. [52] did not find a significant effect of the B allele in NBA and TNB. This lack of significance was mainly due to a second parity drop, which had its largest effect impact in the more productive genotypes [52]. Rothschild and Plastow [42] explain how this drop was related to the nutritional regime used and indicate that the results obtained are an interesting example of genotype by environment interaction. Recently, and this is the reason for the non inclusion in this meta-analysis, the results of another large sample size study have been published [16,17], reinforcing the idea that sample size is not the only source of heterogeneity among studies. Upon analysing data of approximately 1250 sows and 3600 litters, the B allele was found to be disadvantageous to the A allele for prolificacy [16].
The second hypothesis is that the ESR gene is not a major gene but a marker. The PvuII polymorphism could be linked with the causative mutation within the ESR gene or closely linked with unknown quantitative trait loci with an effect on litter size. It should be kept in mind that the PvuII site is located in an intron which makes a difference in expression or in structure of ESR relative unlikely, and ESR being a marker for litter size more likely [57]. Different linkage relationships may be the reason why estimates vary across populations, although it has been observed that the ESR effect can differ in its magnitude and direction not only across but also within populations [17]. The genome scanning approach had not shown evidence for QTL influencing litter size in the region harbouring the ESR locus on chromosome 1 e.g., [4,13,39,59], but it should be considered that these screenings were of poor power to detect a QTL. The fact that a causative mutation has not been described and made public continues to be the main obstacle to conclude the evidence for the hypothesis of the PvuII polymorphism being a marker of a gene affecting sow litter size [42].
The third hypothesis in the literature is background effects of other genes interacting with the ESR gene, i.e., epistasis. The effect of PvuII polymorphism would depend on its frequency and the frequency of alleles at other loci, and it could have a small effect in one population and explain a significant portion of the variance in another population [30]. However, little is known about the magnitude of epistatic variation in sow litter size and consequently there is no knowledge to support this speculative hypothesis.
In summary, the three main hypotheses discussed in the literature reviewing the controversial results of different studies do not seem to give a sufficient explanation for the heterogeneity found in the present work, and other additional hypotheses could be considered. One of them is the effect of population admixture/stratification on association analyses. It is well known that admixture/stratification could have a severe incidence in association analysis e.g. [10]. This has been discussed in the analysis of other candidate genes as for example the Melanocortin 4-Receptor in pigs [20], but not in the context of the ESR gene. Only Noguera et al. [33] introduced the idea that population structure, with one boar contributing to a larger extent in one of two lines analysed, could affect the results although they rejected this possibility in the light of further analysis performed excluding its offspring.
In the present meta-analysis, an important number of studies reviewed were performed on mixed populations. For example, the larger population analysed [49] resulted from the mixture of four different lines. Three of the lines were of Large White origin and the fourth was a synthetic line 3 / 4 Duroc and 1 / 4 Large White. The frequency of the B allele was similar in the three Large White lines (range 0.64 to 0.74) but was considerably less in the 3 / 4 Duroc line (0.17). This disparity in frequencies could arise because each population has a unique history. It has been postulated that the B allele has a Chinese pig origin and that this presence in occidental selected lines may be the result of interbreeding of Chinese and English pigs and the later crosses of resulting crossed populations with other breeds and populations e.g. [49]. Under this hypothesis the population frequency discrepancies will be widespread throughout the genome. Consequently, the assumption of no confounding effects could be violated. Indeed, nearly all outbred populations are confounded by genetic admixture at some level; the challenge is not merely to show that it exists, but to avoid the possibility of making erroneous conclusions because of it [3].
All fifteen studies included in the present meta-analysis tested the difference in prolificacy between genotypes, applying a methodology based on ANOVA models (mixed models, animal model, etc.). It is known that this kind of test is prone to detect spurious results due to the confounding effects of population admixture/stratification [10]. Unless samples are drawn from populations known to be genetically homogeneous, other tests than those based on ANOVA are recommended in order to avoid misleading results [20]. In a meta-analysis context it should be remarked that population admixture/stratification could arise in false positive but also in false negative effects [9]. It is important to correctly interpret the results because usually, only positive results are regarded as being potentially confounded by population admixture/stratification. Lander and Schork [27] indicated that a first step that should be taken to prevent spurious associations arising from admixture/stratification is to perform the association studies within relatively homogeneous populations. If an association can only be found in large mixed populations but not in homogeneous groups, one should suspect admixture/stratification and conclude that no evidence on genetic association exists. Therefore, an additional meta-analysis was performed for the "AB" versus "AA" genotype groups in which results from mixed populations [5,11,21,29,47,49,52] were excluded. Eight studies involving ten populations and a total of 1722 sows were considered. The contrast between "BB" and "AA" groups was not performed because the small number of contrasts in not mixed populations. The test for heterogeneity is not very sensitive at detecting excess variation if there are few studies [1]. In this additional meta-analysis the test for heterogeneity was not significant for NBA1 and TNB1 (P = 0.52 and 0.69 respectively) and for NBA and TNB P-value was smaller than when all the studies were considered (P = 0.02 in front of P ≤ 0.001). On the contrary, it did not show any significant difference between "AB" and "AA" genotype groups in opposition with previous results (NBA1: P f = P r = 0.80; NBA: P f = 0.70, P r = 0.68; TNB1: P f = P r = 0.42; TNB: P f = 0.27, P r = 0.51; being P f and P r , P-values for fixed and random effects models respectively). In summary, when studies that analysed mixed populations were excluded, heterogeneity was reduced and no significant association was found. As result, the admixture/stratification effect could be suspected affecting positive results. However, a solid conclusion cannot be extracted from this second meta-analysis because the largest sample size studies were not considered because they analysed mixed populations.
In conclusion, meta-analysis is revealed like a helpful analytical tool to synthesise and discuss livestock candidate gene effects. The meta-analysis performed shows that there is a large heterogeneity in results on ESR PvuII polymorphism association with sow prolificacy. Although it is not usually discussed in the livestock framework, population admixture/stratification can be a source of heterogeneity in candidate gene analyses, and it could be interesting to separately analyse populations with different genetic background or to apply methodologies robust to the effect of population structure e.g. [20]. Finally, the publication of standard errors or concrete P-values instead of the test significance level is convenient in order to guarantee the quality of candidate gene effect meta-analyses.