A novel meta‐analytical approach to improve systematic review of rates and patterns of microevolution

Abstract A classic topic in ecology and evolution, phenotypic microevolution of quantitative traits has received renewed attention in the face of rapid global environmental change. However, for plants, synthesis has been hampered by the limited use of standard metrics, which makes it difficult to systematize empirical information. Here we demonstrate the advantages of incorporating meta‐analysis tools to the review of microevolutionary rates. We perform a systematic survey of the plant literature on microevolution of quantitative traits over known periods of time, based on the scopus database. We quantify the amount of change by standard mean difference and develop a set of effect sizes to analyze such data. We show that applying meta‐analysis tools to a systematic literature review allows the extraction of a much larger volume of information than directly calculating microevolutionary rates. We also propose derived meta‐analysis effect sizes (h, LG and LR) which are appropriate for the study of evolutionary patterns, the first being similar to haldanes, the second and third allowing the application of a preexisting analytical framework for the inference of evolutionary mechanisms. This novel methodological development is applicable to the study of microevolution in any taxa. To pilot test it, we built an open‐access database of 1,711 microevolutionary rates of 152 angiosperm species from 128 studies documenting population changes in quantitative traits following an environmental novelty with a known elapsed time (<260 years). The performance of the metrics proposed (h, LG and LR) is similar to that of preexisting ones, and at the same time they bring the advantages of lower estimation bias and higher number of usable observations typical of meta‐analysis.

. Moreover, because different traits from different species are analyze together, haldanes is a more appropriate measure because it expresses all changes in a scale-independent common unit, the standard deviation of the variables (SD). Gingerich (1983) also developed a graphical and statistical framework of analysis (LRI) which allows inference of evolutionary processes from the temporal pattern of evolutionary rates. Briefly, the LRIframework is the linear regression between the log 10 of the absolute values of all rates of change in a temporal series (or the rates of change of a random population of independent, but identically distributed trajectories) and the log 10 of the elapsed time between samples used to compute the rates of change. This analysis provides information about the "intrinsic" rate of change (i.e., 10 exp {the intercept of the regres-sion}) and the predominant mode of evolution (directional change, random change, or stasis). The slope of the linear regression is expected to be zero if change is predominantly directional, −0.5 if change is not different from an unbiased random walk, and −1 if stasis predominates (Gingerich, 1993(Gingerich, , 2009).
The LRI-framework has received major criticism, focused on its accuracy in estimating "intrinsic" rate of change (Hunt, 2012) and also on the interpretation of the slope (Sheets & Mitchell, 2001). Hunt (2006Hunt ( , 2012 shows that LRI estimates accurately the "intrinsic" rate of change only in case of directional change. Hunt (2006Hunt ( , 2012 provides some tools that complement the inference about evolutionary mode made by LRI, which are applicable to our present work. According to Hunt (2012), rate of evolution, like haldanes, is an estimator of the parameter μ s (mean of the generalized random walk, which models directional change); haldane's numerator is a divergence measure, with similar properties as ω (variance of his model of stasis).
In the Hunt's approach, it is expected that in a situation of directional change, log 10 (μ s ) has zero slope with log 10 (time), whereas log 10 (ω) has positive slope; in a situation of unbiased random walk, it is expected that log 10 (μ s ) has negative slope with log 10 (time) and log 10 (ω) has positive slope; in a stasis situation, it is expected that log 10 (μ s ) has negative slope with log 10 (time), whereas log 10 (ω) has zero slope. According to Hunt's method, the "intrinsic" rate of change should be estimated using the model of evolution which best fits the data, whereas the LRI model by definition assumes directional change.
In addition, Sheets and Mitchell (2001) argue that the LRI method has lower sensitivity to detect stasis or directional change from random walk than originally claimed, due to the large range of LRI rates produced by random walks. However, we disagree with this last statement because the expected value for the LRI slope in an unbiased random walk (−0.5) is a theoretical value (Bookstein, 1987) not a random variable, so that the test of hypothesis should be as be proposed by Gingerich, 1993 and not by Sheets and Mitchell. Despite the fact that the theoretical framework for the study of microevolution in quantitative traits was organized by Hendry and Kinnisson already in 1999, few empirical studies have used it, or taken into account its methodology. As a consequence, most reviews of microevolution of quantitative traits in plants have been constrained by the scarcity of reports of evolutionary rates (or the information needed to compute them, i.e., the raw data, or means, sample sizes and SD, or coefficients of variation-CV). Bone and Farres (2001) considered only 78 observations from 26 species (of which three were perennials) and mentioned the difficulty of drawing definitive conclusions on the basis of such a small number of studies. A subsequent review by Westley (2011) included a larger empirical database of taxa, but for plants it only included the data published by Bone and Farres (2001). Crispo et al. (2010) analyzed evolution of phenotypic plasticity in response to anthropogenic disturbance, bringing together 212 outcomes from 13 plant species. These authors take the reported evolutionary rates or compute them from the information mentioned above.
However, as we show below, an important proportion of the relevant literature does not allow these procedures. This scarcity of systematized data contrasts with the rapidly growing number of studies comparing changes in plant quantitative traits. Another issue in carrying out evolutionary syntheses, highlighted by Crispo et al., 2010, is the need to weight the outcomes according to the strength of their underlying sources of data and also to control pseudoreplication caused by considering several outcomes from the same work and species. We thus set out to devise a method that could take advantage of all these insights and bodies of data.
The first goal of this article is the development of a set of ES which give similar information as preexisting microevolutionary metrics, but allow the extraction of much more information from a systematic review of the literature. Our novel use of well-tested metaanalysis methods (Gurevitch, Curtis, & Jones, 2001) has three major advantages. First, the meta-analytical ES standardized mean difference (SMD) is mathematically equivalent to the haldane's numerator, and, within certain limits, SMD is a close approximation to Gingerich's (1993) haldane's formulation (with some specificities, see below), so that the information provided is similar to preexisting microevolutionary metrics. At the same time, the interconversion between different ES allows the extraction of information from data sources that are not directly comparable (e.g., descriptive statistics such as mean, SD, sample size, F-statistic, Pearson moment), so that more information may be incorporated than it was possible in previous reviews. Second, as in any meta-analysis, it works with a standardized variable (the ES), and to weighs each study by its precision, in the global analysis (1/variance, here variance is the square of the standard error of the ES, which combines sample size and (un)balance of the sampling, but not the variance of the variable). This makes it possible to compare different studies in a quantitatively manner and use them together in a single analysis, controlling for pseudoreplication (see "random" parameter from "rma.mv" function of the "metafor" R-package), testing hypotheses about categorical factors (moderators) and continuous explanatory variables (metaregression), and leading to less biased estimates (with associated confidence interval) than traditional analyses (Viechtbauer, 2010;www.metafor-project.org). Third, we derive specific size effects in order to introduce the LRI-framework and some elements of the Hunt's (2006Hunt's ( , 2012 methods into our synthesis.
Our second goal was to illustrate the potential of our metaanalysis approach to the study of microevolution, by applying it to a comprehensive literature survey. To this end, we produced the most comprehensive dataset to date of microevolutionary rates in plant quantitative traits.

as:
where H is the haldane rate, x is the original continuous variable, x 1 , x 2 are the sample means of the variable at times 1 and 2, SD x is the square root of the pooled variance of samples at times 1 and 2, and Δt is the number of generations elapsed between times 1 and 2.
The simplest ES based on mean difference is Cohen's d defined as the difference between sample means divided by the pooled sample standard deviation (i.e., d = (x 2 − x 1 )∕SD x ). The equivalence between the haldane's numerator and the SMD (Cohen's d) is now evident: However, Cohen's d is a biased estimator of the actual mean difference which bias increase with small samples (Hedges, 1981).
Therefore, it is better to use the SMD Hedges' g (Hedges, 1981), which is an unbiased estimator of the mean difference.
where m is the total sample size. When m is large enough (e.g., m = 20), the difference between g and d is negligible.
When Gingerich (1993) introduced his haldane rate for first time, he used ln-transformed variables. This is because, as the means for morphological traits (used in paleontology) get bigger, the variation also increases; instead, ln-transformed variables tend to be homocedastic. This has been the most traditional use of the rate, even when transformation is not always necessary. To distinguish the natural logtransformed definition of haldane from the nontransformed one, we use H for the latter and H g for the log-transformed variable: where v 1 ,v 2 are the means of the natural logarithm (ln) of the original variable (x) in times 1 and 2 and SD v is the standard deviation of the ln of the original variable.
However, ecological timescales (from 0 to 300 years or 1 to 10 generations) are very small in comparison with paleontological timescales (from 10² to 10⁷ or more generations and from thousands to hundred millions years; Gingerich, 1983Gingerich, , 2001. It is therefore reasonable to assume that in ecological timescales Δt tends to zero: And in discrete time intervals: It has been shown (Lewontin, 1966;Lynch, 1990;Wright, 1952) that: Therefore, the combination of equations 6-9, in ecological timescales, leads to: Equations 5-11 show that the shorter the elapsed time, the closer will be H and H g . This may be the reason why Hendry and Kinnison (1999) report that the analysis of rates with and without ln-transformed variables gives similar results. As showed above, H's numerator is the same as the Cohen's d; moreover, according to equation 11 we expect that H g 's numerator (which is computed from the ln of the original variable) and the SMD to give similar results in the elapsed times considered or under certain conditions of distribution of the variable developed in "SMD and H g 's numerator" section.

| Derived effects sizes
In order to incorporate meta-analysis in the study of microevolutionary rates, it is necessary to have suitable effects sizes, which may be derived from the SMD as explained below. Each ES (as magnitude) is a variable made up by all the outcomes obtained from the literature.
Each outcome (as measurement) is an estimation of a realized ES and a measure of the precision of such estimation (its variance). In order to perform a meta-analysis, we need both the estimators of the amount (and rate) of change and the variances of such estimators.
Because in the study of contemporary evolution the absolute value of the change is often more important than its direction, several authors have analyzed the former (Bone & Farres, 2001;Hendry & Kinnison, 1999;Kinnison & Hendry, 2001). However, SMD is distributed as a noncentral t-variate (Hedges, 1981). Then, the absolute value of such distribution is a folded noncentral t-variate. But also, Hedges (1981) explains that the distribution of the unbiased SMD estimator can be approximated well by a normal distribution when the number of observations or the number of studies (which is our case) is large. As we show below, in our database the log transformation of the unbiased SMD approximates well to a normal distribution. Therefore, it is reasonable to assume that meta-analysis assumptions are being met with use of the log 10 (|g|) (LG hereafter). Here we propose a new ES, called "LG," and its variance "V LG ," calculated according to the propagation errors theory (Bevington & Robinson, 2003): We constructed an evolutionary rate which is the ratio between Hedges' g and the time elapsed since the environmental novelty appeared; time may be measured as years or generations. We call this ES "h," bearing in mind that if time is expressed in years, it must not be interpreted as haldanes, except in the case of annual species. We call its variance "V h " which may be calculated as the variance of the product of a variable (g) by a constant (1/Δt) (if we use "years" as time; Bevington & Robinson, 2003): If we use "generations" as time, and there is an associated uncertainty which allows the estimation of the variance of "Δt," then "V h " is (Bevington & Robinson, 2003): Due to the relation between evolutionary rates and elapsed time, already documented by other authors (Gingerich, 1993(Gingerich, , 2009Hendry & Kinnison, 1999), there is a need for a new ES appropriate to the LRI and Hunt framework (Gingerich, 1993(Gingerich, , 2009). Here we propose a new ES, called "LR," and its variance "V LR ," calculated according to the propagation errors theory (Bevington & Robinson, 2003): Here, the concern about the distribution of the proposed ES is also valid. As in the case of the amount of change (g) and in the case of rates of change (h) the absolute value is more important that the direction. Moreover in order to apply LRI and Hunt's approach, we need to log transform such variables, which is only possible in the case of positive values. As we show below, the distribution of LR is also close to a normal distribution.

| Inclusion/exclusion criteria
The first survey retrieved 245 studies from 1972 to 2015, and the second showed 22 studies from 2001 to 2011. We also reviewed papers cited by the 267 studies above. We were interested in quantitative trait microevolution processes occurring in "ecological timescales"; therefore, we considered studies that measure intraspecific change in a quantitative trait and report the time elapsed from the onset of the environmental novelty or refer to a historical or biological event reported in other sources (e.g., a mine opening, a welldocumented biological invasion). We included studies which reported situations where the elapsed time between the environmental change and the sampling was no greater than 300 years. The included studies followed a single population through time since a change in the environment, or compared two or more populations, diverging from an originally single population, of vascular plants by measuring a quantitative trait across two situations, where one of them was a new condition of known age (i.e., allochronic and synchronic designs according to Hendry & Kinnison, 1999

| Calculating ES
When raw data were available (mostly from figures), we computed SMD as in equations 2, 3, and 18; 196 ES (11% of the total) were computed in this way. Also, from these data, we computed H g 's numerator in 140 cases (some variables are not amenable to log transformation).
When available, we used means, standard deviations (or standard error or variance), and sample size to compute the unbiased standardized mean difference (Hedges' g), as detailed above, and its variance (V g ) as follows (Nakagawa & Cuthill, 2007): Out of a total of 1,711 ES, 1,034 (60%) were computed in this way.
When these parameters were not available, we used the correlation or determination coefficient (r or R², or bivariate raw data mostly from figures) and sample size to compute Cohen's d and its variance, as shown below (DeCoster, 2004;Nakagawa & Cuthill, 2007). A total of 248 ES (14.5%) were computed in this way: where n is the sample size, R² is the determination coefficient of a linear model, k is the number of parameters of the model (not including the intercept, if it is the case of a simple regression k will be 1), and R 2 adj is the adjusted R². Cohen's d as follows (DeCoster, 2004): where dfD are the degrees of freedom of the denominator in the calculation of the statistic. Finally, when no other information was available, we used 2 × 2 tables of frequency to calculate Cohen's d from odds ratio (OdR) as follows: where ln(OdR) is the natural logarithm of the odds ratio; n is the absolute frequency specified in the subindex; "1T": positives in the treatment; "0T": negatives in the treatment; "1C": positives in the control; "0C": negatives in the control. Five ES (0.3%) were computed in this.
The odds ratio appears in some studies where authors measured a binomial variable (e.g., germination, presence of roots, survival) in an experiment with plants coming from two sources; this generated the 2 × 2 contingency table. For example, Wu and Bradshaw (1972) measured tolerance to copper in plant populations near an industrial source and populations from noncontaminated sites. Tillers from each site were suspended in a copper or control solution; the response variable was the subsequent development (yes/no) of roots. This provided a metric of the copper tolerance of each population.
From Cohen's d, we computed Hedges' g (equation 3), and finally, Hedges' g was transformed to absolute value, because the direction of change does not concern us, only its amount. Then, from the Hedges' g, we calculated the derived ES LG, h, and LR (equations 13, 14, 17).
When necessary, data were extracted from figures using Engauge Digitizer 4.1 software (Mitchell, 2002). In order to explore possible bias introduced by these different sources of information, we analyzed LG moderated by source type in a mixed model with random factor paperID (see Table 1 for moderator details).
Trim and fill procedure (Duval & Tweedie, 2000) for exploring the publication bias of the database was not applicable. This is because of the procedure assumes a symmetrical distribution. However the expected distribution of a random collection of amounts and rates of change is a normal distribution with mean equal to zero; therefore, its absolute value is a folded normal distribution (Tsagris, Beneki, & Hassani, 2014). As consequence, the log transformation of such folded normal distribution is left-skewed (not symmetrical).

| SMD and H g 's numerator
To determine whether SMD is a good estimator of the H g numerator, and under which conditions, we use empirical data from the literature F I G U R E 1 Flow diagram of the literature survey and data extraction. On left side, the sequence of steps in the selection of studies. On the right (grey) side, the type of data source found in papers and steps toward the ES and H g 's numerator calculation. Numbers in boxes indicate number of outcomes and number of studies, respectively. Number of papers on the right side is not additive because each paper may have different source types. On the arrows, numbers in brackets indicate the equation numbers used in the calculation; the numbers after the brackets, when present, indicate the number of outcomes and studies, respectively survey and simulated data. With empirical data, we evaluated the similarity of the two metrics (H g 's numerator and SMD). According to Lynch (1990), H g 's numerator can be estimated using the equation 24, if the CVs are <0.3: where x 1 is mean of population 1, x 2 is mean of population 2, CV x 2 is the variation coefficient of population 2, CV x 1 is the variation coefficient of population 1, and CV x is the variation coefficient of total population calculated from pooled variance and global mean.
In this way, for the cases in which raw data were not available, we computed H g 's numerator, whenever possible, by using expression (24) and compared it, by correlation and linear regression, with the SMDs (Cohen's d and Hedges' g) computed from the same cases.
It is not possible to know, from the analysis of these empirical data, how some aspects of the distribution of the original variable affected the correlation between H g 's numerator and SMD. These aspects include the relation between mean and SD, specifically whether or not the CV holds constant when mean and SD vary. We therefore generated two sets of simulated data with different mean-SD relations. We compared random pairs of samples. For each comparison, H g 's numerator and SMD were computed, and the correspondence between them was analyzed by simple correlation (Figure 3). The samples to be compared had 10 elements each, from a random normal distribution. In the first set of simulated data, the SD of each sample was set to be proportional to the sample mean, so that the expected CV is held constant. In the second set of simulations, the SD of each sample was set to be independent from its mean and came from a random normal distribution. In both sets of simulations, we covered the range of maximum CV between 3 and 0.01. Each set of simulations included 1,000 iterations. In each of these 1,000 iterations, 10,000 comparisons between samples were performed. Because the calculation of H g 's numerator requires a positive values variable, we set up the simulations so that negative values were rare (no negative values were found in the fist set of simulated data, and only 5.5% of the total in the second one); when negative values did appear, we substituted them by 0.001. Gingerich (2009) showed the use of LRI by mean of a series of simulations. To test the utility of the ES LR in analyzing microevolutionary rates according to the LRI-framework, we performed a simulation developed by Gingerich (2009). The simulation reproduced here is a random walk temporal series. In the simulation, we started with a sample of 30 elements with mean = 100 and variance = 1. In each generation (t), until t = 200, we sampled 30 elements from a mean equal to mean in the previous time (t−1) plus a random deviation (from a normal distribution, mean = 0 and SD = 1.25) and variance = 1. We then established all the possible comparisons from each series and computed H g and LR. Then we constructed a linear regression of log 10 (H g ) as a function of log 10 (time) and a metaregression of LR as a function of log 10 (time). To do this, we used the function "rma.uni" from the "metafor" R-package to perform a random effect model, method REML, weighted by the inverse of variance. Due to limited computing capacity, the metaregression was performed with a random subset of 5,000

| LRI-framework, the Hunt approach, and ES
comparisons (from a total of 20,100 comparisons). The simple linear regression was performed with the same subset than the metaregression to compare the two methods. This procedure was performed to show that metaregression is equivalent to the LRI if all samples have similar weights (due to equal sample sizes). From data of the same simulation, we computed LG, and from the same subset we performed a metaregression of LG as a function of log 10 (time) to illustrate the applicability of the LG ES in the graphical tool described by Hunt.
All simulations and statistical analyses were made in R (R Core Team, 2015). Scripts are provided as supplementary material.

| Summary of results of the literature survey
Through a systematic search and application of meta-analysis techniques, we were able to obtain 1,711 ES, many more times than what

| SMD and H g 's numerator
We had raw data for 196 cases, from which 140 allow to calculate H g 's numerator. Additionally, we had enough information to apply equation 24 to 1,034 cases (out of 1,711). Of these, only 436 met the requirement (CV < 0.3) to apply such equation (Lynch, 1990;Hendry & (24) Kinisson 1999). All these 576 cases were used to test the correspondence between H g 's numerator and SMD with real data. This dataset covers a CV range from 0.0147 to 1.79. The correlations between H g 's numerator and SMDs showed values near 0.98 and were highly significant in both cases (H g *t vs. Hedges' g: r = 0.9797, p < 0.0001; H g *t vs. Cohen's d: r = 0.9825, p < 0.0001). The linear regression showed a tight fit between H g 's numerator and SMDs (Table 2). Hedges' g slightly underestimated the amount of change, whereas Cohen's d overestimated it (Table 2). We also found that the relative difference between SMD and H g 's numerator (i.e., (H g 's numerator -SMD)/H g 's numerator) increases with the CV of the original variable ( Figure   S6). Based on our data, we were unable to test the correspondence between SMD, computed from other ES (such as OdsRatio or r), and H g 's numerator, but we used equivalences previously tested in the literature (DeCoster, 2004).
Using a simulated dataset, we found that SMD was always significantly and positively correlated with H g 's numerator, the correlation coefficient rising when CVs in both samples decrease. The Pearson's correlation coefficient is, on average, >0.8 through the whole range of maximum CV (0.01-3; Figure 3).

| LRI-framework, Hunt approach, and ES
In order to compare LR with the LRI-framework and illustrate the applicability of a tool derived from the Hunt's methods (Hunt, 2006(Hunt, , 2012, we used a random walk temporal series. Figure 4 shows the results from a particular random series, which is close to the average expected for this kind of series according to Gingerich (2009;fractal dimension D = 1.5), and in which we analyzed the linear regression for the log 10 (H g ) and the metaregression for LR and LG ES. Similar (but significantly different) slopes were estimated by LRI and the metaregression (LR), whereas the intercept of the LRI was lower (Table 3).
As predicted by Hunt, in an unbiased random walk, the slope of the rate of change (LR) with elapsed time was negative, whereas the slope of the amount of change (LG) with elapsed time was positive (Table 3).

| DISCUSSION
Patterns of phenotypic evolution in quantitative traits provide useful insight into evolutionary processes. A proper estimation of these patterns requires as much as possible empirical data, expressed in F I G U R E 2 Forest plot of the overall LG ES for each data source type. Segments represent the 95% confidence interval of the overall LG ES for each source type, and the central symbol in each segment represents the mean. Symbol sizes are proportional to the number of outcomes summarized by each source type. On the left, the name of each source type according to  (Hunt, 2006(Hunt, , 2012. In both cases, our two available measures (LG and LR) do not discount the sample error but account for sampling error by weighing the observation according to its precision. These ES contain similar information to that of previous metrics, with the added advantage of being able to use more diverse data (through the application of meta-analysis), and taking into account methodological concerns, pseudoreplication and "representativity" of the studies issues (Crispo et al., 2010) about this kind of synthesis. All these features allow the application of preexisting conceptual frameworks (Gingerich, 1983;Hendry & Kinnison, 1999;Hunt, 2006Hunt, , 2012 to the analysis of a much broader database. As shown in Table 2  with the mean (CV remains relative constant), in order to reduce heteroscedasticity (Hendry & Kinnison, 1999). Otherwise, the use of raw variable is preferred. Only when CV remains constant, the use of SMDs without transforming the data causes an underestimation of the amount of change. This is because, if CV remains constant, the pooled variance of the raw data is larger than the pooled variance of the ln-transformed variable, so that the amount of change relative to SD is smaller.
Considering that the analysis of the random walk temporal series by the LR ES and LRI-framework showed consistent result (Table 3 and Figure 4b), these methods could be both considered valid alternatives. However, metaregression has the advantage of weighting each observation (for estimator variance or sample size) in order to obtain a better parameter estimation. In the current simulation, all samples have the same size, so all comparisons have similar weight in the metaregression and, as expected, the results are similar. This is because we performed the simulation to show the similarity between the two methods. However, in the analysis of real data there will be different sample sizes and variances, and therefore, this weighting becomes relevant for better estimation of parameters and more precise confidence intervals. The possibility of incorporating other moderator variables which can explain the heterogeneity of the data is also an advantage.
These strengths of the meta-analysis become especially relevant when T A B L E 3 Application of the LRI-framework by standard linear model or by metaregression with the dataset of the random path shown in Figure 4a and metaregression of the LG EF Y = β 0 + β 1 *log 10 (time) + Er