Consequences of measurement error in qPCR telomere data: A simulation study

The qPCR method provides an inexpensive, rapid method for estimating relative telomere length across a set of biological samples. Like all laboratory methods, it involves some degree of measurement error. The estimation of relative telomere length is done subjecting the actual measurements made (the Cq values for telomere and a control gene) to non-linear transformations and combining them into a ratio (the TS ratio). Here, we use computer simulations, supported by mathematical analysis, to explore how errors in measurement affect qPCR estimates of relative telomere length, both in cross-sectional and longitudinal data. We show that errors introduced at the level of Cq values are magnified when the TS ratio is calculated. If the errors at the Cq level are normally distributed and independent of true telomere length, those in the TS ratio are positively skewed and proportional to true telomere length. The repeatability of the TS ratio declines sharply with increasing error in measurement of the Cq values for telomere and/or control gene. In simulated longitudinal data, measurement error alone can produce a pattern of low correlation between successive measures of relative telomere length, coupled with a strong negative dependency of the rate of change on initial relative telomere length. Our results illustrate the importance of reducing measurement error: a small increase in error in Cq values can have large consequences for the power and interpretability of qPCR estimates of relative telomere length. The findings also illustrate the importance of characterising the measurement error in each dataset—coefficients of variation are generally unhelpful, and researchers should report standard deviations of Cq values and/or repeatabilities of TS ratios—and allowing for the known effects of measurement error when interpreting patterns of TS ratio change over time.


Supporting Information for Nettle et al. 'Consequences of measurement error in qPCR telomere data: A simulation study' Section 1: Analytical treatment of measurement error in the TS ratio
This section examines the TS ratio under measurement error, providing analytical results to support simulation findings.
As specified in the main paper, the ideal (i.e. error-free) Cq values for the telomere assay and singlecopy gene relate to the amounts of each kind of DNA present in the sample as given in (1) and (2).
Here, f denotes a constant set by the chosen fluorescence threshold. The amount of telomeric DNA present is proportional to the amount of single-copy gene DNA present, but scaled by the relative telomere length of the individual.
Reference Cq values for a standard sample are typically subtracted from the Cqs for the single-copy gene and telomeric assay when calculating TS ratios. The effect of this is simply to rescale the TS ratio; such rescaling can be ignored in what follows without loss of generality, and hence for clarity we do not include this step here (though see main paper for the TS formula with these reference values included).
By substituting into (7) and (8) and rearranging, we have: Thus, (9) gives us Result 1: The TS ratio, if measured without error, is proportional to the relative telomere length in the sample.
For the measured TS ratio where there is measurement error, we have: From (10), we have Result 2: The measured TS ratio is proportional to relative telomere length multiplied by 2 ( − ) , or two to the power of the difference between the measurement errors in the two Cq values.
The error in the measured TS ratio (henceforth ) is the difference between the measured TS ratio, , and the ideal or error-free TS ratio, . From (9) and (10): By inspection of (11), we have Result 3: The error in the TS ratio is proportional to telomere length. This is true even though the errors in the Cq values were assumed to be independent of the amounts of telomere and single-copy DNA in the samples.
If ~ (0, ) and ~ (0, ), from properties of the normal distribution: Here, is the correlation between and . Hence, the distribution of is the distribution of: From (12), we can make the following inferences for the case where the measurement errors in the Cq values are normally distributed: • Result 4: Positive correlations between and reduce the size of measurement errors in the TS ratio. From (12), given that 2 is positive, increasing will always reduce the size of � 2 + 2 − 2 , and hence the standard deviation of . • Result 5: Perfect positive correlation between the measurement errors of the Cq for telomere and the Cq for the single-copy gene eliminates measurement error in the TS ratio entirely, as long as the extent of measurement error is the same for the two reactions.
If we can assume that telomere length itself is normally distributed, then we can see from (12)  is log-normal). Thus, the distribution of belongs to the class of normal-log-normal mixture distributions. Such distributions are typically skewed and leptokurtic (Yang 2008).

Section 2: Simulation results with correlations between errors
Simulation results reported in the main paper assume that the error in the telomere Cq and the error in the single-copy gene Cq are uncorrelated; that is, in the notation of section 1, = 0. We Repeating other simulated results with positive values of produces similar conclusions: increasing attenuates the impact of error in measuring Cqs on the TS ratio, but the effect is slight until is close to 1.

Section 3: How to use the simulation R code
We define a series of R functions, contained in the script 'simulation.functions.r', that return datasets with requested properties containing both the true values of the quantities (Cqs, TS, etc.), and their post-error measured values. This allows the user to determine the differences between true and measured values, and perform other analyses. All simulation parameter values are userspecifiable. The script 'paper.results.r' reproduces all the figures and simulation results from the main paper.
Datasets consist of observations from n individuals. The steps common to all of the simulation functions are as follows: • A vector of n true single-copy gene abundances, true.dna.scg is defined, drawn from a normal distribution with mean b and standard deviation var.sample.size (b is a constant). • A vector of n relative telomere lengths, true.telo.var is defined, drawn from a normal distribution with mean 1 and standard deviation telomere.var. • Hence, the true abundance of the telomere sequence is defined, as a*true.dna.scg*true.telo.var. Here, a is a scaling constant representing how many copies of the telomeric sequence there are per single-copy gene in the average sample. • Ideal Cq values for both reactions are defined as f -log2 (true.dna.scg) and f -log2(true.dna.telo), where f is a constant representing the chosen fluorescence threshold. • Measurement errors in the Cqs are generated from a normal distribution with mean 0; standard deviations given by error.scg and error.telo; and a correlation between error.scg and error.telo given by error.cor. • Hence, measured Cqs are generated, which can be compared to the ideal Cq values.
• TS ratios are calculated both on the measured Cqs, and the ideal ones.
The following functions are available. Specify desired parameter values in the parenthesis, e.g. generate.one.dataset (n=10000, error.telo=0.1, error.scg=0.1, error.cor=0). Default values in the simulation functions are generally those given in table 1 of the main paper.
• generate.one.dataset() returns a simple dataset (one telomere measurement per individual) for chosen values of all the variables described in section 1. As well as ideal and measured Cqs, it returns ideal and measured TS ratios. It also returns the difference between the ideal and measured TS ratio, calculated two ways, computed (error.computed), and using equation (11) of online supplement 1 (error.analytic). Both methods produce the same number. This was included as an additional check of correctness of the simulation. • generate.repeated.measure() returns a dataset where telomere lengths from the same individuals are measured twice, via two independent biological samples, and the true telomere length of each individual is assumed not to have changed at all. The data frame it returns is as for generate.one.dataset(), except that there are two of each variable (e.g. true.ts. 1, true.ts.2, measured.ts.1, measured.ts.2, etc.). • calculate.repeatability() calculates the repeatability of the measured TS ratio (intra-class correlation coefficient) when generate.repeated.measure() is implemented using the given values for all the parameters. It requires prior installation of R package 'irr'. • compare.repeatability() returns the repeatability of the TS ratio and the repeatability calculated on the raw Cqs for the telomere reaction, for the given parameter values.