Repeated Measures Regression in Laboratory, Clinical and Environmental Research: Common Misconceptions in the Matter of Different Within- and Between-Subject Slopes

When using repeated measures linear regression models to make causal inference in laboratory, clinical and environmental research, it is typically assumed that the within-subject association of differences (or changes) in predictor variable values across replicates is the same as the between-subject association of differences in those predictor variable values. However, this is often false. For example, with body weight as the predictor variable and blood cholesterol (which increases with higher body fat) as the outcome: (i) a 10-lb. weight increase in the same adult affects more greatly an increase in cholesterol in that adult than does (ii) one adult weighing 10 lbs. more than a second indicate higher cholesterol in the heavier adult. A 10-lb. weight gain in the first adult more likely reflects a build-up of body fat in that person, while a second person being 10 lbs. heavier than the first could be influenced by other factors, such as the second person being taller. Hence, to make causal inferences, different within- and between-subject slopes should be separately modeled. A related misconception commonly made using generalized estimation equations (GEE) and mixed models on repeated measures (i.e., for fitting cross-sectional regression) is that the working correlation structure only influences variance of the parameter estimates. However, only independence working correlation guarantees that the modeled parameters have interpretability. We illustrate this with an example where changing the working correlation from independence to equicorrelation qualitatively biases parameters of GEE models and show that this happens because within- and between-subject slopes for the outcomes regressed on the predictor variables differ. We then systematically describe several common mechanisms that cause within- and between-subject slopes to differ: change effects, lag/reverse-lag and spillover causality, shared within-subject measurement bias or confounding, and predictor variable measurement error. The misconceptions we describe should be better publicized. Repeated measures analyses should compare within- and between-subject slopes of predictors and when they do differ, investigate the causal reasons for this.


Introduction
We focus on two common misconceptions that are made in research while fitting repeated measures regression with generalized estimating equations (GEE) and mixed models (MM). Misconception-A: The association between the predictor variable and the outcome across different measures from the same subject (within-subject) is the same as the association of that variable with the outcome between measures from different subjects (between-subject). In fact, these associations often differ, which should be considered when making causal inference. For example, consider weight as the predictor and cholesterol the outcome given the well-known association of higher serum cholesterol and with greater body fat: (i) a 10 lb. increase in the same adult more likely indicates greater difference in serum cholesterol than does (ii) one adult being 10 lbs heavier than a second adult. A 10-pound weight gain in the same adult more likely reflects a build-up of body fat in that person, while the first adult being 10 pounds heavier than the second could be influenced by other factors such as the first adult being taller than the second. Misconception-B: The working correlation structure used in GEE and MM models is only a nuisance factor that impacts precision of model parameter estimates. As illustrated and explained in the next Section (and Table 1), the wrong choice for working correlation structure biases parameter estimates.
Both of these misconceptions are related, but the analytical details are complicated. To explore this further, Section 2 begins with an illustration of Misconception-B in real data. Section 2 also explains how this relates to Misconception-A and why independence working correlation must be used for creation of predictive models using "cross-sectional regression" on repeated measures. Then, Section 3 details how separation into within-and between-subject associations is needed for using repeated measures regression to makes causal inference. Section 4 describes epidemiological mechanisms that can cause within-and between-subject slopes to differ. Section 6 summarizes and explores further implications for statistical practice in applied research.

Cross-Sectional and Between/Within-Subject Linear Models with Repeated Measures
We begin here with some notation. Consider repeated measures on n subjects denoted by i = 1,2, . . . , n. The "subjects" can either be persons with longitudinal repeated measures, or, as is common in environmental epidemiology, can be cities, schools, neighborhoods, census tracks, hospitals, etc. Each subject has J i different observations enumerated by j = 1, . . . , J i . For example, these J i different observations could be taken at times t i1 < t i2 < . . . < t iJi , on the same person when the "subject" is a person or from J i different persons living in the same neighborhood when the "subject" is a neighborhood. For J i constant across i, (i.e., always the same number of repeat measures for a subject), we drop the "i" subscript and denote J. Let us consider that the observations have continuous outcomes Y ij and K predictor (or exposure) variables X ij = X 1,ij , X 2,ij , . . . , X K,ij . When K = 1, we drop the "K" enumeration, using X ij for the only predictor. Linear regression models for E[Y ij | X ij ] or E[Y ij |X ij ] are fit in the analyses described here. However, the overall conclusions we obtain on these linear regression models can be generalized to discrete outcomes (i.e., logistic regression) and survival analyses.
2A Cross-Sectional (CS) Regression. The most commonly fitted linear regression model on repeated measures does not separate within-and between-subject associations and is usually written out as Y ij = α + β 1 X 1,ij + β 2 X 2,ij + . . . + β K X K,ij + ε ij . This is denoted as "cross-sectional (CS) regression" particularly for longitudinal repeated measures. We add a subscripted "CS" to the β's to distinguish these slopes from between-subject (BS) and within-subject (WS) slopes defined in Section 2B. The CS regression model here is thus denoted as Y ij = β CS X ij = α CS + β 1,CS X 1,ij + β 2,CS X 2,ij + . . . + B K,CS X K,ij + ε ij (1) where α CS , β 1,CS , β 2,CS , . . . , B K,CS are parameters (fixed effects), while ε ij is error with E[ε ij ] = 0 that is independent between different subjects i and i', but may be correlated for j = j' within the same subject. It should be noted that the intercept is fixed at the same α CS for each subject. Should the actual intercepts differ between subjects (i.e., be α CS,i ) as random intercepts, then for both MM and GEE, the difference α CS,i − α CS is incorporated into the error term ε ij of (1) and the within-subject correlation of that error [1]. Using (vs. not using) random intercepts does not influence the point estimates of α CS , β 1,CS , β 2,CS , . . . , B K,CS or the variance of these estimates for mixed models [1]. However, for GEE using a different intercept on each subject (with each intercept now adding a new parameter) creates too many parameters for the asymptotic properties of GEE model to hold in our examples (and in general) which destabilizes parameter estimates [2]. Again for K = 1, the subscript for K is dropped and the model is Y ij = α CS + β CS X ij + ε ij . The main goal of CS regression is to first obtain estimatesβ CS for β CS and then inputβ CS into (1) in order to estimate future unobserved Y's from observed X ij 's as β CS X ij . Cross-sectional regression is also used to make adjusted (causal) inference on the covariate associations inβ CS , but, as we show later, doing this may be problematic. Table 1 presents parameter estimates from repeated measure cross-sectional regression (1) to a clinical measure of glomerular filtration rate (EGFR) from the Modification of Diet in Renal Disease Study (MDRD) [3]. Formula (1) with EGFR as the outcome Y and three predictor variables (X 1 , X 2 , X 3 ) = (HIV infection, serum albumin, blood urea nitrogen (BUN)) was fit to 10,782 semi-annual measures of 584 women at the Bronx-site of the Women's Interagency HIV Study (WIHS) [4]. Higher EGFR values indicate better renal function. The models assume that the within-and between-subject associations of the predictor variables are the same. We later show this assumption is incorrect. The parameter estimates of Table 1 were calculated using GEE [1] with both independence (GEE-IND) in columns 2-4 and equicorrelation (GEE-E) columns 5-7 for the working correlation structure of model residuals from repeated measures in the same person. We again note that this model (1) has a fixed intercept across all subjects with the error term being independent between different subjects. However, otherwise in Table 1 (and elsewhere in the paper) the within subject correlation structure of the error is allowed to be either (i) independent within the same subject (GEE-IND) or (ii) to have the same correlation for all outcomes within the same subject (GEE-E). The second condition (i.e., equicorrelation) is equivalent to fitting a random subject intercept model [1].
Most of today's literature providing guidance on fitting repeated measures linear regression (i.e., [5][6][7][8][9][10][11][12][13]) qualitatively describes working correlation as a "nuisance factor" that does not alter model parameters and states that "the working correlation that minimizes variance of parameter estimates should be chosen". However, in Table 1, the parameter estimates for BUN (per g/dL), from GEE-E, of −1.22; 95% confidence interval (CI) (−1.46, −0.99) is both qualitatively and statistically higher than the corresponding GEE-IND estimate of −1.87; 95% CI (−2.12, −1.62). For HIV, the parameter estimates of −3.86, p = 0.0081 from GEE-E is qualitatively lower than that from GEE-IND −2.04 and p = 0.19. Clearly, changing the working correlation from independence to equicorrelation qualitatively and statistically changes the parameter estimates. Thus, this correlation structure is not a nuisance factor.
When faced with such a dilemma of qualitatively and statistically different parameter estimates from the same model fit to the same data with only the working correlation structure changed (as is shown in Table 1), investigators typically go to published guidance on which correlation structure to use. To that end, based on the within-subject correlation of residuals being 0.45 in GEE-E (and in MM-E), and the quasi-likelihood independence criteria goodness of fit statistic (QIC) = 10,836.27 for GEE-E being smaller than the QIC = 10,847.14 for GEE-IND (or the Akaike information criteria goodness of fit statistic (AIC) from a mixed model using equicorrelation (MM-E) of (AIC = 94,934.5) being smaller than AIC = 99,374.5 from a mixed model using independence (MM-IND) as shown in Table A1 in Appendix A), almost all articles providing model fitting guidance [5][6][7][8][9][10][11][12][13] point towards using equicorrelation as the working correlation structure. However, as the rest of Section 2 describes in detail, this guidance is problematic as only the parameter estimates obtained by using independence working correlation can have any meaning for cross-sectional regression.
But first we make two brief asides. First, we note that if MM, rather than GEE are used for Table 1, the corresponding parameter point estimates in Table 1 using independence correlation (MM-IND) and equicorrelation (MM-E) are essentially unchanged [1]. (See Appendix A for details on parameter estimates from MM fit to this data with independence and equicorrelation correlations structures). However, due to non-robustness of MM, GEE is preferable for this specific example. Second, we note that the differences observed in Table 1 occur not only between independence and equicorrelation. Any different choice of correlation structure, such as AR(1), Toeplitz, unstructured, etc. will result in different parameter estimates (results not shown). For simplicity, we focus this article on only two structures: independence and equicorrelation. 2B Between-/Within-Subject Slope (BS/WS) Regression. While investigators almost never consider this in practice, it has long been noted that slopes on changes of X ij within the same subject i differ from cross-sectional slopes on between subject-measure differences in X ij [14][15][16][17]. To illustrate this, consider the cross-sectional model of a laboratory measure cholesterol (Y ij ), which is well known to be higher in people with more body fat. To that end, the predictor is body weight (X ij ) with E[Y ij ] = α CS + β CS X ij . As described in the Introduction, the cross-sectional slope β CS for association of a 10 lbs. weight difference between two different adults for cholesterol is less than the slope for association of a 10 lbs. within-subject weight change for the same adult on cholesterol, which we denote as β WS . Again, the reason β CS is less than β WS is that: (i) a 10 lb. cross-sectional weight difference between two adults often reflects greater height in one of the persons, but (ii) a 10 lbs. weight increase in the same adult is not influenced by height difference and thus is more likely due to more body fat after the 10-lbs weight gain. Thus, since greater body fat is what is directly associated with more cholesterol, the within-person association of a 10-lb. weight increase with cholesterol is greater than the cross-sectional repeated measures association with a 10-lb. weight difference between two persons.
Common within-person height creates a shared within-subject measurement bias from this extraneous factor for subject i (denoted E i ) on weight as a predictor of cholesterol. To that end, many investigators adjust weight for height using body mass index = wt/ht 2 to remove this effect of height on weight. As Figure 1a illustrates, if TX ij = body mass index (wt/ht 2 ) were the true predictor of Y ij , and H i = height (which does not change with j in the same i), then X ij = TX ij * (H i ) 2 contains this shared within-subject measurement bias from common H i which again we denote as E i in Table 1a to confer it is an extraneous within-subject bias. Section 4 describes more settings where β WS = β CS . While for weight it is possible to remove the common shared within-subject bias from height by dividing by ht 2 , this is not the case for less well-understood causal relationships. Therefore, to model and account for a bias such as this, linear regression models fit for making causal inference can decompose the associations into "within-subject" slopes ( ), described above, and "betweensubject" slopes ( ), described below, which capture associations of subjects' central tendencies of the exposure. To do this, subject means of the predictor variables are calculated, where . Then Yij is modeled as a combination of "between-subject" slopes from (that could be influenced by the common person measurement bias in Figure 1) and "within-subject" slopes from deviations of Xk,i j about which will be free of such a bias, since the comparison is within person. ( As described for (1), this is a fixed intercept model that is functionally equivalent to a random intercept model for MM. When K = 1, we have . To illustrate this for our earlier example with Yij = cholesterol and Xij = weight, let , βBS = 0.9 and βWS = 3, Now we make some technical asides. First, the choice of the observed as the "central tendency" of Xk,ij for subject i is necessary as a person's "true average weight" over the entire While for weight it is possible to remove the common shared within-subject bias from height by dividing by ht 2 , this is not the case for less well-understood causal relationships. Therefore, to model and account for a bias such as this, linear regression models fit for making causal inference can decompose the associations into "within-subject" slopes (β WS ), described above, and "between-subject" slopes (β BS ), described below, which capture associations of subjects' central tendencies of the exposure.
To do this, subject means of the predictor variables x i = x 1,i , x 2,i , . . . , x K,i are calculated, where x k,i = ∑ Ji j=i X k,ii /J i . Then Y ij is modeled as a combination of "between-subject" slopes from x k,i (that could be influenced by the common person measurement bias in Figure 1) and "within-subject" slopes from deviations of X k,ij about x k,i which will be free of such a bias, since the comparison is within person.
As described for (1), this is a fixed intercept model that is functionally equivalent to a random intercept model for MM. When K = 1, we have To illustrate this for our earlier example with Y ij = cholesterol and X ij = weight, let α BS/WS = 30, β BS = 0.9 and Now we make some technical asides. First, the choice of the observed x k,i as the "central tendency" of X k,ij for subject i is necessary as µ k,i a person's "true average weight" over the entire time period is unknown, but for J i large enough, x k,i should be close to µ k,i . Thus, while β k,WS only captures association with within-subject change in X k,ij , β k,BS inherently contains some β k,WS from deviation of (x k,i − µ k,i ); especially for small J i . This situation is described for occupational epidemiology research, where often an average of personal exposure measurements is computed as estimate of true exposure of a "subject", defined as either an individual, or group of individuals that share a job [18]. Second, the implicit assumption that β k,WS is well defined may also not always be true. For example, "β k,WS " could differ by time separation t ij -ti j' . Perhaps for k = weight, a weight gain of 10 lbs. in one month creates a shock that hyper-elevates cholesterol, but a 10 lbs. weight gain over 12 months does not, in which case β k,WS (t ij − t ij ) = 1 > β k,WS (t ij − t ij ) = 12 . Third, if the investigator is only interested in the within-subject slopes he/she can substitute as a fixed effect a different subject intercept α WS,i for the between-subject slopes in (2) with the model reducing to Despite these technical caveats, the within-vs. between-subject decomposition in (2) is used to test whether β k.BS = β k.WS so that, as shown in Section 2C, they also equal β k.CS and thus the separated WS vs. BS decomposition can be collapsed to (1). Due to the orthogonal decomposition of X k,ij about x k,i this previous test for collapsing the within-vs. between-subject decomposition is a two-sample z-test of parameter estimates from fitted models comparing β k,BS −β k,WS / Var(β k,BS ) + Var(β k,WS ) to Z 1-α/2 [17]. The within-vs. between-subject decomposition is mostly used for inference on adjusted (causal) associations of the X k,ij 's on Y ij 's. It is typically not used to produce models to estimate future unknown Y ij from known X ij as such estimation often only happens in settings where just one observation per subject is available, hence X k,ij ≡ x k,i .
We refit the analyses of Table 1 to illustrate that the impact of choice of correlation structure (i.e., GEE-IND vs. GEE-E working correlation structure) is eliminated in our example after making a within-vs. between-subject decomposition. Please note that there were no new HIV infections after study entry; so X H IV,ij ≡ x H IV,i meaning that the within-subject association of change of HIV infection status cannot be modeled. For within-subject associations of BUN and albumin, GEE-IND and GEE-E gave identical point estimates, because centering about x k,i makes comparisons entirely within-subject and invariant to these correlation structure choices (although within-subject estimates could differ slightly if autoregressive (AR (1)) or other formulations for intra-subject correlation of residuals had been used). There were small GEE-IND vs. GEE-E differences on the between-subject slopes as was observed elsewhere [19]. For example, the point estimate for between-subject HIV status is −1.16; 95% CI (−4.21, 1.88) in the GEE-IND of Table 2 versus −1.57; 95% CI (−4.47, 1.33) with GEE-E.
From now on, we only examine GEE-IND results for within-between-subject decomposition models, as GEE-E results are similar. For BUN and GEE-IND, the within-subjectβ BUN,WS = −1.11, 95% CI (−1.34, −0.88) is qualitatively and statistically closer to 0 than is the corresponding between-subject slopeβ BUN,BS = −2.72, 95% CI (−3.10, −2.33). However, serum albumin goes the other way: the within-subject slopeβ ALB,WS = −10.70, 95% CI (−12.99, −8.40) is statistically further from 0 than is the corresponding between-subject GEE-INDβ ALB,BS = −3.27 with a 95% CI (−7.88, 1.33) that overlaps 0. The QIC is lower (10,857.62) for equicorrelation than for independence (10,866.64) which perhaps now indicates an advantage to the former correlation structure in this setting where the slopes have been correctly decomposed.
One might wonder how to interpret differences in the within-and between-subject slopes for causal inference, including the reasons that these slopes were different? This in part will depend on the hypotheses of interest (and we did not have any for this illustrative example). However, general rules also apply, although we are unaware of any systematic exploration of reasons why the between-subject slopes β BS (or β BS for K = 1) could differ from within-subject slopes β WS (or β WS for K = 1). and the resultant implications for causal inference. Before outlining these rules, it is important to note an important relationship among cross-sectional, within-subject and between-subject slopes.  ) [17]. For example, with K = 1, if σ x 2 is the population variance of the within-person mean x i and σ 2 X−x is the population variance of the deviations of differences of the repeat measures X ij from their x i , then In the previous example of weight and cholesterol with β BS = 0.9, β WS = 3 and and σ 2 X−x = 100, then from (3) β CS = 0.9*400/(100+400)+3*100/(100+400) = 1.32. If the between-person sample means are more homogeneous in weight with σ 2 X−x = 200 but the within-person σ 2 X−x is still 100, then again using (3) β CS moves closer to β WS ; β CS = 0.9*200/(100+200)+3*100/(100+200) = 1.60.

2D Working Correlation Structures for Model Residuals Other than Independence Can Lead to Unusable Results for Cross-Sectional Regression.
As noted earlier, fitting both MM and GEE repeated measure regression models involves specification of correlation (or working correlation) structure of ε ij within the same subject i. We denote the working correlation structure by matrix V i . Typical choices for V i are the ones we used in the illustrative examples of Tables 1 and 2; equicorrelation (E), with correlation of ε ij and ε ij' for j = j' always the same value ρ (this common value of ρ is estimated in the model fitting process based on the residuals in the model fitting process), and independence (IND), with correlation of ε ij and ε ij' ≡ 0. However, other structures are used such as AR(1) where correlation of ε ij and ε ij' is ρ |j-j'| with the value of ρ being estimated from the residuals [1]. Again, current guidance [5][6][7][8][9][10][11][12][13] emphasizes choosing the V i that most closely fits the true covariance structure of the residuals within i and/or by model fit criteria such as having lowest QIC for GEE and AIC for MM, because doing so often improves precision of the model parameter estimates. However, we just observed that this approach may be wrong for CS regression, because using any correlation structure other than IND can introduce structural bias intoβ CS [20,21] and, unfortunately, AIC and QIC do not account for this bias. 8 of 21 To that end, Pepe and Anderson (1994) [20], developed a general rule for when IND is (and is not) the only correlation structure that should be used for CS regression that we now present. Specifically, they show that if a predictor X ij varies (i.e., takes on different values) within the same subject i and, then, no matter what true correlation structure of ε ij among repeated measures within a subject is, GEE-IND gives unbiased estimates for β CS , but any MM or GEE model not using V i = IND, gives biased estimates of β CS . Thus, the only working correlation structure that should be used to estimate (4) does not hold, then any working correlation structure obtains unbiased estimates for β CS in which case, choosing the V i that most accurately fits the correlation structure of ε ij minimizes the variance ofβ Our paper only focuses on equicorrelation as the alternate to independence in order to keep the presentation from becoming too cumbersome, given the large number of possible correlation structures. However, the previous paragraph and (4) apply to any non-independence correlation structure.
As one (of many) examples of where (4) holds, let k = 1 and Y ij and X ij be the degree of airway obstruction and inhalation of tobacco smoke of subject i at time j, respectively. One would expect that, because smoking effect on the lung is cumulative, historical smoking in a current smoker or non-smoker would lead to poorer lung function. Thus, E[Y ij |X ij' ] for a smoker at time j' < j would poorer irrespective of X ij .
We now present an easier way to visualize (4). If repeated measures j and j' are thought of as "siblings" and the predictors as "exposures" then (4) means that even after considering the "self-exposure" of the current measure j through X ij the outcome Y has "Conditional Dependence On Sibling Exposures" (Co-DOSE) (i.e., on X k,ij ). Thus, the sibling exposure X k,ij could be thought of as a Co-DOSE beyond the "dose" from the "self-exposure". Hence, from now on we use the term Co-DOSE to denote that (4) occurs.
Also, while this point has not been very well made, for CS regression, Co-DOSE in (4) largely occurs if and only if within-and between-subject slopes differ. If within-and between-subject slopes differ for any predictor (i.e., β BS = β WS ) then Co-DOSE (4) happens. However, if the within-and between-subject slopes are equal for all predictors (i.e., β BS = β WS ) then Co-DOSE (4) does not occur.
More details on this and an illustration are given in Appendix B, but one trivial case arises if the predictors are invariant within the same subject (i.e., ) such that the within-subject slopes are not defined (since X ij − x i ≡ 0) and for the same reason Co-DOSE in (4) cannot occur. While the mathematical details are beyond this paper, if β BS = β WS and V i = IND, then non-zero covariance ρ ij > 0 besides adjusting for within-i collinearity of ε ij also over-weights the β WS relative to β BS in (3), thereby pushing CS regression parameter estimates away from β CS towards β WS [17]. Since robust covariance methods exist to adjust for impact of misspecification of V i = IND from collinearity of the residuals ε ij 's on variance estimates, in particular for GEE [1], V i = IND can eliminate bias in estimating β CS while providing conservative variances for the parameter estimates.

2E Implications for Applied Research and Statistical Practice.
Much of what has been presented above is not commonly understood and implemented in applied research and statistical practice. CS models are typically fit, with β CS interpreted to also be β BS and β WS , without checking if these slopes are equal. Non-independence V i is often used for CS regression without checking if Co-DOSE (in (4)) exists. Perhaps in part this occurs because systematic epidemiological descriptions of causal mechanisms for why between-and within-subject slopes can differ are lacking, which hinders awareness of this possibility. We endeavor to fill this gap in Section 3.

Epidemiological Reasons for Between-and Within-Subject Slopes to Differ
To make it easier for investigators to identify what could cause β k,WS = β k,BS (or equivalently Co-DOSE) in a given setting, we classify major reasons why this can happen. For simplicity, let K = 1 unless otherwise noted, as the following principles extend to multivariate settings.
3A. Change Effects. We propose that the effect of a longitudinal within-subject change in the predictor X could have a greater (or less) direct impact on Y than a long-term standing difference in X between two different subjects (hence β WS = β BS ) and define this as a (c.f. short term) "change effect". Returning to the example of weight and cholesterol, consider two identical twins, "A" has lived his adult life at x i = 190 lbs. and "B" at x i = 180 lbs. If "B" undergoes a short-term weight gain of 10 lbs. to 190 (X i j − x i = 10), assuming x i not impacted by the rapid change, while A remains at 190 lbs. (X ij − x i = 0), the shock or corollaries of this rapid change in B may raise his cholesterol level above that of A's even though they both now weigh 190 lbs., meaning that β WS > β BS and Co-DOSE in (4) occurs. However, it should be noted that as was mentioned in Section 2B, in this setting, β WS would be somewhat undefined if, e.g., a 10 lbs. gain in a shorter time period (i.e., 1 month) increases β WS more than does a 10 lbs. gain over a longer time period (i.e., 12 months).
3B Lag Causality of X on Future Y. The effect of historical levels of X on Y may independently project into the future (i.e., beyond that effect of the current level of X). For example, consider an HIV-infected person and two time points t 1 < t 2 ; let X ij be HIV viral load and Y ij be CD4 count. High HIV levels destroy CD4 blood cells into the future. Therefore, as illustrated in Figure 2a, high HIV viral load at t 1 may affect CD4 loss from t 1 to t 2 so that even if the person's HIV viral load is low at t 2 , the high viral load at t 1 is predictive of lower CD4 at t 2 through higher viral load at t 1 having created more CD4 destruction between t 1 and t 2 (i.e., lag causality of X at t 1 on Y at t 2 ). Thus, Y i2 |X i2 at t 2 is not independent of X i1 at t 1 ; Co-DOSE in (4) occurs and the within-and between-subject slopes differ (β WS = β BS ). In Figure 2a,b, E i2 denotes that X i2 differs from X i1 due to an extraneous process that is causing X i to change over time. Lag causality is often considered when serial measures of X represent long-term environmental exposures (such as air pollution and cigarette smoke) that effect chronic conditions Y (such as lung function) are obtained [1,18,22].  3C Reverse-Lag Causality of X on Future Y. The setting in Section 3B also manifests in the opposite direction if X is being used as to estimate Y that is causal for future X. Reversing the previous example with X now being CD4 used to predict HIV viral load as Y, as Figure 2b illustrates, high viral load (Y i1 ) at t 1 may have degraded the CD4 count from t 1 to t 2 . Thus, Y i1 |X i1 at t 1 is not independent of X i2 at t 2 : Co-DOSE in (4) occurs and within-and between-subject slopes differ (β WS = β BS ).
3D Spillover Causality of X on Adjacent Y. An analogous setting to those of 3B and 3C can also manifest in repeated measure cross-sectional settings based on geographical proximities. Let the subjects i now be cities and j enumerate different neighborhoods in these cities. The repeated measures are average air pollution (X ij ) of neighborhood j in city i and average lung function of all residents living within neighborhood j of city i (Y ij ). A resident living in neighborhood j may work in a different neighborhood j' of the same city and thus have "spillover exposure" to air in the neighborhood they work in, for a given city i, thus Y ij |X ij is not independent of X ij' and hence Co-DOSE in (4) occurs.
3E Common Within-Subject Measurement Bias. Shared within-subject measurement bias occurs if all repeat measures from the same subject have the same correlated measurement bias. This was the setting described in Section 2B and Figure 1a with weight as exposure for cholesterol. Here with weight as a surrogate for body fat, the measurement bias was mediated by height with taller adults being heavier independently of body fat than shorter adults, which leads to β WS > β CS and Co-DOSE in (4) when weight was a predictor of cholesterol. In this setting, height is a measurement bias not a confounder as height itself is not associated with cholesterol. We now present a similar setting where the un-modeled variable is a confounder.
3F Common Within-Subject Confounding. Figure 1b shows common within-subject confounding, that causes β WS = β CS and Co-DOSE in (4). This phenomenon is diagrammatically similar to common measurement bias that was described in Section 2B. However, rather than a common measurement bias, the extraneous factor, shared by the repeated measures of the same subject, is a confounder that is associated with both X and Y. For example, let the confounder variable C i be sex of subject i (which does not change with j) not be in the model and the outcome Y ij be a linear score for male pattern baldness at time j with again X ij being weight at time j. Adult men are both on average heavier and, independently of weight, have greater male pattern baldness than do adult women. So C i is associated with both the exposure and the outcome. Here a 10 lbs. weight difference in two adults, but not a within-adult increase of 10 lbs., could be informative of the heavier adult more likely being male. Hence for this example, β WS = 0 (assuming within adult weight does not influence baldness), but β CS > 0 (and thus β BS > 0) as males are more likely to be both heavier and bald compared to women. Hence also β CS > 0, reflecting unaddressed between-subject confounding from heavier adults more likely being men.
Similarly, Mancl, Leroux and DeRouen proposed that in a study with repeated dental predictor and outcome pairs as (X ij ,Y ij ) measured on teeth (i.e., enumerated by j) on the same persons (i.e., enumerated by i) that better compliance with dental treatment by some persons was a confounder that could lead to differences in slopes within and between subjects [19]. In a non-longitudinal setting where i denotes clusters (for example schools) and j denotes repeated subjects within that cluster (for example students), common within-subject confounding is referred to as "contextual effects" [23,24]. For example, as Robinson (1950) [14] observed, when X was race of the student and Y was achievement-score, a higher x i (here: portion of a school's students that were non-White) indicated weaker financial support for that school (weaker financial support being the confounder) and thus worse achievement-scores overall for that school: β BS was negative. However, within the same school, race had no impact on the achievement score (β WS = 0). Begg and Parides [25] identify a similar setting in birthweight and intelligence quotient in families.
3G Measurement Error in X ij Makes E[Y ij |X ij ] Dependent on X ij' In many settings, the predictor we observe is X = TX + M where TX is the true value of the predictor and M is measurement error that is independent of TX (i.e., classical measurement error). It has been shown that, measurement error in X that is either independent of [26], or correlated with Y [27], biases estimates for the slope that relates TX with Y. Measurement error can arise either from imprecision in an analysis instrument, such as in a machine quantifying components of serum, or in data collection process, such as the chemical composition of blood samples being non-informatively influenced by diurnal and other nuisance processes. If X ij is incorrectly quantified due to such measurement error, then Co-DOSE in (4) occurs and the observed within-and between-subject slopes differ, because, as illustrated in Appendix C, the biases being created from the measurement error distribute differentially to different slopes. As Figure 3 shows and the paragraph below it describes using an illustrative example, if X i1 incompletely measures the true state TX i1 (i.e., true BUN) due to classical measurement error as the extraneous influence then X i2, is informative for TX i1 even after considering X i1 . Please note that in Figure 3 there are two times subscripts on the extraneous influence, because E i1 and E i2 are two independent measure errors.
worse achievement-scores overall for that school: βBS was negative. However, within the same school, race had no impact on the achievement score (βWS = 0). Begg and Parides [25] identify a similar setting in birthweight and intelligence quotient in families.

3G Measurement Error in Xij Makes E[Yij|Xij] Dependent on Xij'
In many settings, the predictor we observe is X = TX + M where TX is the true value of the predictor and M is measurement error that is independent of TX (i.e., classical measurement error). It has been shown that, measurement error in X that is either independent of [26], or correlated with Y [27], biases estimates for the slope that relates TX with Y. Measurement error can arise either from imprecision in an analysis instrument, such as in a machine quantifying components of serum, or in data collection process, such as the chemical composition of blood samples being non-informatively influenced by diurnal and other nuisance processes. If Xij is incorrectly quantified due to such measurement error, then Co-DOSE in (4) occurs and the observed within-and between-subject slopes differ, because, as illustrated in Appendix 3, the biases being created from the measurement error distribute differentially to different slopes. As Figure 3 shows and the paragraph below it describes using an illustrative example, if Xi1 incompletely measures the true state TXi1 (i.e., true BUN) due to classical measurement error as the extraneous influence then Xi2, is informative for TXi1 even after considering Xi1. Please note that in Figure 3 there are two times subscripts on the extraneous influence, because Ei1 and Ei2 are two independent measure errors. For example, going back to the analysis of Table 1, let Xij be BUN and Yij be EGFR. Consider two persons who have BUN of Xi1 = 10 mg/dL measured with error today. Also assume that the true BUN state changes slowly. If so, and after 6 months one of these persons measures Xi2 = 20 mg/dL while the other measures Xi2 = 5 mg/dL, we can then surmise that since BUN changes slowly, it is more likely that the true BUN today (TXi1) of the former person is > 10 mg/dL and that of the latter is < 10 mg/dL. Thus, since (i) EGFR (Yi1) directly depends on TXi1 not Xi1, and (ii) Xi2 is informative on TXi1 after For example, going back to the analysis of Table 1, let X ij be BUN and Y ij be EGFR. Consider two persons who have BUN of X i1 = 10 mg/dL measured with error today. Also assume that the true BUN state changes slowly. If so, and after 6 months one of these persons measures X i2 = 20 mg/dL while the other measures X i2 = 5 mg/dL, we can then surmise that since BUN changes slowly, it is more likely that the true BUN today (TX i1 ) of the former person is > 10 mg/dL and that of the latter is < 10 mg/dL. Thus, since (i) EGFR (Y i1 ) directly depends on TX i1 not X i1 , and (ii) X i2 is informative on TX i1 after considering X i1 , then (iii) Y i1 |X i1 is not independent of X i2 and similarly Y i2 |X i2 not independent of X i1 meaning Co-DOSE in (4) occurs and the observed within-between-subject slopes differ. Appendix C shows that measurement error in the exposure that is independent of the outcome pushes both β WS and β BS towards 0, but more so for β WS . Such tempering from averaged measurement error has been proposed as a reason |β WS | < |β BS | was observed in dental research [19] and occupational epidemiology [28,29].
However, if M ij is correlated with Y ij (most likely being correlated with measurement error on Y ij [27]) the tempering of β's from M ij will not be to 0. For example, consider TX = CD8 and TY = CD4 cells which together are the almost exclusive components of serum lymphocytes (TZ) (i.e., TY ≈ TZ − TX. Physiologically, TZ is constrained to create a negative β BS , β WS and β CS for TY ij on TX ij : subjects with a higher CD8 component of serum lymphocytes by converse must a have lower CD4 components. However, the measured lymphocyte count (Z) is subject to a correlated measurement error that equally spreads onto X and Y. For example, if a person is dehydrated, the entire measured lymphocyte (meaning both CD8 = X and CD4 = Y) portion of blood becomes artificially higher due to reduction of the percentage of water in the blood. If a person has a high (or low) measured lymphocyte count Z ij = TZ ij + M ij due to such measurement error, then M ij contributes to both CD4 (X ij ) and CD8 (Y ij ), making both simultaneously artificially higher (or lower). Consequently, within person, a higher measured CD4 count due to positive M ij is associated with higher measured CD8. Because in this case the measurement error is shared, naïve regression analysis tends to draw β WS towards being positive. On the other hand, β BS , which tempers down M ij on both X and Y through averaging as shown in Appendix C, is less affected by the shared bias due to measurement error.
We have only considered classical measurement error so far. The other common type of measurement error is known as Berkson error [30]. It is approximated by some exposure assessment procedures commonly used in environmental and occupational epidemiology (see semi-ecological design and group-based exposure assessment) [18]. While this is an aside to the main points of this paper, when Berkson measurement error exists, only the between-subject slope, β BS , is estimable. More details are in Appendix D.

Predictors Having Co-DOSE Will Bias Adjusted Parameter Estimates of Other Predictors Not Having Co-DOSE When Included Together in Cross-Sectional Regression When V i = IND Is Used
Going back to Table 1, it was shown earlier that the point estimate from GEE-INDβ H IV,CS for the adjusted cross-sectional association of HIV with EGFR is still consistent for β H IV,CS . However, HIV infection status was constant over all replicates within the same subject, and therefore cannot have Co-DOSE in (4) as the entire effect of HIV is mediated between-subject, not within-subject. Consequently, the question arises whether the adjusted estimate from a non-independence correlation structure (say for exampleβ H IV,CS−E ) can be biased for β H IV,CS . Please note that for this section, we usê β XXX,CS andβ XXX,CS−E to denote estimates for adjusted cross-sectional association for variable XXX from models using independence and equicorrelation structures, respectively. The added designation of "E" (CS-E) in the subscript for equicorrelation, but none for independence correlation, is made because the equicorrelation estimate (but not the independence estimate) can be asymptotically biased. The specific question addressed here is: could including BUN and albumin that each have Co-DOSE in the model bias the corresponding estimate for cross-sectional adjusted HIV association from using equicorrelation (β H IV,CS−E ) so that it no longer is consistent for β HIV,CS in the multivariate model, even though HIV itself is not Co-DOSE? This is important, because in Table 1,β H IV,CS of −2.04 95% CI (−5.07 0.98) qualitatively differs fromβ H IV,CS−E of −3.96 (−6.90, −1.03) with onlyβ H IV,CS−E statistically (p < 0.01) differing from 0.
We believe thatβ H IV,CS−E for HIV in Table 1 is biased away from β HIV,CS . To help make this point, Table 3 presents normative data broken down by HIV status of the subjects. First we note from Table 1 Table 3, those who are HIV+ have higher mean BUN (12.94 vs. 12.10, p < 0.0001 from GEE-IND). Thus, the full apparent "negative effect" of the higher BUN in HIV+ subjects from β BUN,CS is underestimated byβ BUN,CS−E and this pushesβ H IV,CS−E down to compensate. Second, similarly, also from Table 1,β ALB,CS−E is biased lower for β ALB,CS (with GEE-Ê β ALB,CS−E = −9.84 <β ALB,CS = −6.21), while from Table 3, HIV+ individuals have lower mean albumin (3.97 vs. 4.14, p < 0.0001 from GEE-IND). Thus, the apparent "positive effect" of the lower albumin in HIV+ subjects from β ALB,CS is overestimated byβ ALB,CS−E , which pushesβ H IV,CS−E further down to compensate. Now we consider these two biases together as illustrated in Figure 4. These two deficits act jointly to pushβ H IV,CS−E downwards from the true adjusted β HIV,CS . Therefore, non-independence V i can bias multivariate cross-sectional parameter estimates of variables that do not carry Co-DOSE in (4) when other variables in the model carry Co-DOSE. 0.0001 from GEE-IND). Thus, the apparent "positive effect" of the lower albumin in HIV+ subjects from is overestimated by , which pushes further down to compensate. Now we consider these two biases together as illustrated in Figure 4. These two deficits act jointly to push downwards from the true adjusted βHIV,CS. Therefore, non-independence Vi can bias multivariate cross-sectional parameter estimates of variables that do not carry Co-DOSE in (4) when other variables in the model carry Co-DOSE.

Discussion
Numerous published papers fit GEE and MM cross-sectional regression models with repeated measures having time varying predictors that either use non-independence working correlations structures or do not state the correlation structure. These papers, which continue to be published, do not show awareness of the points presented in Sections 1-4, above. Specifically, they: We have identified almost 45 such papers including some authored by us prior to becoming aware of these issues. This is almost certainly only a fraction of the total number of such papers.
While it was not covered in our paper, this includes fitting GEE models of binary outcomes where the issues discussed here also apply [19,31]. However, these points are still not well known or emphasized in statistical software documentation and papers providing guidance on GEE and MM analyses (i.e., [5][6][7][8][9][10][11][12][13]).
One problem that impedes acceptance of within-and between-subject decomposition is that it necessitates much more complicated models that are difficult to explain. Still, some air pollution epidemiologic studies have attempted within-and between-subject decompositions using cities as the subject and neighborhoods as the repeated measures within the city [32][33][34]. Most often in these studies, the magnitude was greater for within-subject slope |β WS | > |β BS | but sometimes |β BS | > |β WS | was observed meaning that possibly multiple causes for slope differences are involved. Those papers that did attempt to explain the reasons for the differences described only "common within-subject confounding" (Section 3E) as a potential reason; such as un-modeled pollutants that were correlated between (but not within) cities with the modeled pollutants of interest. Other studies in environmental research have considered the mechanism described in Section 3B, namely, lag causality in longitudinal analyses of association of air pollution on health measures [1]. Nevertheless, having to explain complicated and unknown mechanisms for biases such as these can appear to detract from the main purpose of the research and cast doubt on the overall findings, making the paper harder to publish. In other words, there appears to be neither incentive, nor guidance on how to engage with these issues for applied researchers.
We concur with others [19,20], that cross-sectional regression with repeated measures should use independence as the default working correlation unless justification is given to use other V i . While non-independence V i can improve precision and thus be desirable [21], they can considerably bias estimates for cross-sectional parameters, β CS , including perhaps towards what the investigator wants to see. For example, in Table 1, p < 0.01 was observed for association with HIV with worse EGFR in GEE-E compared to the more appropriate p = 0.19 from GEE-IND. An investigator who was expecting HIV to be associated with worse EGFR might thus be tempted to use the results from GEE-E for this reason.
While showing this is beyond the scope of our paper, when V i is not independence, factors such as the values of J i and magnitude/structure of ε ij strongly influence parameter estimate values for β CS from the miss-fitted cross-sectional models, allowing the miss-fitted estimate to arbitrarily range from β CS to β WS [17]. Standardization is important and, as such factors will arbitrarily vary between studies, parameter estimates of β CS become harder to compare across studies when V i differs at discretion of investigators. Therefore, the working correlation structure used in cross-sectional regressions using repeated measures should always be justified and reported.
We also concur with others [14][15][16][17][18][19][23][24][25] that despite the difficulties in identifying why withinand between-subject slopes differ, causal inference analyses with repeated measures should initially make such decompositions. Investigators should then be wary if there are qualitative differences between β WS and β BS . For example, Table 2 with 584 subjects and 10,782 measurements demonstrated need for β WS , β BS decomposition to make causal inference (as well as for using GEE-IND in cross-sectional regression). However, a smaller study could have been less clear-cut. If the same point estimates for β WS and β BS seen in Table 2 were observed but did not statistically differ, one would be tempted to merge β WS and β BS into a combined β CS at least for some variables, because standard model-fitting practice promotes parsimony when statistical significance is not observed. This would be particularly true if for a given variable, k, neitherβ k,WS norβ k,BS statistically differed from 0, but β k,CS did. If such collapsing is done, it may still be important to reportβ k,WS andβ k,BS for comparison to future studies and target potential mechanisms for within-between-subject slope differences as described in Section 3.
Unfortunately, the within-and between-subject slope decomposition expands required analyses and presentation. Statistical software mostly does not have standard subroutines to do this. Decomposition can be tedious if x k,i is recalculated to maintain orthogonal decomposition of X k,ij as new models are fit if observations are excluded from the J i due to missing values of newly included variables. The fact that the x k,i are ill-defined by averaging the X k,ij rather than being true means for subject i creates confusion about interpretation ofβ BS that can also be influenced by within-subject slopes as was noted in Section 2B.
Whenβ BS andβ WS differ, the causal mechanisms as to why this happens should be explored. For example, in our analysis presented in Table 2 with EGFR as the outcome, for BUN the between-subject slopeβ BUN,BS = −2.72 (from GEE-IND) was statistically further from 0 in the expected direction of association than was the within-subject slopeβ BUN,WS = −1.11. However, the albumin went the other way: between-subject slopeβ ALB,BS = −3.27 was statistically closer to 0 than was within-subject slopeβ ALB,WS = −10.70 with again both slopes being in the expected direction from zero. So what are the potential reasons for this? While lag/reverse-lag causality (Section 3B,C) between BUN and creatinine (the main component of calculated EGFR) could reduce magnitude of β BUN,WS vs. β BUN,BS , this was unlikely given the separation of visits was 6 months and internal biochemistry operates over shorter time periods. However, independent measurement error on BUN (Section 3G) would temper |β BUN,WS | towards 0 relative to |β BUN,BS |. To that end, several articles find greater coefficient of variation [35,36], within-person change [35,36], assay error [36], and sample degradation for BUN vs. albumin measures [37], all of which could reflect BUN having larger independent measurement error than does albumin that would selectively attenuateβ BUN,WS towards 0 (i.e., more than it did tô β ALB,WS ). Conversely, serum creatinine and albumin are both constrained into the intravascular fluid compartment and will non-informatively increase together with greater hydration and decrease with less hydration of this compartment, inducing positively correlated measurement error, as in the case for measured CD4 and CD8 cells in the last paragraph of Section 3G. As creatinine factors inversely into the EGFR calculation, this would constitute negative correlation of measurement error between albumin and EGFR and selectively biasβ ALB,WS to be more negative thanβ ALB,BS . However, BUN, which crosses across all body compartments, is less subject to such correlation in measurement error with creatinine and thus with EGFR. As is illustrated in the previous paragraph, we believe that the systematic epidemiological description of reasons for within-and between subject slopes to differ in Section 3 will provide some basis for future studies to explore this. That may lead to greater recognition and understanding of this phenomenon. However, our list of reasons for these slopes to differ may not be exhaustive. Furthermore, these mechanisms are quite complicated including that limited resources may be available to investigate them in given studies given the other tasks that need to be done and limited funding/personnel.
When between-and within-subject slopes differ, β BS = β WS , it is unclear which is the "least confounded or biased", including the possibility that by "averaging" the different biases in each would make β CS be the least biased. There may be a heuristic perception that by "matching within the same subject", β WS is superior to β BS and β CS , but this is not necessarily true as measurement error in X (Section 3G) and lag/reverse-lag and spillover causality (Section 3B-D) can in fact bias β WS to a larger degree than they do for β BS and β CS .

Conclusions
It has been known for decades by some that when exposures vary within subjects in repeated measures regression then, (i) cross-sectional regression using V i = independence working correlation should be the default for building a model to estimate a future unknown Y as the goal, and (ii) withinand between-subject decompositions of slopes should at least initially be fit when building models for causal inference. Yet this advice rarely makes it into published guidelines and hence is not heeded, perhaps in part due to complexity of the settings where within-and between-subject slopes differ and limited substantive study of the mechanisms that cause such differences. In general, analysts should explore and quantify reasons for biases that can occur in such study designs. To that end, analyses using repeated measures regression should investigate if within-and between-subject slopes differ and when they do, try to identify the reasons for this. Acknowledgments: Data in this manuscript were collected by the Women's Interagency HIV Study (WIHS) Collaborative Study Group at New York City/Bronx Consortium, which was funded by the National Institute of Allergy and Infectious Diseases UO1-AI-35004. We are indebted to the participants of this study, many of whom have now devoted over 15 years of their life to this effect.

Conflicts of Interest:
The authors declare no conflict of interest. The founding sponsors had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, and in the decision to publish the results. While mixed models are not appropriate for the cross-sectional regression of this example (and often are not appropriate for cross-sectional regression using repeated measures in general), they are often used for this purpose in practice. We have made the case that the biases described in this paper applying to GEE for CS regression also apply to mixed models CS regression. Thus, Table A1 below presents the parameters for CS regression of the Bronx WIHS example in Table 1 as estimated by mixed models using independence and equicorrelation working correlation structures. To that end, the reader can confirm that the parameter estimates from the mixed models in Table A1 are almost identical to those from the GEE model with the same correlation structure in Table 1. This includes that the estimates obtained using independence correlation here are also qualitatively different than those from using equicorrelation. We caution the reader; however, that the confidence intervals and p-values reported in these tables are meaningless irrespective of the biases reported on in this paper, since, unlike GEE, mixed models are not robust to misspecification of the correlation structure. Miss-specified correlation structure for this example is clearly the case for independence although such a claim is more debatable for equicorrelation.  1 The confidence interval and p-values for independence working correlation structure in particular but also arguably for equicorrelation as well overestimate the precision of the parameter estimates. Unlike GEE, mixed models are not robust to misspecification of the correlation structure. Mixed models may be more appropriate for within-between-subject decomposition models than they are for CS regression provided the correct correlation structure of the residuals is used. Table A2 below thus presents the parameters for within-between-subject decomposition regression of the Bronx WIHS example in Table 2 as estimated by mixed models using independence and equicorrelation. As was the case with Table A1 above compared to Table 1, the reader can again confirm here that the parameter estimates from the mixed models in Table A2 are almost identical to those from the GEE model with the same correlation structure in Table 2. This also includes that the mixed model parameter estimates under independence and equicorrelation are at worst qualitatively similar and often close to identical for the two correlation structures. We caution the reader; however, that, as with Table A1, the confidence intervals and p-values in Table A2 should be interpreted cautiously, because mixed models are not robust to misspecification of correlation structure of the residuals. While independence correlation is clearly not correct (as the within subject residuals for this example had a large positive correlation) it can be argued that equicorrelation might be correct. However, looking into that is beyond the scope of this paper. Figure A1 illustrates using the example of Section 2B (with K = 1) that Co-DOSE in (4) occurs if β WS = β BS . Remember that in this example, β BS = 0.9, β WS = 3, β CS = 1.60. Now let J = 2. So for the between / within-subject decomposition model; However, since X i2 = 220 and x i = 210, as we saw earlier within-between-subject decomposition gives;

Appendix B. Homology between Co-DOSE in (4) Occurring with between-and within-Subject Slopes Being the Same or Differing
is not independent of X i2 since X i2 is informative of where x i falls and the slope for (X ij − x i ) is different than the slope for x i when   Figure A1. Illustration that E{Y i1 } X i1 } is not independent of X i2 in Cross-sectional regression for K = 1 when B BS = β WS in a between/within decomposition model with J i = 2, X i1 = 200 and X i2 = 220.
As J i ≡ 2, in the prior example, the second observation was deterministic for x i . However, for J i > 2, while additional X ij' go into computation of x i these are still informative on relative contributions of β BS x i and β WS (X ij − x i ) on E[Y ij |X ij ].
Whether or not Co-DOSE in (4)  To illustrate this for the classical measurement error setting with K = 1, let there always be the same number of replicates, J, per subject and assume that the true data-generating mechanism, i.e., in the absence of measurement error, involves β = β WS = β BS = β CS . For example, let E[Y ij ] = βTX ij = β(TX ij -µ i ) + βµ i , where µ i is true mean exposure of the i th subject, for simplicity the intercept is 0. However, we only observe X ij = TX ij + M ij , where M ij is measurement error with E[M ij ] = 0, variance σ 2 M that is independent across all i's and j's and also independent from Y ij . It also would often be assumed that M ij~N (0, σ 2 M ), but we do not invoke this assumption here. When we use X ij instead of TX ij in regression, the observed estimates of β WS , β BS , and β CS will not be equal to their true values (i.e., as obtained with TX ij ), but instead will equal different values β * WS , β * BS , and β * CS from X ij being watered down by the independent measurement error as shown below. In this special case that β = β W S = β B S = β CS , we show that we expect β * WS = β * BS = β * CS . We also reproduce a known result that under classical measurement error observed β * 's are attenuated towards 0 with respect to the true β's. Furthermore, let TX ij vary with j within i as follows; TX ij = TC i + TR ij where TC i is a central tendency of TX for subject i, while TR ij is within-subject i repeated visit variation in TX ij across the j's. Let σ 2 C and σ 2 R be variances of TC i and TR ij , respectively. Now, using the identity that the slope of the regression line for Y = α + βX is the covariance of X and Y divided by the variance of X (i.e., Cov(X,Y)/Var(X)), we derive: Var(X ij ) = (σ 2 C + σ 2 R + σ 2 M ), Cov(X ij ,Y ij ) = β CS (σ 2 C + σ 2 R ), so that Var(x i ) = (σ 2 C + σ 2 R /J +σ 2 M /J), Cov(x i ,Y ij ) = (β BS σ 2 C + β WS σ 2 R /J), so that β * BS = (β BS σ 2 C + β WS σ 2 R /J)/(σ 2 C + σ 2 R /J + σ 2 M /J); Var (X ij − x i ) = ((J − 1)/J)(σ 2 R + σ 2 M ), Cov(X ij − x i , Y ij − y i ) = β WS σ 2 R ((J − 1)/J), so that β * W S = β WS σ 2 R /(σ 2 R + σ 2 M ).
Thus, for example if β WS = β BS = β CS = 5 we therefore have from the above formulas Continuing with this numeric example, let σ 2 C = 8, σ 2 R = 2 and σ 2 M = 10 and J = 5. The entire variance of X ij is 20 of which half, 10, is from measurement error, 8 is variation of the central tendency of X between-subjects and 2 is variation of X within-subject. Then β * CS = 5(10)/20 = 2.5, β * BS = 5(8.4)/10.4 = 4.03 and β * WS = 5(2)/12 = 0.83. Considering that, without measurement error, true between and within person slopes are both 5, measurement error has greatly attenuated β * WS = 0.83 towards 0, while β * BS = 4.03 is the least tempered. This happens because β * BS most fully retains the common signal in X, but tempers M through averaging, while β * WS more fully retains M while excluding the between-subject signal in X.

Appendix D. Only β BS Is Estimable When Under Berkson-type Measurement Error
Since we have brought up classical measurement error we should also discuss the other common type of measurement error known as Berkson error. For Berkson error the measurement error is independent of the observed value (i.e., X), but is not independent from the true value (TX). Such a situation is approximated when a common value is reported for all J i replicates in the same subject i. For example consider a study of radiation contamination of the milk supply with i = community and j = child within the community. Now the true value of daily milk consumption for each child, TX ij , is unknown, but the average daily consumption of milk across all children in that community, µ i , is known (or estimated with high degree of certainty as x i , with a caveat noted below) and is thus substituted for X ij . With Berkson-type error, the common within subject mean, µ (or x i ), rather than the different X ij is observed across all j replicates. Thus, when Berkson-type error exists, the fitted model estimates β BS by default and both β WS and β CS (which require knowledge of TX ij ) are not identifiable. However, in practice of observational rather than laboratory studies Berkson-type error may coexist with classical measure error in different ratios. This was described by Berkson (1950) [30] as "modified controlled experimentation". If so, then the estimate for β BS is likely attenuated (i.e., as a β * BS similar to what has been described in Appendix C for classical measurement error) due to the classical measurement error component of x i that is computed from X ij . More formal exploration of this hybrid quasi-Berkson-type error is given in context of occupational epidemiology in Kim et al. [18].