Modeling repeated measures of dichotomous data: Testing whether the within-person trajectory of change varies across levels of between-person factors
Highlights
► With repeated measures models, logistic interactive terms can be misleading. ► Differences in the rate of change should be assessed on an additive scale. ► This can be done using the STATA margins command.
Introduction
Research designs in which an initial wave of respondents or subjects is observed repeatedly over time are increasingly used in the social sciences. The primary goal of these studies is to depict change over time and to identify factors that influence the direction and rate of change. These factors can include time constant (e.g. gender) or time-changing (e.g. marital status) variables. As Molenberghs and Verbeke (2006, p. 7) have observed, mixed (hierarchical, multilevel) models have become the main tool for the analysis these kinds of data. Examples of studies across various disciplines using these methods include the influence of language exposure on early vocabulary growth (Huttenlocher et al., 1991), the impact of early teacher effectiveness on trajectories of student achievement (Palardy and Rumberger, 2008), how age, sex, race, class, and place of residence affect patterns of delinquency, substance abuse and health problems among young persons (Elliott et al., 1989), the relationship between education on change in blood pressure over the life course (Loucks et al., 2011), and the effects of age and race on trajectories of self esteem among older persons (Shaw et al., 2010).
Bodies of codified knowledge and methodological guidelines for the statistical analysis of repeated measures in such panel designs have developed correspondingly in various disciplines and fields of research (e.g., Molenberghs and Verbeke, 2006, Hsiao, 2003, Singer and Willet, 2003, Walls and Shafer, 2006, Wooldridge, 2002). In this paper, we consider how these repeated-measures data should be modeled and interpreted when the dependent variable is dichotomous and the objective is to determine whether the within-person rate of change over time varies across levels of a between-person factor (or group). With a linear model and a continuous dependent variable, a cross-level product term can be used to estimate and test differences in the rate of change between groups (Singer and Willett, 2003).1 Some have suggested that a cross-level product term from a logistic model can be used in a similar manner to examine group differences in the trajectory of change when the dependent variable is dichotomous (e.g., Molenberghs and Verbeke, 2006, pp. 282–287; Rabe-Hesketh and Skrondal, 2005, pp. 115–118).
Using both an empirical example and simulated data, we show that using a cross-level product term from a logistic model to evaluate group differences in the rate of change can produce highly misleading results, especially when substantial group differences in baseline prevalence are present. We argue that subgroup differences in the rate of change over time should be assessed on an additive scale (using group differences in the effects of predictors on the probability of an outcome) rather than on a multiplicative scale (using group differences in the effects of predictors on the odds of an outcome). Because standard approaches do not provide an overall estimate or significance test for whether the additive change varies across subgroups in the population, we illustrate how marginal effects on the probability can be estimated based on a logistic model, and then used to estimate and test whether additive changes in the probability vary with baseline status.
Section snippets
Baseline cognitive impairment and change in IADL disability
To illustrate the nature of the problem, we focus on the analysis of a particular outcome variable in a specific longitudinal panel study. We study changes in a dichotomous measure of disability among elderly respondents followed over three yearly waves after an initial baseline survey. Data are from the Duke University site of the multisite National Institute on Aging (NIA)-funded Established Populations for Epidemiologic Studies of the Elderly (EPESE) program. This was a 6-year annual
Summary of problems identified
The preceding analyses have shown that cross-level product terms in logistic models can be strongly influenced by group differences in baseline prevalence rates. Bollen (1989, chapter 4) and others warn against using standardized regression coefficients (b * Sx/Sy) to compare the effects of regression predictors across subgroups because group differences in metric regression effects are affected by group differences in Sx and Sy. With the use of multiplicative product terms, group differences on
Testing whether additive change in the probability of disability varies across subgroups
The predicted probabilities from the GEE and GLMM models in Table 4 indicated that increases in the probability of disability are greater among the cognitively impaired in our sample data. We require a significance test to determine whether we can reject the null hypothesis of no group differences in the trajectory of additive change for the population. A variety of authors have developed methods for testing whether the effect of one risk factor on a dichotomous outcome varies across levels of
Discussion and conclusions
Using EPESE data on baseline cognitive impairment and changes in IADL disability over time, we showed that when a repeatedly-measured outcome is dichotomous, using a product term from a logistic model to assess whether the rate of within-person change varies across subgroups can give substantively implausible and misleading results (i.e., that baseline impairment is protective for subsequent disability). This is the case because a logistic product term measures changes in the odds of an event
Acknowledgments
This work was supported by the National Institutes of Health, National Institute on Aging, Claude D Pepper Older Americans Independence Center; Grant No. P30 AG028716.
References (47)
- et al.
Test for additive interaction in proportional hazards models
Annals of Epidemiology
(2007) - et al.
Interaction: a word with two meanings creates confusion
European Journal of Epidemiology
(2005) A Logistic Regression Using SAS: Theory and Application
(1999)Comparing logit and probit coefficients across groups
Sociological Methods and Research
(1999)Fixed Effects Regression Methods for Longitudinal Data Using SAS
(2005)- et al.
Confidence intervals for measures of interaction
Epidemiology
(1996) - et al.
The association of age and depression among the elderly: an epidemiologic exploration
Journal of Gerontology: Medical Science
(1991) Structural Equations with Latent Variables. Wiley Series in Probability and Mathematical Statistics
(1989)Stata tip 87: interpretation of interactions in non-linear models
The Stata Journal
(2010)- et al.
Partial effects in probit and logit models with a triple dummy-variable interaction term
The Stata Journal
(2009)
Cognitive impairment as a strong predictor of incident disability in specific ADL-IADL tasks among community-dwelling elders: the Azuchi study
Gerontologist
Multiple Problem Youth: Delinquency, Substance Use, and Mental Health Problems
Screening the elderly
Journal of the American Geriatric Society
Statistical Methods for Social Scientists
Confidence interval estimation of interaction
Epidemiology
Analysis of Panel Data
To GEE or not to GEE: comparing population average and mixed models for estimating the associations between neighborhood risk factors and health
Epidemiology
Early vocabulary growth: relations to language input and gender
Developmental Psychology
Interaction reaction
Epidemiology
Estimating interaction on an additive scale between continuous determinants in a logistic regression model
International Journal of Epidemiology
Longitudinal data analysis using generalized linear models
Biometrika
Cited by (29)
Biopsychosocial factors influencing the occurrence and recurrence of preterm singleton births among Australian women: A prospective cohort study
2022, MidwiferyCitation Excerpt :After imputation, the rate of missing values was 0.5% for both BMI and age at menarche. Since the data can have multiple observations per woman (representing multiple births per woman), binomial regression in a generalised linear mixed effect model (GLMM) was used to identify the risk factors of preterm birth, using all available births (Landerman et al., 2011). Nearly half of newborns were first order births (47.0%), and a small proportion of newborns were of birth order four or above (3.2%) (see Table 1).
Bright light and oxygen therapies decrease delirium risk in critically ill surgical patients by targeting sleep and acid-base disturbances
2018, Psychiatry ResearchCitation Excerpt :The primary analyses utilized generalized estimating equation (GEE) analyses used to examine the effects of treatment modality as the predictor variable on the outcome variable (delirium). The GEE procedure is widely utilized in the analysis of repeated measurements of binary data (Ma et al., 2012; Landerman et al., 2011). First we ran a GEE analysis including treatment modality and delirium as outcome variable (adjusting for age and sex).
Sex differences in health behavior change after premature acute coronary syndrome
2015, American Heart JournalCitation Excerpt :To assess sex differences in the magnitude of change (ie, difference from baseline to 1 year post-ACS) in the proportion of patients for each health behavior, repeated measure analyses models were fitted by applying generalized estimating equations, using the SAS PROC GENMOD procedure with the REPEATED statement.22 This type of modeling is appropriate for dichotomous data when repeated measures of the same individuals are included in the analyses.23 An interaction term between sex and time was used to account for the within-subjects effect.