Elsevier

Social Science Research

Volume 40, Issue 5, September 2011, Pages 1456-1464
Social Science Research

Modeling repeated measures of dichotomous data: Testing whether the within-person trajectory of change varies across levels of between-person factors

https://doi.org/10.1016/j.ssresearch.2011.05.006Get rights and content

Abstract

In this paper, we consider the following question for the analysis of data obtained in longitudinal panel designs: How should repeated-measures data be modeled and interpreted when the outcome or dependent variable is dichotomous and the objective is to determine whether the within-person rate of change over time varies across levels of one or more between-person factors? Standard approaches address this issue by means of generalized estimating equations or generalized linear mixed models with logistic links. Using an empirical example and simulated data, we show (1) that cross-level product terms from these models can produce misleading results with respect to whether the within-person rate of change varies across levels of a dichotomous between-person factor; and (2) that subgroup differences in the rate of change should be assessed on an additive scale (using group differences in the effects of predictors on the probability of disease) rather than on a multiplicative scale (using group differences in the effects of predictors on the odds of disease). Because usual approaches do not provide a significance test for whether the rate of additive change varies across levels of a between-person factor, sample differences in the rate of additive change may be due to sampling error. We illustrate how standard software can be used to estimate and test whether additive changes vary across levels of a between-person factor.

Highlights

► With repeated measures models, logistic interactive terms can be misleading. ► Differences in the rate of change should be assessed on an additive scale. ► This can be done using the STATA margins command.

Introduction

Research designs in which an initial wave of respondents or subjects is observed repeatedly over time are increasingly used in the social sciences. The primary goal of these studies is to depict change over time and to identify factors that influence the direction and rate of change. These factors can include time constant (e.g. gender) or time-changing (e.g. marital status) variables. As Molenberghs and Verbeke (2006, p. 7) have observed, mixed (hierarchical, multilevel) models have become the main tool for the analysis these kinds of data. Examples of studies across various disciplines using these methods include the influence of language exposure on early vocabulary growth (Huttenlocher et al., 1991), the impact of early teacher effectiveness on trajectories of student achievement (Palardy and Rumberger, 2008), how age, sex, race, class, and place of residence affect patterns of delinquency, substance abuse and health problems among young persons (Elliott et al., 1989), the relationship between education on change in blood pressure over the life course (Loucks et al., 2011), and the effects of age and race on trajectories of self esteem among older persons (Shaw et al., 2010).

Bodies of codified knowledge and methodological guidelines for the statistical analysis of repeated measures in such panel designs have developed correspondingly in various disciplines and fields of research (e.g., Molenberghs and Verbeke, 2006, Hsiao, 2003, Singer and Willet, 2003, Walls and Shafer, 2006, Wooldridge, 2002). In this paper, we consider how these repeated-measures data should be modeled and interpreted when the dependent variable is dichotomous and the objective is to determine whether the within-person rate of change over time varies across levels of a between-person factor (or group). With a linear model and a continuous dependent variable, a cross-level product term can be used to estimate and test differences in the rate of change between groups (Singer and Willett, 2003).1 Some have suggested that a cross-level product term from a logistic model can be used in a similar manner to examine group differences in the trajectory of change when the dependent variable is dichotomous (e.g., Molenberghs and Verbeke, 2006, pp. 282–287; Rabe-Hesketh and Skrondal, 2005, pp. 115–118).

Using both an empirical example and simulated data, we show that using a cross-level product term from a logistic model to evaluate group differences in the rate of change can produce highly misleading results, especially when substantial group differences in baseline prevalence are present. We argue that subgroup differences in the rate of change over time should be assessed on an additive scale (using group differences in the effects of predictors on the probability of an outcome) rather than on a multiplicative scale (using group differences in the effects of predictors on the odds of an outcome). Because standard approaches do not provide an overall estimate or significance test for whether the additive change varies across subgroups in the population, we illustrate how marginal effects on the probability can be estimated based on a logistic model, and then used to estimate and test whether additive changes in the probability vary with baseline status.

Section snippets

Baseline cognitive impairment and change in IADL disability

To illustrate the nature of the problem, we focus on the analysis of a particular outcome variable in a specific longitudinal panel study. We study changes in a dichotomous measure of disability among elderly respondents followed over three yearly waves after an initial baseline survey. Data are from the Duke University site of the multisite National Institute on Aging (NIA)-funded Established Populations for Epidemiologic Studies of the Elderly (EPESE) program. This was a 6-year annual

Summary of problems identified

The preceding analyses have shown that cross-level product terms in logistic models can be strongly influenced by group differences in baseline prevalence rates. Bollen (1989, chapter 4) and others warn against using standardized regression coefficients (b * Sx/Sy) to compare the effects of regression predictors across subgroups because group differences in metric regression effects are affected by group differences in Sx and Sy. With the use of multiplicative product terms, group differences on

Testing whether additive change in the probability of disability varies across subgroups

The predicted probabilities from the GEE and GLMM models in Table 4 indicated that increases in the probability of disability are greater among the cognitively impaired in our sample data. We require a significance test to determine whether we can reject the null hypothesis of no group differences in the trajectory of additive change for the population. A variety of authors have developed methods for testing whether the effect of one risk factor on a dichotomous outcome varies across levels of

Discussion and conclusions

Using EPESE data on baseline cognitive impairment and changes in IADL disability over time, we showed that when a repeatedly-measured outcome is dichotomous, using a product term from a logistic model to assess whether the rate of within-person change varies across subgroups can give substantively implausible and misleading results (i.e., that baseline impairment is protective for subsequent disability). This is the case because a logistic product term measures changes in the odds of an event

Acknowledgments

This work was supported by the National Institutes of Health, National Institute on Aging, Claude D Pepper Older Americans Independence Center; Grant No. P30 AG028716.

References (47)

  • Rongling Li et al.

    Test for additive interaction in proportional hazards models

    Annals of Epidemiology

    (2007)
  • Anders Ahlbom et al.

    Interaction: a word with two meanings creates confusion

    European Journal of Epidemiology

    (2005)
  • Paul D. Allison

    A Logistic Regression Using SAS: Theory and Application

    (1999)
  • Paul D. Allison

    Comparing logit and probit coefficients across groups

    Sociological Methods and Research

    (1999)
  • Paul D. Allison

    Fixed Effects Regression Methods for Longitudinal Data Using SAS

    (2005)
  • Susan F. Assman et al.

    Confidence intervals for measures of interaction

    Epidemiology

    (1996)
  • Dan Blazer et al.

    The association of age and depression among the elderly: an epidemiologic exploration

    Journal of Gerontology: Medical Science

    (1991)
  • Kenneth A. Bollen

    Structural Equations with Latent Variables. Wiley Series in Probability and Mathematical Statistics

    (1989)
  • Maarten L. Buis

    Stata tip 87: interpretation of interactions in non-linear models

    The Stata Journal

    (2010)
  • Thomas Cornelißen et al.

    Partial effects in probit and logit models with a triple dummy-variable interaction term

    The Stata Journal

    (2009)
  • Hiroko H. Dodge et al.

    Cognitive impairment as a strong predictor of incident disability in specific ADL-IADL tasks among community-dwelling elders: the Azuchi study

    Gerontologist

    (2005)
  • Delbert S. Elliott et al.

    Multiple Problem Youth: Delinquency, Substance Use, and Mental Health Problems

    (1989)
  • Ender, Phil. 2010. Margins and the Tao of Interaction. Paper presented at the Boston Stata Conference, July,...
  • Gerda G. Fillenbaum

    Screening the elderly

    Journal of the American Geriatric Society

    (1985)
  • Eric A. Hanushek et al.

    Statistical Methods for Social Scientists

    (1977)
  • Hoetker, Glenn. 2004. Confounded coefficients: extending recent advances in the accurate comparison of logit and probit...
  • David W. Hosmer et al.

    Confidence interval estimation of interaction

    Epidemiology

    (1992)
  • Cheng Hsiao

    Analysis of Panel Data

    (2003)
  • Alan E. Hubbard et al.

    To GEE or not to GEE: comparing population average and mixed models for estimating the associations between neighborhood risk factors and health

    Epidemiology

    (2010)
  • Janellen Huttenlocher et al.

    Early vocabulary growth: relations to language input and gender

    Developmental Psychology

    (1991)
  • Jay S. Kaufman

    Interaction reaction

    Epidemiology

    (2009)
  • Mirjam J. Knol et al.

    Estimating interaction on an additive scale between continuous determinants in a logistic regression model

    International Journal of Epidemiology

    (2007)
  • Kung Yee Liang et al.

    Longitudinal data analysis using generalized linear models

    Biometrika

    (1986)
  • Cited by (29)

    • Biopsychosocial factors influencing the occurrence and recurrence of preterm singleton births among Australian women: A prospective cohort study

      2022, Midwifery
      Citation Excerpt :

      After imputation, the rate of missing values was 0.5% for both BMI and age at menarche. Since the data can have multiple observations per woman (representing multiple births per woman), binomial regression in a generalised linear mixed effect model (GLMM) was used to identify the risk factors of preterm birth, using all available births (Landerman et al., 2011). Nearly half of newborns were first order births (47.0%), and a small proportion of newborns were of birth order four or above (3.2%) (see Table 1).

    • Bright light and oxygen therapies decrease delirium risk in critically ill surgical patients by targeting sleep and acid-base disturbances

      2018, Psychiatry Research
      Citation Excerpt :

      The primary analyses utilized generalized estimating equation (GEE) analyses used to examine the effects of treatment modality as the predictor variable on the outcome variable (delirium). The GEE procedure is widely utilized in the analysis of repeated measurements of binary data (Ma et al., 2012; Landerman et al., 2011). First we ran a GEE analysis including treatment modality and delirium as outcome variable (adjusting for age and sex).

    • Sex differences in health behavior change after premature acute coronary syndrome

      2015, American Heart Journal
      Citation Excerpt :

      To assess sex differences in the magnitude of change (ie, difference from baseline to 1 year post-ACS) in the proportion of patients for each health behavior, repeated measure analyses models were fitted by applying generalized estimating equations, using the SAS PROC GENMOD procedure with the REPEATED statement.22 This type of modeling is appropriate for dichotomous data when repeated measures of the same individuals are included in the analyses.23 An interaction term between sex and time was used to account for the within-subjects effect.

    View all citing articles on Scopus
    View full text