Control of Weekly Time Trend in Time-Stratified Case-Crossover Design

Case-crossover designs have become widespread in biomedical investigations of transient associations. However, the most popular reference-selection strategy the time-stratified scheme may not be an optimum solution to control systematic bias in case-crossover studies. To prove this, we conducted a time series decomposition for daily ozone records and examined the capability of the time-stratified scheme to control for yearly, monthly, and weekly time trends; and observed its failure on the control for the weekly time trend. To solve this issue, we proposed a new logistic regression approach in which we suggest the adjustment for the weekly time trend. We compared the performance of the proposed with that of the traditional method by simulation. We further conducted an empirical study to explore the performance of the new logistic regression approach in examining potential associations between ambient air pollutants and acute myocardial infarction hospitalizations. The time-stratified scheme provides effective control for yearly and monthly time trends but not of the weekly time trend. Uncontrolled weekly time trends could be the dominant source of systematic bias in time-stratified case-crossover studies. In contrast, the proposed logistic regression approach can effectively minimize systematic bias in a case-crossover study.

Compare to general case-control and cohort study designs, these self-matching designs have an obvious advantage of controlling for all time-invariant confounders, either measured or unmeasured. The challenge of a case-only design comes from how to control for time-varying factors. Like a cohort or a case-control study, confounding in a self-matching design arises when there is an unbalanced matching in determinants between hazard and reference periods, leading to various sources of systematic bias. These sources were previously reviewed by Mittleman and Mostofsky (Mittleman & Mostofsky, 2014).
To control for time-varying factors, various reference-selection strategies have been proposed in the literature. Janes, et al. (Janes, Sheppard & Lumley, 2005a) provided a comprehensive review of these strategies and assessed potential bias associated with them. These strategies aim to limit the reference-selection window to a short period that restricts timevarying confounders to be nearly constant across reference-selection windows (Lu, Symons, Geyh & Zeger, 2008). Ambidirectional (Bateson & Schwartz, 1999) and time-stratified (Lumley & Levy, 2000) are among the top two referenceselection strategies of case-crossover studies in the literature and the latter is claimed to be unbiased (Janes, Sheppard & Lumley, 2005b) and optimal (Mittleman, 2005), and intended to avoid potential bias resulted from time trends in the exposure series as well as specific time-varying confounders. study. One is the bias in a point estimate of effect from imperfect control of time-varying factors (the confounding issue). The other is estimation bias of the standard error (SE), the so-called overlap bias (Janes Sheppard & Lumley, 2005b;Austin, Flanders & Rothman, 1989;Robins & Pike, 1990) from correlations among observations. One can reduce bias in point estimate by using a shorter reference-select window, which results in higher overlap bias because a shorter reference-selection window leads to higher correlation among cases and their matched references. In their paper, the authors concluded that the time-stratified case-crossover design is by no means a final solution to the overlap bias issue. As a remedy, a calibration strategy based on permutation (Fisher, 1935;Pitman, 1937) was proposed to overcome the overlap bias issue. Although the authors proved that their strategy can effectively reduce estimation bias. However, its computational burden is high, especially when the sample size is large. In addition, the authors did not explain where the bias roots from.
In this paper, we sought a better solution to potential systematic bias in a case-crossover study. We started with scrutinizing the capability of the time-stratified scheme to control for time-varying factors and found its failure on the control for the weekly time trend. We then suggested an additional adjustment for the weekly time trend in a logistic regression model. Via avoiding a calibration algorithm, the proposed simple modeling strategy can dramatically reduce the computational burden from permutation. We hypothesize that it can be competitive to the calibration method, and it can effectively minimize both overlap bias and bias in point estimate.

Overlap Bias
The term overlap bias was first introduced to case-crossover studies by Lumley and Levy (Lumley & Levy, 2000). They observed that this bias is similar to the friend control bias in matched case-control designs (Austin, Flanders & Rothman, 1989;Robins & Pike, 1990). For example, in the time-stratified design for the simulation (subsection 2.5), a hazard day could be viewed as a random sample from a study period, and 3 to 4 reference days are days in the same year, same month, and matching weekday. The selection of reference days is conditional on hazard days. This reference-selection scheme makes the hazard days (synonymous of cases) and reference days (synonymous of friend or sibling controls) to be selected from different strata within the study period, causing a correlation between exposure levels on hazard and reference periods. Austin, et al. (Austin, Flanders & Rothman, 1989) showed that choosing friends or siblings as matched controls for each case can lead to the so-called overlap bias.
Overlap bias originates from correlation among matched cases and controls and leads to estimation bias of SE, and therefore a biased significance test of the exposure effect. We further investigated where comes from the correlation among observations by using an air pollution time-series dataset.

Limitation of the Time-Stratified Scheme
We used the same publicly available air pollution dataset used by Wang, et al. (Wang, Wang & Kindzierski, 2019). Briefly, daily records of five criteria air pollutants from urban monitoring stations in Edmonton, Alberta, Canada spanning 10 years (from April 1, 2000 to March 31, 2010) were obtained from National Ambient Pollution Surveillance (NAPS) Database (National Air Pollution Surveillance Network, 2013). The five pollutants were carbon monoxide (CO), nitrogen monoxide (NO), nitrogen dioxide (NO 2 ), ozone (O 3 ), and particulate matter with an aerodynamic diameter ≤ 2.5 (PM 2.5 ). Because different pollutants have different concentration units, each of the five pollutant measures was standardized by the interquartile range (IQR, difference between the 75th and the 25th concentration percentile).
To examine the limitations of the time-stratified reference-select scheme, we conducted a time series decomposition for daily records of O 3 . From the daily time series of O 3 , we calculated its yearly average (O y 3 ), monthly average after removing the yearly time trend (O m 3 ), weekly average after removing the yearly, and monthly time trends (O w 3 ) and daily average after removing the yearly, monthly and weekly time trends (O d 3 ). The above process is an usual time series decomposition, such that We have depicted the time series and its components in Figure 1. The potential transient effect of an air pollutant on a health outcome is a within-day or within-a few-days impact of the air pollutant. It should not include any effect from a weekly time trend. We believe that a lack of control of the weekly time trend is the dominant source of an estimation bias and the overlap bias. Correlations among observations are unignored if the weekly trend time-series is highly correlated.

Control of the Weekly Time Trend
Suppose that Y is a response variable where Y = 0/1 indicates exposure to a referent/hazard period in a case-crossover study. The following traditional logistic regression model (model 1) is the typical analytic tool being used in literature to assess a transient association between adverse events and exposure to a hazard (an air pollutant in our case): here logit(p) = log(p/(1 − p) with p = Pr(Y = 1) and β is the transient effect of O 3 under estimation. Again, we use O 3 as example of the air pollutant. For simplicity we consider a model without covariates.
We proposed the following analytic methods. Given a suitable model adjustment for the weekly trend, we sought a better point estimate and a better significance test. The proposed logistic regression model (model 2) is one adjusted for the weekly time trend of O 3 , i.e., here γ 1 , γ 2 , and γ 3 represent effects of yearly, monthly, and weekly time trends, respectively. Adjustment for yearly and monthly time trends is because some cases will match with three references, and the others will match with four references. It leads to a possible unbalanced matching of yearly and monthly time trends in the hazard and reference periods. Given each of the cases match with the same number of controls in a time-stratified study design, the yearly and monthly terms in model 2 can be removed.
Another logistic regression model (model 3) is designed to directly estimate impact of the daily average (O d 3 ) and adjust for the weakly time trend: A transient impact from O 3 is considered to be the effect of O d 3 . It is worth pointing out that models 2 and 3 are equivalent to each other.
Without control of the weekly time trend, correlation among the weekly-average records will enter into the model errors, leading to a relatively higher correlation among the model errors. On the other hand, with control of yearly, monthly, and weekly time trends, correlation among the daily-average records could be quite low for days in the same month but with one week or multiple weeks apart. From this analysis, we conclude that the both the overlap bias and the bias in point estimate root in the same source, i.e., the uncontrolled weekly time trend.

Calibration Using Permutation
From a statistical point of view, the proposed models 2 and 3 are only for "univariate analysis". However, bias could also come from other potential time-varying confounders. An ideal strategy is to observe those time-varying confounders and adjust for them. In this way, model 2 can be rewritten as: Here µZ = µ 1 Z y + µ 2 Z m + µ 3 Z w represents suitable adjustment for time trend from all observed time-varying covariates.
A big challenge comes from unobserved confounders. It is impossible and worthless for investigators to account for all potential time-varying confounders. A possible (but imperfect with computational burden) solution to this challenge is the calibration method suggested by Wang, et al. (Wang, Wang & Kindzierski, 2019). Please referred to the article for details of the calibration algorithm. The basic assumption of the calibration method is that estimated effect (β) can be divided into design bias (b), true effect (β), and random error (e), i.e.
It implies that regardless of whether the true effect (β) is zero or not, the design bias b will be a constant 1 .
Under this assumption, theoretically, we can correctly estimate the design bias b by permutation and completely eliminate it from point estimateβ. An unbiased p-value can be obtained by comparing the estimated effect from original data and null effects from permuted samples. Although this assumption may be questionable in practice, we still believe this calibration technique can be beneficial to us for bias reduction in both point estimate and significance test. We will show its performance by simulation.

Simulation Study Design
We use O 3 as an example to describe our simulation study design. The same design applied to all five air pollutants. Based on the 3,652-day study period (April 1, 2000 to March 31, 2010), we simulated hazard days of adverse events to create different scenarios in order to show the performance of the following modeling strategies.
For each scenario, we replicated 5,000 experiments. For each of these experiments: (a) 5,000 hazard days were sampled randomly with replacement from the study period with a probability proportional to exp(βO d reference days for each hazard day were selected using the time-stratified scheme, and (c) data were analyzed using the above four modeling strategies. All analyses were conducted in R (version 4.1.0, 2021-05-18).
We used the null model (β = 0) to compare the size (under the criterion α 0 = 0.05) of the tests for each modeling strategy. All simulation scenarios under the null hypothesis used settings of γ 1 = γ 2 = γ 3 = γ = 0 or 0.2 2 . The purpose of the simulation analysis was to examine whether type-I errors of the case-crossover model scenarios were close to the nominal level of 50 in 1,000 replications.
We then used the alternative model to compare the power (under the criterion α 0 = 0.05) of the tests for each modeling strategy. Scenarios for this check were simulated by setting γ 1 = γ 2 = γ 3 = γ = 0 and β from 0, 0.1, · · · , 0.3. The purpose of this simulation analysis was to examine rejection rates of significance tests in 5,000 experiments under each scenario.

Empirical Study Design
Using Alberta Health administrative databases, we obtained historical records of hospital admissions for acute myocardial infarction (AMI, ICD-10 code I21-I22 or ICD-9 code 410) for Edmonton urban dwellers for the 10 years (April 1, 2000 to March 31, 2010). In total 9,811 admissions for AMI occurred for the 10 years among patients aged 20 or over and living in the metropolitan areas of Edmonton.
We explored potential transient associations between AMI hospitalizations and the five air pollutants (the same data used in the simulation study). Cohort and ten sub-cohorts populations were defined using the following admission characteristics: • the whole cohort (Whole), • males (Males), • females (Females), • subjects aged ≥ 65 (Age ≥ 65), • subjects aged < 65 (Age < 65), • subjects categorized as ST-Segment Elevation Myocardial Infarction upon hospitalization (STEMI), • subjects categorized as Non-ST Segment Elevation Myocardial Infarction upon hospitalization (NSTEMI), • subjects with pre-existing diabetes (Diabetes), • subjects with pre-existing hypertension (Hypertension), • subjects with a pre-existing irregular heartbeat (Dysrhythmia), and • subjects with the prehistory of heart disease (PIHD).
Records of the whole cohort and each sub-cohort were further divided into temporal subgroups defined by seasons (All season, Spring, Summer, Autumn, and Winter). For each subgroup, (1) hazard days were selected as the kth-day pre-case events (k = 0 − 4), and (2) reference days for each hazard day were selected using the time-stratified scheme. These data were analyzed using the traditional logistic regression model 1 and the proposed logistic regression model 2.

Simulated Study Results
Bias check and size check under the null model (with the setting of β = 0 and γ = 0 or 0.2) were conducted for each of the five pollutants. Point estimates from each of the four modeling strategies are depicted in Figure 2 and Figure 3, and rejection rates (size, under the criterion α 0 = 0.05) of significance tests in 5,000 experiments are reported in Table 1.   Table 1, we can see that in the scenarios with γ = 0, the size of model 1 (before calibration) is far less from the nominal level 5% in 5,000 replications (i.e., the 0.05 criterion), especially for CO, NO, and O3. Considering the point estimates in these situations are almost unbiased (Figure 2-panel A), we can conclude that the bias in size is mainly from correlation among model errors (i.e., overlap bias). It implies that, without control of the weekly time trend, model 1 provides overestimates of standard errors, leading to higher overlap bias. On the other hand, with suitable adjustment of the weekly time trend (model 2), the overlap bias is almost eliminated and significance tests are nearly unbiased. It appears that the calibration method provides reasonable and robust significance tests for both models.
The bottom panel of Table 1 shows that under the setting of γ = 0.2 there is a big problem in using model 1 (before calibration). The large size of significance tests in model 1 (before calibration) is from the uncontrolled weekly time trend, leading to biased point estimates (refer to Figure 3-panel A) as well as a higher correlation among errors of the model. In this scenario, we observe again the superpower of the calibration strategy in bias correction (refer to Figure 3panel B). The proposed logistic regression model 2 is nearly unbiased in significance tests both before and after calibration. The superior performance of model 2 roots from thorough control of the weekly time trend. In this case, the calibration method can do very little in further reducing bias for model 2.  Figure 2, we can see a difference in the variance of estimation bias: the range of top panels is roughly within ±0.06, while that of bottom panels is roughly within ±0.15. This can be explained by the difference between models 1 and 2. In model 1, the time-stratified scheme is used to control for yearly and monthly time trends, so the estimated effect is from the weekly time trend (i.e. weekly average afterl removing yearly and monthly time trends). In contrast, with control of yearly, monthly, and weekly time trends, the estimated effect from model 2 is that of the daily time trend (daily time series after removing yearly, monthly, and weekly time trends).  Table 2.

Comparing the top panels (A and B) and the bottom panels (C and D) in
We observed the following results from the power check panel of Table 2: (a) the power from model 1 before calibration was the poorest because of the overlap bias; (b) model 2 before calibration provided competitive power results with model 2 with calibration, and (c) model 1 with calibration was better than model 1 before calibration in test power but poorer than model 2 before calibration and model 2 after calibration.
From the bias check panel of Table 2, we see that: (a) model 1 (before or after calibration) gets biased estimates under alternatives; (b) model 2 (before or after calibration) gets nearly unbiased estimation; and (c) in all the experiments, the calibration provides smaller or similar estimation bias, regardless whether the assumption (5) holds or not.
In the estimates from model 1, only 793 (582) candidate associations were estimated as positive (negative) effects. Among them, there were 140 (39) candidate associations with p-values ≤ 0.05 (0.01). On the other hand, 634 (741) candidate associations were estimated as positive (negative) effects in the estimates from model 2. Among them there were 75 (10) candidate associations with p-value ≤ 0.05 (0.01). Using Bonferroni correction with a criterion α 0 = 0.05/1375, none of the candidate associations from model 1 or model 2 had a p-value less than the Bonferroni correction criterion (0.000036). The correlation between estimates from model 1 and estimates from model 2 was 0.605.
The 39 associations detected by model 1 with p-value ≤ 0.01 are reported in Table 3. From the table we can see an anomalous seasonal pattern: CO, NO, and NO 2 have positive effects in spring and negative effects in autumn; O 3 has a negative effect in winter and spring, and a positive effect in summer. It does not make sense that a hazard (an air pollutant) has a protective effect in one season while in another season has a harmful effect. This phenomenon suggests that the estimate of the traditional logistic model 1 contains some kind of seasonal trend. We believe this is because that model 1 is unable to control for the weekly time trend of an exposure time series.     On the other hand, with adjustment for the weekly time trend, model 2 tells a different story. The 10 associations detected by model 2 with p-value ≤ 0.01 are reported in Table 4. From the table, we do not observe any anomalous seasonal pattern as shown in Table 3. All pollutant-AMI associations were consistent, irrespective of season: nine estimated effects for CO, NO, O 3 or PM 2.5 were positive, and 1 estimated effect of NO 2 was negative.

Discussion
Key points of the paper are summarized as follows: • Using the ozone time series decomposition, we concluded that the time-stratified reference-select scheme lacks capability on control of weekly time trend; • New strategies were proposed to solve the bias challenge; -Modified logistic regression to control for weekly time trend (recommended); -Calibration approach for control of bias from unobserved covariates; • Comparison was done by simulation and empirical studies between the traditional logistic regression and the proposed approach; -The simulation study showed that control of weekly time trend can eliminate potential overlap bias as well as estimation bias; -The empirical study showed the new approach provides more reasonable association findings compare to the traditional approach.
Using decomposition of an ozone time series dataset, we scrutinized the capability of the time-stratified reference selection scheme to control for yearly, monthly, and weekly time trends when used in case-crossover studies. This scheme provides effective control of yearly and monthly time trends but not of a weekly time trend. We believe that lack of control of the weekly time trend is the dominant source of systematic bias in the existing case-crossover studies using the time-stratified reference selection scheme.
An inability to control for the weekly time trend will lead to overlap bias and possible bias in a point estimate. Modeling adjustment for the weekly time trend can be an effective way to reduce this systematic bias. Further model adjustments or calibration may be necessary if other potential time-varying confounders (observed or unobserved) have to be taken into account.
Through simulation, we show that the typical case-crossover study design (i.e., a time-stratified reference selection scheme combined with a traditional logistic regression) leads to an estimate of the weekly time trend effect, rather than the transient effect under investigation. The difference in results could be big between the typical approach and the proposed method. Based on it, we recommend that researchers be cautious and make necessary adjustments for the weekly time trend in both exposures and covariates in their case-crossover application studies.