Segmented Linear Regression Models for Assessing Change in Retrospective Studies in Healthcare

Introduction In retrospective studies, the effect of a given intervention is usually evaluated by using statistical tests to compare data from before and after the intervention. A problem with this approach is that the presence of underlying trends can lead to incorrect conclusions. This study aimed to develop a rigorous mathematical method to analyse temporal variation and overcome these limitations. Methods We evaluated hip fracture outcomes (time to surgery, length of stay, and mortality) from a total of 2777 patients between April 2011 and September 2016, before and after the introduction of a dedicated hip fracture unit (HFU). We developed a novel modelling method that fits progressively more complex linear sections to the time series using least squares regression. The method was used to model the periods before implementation, after implementation, and of the whole study period, comparing goodness of fit using F-tests. Results The proposed method offered reliable descriptions of the temporal evolution of the time series and augmented conclusions that were reached by mere group comparisons. Reductions in time to surgery, length of stay, and mortality rates that group comparisons would have credited to the hip fracture unit appeared to be due to unrelated underlying trends. Conclusion Temporal analysis using segmented linear regression models can reveal secular trends and is a valuable tool to evaluate interventions in retrospective studies.


Introduction
e National Health Service has a strong culture of quality improvement [1]. A good evidence base is needed to drive this process, and these data are often derived from retrospective studies and audits [2].
An important part of the audit cycle is comparison of data before and after a given intervention [3]. Typically, mean values or ranks of outcome measures before and after the intervention are compared, and differences are tested statistically for significance.
is method assumes stationarity of data which is often not the case. Nonstationarity of data is difficult to distinguish from causal change, and group comparisons are unable to distinguish underlying (secular) trends from intervention-induced change. is can lead to erroneous conclusions [4][5][6] and is illustrated by a simulated example shown in Figure 1. is shows a hypothetical intervention that made no contribution to a change. Group comparison would attribute this to the intervention (p � 0.0002). While this is obvious in this example, more subtle trends obscured by highly variable data are more challenging to identify. Researchers attempt to limit the influence of trends by selecting data closer to the point of the intervention, but this does not annul the effect of a secular trend when one is present [6]. It also sacrifices much of the data, although it should be noted that data spanning long time intervals are more susceptible to confounding parameters that are external to the purpose of the study.
Alternative approaches that accommodate trends use interrupted linear or higher-order regression, but these suffer from bias because the interruption is chosen at the point of the intervention [7,8]. Randomised controlled trials avoid many of these issues but are often expensive and time consuming. Consequently, any method that can improve the reliability of retrospective studies has wide potential application to the wealth of data that such studies make available at relatively low cost. e aim of this study was to describe a novel method of modelling the temporal variation of a time series of data to help identify secular trends in retrospective studies. Such a method should reliably reveal underlying trends spanning the entire study period, without bias to the intervention. Importantly, there should be a mathematically robust technique to determine whether a trend in the data is significant or due to random variation. We developed and applied a novel method to analyse the temporal variation in hip fracture outcomes in a retrospective study that aimed to evaluate the implementation of a dedicated hip fracture unit (HFU).

Hip Fracture
Unit. In July 2015, a level 1 Major Trauma Centre (MTC) in the United Kingdom established a dedicated HFU to free up beds within the MTC. e HFU was located in a nearby district general hospital. Services were reconfigured to include ambulance triage, daily consultantled dedicated theatre lists for proximal femoral fractures, and coordination of the necessary multidisciplinary team of staff at the district hospital. ese changes were all instituted simultaneously on July 1 st , 2015.

Study Group.
We studied 2777 patients who sustained proximal femoral fragility fractures. Of these, 2117 patients (2176 fractures) sustained their fractures prior to commencement of the HFU (study period April 2011-June 2015), and 660 patients (671 fractures) were treated after the introduction of the HFU (study period July 2015-September 2016).

Data.
From a retrospective review of patient notes, the following data were obtained: (1) Time to surgical intervention (hours from time of admission) (2) Median length of hospital stay in days (LOS) (3) Mortality rate at 30 days, 120 days, and 365 days Patient demographic data were also collected. Both hospitals were part of the same trust, and as such the same sources of data collection were used: "e-Oasis" and "Symphony" databases.

Statistical Analysis.
Data before and after the intervention (introduction of the HFU) were compared. e data were analysed using both conventional statistical tests and temporal analysis using a novel model-fitting method. MATLAB (Natick, MA, USA) was used to apply segmented least squares regression to determine parameters of the nested models that are fitted to the time series [9,10]. e models were compared for goodness of fit using F-tests to determine the best-fitting model. e large number of data points (n > 2800) helped overcome limitations arising from the non-normality of data to validate the use of F-tests to compare nested models [11].
For conventional statistical tests, continuous variables (time to surgery and length of stay) were compared using Mann-Whitney tests while categorical data (30-day, 120day, and 365-day mortality rates) were compared using Fisher's exact tests. Demographic data were compared using Mann-Whitney tests (age) and chi-squared tests (gender, pathological fracture, ASA grade, and re-operation). ese tests were applied using the MedCalc software suite (Ostend, Belgium) [12]. Statistical significance was set at p < 0.05.

Segmented Linear Regression Model Fitting.
In this study, we used an adaptation of a segmented linear regression technique previously published by one of the authors (EMV) in the context of learning curve modelling [9,13].
We used least squares regression to fit a set of four progressively more complex models to data of the time series. Our models ranged from a simple plateau to two adjoining straight lines (linear splines) with a single knot at the point in time that minimised the sum of the square of residuals. is range of models accommodates a respectable degree of complexity that can capture changing trends in the study period but also maintains simplicity and meaningfulness in the descriptions of trends. e four models, in increasing order of complexity, are as follows: (i) Plateau model. is is the simplest model. It is a horizontal line (plateau) at the average value of the outcome measure over the study period.   Computational and Mathematical Methods in Medicine where y � i y i /n. (ii) Line model. e next model is a straight line determined by least squares regression of the variable (y) on the time (t): where m � i t i y i / i t 2 i is the gradient and c � ( i y i /n) − (m i t i /n) is the y-axis intercept. (iii) Line-plateau (or plateau-line) model. is model is made up of two sections: a straight line adjoining a horizontal plateau at the kth value of t(t � t k ) or vice versa. e parameters of each section (gradient and intercept) are determined by least squares regression while the adjoining point (t � t k ) is determined as that which minimises the sum of squares of residuals (Appendix A). e line-plateau model is given by y i � m k t i + c k for i � 1 to k and y j � Y k for j � k + 1 to n where m k and c k are the gradient and y-axis intercept of the straight line, respectively, and Y k the value of the plateau.
e plateau-line model is given by y i � c k for i � 1 to k and y j � m k (t j -t k ) + c k for j � k + 1 to n where m k is the gradient of the straight line and c k the value of the plateau. (iv) Line-line model.
is model is made up of two sections: a straight line adjoining a second straight line at the kth value of t (t � t k ). e two straight lines are determined by least squares regression of the variable (y) on the time (t) using a closed form. e adjoining point (t � t k ) is determined as that which minimises the sum of squares of residuals (Appendix B). e resulting lines have equations y i � m1 k t i + c k (i � 1 to k) and y j � m2 k (t j -t k ) + y k (j � k + 1 to n) where m1 k and c k are the gradient and y-axis intercept of the first line and m2 k is the gradient of the last straight line.
After all four models have been fitted, we select the best model as the simplest one unless a more complex one fits the data significantly better. Significance is confirmed by F-tests that compare the sum of the squares of residuals between models. A simple tabular method listing the p values of Ftests conducted pairwise between all four models helps select the best model. is method is illustrated in Table 1. is technique (F-test as applied to nested models) accounts for and prevents, the risk of overfitting whereby higher-order models (e.g., line-line) will always demonstrate improved fit, but at the expense of additional parameters [14].

Application of the Method to the Study.
Our proposed method for selecting the best model was applied over three distinct periods of time: (i) before the intervention (pre-HFU), (ii) after the intervention (post-HFU), and (iii) the entire time series (including both pre-HFU and post-HFU periods). e model describing the entire time series was subsequently compared to the models for the separate pre-HFU and post-HFU periods using F-tests to determine whether separate models offered a significantly better fit.

Time to Surgery.
Conventional tests: median time to surgical intervention decreased after introduction of the HFU. is change was not significant (21.51 hours pre-HFU to 20.75 hours post-HFU, Mann-Whitney test: p � 0.150).
Temporal analysis: the best-fitting model for the entire data set was the line-plateau (Figure 2), illustrating that time to surgery decreased during the initial period of the study, well before the introduction of the HFU. e line-line model was the best model for the period pre-HFU and showed a decreasing trend followed by an increasing trend. e best model post-HFU was the plateau, taking a value at about the average plateau of the line-plateau model fitted on the entire study period. Separate models for the periods before and after the intervention did not offer a better fit compared to using a single model for the entire time series (p � 0.100). As a result, a single model over the entire period offered the best temporal description, suggesting that a decrease in the time to surgery occurred early in the study period, well before the intervention.

Length of Stay.
Conventional tests: median length of stay (LOS) decreased after introduction of the HFU, though not significantly (15 days pre-HFU to 14 days post-HFU, Mann-Whitney test: p � 0.410).
Temporal analysis: the best-fitting model for the entire time series was the line-line model ( Figure 3). It demonstrated a slowly decreasing rate in LOS followed by a sharply decreasing rate in LOS that started about half a year after the introduction of the HFU. e onset of this sharp change in the rate explained the difference in median LOS pre-and post-HFU. Separate pre-HFU analysis and post-HFU analysis demonstrated the single line to be the best model in both. Separate models offered a better fit than the overall model (p � 0.017) confirming that the intervention (HFU) contributed to change by causing an acceleration in the decrease of LOS.
best model is the one that has a row of uninterrupted significant p values (p < 0.05) stretching furthest to the right. In the case of a tie, the lower-order model is preferred. If no significant p values feature in the table, the plateau is chosen as the best model.

Computational and Mathematical Methods in Medicine 3
Temporal Analysis: e single line model was the best model when analysing the entire time period, suggesting that 30-day mortality rate followed a declining trend throughout the study period ( Figure 4). Separate pre-HFU and post-HFU analysis demonstrated that the plateau fit best in both. Separate models did not offer a better fit compared to the overall model (p � 1.000). We concluded that 30-day mortality most probably followed a declining trend over the entire study period which appeared to be responsible for the significant reduction found using group comparison. is means that the significant reduction in mortality rates from pre-intervention to post-intervention cannot be simply credited to the intervention.
Temporal analysis: the single line was the best model for the entire time period ( Figure 5). Separate pre-HFU and post-HFU analysis demonstrated the line model fit best for both. Separate models did not offer a better fit compared to the model for the entire period (p � 0.187). e 120-day mortality rate appeared to follow a declining trend that spanned the entire study period, and the HFU did not appear to have caused a reduction in this measure.

365-Day Mortality.
For this outcome measure, post-HFU data are limited to a smaller sample (n � 316 instead of n � 677) due to fewer patients having been followed-up.
Conventional Tests: ere was a small and non-significant decrease in 365-day mortality (21.46% pre-HFU to 20.57% post-HFU, Fisher's Exact test: p � 0.769) Temporal Analysis: e single line model fits the entire time series best ( Figure 6). Separate pre-HFU and post-HFU analysis demonstrated the best fit models were the line and plateau, respectively. Separate models did not offer a better fit when compared to the overall model (p � 0.380). is suggests that a change in one-year mortality was not due to the intervention. Instead, temporal analysis suggested that it followed a decreasing trend that spanned the entire study period. Other causal factors may be implicated in the background changes in 30-day, 120-day, and 365-day mortality rates (Discussion).

Patient
Demographics. Patient demographics were analysed before and after implementation of the HFU ( Table 2). ere were two statistically significant differences: post-HFU, the rate of pathological fracture was less and the distribution of ASA grade was different.
ese are likely confounding variables which may have also influenced secular trends, further emphasising the importance of temporal analysis in retrospective studies.

Discussion
In this study, temporal analysis provided important insight into the observed changes before and after the intervention. Importantly, it demonstrated that some of the changes in the outcome measures were likely due to trends that occurred over a longer time period, independent of the intervention. In modelling the temporal effect of patient mortality, we used simple linear regression instead of logistic regression. We did this because the latter is better suited when dependence on a number of covariates is sought. As we were evaluating the time dependence of mortality rate, simple linear regression offered the advantage of yielding results in a directly interpretable form.
We retrospectively investigated factors that could explain the observed trends as this would help us in evaluating the reliability of the modelling method. We found several factors which are listed below: (1) Tariff.
e Best Practice Tariff (BPT) was implemented nationwide in April 2010. One criterion was time to surgery within 36 hours (decreased from 48 hours previously) [15].
is could explain the falling trend in time to surgery observed in the early phase of the study in 2011 and 2012.
(2) Orthogeriatrician. e first orthogeriatrician was appointed to the trust in 2006. ree more have since been appointed and are supported by junior staff.  Computational and Mathematical Methods in Medicine 5 hemiarthroplasty was preferred in our trust after 2010 and was shown to decrease length of stay in a separate study [16]. Temporal analysis demonstrated that the HFU significantly accelerated the rate of decrease in length of stay, yet this was missed in group comparisons using rank tests. e change in use of hip implants at our trust between 2009 and 2011 has been shown to decrease length of stay [16], possibly explaining the underlying improvement in this variable before the introduction of the HFU. Improvements in anaesthetic use may have also contributed to this. e underlying decreasing trend in 30-day mortality represented by the single line model over the entire study period and the fact that separate models did not offer a significantly better fit meant that the decrease in 30-day mortality was probably due to a secular trend. Using group comparison tests alone would have misattributed this to the HFU. Temporal analysis of 120-day and 365-day mortality rates similarly demonstrated that an underlying decrease throughout the entire study period was the reason for any observed reduction when using group comparison tests. Interestingly, the post-HFU model for 120-day mortality showed a rapid decline possibly because of a small initial rise subsequently regressing to its normal trend. e secular trends demonstrating improvement in mortality rate may have been a result of some or all of the aforementioned improvements, although other changes not identified here may have contributed as well.

Evaluation of the Proposed Method.
e chosen set of models balance simplicity with the complexity that is required if a single straight line cannot capture underlying trends. e decision to enforce joined segments in models (iii) (lineplateau) and (iv) (line-line) accounted for the expectation that any change within the fitted period is expected to be gradual.
Discontinuity in the variable, such as when an intervention results in a sudden change, can be accommodated by fitting separate models in the phases before and after the intervention. By using linear segments (as opposed to higher-order curves), our models yield meaningful parameters such as plateau values and rates of change. Furthermore, the models are nested which allows statistical comparison with a rigorous tabular method for selection of the best model. e best model offers the most reliable description of the temporal variation of the data.
Our method examined whether the intervention was significant by considering the periods before and after an intervention as separate but also as part of the whole study period. If separate models fit the data significantly better, we concluded that the intervention was significant in bringing change.

Comparison with Other Methods.
e simplest form of temporal analysis is a visual inspection of the data of the time series, though this is of little practical value when change is small and obscured by data featuring high variability. Regression analysis is more reliable in revealing the presence of trends and has been used extensively in time series analysis [18].
An adaptation to linear regression is the interrupted time series (ITS) analysis where separate regressions are attempted for the periods before and after the intervention, allowing for discontinuities that could be due to the intervention [19]. is is a well-established method to test the hypothesis that an intervention causes a significant change in the outcome measure over time. However, fitting separate models before and after the intervention biases the method toward finding change at the intervention, and this may not offer the most effective description of how the time series evolved and what secular trends existed (or how these changed) throughout time. Most applications of the ITS method do not compare separate pre-and post-intervention lines with a single model for the entire time period, nor do they envisage using more than one linear segment for each section.
is can miss more complex trends [7,18]. For example, a change in the outcome measure that occurs well before or after the intervention, can be missed by ITS, which would credit the change to the intervention.
To contrast our proposed method with the ITS, we applied ITS analysis to one of our outcome variables: the time to surgery (Figure 7). e resulting intercept change and slope change of the ITS at the intervention marginally miss significance (change in slope (p � 0.0532) and change in intercept (p � 0.0812)). Nonetheless, as ITS focuses its attention at the intervention, the analysis would have deemed the HFU ineffective while missing the important reduction in time to surgery early in the study. F-tests demonstrated significantly improved goodness of fit when using our proposed method, compared to the ITS (p < 0.0001). is is not surprising given that our method enabled model change at different time points, yielding a lower sum of the square of residuals. In this scenario (time to surgery), our method concurred with the conclusion when using ITS analysis, but the two methods are fundamentally different, and their conclusions may not agree in all cases.
Indeed, our method did not solely evaluate the effect of an intervention (as the ITS does) but described the overall trends within the study period, as well as providing a method to test the significance of change at the intervention.

Limitations.
e first limitation of the proposed method is that it cannot capture change that is more complex than envisaged by two adjoining segments, (for example, multiple segmental trends or exponential trends) [18,20]. In principle, the method can be extended to incorporate three or more segments, but in embracing higher complexity, one risks yielding unnecessary and meaningless information. Expanding to more complex models should be undertaken with caution.
Second, the method requires dedicated computer programming, as most statistical and spreadsheet software do not envisage the fitting of segmented linear regression with variable adjoining points between the segments. Consequently, a basic level of computer programming is required, which limits its use to more advanced healthcare analysts.
Another important limitation is in using F-tests to compare the goodness of fit of the various models when residuals do not meet the assumption of being normally distributed [11]. While this may have an effect on the reliability of p values that are obtained from the F-distribution, particularly when applying the method to mortality rates, it affects all models equally and should not decrease the effectiveness of the method in selecting the best model. Indeed, setting significance at 5% is imposing an arbitrary threshold which affects model selection, and different significance levels will undoubtedly yield different best models in marginal situations.

Conclusion
We proposed a novel, systematic method to model the temporal variation of a time series based on segmented linear regression. Its application to a real healthcare intervention demonstrated the method's ability to identify and describe trends over the study period, without bias to the intervention. We described a mathematically rigorous technique to determine whether trends are significant. e method offers a reliable tool in evaluating interventions and in detecting change, improving the information that can be drawn from retrospective studies.

A. Derivation of the Least Squares Parameters for the Line-Plateau and Plateau-Line Models
e line-plateau model is a straight line y i � m k t i + c k for i � 1 to k adjoining to a plateau y j � Y k for j � k + 1 to n. m k and c k are the gradient and y-axis intercept of the straight line, respectively, and Y k the value of the plateau. For each k from k � 1 to n, parameters m k and c k are found so as to minimise R, the sum of the squares of residuals: where y i are the ordinates of the points of the data series (t i , y i ) and y i are the predicted values of the model y i � m · t i + c, i � 1 to k and y i � Y, i � k + 1 to n. Least squares fitting requires that zR/zm � 0 and zR/zc � 0 which together give the following matrix equation: Least squares fitting requires that zR/zm � 0 and zR/zc � 0 which yields the following matrix equation: where G � n i�k+1 t 2 i − 2t k · n i�k+1 t k + (n − k)t 2 k , H � n i�k+1 t i − (n − k)t k , L � n i�k+1 t i y i − t k · n i�k+1 y i , and F � n i�1 y i . Equation

B. Derivation of the Least Squares Parameters for the Line-Line Model
In the line-line model, a first straight line with equation y i � m1 k t i + c k (i � 1 to k) adjoins a second straight line with equation y j � m2 k (t j -t k ) + y k (j � k + 1 to n).
For each value of k, parameters m1, c, and m2 are found that minimise R: (B.1) Least squares fitting requires that zR/zm 1 � 0, zR/zm 2 � 0 and zR/zc � 0 which results in the following linear equation: n i�k+1 y i , and E � k i�1 t i y i + t k · n i�k+1 y i . Equation (B.2) can be solved using linear algebra to yield values for m1, m2, and c and the process is repeated for all k to find the set of values [m1 k , c k , k, m2 k ] that yields the s minimum value of R for k � 1 to n.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare no conflicts of interest.