Methods for Analyzing Hospital Length of Stay with Application to Inpatients Dying in Southern Thailand

This study investigated length of stay (LOS) for patients who died in hospital in Southern Thailand from 2000 to 2003 with respect to principal diagnosis and demographic, geographic and hospital size factors. The computerized data of 40,498 mortality cases were obtained from the Ministry of Public Health from 167 hospitals in 14 provinces of Southern Thailand between October 2000 and September 2003 with information on age, gender, principal diagnosis, province and hospital size. Logistic and linear regression with log-transformed LOS were used to analyse the data. Patients with injuries as principal diagnosis had shortest LOS, whereas cancer patients had the longest LOS. Older patients, particularly females, had higher LOS for all diagnoses. LOS increased with hospital size except in the North and North West. Small hospitals in the South West region had the lowest LOS whereas large hospitals in the North West had the highest. The highest proportion of bed days (11.2%) occurred for males aged less than 60 diagnosed with infectious diseases. Males aged less than 60 and diagnosed with injuries or digestive diseases, and aged at least 60 diagnosed with COPD, and aged less than 60 diagnosed with infectious diseases, accounted for more than double those for female patients in the same disease groups. Both logistic regression with LOS at least 1 week as the outcome and linear regression on appropriately log transformed LOS gave consistent results. Providing suitable palliative care or allowing patients to select the place for spending their final time of life, especially for patients with chronic diseases, can reduce hospital resource utilization.


Introduction
Length of hospital stay (LOS) is a common parameter used to indicate health resource utilization, health care cost and severity of disease (Li, 1999;Wang et al., 2002;Lee et al., 2003). Based on a literature review Martin and Smith (1996) concluded that patient demographics and hospital characteristics were the two major factors that determine patient LOS. Among demographic characteristics, studies have reported that LOS varied according to age and disease group (Goldfarb et al., 1983;McMullan et al., 2004), whereas among hospital characteristics, LOS has been reported to vary by region, hospital size, and health care service (Health Technology Case Study 24, 1983;Xiao, 1997;Clarke, 2002). LOS can be terminated by cure, transfer or death discharge. Many studies have been concerned only with LOS for patients with cured discharge (see, for example, Cabre et al., 2004). However, hospital stay terminated by death is also an important outcome event. Patient care in hospital is the most expensive way of providing palliative care (Huang et al., 2002). Longer stays are more likely to indicate physicians' decisions or administrative inefficiencies than patients' need (Brownell & Roos, 1995). Reducing LOS is health policy in many countries (Clarke & Rosen, 2001). Setting up a proper palliative care or providing opportunity for patients to decide where they want to stay during the last stages of their life can reduce unnecessary hospital LOS. For inpatients spending their final days in hospital in developed countries such as Canada, England and Belgium, LOS has been studied by Huang et al (2002), Dixon et al. (2004) and van den Block et al. (2007), but similar studies for developing countries are limited.
The development of LOS models for patients dying in hospital will be useful for hospital management, particularly for prioritizing health care policies and improving health services, including the most appropriate allocation of health resources according to differences in LOS with respect to patients' health conditions and demographic and geographic factors.
It is also important to develop appropriate methods for the statistical analysis of LOS data. Generally, LOS has a highly positive skewed distribution (Liu et al., 2001;Lee et al., 2003) and some patients stay in hospital for a very long time. Using linear regression to predict LOS is likely to seriously violate the assumptions of the model (Li, 1999). Various methods have been suggested to handle outliers. For caesarean delivery LOS Lee et al. (2003) used median regression. Marazzi et al. (1998) examined the adequacy of models based on lognormal, Weibull and gamma distributions and found that the lognormal model was most appropriate but cases with LOS less than one day were omitted from their analysis because of computational problems when the LOS is zero.
In this study, two methods were used to handle skewness in the LOS distribution: (a) logistic regression with LOS 7 days or more taken as the outcome, and (b) linear regression on the natural logarithm of LOS after adding an appropriate constant to cope with LOS equal to zero. We thus examined the variation in LOS with respect to principal diagnosis, demographic, and geographic and hospital size factors for patients dying in hospital in Southern Thailand over the 4-year period from 2000 to 2003. This analytical framework provides a straightforward methodology for modeling positively skewed outcomes with a high proportion of zero occurrences.

Data source and variables
The data for this study comprised 41,134 cases of mortality records routinely reported to the National Health Security Office (NHSO) of the Ministry of Public Health from hospitals in 14 provinces of Southern Thailand according to date of discharge during the 4 fiscal years from October 2000 to September 2003. We excluded 636 cases with no information on patient demographics or principal diagnosis. As gender and age by themselves are known to be insufficient to adequately model LOS (National Center for Health Statistics, 1973), we also used principal diagnosis group and hospital location and size. Two nominal explanatory factors were defined: (1) the combination of diagnosis, gender and age group (54 categories), and (2) the combination of region and hospital size (19 categories). Age and gender were grouped into six categories by dividing age into three groups: 0-59 years, 60-74 years and 75 and over. Principal diagnosis using ICD 10 was regrouped into nine broad classes: injuries, digestive diseases (DD), unspecified septicemia (ICD-10 code A41.9), other infectious diseases (ID), chronic obstructive pulmonary diseases (COPD), respiratory infection (RI), cardiovascular diseases (CVD), cancer, and other diseases. The 14 provinces were reduced to seven regions as follows: Chumphon and Ranong (N), Surat Thani (ST), Phangnga, Phuket and Krabi (NW), Nakhon Si Thammarat (NST), Satun and Trang (SW), Songkhla and Phattalung (CS) and Pattani, Yala and Narathiwat (SE). These seven regions were than combined with hospital size (60 or fewer beds, 61-499 beds, 500 beds or more) to give the second factor as shown in Table 1.

Statistical methods
Two statistical models were used to examine the influences of the two factors on length of stay for patients who died in hospital.
In the first model LOS was treated as a binary variable with patients staying in hospital at least 7 days as the outcome of interest. Logistic regression (Hosmer & Lemeshow, 2000;Kleinbaum & Klein, 2002) was then used to estimate the proportion p ij of these outcomes in diagnosis-demographic group i and region-hospital size group j using the model (1) To avoid over-specification of the parameters, for each factor the category having the largest group size was taken as the referent with corresponding parameter 0. To calculate the adjusted prevalence • i p for category i of the first factor, the term β j in equation (1)  Similarly, to calculate the adjusted prevalence j p • for category j of the second factor, the term α i in equation (1) was replaced by a constant α 0 , again chosen to ensure that the sum of the expected number of outcomes equaled the total observed. In the second model LOS was treated as a continuous outcome, by taking its national logarithm after adding a constant d to handle zero days stay, giving the transformed outcome y. The linear regression model is thus similar to equation (1), namely ( Estimates of LOS for different levels of the first factor after adjusting for the second factor were calculated by replacing β j in equation (1)  Confidence intervals for these parameters were obtained by using the standard errors obtained through fitting each model.
All statistical analysis was carried out using the R program (R Development Core Team, 2007).

Results
Median LOS and proportions staying seven days or more are given in Tables 2 and 3 for each determinant. The largest number of patients (8,461) occurred in the cardiovascular disease group. There were 6,261 and 5,698 patients in the other infectious disease and injuries groups, respectively. The number of male patients in the digestive disease, COPD and respiratory infectious disease groups were close to double the number of female patients in the same groups. The shortest median LOS were found in the injuries group among males aged less than 75 and females aged less than 60 where 56 percent had LOS less than 7 days. The highest median LOS was found in the cancer group among females aged 75 and over. More than half the patients with cancer had LOS at least 7 days. The South East region had the highest proportion of patients staying 7 days or more, followed by Nakhon Si Thammarat and the Central South. The highest median LOS occurred in the North West region with more than one third of LOS at least one week. Figure 1 shows a histogram of the overall distribution of LOS after transforming the data by adding d = 0.4 and taking natural logarithms. The fit to this log-normal distribution is good for LOS greater than 1 day (apart from three high outliers at 1,179, 1,830 and 2,741 days). However, the lognormal distribution, being a continuous curve, cannot accurately accommodate data where LOS takes discrete values 0 or 1, although if these shorter lengths of stay were coded in hours rather than days they might well be accommodated in the model curve. Note that if the model is to be used to provide estimates of hospital costs, it is more important to accurately model longer stays.
The data were grouped by principal diagnosis, age, gender, region and hospital size from 40,498 individual cases into 1,026 records. Figure 2 shows a scatter plot of these observed counts and fitted values of LOS at least one week in the left panel and a residual plot in the right panel. The model predicts the proportions in the 1026 cells quite well, as shown in the residuals plot. However, the model could not be expected to accurately predict individual LOS. Figure 3 shows the fitted prevalence of LOS of at least 1 week based on the logistic regression model. The dotted horizontal line on each graph gives the overall prevalence of patients having LOS at least 7 days, (30.9 percent). The top panel shows the prevalence of LOS at least 1 week by diagnosisdemographic groups after adjusting for the geographic-hospital size factor. Males aged less than 60 years admitted to hospital from injury comprised the referent group. The increasing trend of LOS at least one week in older age groups occurred in all disease groups except females diagnosed with digestive disease. The lowest LOS occurred among males aged less than 60 diagnosed with injuries while the highest LOS occurred among females diagnosed with cancer aged 75 and over.
The graph in the lower panel of Figure 3 shows the prevalence of LOS at least 1 week by the geographic-hospital size factor after adjusting for the diagnosis-demographic factor. The large hospital in Nakhon Si Thammarat was taken as the referent group. Patients from all hospital sizes in the North, Surat Thani, South East, and from small and medium sized hospitals in Nakhon Si Thammarat and small hospitals in the Central South were less likely to have LOS at least 1 week. Patients from large hospitals in the Northwest were more likely to have LOS at least 1 week. The increasing trends of LOS with hospital size appeared in all regions except for the North and the North West.
The second method involved fitting a linear model to the log-transformed outcome ln(0.4+LOS). Figure 4 shows a normal quartile plot for the standardized residuals from this model, and indicates that the normality assumption is reasonable for LOS greater than zero. However, the model gave an rsquared of only 7.8%, confirming that individual LOS cannot be accurately predicted. The components of this r-squared were 1.7% for age and gender alone and 6.6% for age-gender and principal diagnosis combined. Figure 5 shows estimates of LOS from the model. The dotted line on each graph gives the overall mean of LOS (8.9 days). The top graph shows LOS by the diagnosis-demographic factor after adjusting for the geographic-hospital size factor. It shows a similar pattern of increasing trend of LOS by disease group, gender and age group as Figure 3. Patients with injuries had the lowest LOS while patients with cancer had the highest LOS.
The graph in the lower panel of Figure 5 shows mean LOS for the geographic-hospital size factor after adjusting for the demographic-diagnosis factor. Again the patterns are similar to those given by the logistic model. Table 4 shows numbers of patients and corresponding expected proportions of bed days given by the model for each category of the diagnosis-demographic factor, thus giving a breakdown of expected contributions to the costs associated with patients dying in hospital in southern Thailand. Among patients aged less than 60, the highest proportion (11.2%) occurred in males diagnosed with infectious diseases while patients aged 60 to 74, males diagnosed with cancer, patients diagnosed with the CVD, and males diagnosed with COPD, respectively, had the highest proportions. The highest proportions for patients aged more than 74 years were found in the CVD group. Males diagnosed with injuries accounted for over 4% of all bed days. Males aged less than 60 diagnosed with injuries and digestive diseases, and aged at least 60 diagnosed with COPD, and aged less than 60 diagnosed with infectious diseases accounted for more than double those for female patients in the same disease groups. .

Discussion
This study analysed LOS using logistic and linear regression. The first model was used for predicting the factors associated with LOS at least 1 week. This method provided reasonable and explainable results and the patterns of LOS across diagnosis-demographic and region-hospital size were clearly illustrated from the plot of the prevalence (Figure 3). The linear model with log transformed LOS provided results consistent with the logistic model ( Figure 5). We used the log-linear model to calculate average LOS and total number of bed days as they cannot be calculated from the logistic model.
Spending the remaining time to death in the hospital increases hospital cost (Sashamani & Gray, 2004). In this study, LOS increased with age. Similar results were reported by Himsworth and Goldacre (1999), Brownell and Roos (1995) and McMullan (2004). The simple explanation is that that older patients tend to take longer to recover from disease, and most have chronic disease whereas younger patents tend to have acute disease with shorter duration. However, our result contrasts with the finding by Dixon et al. (2004) that average LOS did not increase with age. The longest LOS was found in females aged at least 75 diagnosed with cancer. Patients with injuries as principal diagnosis had shortest average LOS. Reducing LOS is a policy for many health care systems (Clarke & Rosen, 2001). Brownell and Roos (1995) indicated that longer stays in hospital were more likely to reflect physicians' decisions or administrative inefficiencies than patient needs, and LOS among patients dying from injuries could not be managed due to the short stay and uncertainty of the discharge status. However, hospital LOS among cancer patients can be reduced by providing palliative care or providing the opportunity for patients to decide where they want to stay during their last stage of life.
LOS variation by region and hospital size has been reported in many studies (List et al., 1983;Xiao, 1997;Clarke, 2002). In our setting, small hospitals in the South West region had the lowest average LOS whereas large hospitals in the North West had the highest average LOS.
Beside average LOS, total numbers of bed days provide useful information on the financial cost of patients who die in hospital. Thus the use of health care services and resources can be identified by considering this aspect, allowing health planning and budget allocations to be fairly distributed to the most suitable groups. In the United States, hospital inpatient stays accounted for nearly one-third of total health care expenses (Machlin & Carper, 2004). Our results show that males aged less than 60 diagnosed with infectious diseases accounted for 11.2% of the total, whereas females in the same group accounted for only 5.4% of the total. The higher proportion in males is possibly due to excess deaths for males from HIV/AIDS (Lim & Choonpradub, 2007). Excess male mortality from other diseases related to HIV/AIDS such as tuberculosis and pneumonia were also reported by Rumakom et al. (2002). Males who died from injuries aged less than 60 accounted for 4% of total bed days, possibly due to traffic accidents with higher mortality in males than females (Bureau of Policy and Strategy, 2002).
A strength of this study is that data were taken for 4 years and from every hospital size in every province in Southern Thailand. Gustafson (1968) mentioned the poor performance of linear regression analysis for LOS due to the small sample size. In our study the sample size was large enough for accurate analysis with each statistical method. Extremely positive skewed data and zero days stay were not omitted from the analysis. Outliers can give vital information for health services management in several aspects such as resources allocation or assessment of health intervention (Marazzi, 1998). While information concerning physician characteristics for data used in this study was limited, this may not have unduly biased the results of our study. Westert et al. (1993) noted that the difference in LOS between treating physicians within the same hospital was much less than the difference between hospitals.
In conclusion, providing suitable palliative care or allowing patients to select the place for spending their final time of life, especially for patients with chronic diseases, can reduce hospital resource utilization.