Attempt to Raise the Predictive Accuracy in Binary Logistic Regression Analysis

Objective: It is necessary to improve the predictive accuracy of binary logistic regression analysis. This study aimed to clarify whether binary logistic regression analysis using Functional Independence Measure (FIM) gain (a 0/1 binary value) as a dependent variable increases the predictive accuracy when FIM at admission (FIMa) is categorized or when multiple predictive formulae are created. Methods: The study population consisted of 2,542 stroke patients admitted to convalescent rehabilitation wards in Japan. We compared the predictive accuracy of FIM gain between a formula using FIMa as quantitative data (A), a formula that categorized FIMa into 4 groups (B), and two predictive formulae (C). Result: The predictive accuracy of these formulae, in descending order, was found to be C (76.3%), B (76.0%), and A (68. 4%). Conclusion: Even more than using FIMa as quantitative data, the predictive accuracy of FIM gain was heightened by either categorizing FIMa into 4 groups or by creating two predictive formulae.


Introduction
Many reports have used Functional Independence Measure (FIM) [1] gain (FIM at discharge minus FIM at admission) as the dependent variable in multiple linear regression analysis [2]. Binary logistic regression analyses have also been carried out using 1 for FIM gains equal to or greater than the median value and 0 for FIM gains less than the median value [3][4][5][6][7][8][9][10][11]. The deliberate conversion of quantitative FIM gains into 0/1 binary data is thought to be advantageous in that this does not require as much rigor in terms of the type or distribution of data [12].
While multiple regression analysis envisions a linear relationship between independent variables and dependent variable, there are in fact many cases where no such linear relationship exists. Especially, there is no linear relationship found between FIM at admission and FIM gain [13]. Accordingly, it has been reported that, rather than relying on a single predictive formula, the predictive accuracy of motor FIM (mFIM) gain will be increased by creating two predictive formulae by stratifying mFIM scores at the time of admission (mFIMa) into two groups [14].
In binary logistic regression analysis, as well, stratifying mFIMa to create two predictive formulae may improve the predictive accuracy of mFIM gain. In addition, because it is possible to categorize independent variables in binary logistic regression analysis [12], it may also be possible to heighten the predictive accuracy of mFIM gain by categorizing mFIMa.
This study conducted binary logistic regression analysis with mFIM gain as dependent variable among stroke patients admitted to convalescent rehabilitation wards in Japan. The aim of this study was to compare the predictive accuracy of mFIM gain (a 0/1 binary value) between "mFIMa used as quantitative data", "categorized mFIMa into 4 groups, and "creation of two predictive formulae".

Subjects and Methods
We used patient data from the Japan Rehabilitation Database (JRD) [15]. The subjects were selected from 6,322 stroke patients hospitalized in convalescent rehabilitation wards and registered with the JRD in April 2015. To reduce the influence of exceptional cases that could be seen as outliers, the subjects were limited to patients who fulfilled the following inclusion criteria: age 15 to 99 years, duration from onset to hospital admission of 5 to 90 days, admitted to convalescent rehabilitation wards for 21 to 210 days, total score of 13 to 90 for mFIMa, FIM gain of 0 or higher, and having entries for all items to be examined. The remaining 2,542 patients were included in this study ( Figure 1).

Study 1: Predictive formula using mFIMa as quantitative data
Binary logistic regression analysis with mFIM gain as dependent variable and the six independent variables consisting of age modified Rankin Scale before onset, days from onset to admission, mFIMa (quantitative data), cognitive FIMa, and length of stay in hospital. The dependent variable of mFIM gain, as in previous studies [3][4][5][6][7][8][9][10][11], was input as 1 for scores equal to or greater than the median value and 0 for scores less than the median value. Specifically, as the median value for mFIM gain was 18 points, mFIM gains of equal to or greater than 18 points were entered as 1, while those from 0 to 17 points were entered as 0.

Study 2: Categorization of mFIMa into 2 groups or 4 groups
mFIMa (independent variable) was categorized into four groups; 13-21 points, 22-30 points, 31-60 points, and 61-90 points. A predictive formula was created using the same independent variables and dependent variable as in study 1. The difference was while mFIMa was quantitative data in study 1, it was categorized in study 2. Study 3: Two predictive formulae using stratified mFIMa scores mFIMa was divided into two groups of 13-30 and 31-90 points. And two predictive formulae were created. The predictive accuracy of the predicted values for mFIM gain (a 0/1 binary value) obtained in studies 1, 2, and 3 were then compared. Table 1 shows the basic characteristics of this study. The median value for mFIMa was 46 points and the median value for mFIM gain was 18 points.

Discussion
Predictive accuracy was found to be highest in (1) creation of two predictive formulae (76.3%), followed in descending order by (2) the formula in which mFIMa was categorized into 4 groups (76.0%), and (3) the ordinary formula using mFIMa as quantitative data (68.4%).    In reports of binary logistic regression analyses with FIM gain as dependent variable, FIMa was always used as quantitative data (  [3][4][5][6][7][8][9][10][11]. The only exception [4] produced a predictive formula using patients with mFIMa scores of less than 50 points. However, we were unable to find any reports comparing the predictive accuracy of the respective techniques of using FIMa as quantitative data, FIMa, as categorized data, and creating two predictive formulae by stratifying FIMa.  Table 4: Reports which used binary logistic regression analysis to predict FIM gain (Abbreviations: FIM, Functional Independence Measure; mFIM, Motor FIM; MRFS, Montebello Rehabilitation Factor Score. Motor FIM was not used as an independent valiable in two reports [8][9][10][11].

Reports
The following considerations may be raised as limitations of the present study. First, the results of categorization and stratification will differ according to the number of divisions made and at which scores. Second, while predictive accuracy is listed in three reports [4][5][6], the fact that the subject populations differ means that comparison of predictive accuracy between these reports is not possible.
Which factors other than mFIMa are effective to categorize and to what extent the predictive accuracy of mFIM gain can be improved by combining the categorization of various factors are challenges for future study.