Importance of time in therapeutic range on bleeding risk prediction using clinical risk scores in patients with atrial fibrillation

Bleeding risk with vitamin K antagonists (VKAs) is closely related to the quality of anticoagulation in atrial fibrillation (AF) patients, reflected by time in therapeutic range (TTR). Here we compared the discrimination performance of different bleeding risk scores and investigated if adding TTR would improve their predictive value and clinical usefulness. We included 1361 AF patients stables on VKA for at least 6 months. Bleeding risk was assessed by the HAS-BLED, ATRIA, ORBIT and HEMORR2HAGES scores. Major bleeding events were recorded after a median of 6.5 years follow-up. In this period 250 patients suffered major bleeds. Comparison of receiver operating characteristic (ROC) curves demonstrated that HAS-BLED had the best discrimination performance, but adding the ‘labile INR’ criteria (i.e. TTR <65%) to ATRIA, ORBIT and HEMORR2HAGES increased their ability of discrimination and predictive value, with significant improvements in reclassification and discriminatory performance. Decision curve analyses (DCA) showed improvements of the clinical usefulness and a net benefit of the modified risk scores. In summary, in AF patients taking VKAs, the HAS-BLED score had the best predictive ability. Adding ‘labile INR’ to ATRIA, ORBIT and HEMORR2HAGES improved their predictive value for major bleeding leading to improved clinical usefulness compared to the original scores.

Based on clinical trial cohorts, other risk scores have been proposed, to be valid in VKA or NOACs users by not considering 'labile INR' as a criterion. In clinical trials, however, patients are often carefully selected and followed up regularly, whereas AF patients in 'real world' clinical practice tend to be older, with associated comorbidities and polypharmacy.
In the present study, we have compared the four AF-validated bleeding risk schemas in a large 'real world' cohort of AF patients over a long period of follow-up. Second, we tested if the predictive values of ATRIA, ORBIT and HEMORR 2 HAGES scores, and their clinical usefulness could be improved adding a labile INR criterion (defined as TTR <65%).

Methods
From May 1, 2007 to December 1, 2007 we recruited consecutive patients with paroxysmal, persistent or permanent AF who had steady OAC with VKA (INR 2.0-3.0) for at least 6 months, in our single anticoagulation center from a tertiary Hospital in Murcia (Southeastern Spain). At baseline, all patients were receiving anticoagulation therapy with acenocoumarol (the commonest VKA used in Spain) and consistently achieved an INR between 2.0 and 3.0 during the previous 6 months (hence, TTR 100% for this cohort -to ensure baseline homogeneity and avoiding the bias produced by a low TTR at entry or initially unstable INRs especially in an inception cohort). Patients with prosthetic heart valves or AF due to mitral valve stenosis, recent acute coronary syndrome (ACS), stroke (ischemic or embolic), or any hemodynamic instability that led hospital admission or surgical intervention in the preceding 6 months were excluded.
At baseline, a complete medical history was recorded and stroke risk (CHADS 2 ) and bleeding risk (HEMORR 2 HAGES) were calculated. Other risk scores (CHA 2 DS 2 -VASc for stroke risk; HAS-BLED, ATRIA and ORBIT for bleeding risk) were calculated retrospectively using the clinical variables available in our (prospectively collected) dataset. The TTR at 6 months after entry was calculated using the linear interpolation method of Rosendaal 12 . Good anticoagulation control was defined as a TTR >65%, based on recommendations of the National Institute for Health and Care Excellence (NICE) 13 . Anemia was defined as hemoglobin <13 g/L in men and <12 g/L in women.
Follow-up was performed by personal interview at each visit to the anticoagulation clinic and through medical records. During this period we recorded all bleeding events, which were categorized as major bleeding (primary endpoint) if they met the following 2005 International Society on Thrombosis and Haemostasis (ISTH) criteria 14 : fatal bleeding, and/or symptomatic bleeding in a critical area or organ, such as intracranial, intraspinal, intraocular, retroperitoneal, intra-articular or pericardial, or intramuscular with compartment syndrome, and/or bleeding causing a fall in hemoglobin level of 20 g.L −1 (1.24 mmol.L −1 ) or more, or leading to transfusion of two or more units of whole blood or red cells. Bleeding events, as well as other clinical outcomes, were identified, confirmed and recorded by the investigators.
This observational registry was approved by the Ethical Committee from University Hospital Morales Meseguer and was performed in accordance with the ethical standards laid down in the 1964 Declaration of Helsinki and its later amendments. All patients gave informed consent to participation in the study.

Statistical analysis.
Categorical variables are presented as counts and percentages. Continuous variables were tested for normality by the Kolmogorov-Smirnov test and presented as mean ± standard deviation (SD) or median and interquartile range (IQR), as appropriate. The Chi-squared test was used to compare proportions. Cox regression models were performed to determine the association between higher values (or high risk and medium/high risk when we analyzed as categories) of the bleeding risk scores and the occurrence of a major bleeding.
Kaplan-Meier estimates and analysis by the long-rank test were carried out to assess differences in event-free survival distributions between subgroups of bleeding risk categories. Receiver operating characteristic (ROC) curves were applied to evaluate the predictive ability (expressed as c-indexes) of the four AF-validated bleeding risk scores. Comparisons of ROC curves were carried out by DeLong et al. method 15 . Net reclassification improvement (NRI) and integrated discriminatory improvement (IDI) were performed according to the methods described by Pencina et al. 16 . Additional analyses were carried out by adding one point for TTR <65% to the ATRIA, ORBIT and HEMORR 2 HAGES scores (as 'labile INR' was already included within the HAS-BLED score), in order to determine if this results into an improvement of the predictive ability for major bleeding.
Goodness of fit of the new bleeding risk models was evaluated using the Hosmer-Lemeshow test. Finally, clinical usefulness and net benefit of the new predictive models were estimated using decision curve analyses (DCAs) 17,18 . The DCA test identifies patients who will have any major bleeding, based on the predictions of one risk score when is compared with another score. The x-axis shows threshold values for major bleeding risk while the y-axis represents the net benefit for the different threshold values of major bleeding risk. The prediction models that are the farthest away from the slanted dash grey line (i.e., assume all major bleeding) and the horizontal black line (i.e., assume none major bleeding) demonstrates the higher net benefit.
Relationship to comorbidities. Diabetes mellitus, heart failure, coronary artery disease and prior malignancy were generally more prevalent in high risk groups in all scores (Table 1). Patients at medium/high risk of bleeding using HEMORR 2 HAGES were more frequently female (p = 0.004), but there was no sex association with other scores. Anaemia was more prevalent with high risk HAS-BLED (p = 0.001). Patients at medium/high risk according to ORBIT and ATRIA more commonly had previous stroke/TIA. As expected, thromboembolic risk according to CHA 2 DS 2 -VASc score was higher amongst high or medium/high bleeding risk categories (p < 0.001 for all scores).
Median TTR analyzed was significantly lower in the HAS-BLED high risk group (p < 0.001) and ATRIA (p = 0.028) and ORBIT (p = 0.003) medium/high risk groups. The proportion with poor anticoagulation control (TTR <65%) was significantly increased in the high risk HAS-BLED (p < 0.001) and medium/high ORBIT (p = 0.019) categories.
Bleeding events. Of 250 major bleeding, 65.2% occurred in the HAS-BLED high risk category and 82.4% in the HEMORR 2 HAGES medium/high risk category; in contrast, most major bleeds occurred in 'low risk' ATRIA and ORBIT scores, with only 29.6% and 34.0% of major bleeds occurred in their respective 'medium/high risk' categories. Odds ratios (OR) for major bleeds using the four bleeding risk scores were calculated. The HAS-BLED high risk category [ Univariate Cox regression analysis also showed a significant association between the four bleeding risk scores and major bleeds, whether analysed as continuous or categorical variables (Table 3). Survival analysis demonstrated that patients categorized at high risk or medium/high risk showed an increased risk of major bleeding (HAS-BLED: Log-Rank 40.24, p < 0.001; ATRIA: Log-Rank 25.82, p < 0.001; ORBIT: Log-Rank 40.88, p < 0.001 and HEMORR 2 HAGES: Log-Rank 21.33, p < 0.001) (Fig. 1).
Receiver operating characteristic (ROC) curves analysis shows that all scores predicted major bleeding in patients with AF, with c-indexes of 0.62 (p < 0.001) for HAS-BLED, and 0.54 (p = 0.004), 0.56 (p < 0.001) and 0.54 (p = 0.007) for ATRIA, ORBIT and HEMORR 2 HAGES (Supplementary Table 2), with HAS-BLED having the best predictive value. Comparison of the ROC curves according to DeLong et al. 15 demonstrated that HAS-BLED had the best performance of the four scores (Table 4).
When labile INR or poor anticoagulation control (i.e. TTR <65%) was added to ATRIA, ORBIT and HEMORR 2 HAGES, this modification significantly increased the ability of discrimination and their predictive values ( Table 5). Comparison of the original and modified scores demonstrated significant improvements in c-indexes for the ATRIA, ORBIT and HEMORR 2 HAGES modified scores (p < 0.001 for the three scores). Reclassification analysis showed an improvement in sensitivity and significant positive reclassification of the modified scores compared with the original, based on the IDI and NRI (Table 5; Fig. 2). Based on the p values of the Hosmer-Lemeshow test, the new predictive models that include poor anticoagulation control (TTR <65%) were properly calibrated (ATRIA, p = 0.981; ORBIT, p = 0.569 and HEMORR 2 HAGES, p = 0.294).
Finally, decision curve analysis (DCA) graphically demonstrates that the overall risk of major bleeding is 19%, based on the intersection of the y-axis and the slanted dash grey line. As they are farthest away from the slanted dash grey line (i.e., assume all major bleeding) and the horizontal black line (i.e., assume none major bleeding), the modified ATRIA, ORBIT and HEMORR 2 HAGES scores (that include labile INR) demonstrates improved clinical usefulness and a higher net benefit compared to the original scores (Fig. 3).

Discussion
In this 'real world' study, our principal finding was that in AF patients taking VKAs, HAS-BLED, ATRIA, ORBIT and HEMORR 2 HAGES scores are all associated with major bleeding, although HAS-BLED had the best predictive ability. Second, adding labile INR (TTR <65%) to ATRIA, ORBIT and HEMORR 2 HAGES scores significantly improved their predictive value for major bleeding, suggesting that these three scores would perform suboptimally in VKA users by not considering 'labile INR' as a criterion for bleeding. Indeed, the modified ATRIA, ORBIT and HEMORR 2 HAGES scores (that include labile INR) demonstrated improved clinical usefulness and a higher net benefit compared to the original scores.
Given that the VKAs are the commonest OACs in use world-wide, our findings have major implications for bleeding risk assessment in relation to OAC use. Also bleeding risk is not a 'static' process, and patients require re-evaluation at every opportunity over the course of the patient pathway 19 . The appropriate use of bleeding risk   Table 3. Univariate Cox regression analysis between bleeding risk scores and major bleeding events. CI = confidence interval; HR = hazard ratio. *As per every score point.    scores has been discussed, and these scores are to 'flag up' patients potentially at risk of bleeding for more careful review and follow-up. Thus, the ATRIA and ORBIT categorise most patients at 'low risk' and hence, would not have 'flag up' patients potentially at risk of bleeding -indeed, most patients sustaining major bleeding events occurred in the 'low risk' categories of the ATRIA and ORBIT scores. Given that bleeding risk can be modified, appropriate use of bleeding scores should be to focus attention on reversible bleeding risk factors, such as uncontrolled hypertension, excess alcohol and concomitant antiplatelet therapy or NSAID use, as well as labile INRs in a patient taking VKA 19 . These features are fulfilled by HAS-BLED which has been validated in patients on anticoagulants (whether VKA or non-VKA), aspirin or no antithrombotic therapy -hence, the validity of using this bleeding score in all steps of the patient management pathway.
The ATRIA (Anticoagulation and Risk Factors in Atrial Fibrillation) score was proposed in 2011 to predict bleeding associated with warfarin 27 , but none of the risk score criteria includes assessment of quality of anticoagulation control or the concomitant use of antiplatelet therapy. As previously described, the HAS-BLED score has a better performance than ATRIA amongst anticoagulated patients with AF, whether with VKA 23,25 or non-VKA anticoagulants 28 , as well as amongst non-anticoagulated patients 29 . More recently, the ORBIT score was derived from an industry-sponsored registry and proposed as a simple score to assess the risk of bleeding in patients with AF regardless of the type of anticoagulant, whether VKA or non-VKA 30 . Although the ORBIT score predicted major bleeding in the large cohort of the ROCKET-AF trial 31,32 , the ORBIT score ignores the quality of anticoagulation control as a criterion and has been shown to be inferior to HAS-BLED in predicting bleeding amongst AF patients on VKA and non-VKA anticoagulants [33][34][35] , as well as those who are non-anticoagulated 29 .
The association between the TTR and adverse events has been shown in numerous studies. For example, an increased risk of major bleeds has been consistently shown in patients with VKAs with a TTR below than 65% [36][37][38][39] . Many other clinical factors have been added into bleeding risk stratification schemes, but these have been based on complex scoring systems derived from multivariate analyses and thus, difficult to apply in clinical daily practice 6 . Whilst undoubtedly interesting and necessary to develop risk scores for assessing the risk of bleeding irrespective of the anticoagulant and make these scores as simple as possible, the VKAs are still the most commonly used OAC worldwide, and thus, anticoagulation control is an issue that cannot be ignored to support the appropriate clinical decision making. Given the close relationship of bleeding to labile INRs and poor TTR, attention to this clinical factor amongst those patients taking a VKA is crucial 7 . Of note, 'labile INR' can also be easily defined using other simple (and easily accessible) parameters, such as the proportion of INRs in range, INR variability, time above range, INR >5 twice, INR >8 once, or INR <2 twice, etc. 13,40 . The results of the present study reinforce this perspective, since the inclusion of 'labile INR' into the ATRIA, ORBIT and HEMORR 2 HAGES would significantly increase the predictive ability and clinical usefulness of these scores. Importantly, this suggests that these scores perform suboptimally in VKA patients unless labile INR is considered. Hence, these findings observed in 'real world' AF patients support the results from clinical trial cohorts 33,35 . Limitations. This study is limited by its single centre design, with a Caucasian based population. The dataset was collected prospectively, although we calculated some risk scores (CHA 2 DS 2 -VASc, HAS-BLED, ATRIA and ORBIT) and performed the analyses retrospectively, since at the time of patient inclusion these newer scores were not yet described and hence, were not used to 'clinically manage' these patients. At the beginning of the study all patients were treated with acenocoumarol, which has a shorter half-life than other VKAs. However, one strength of our study is the inclusion of consecutive AF patients that were stable with VKA (INR 2.0-3.0) for at least 6 months. Follow-up was also done in an anticoagulation clinic, where at the beginning of OAC therapy patients are carefully followed, according to a standardized care protocol. This aspect may have minimized our bleeding events, and the generalizability to other settings with less intense follow-up.

Conclusions
In AF patients taking VKAs, the HAS-BLED score had the best predictive ability. Adding labile INR (TTR <65%) to ATRIA, ORBIT and HEMORR 2 HAGES scores improved their predictive value for major bleeding leading to improved clinical usefulness and a higher net benefit compared to the original scores. This suggests that these three scores would perform suboptimally in VKA users by not considering 'labile INR' as a criterion for bleeding.