A novel dataset of predictors of mortality for older Veterans living with type II diabetes

The dataset summarized in this article includes a nationwide prevalence sample of U.S. military Veterans who were aged 65 years or older, dually enrolled in the Veterans Health Administration and traditional Medicare and had a previous diagnosis of diabetes (diabetes mellitus) as of December 2005 (N = 275,190) [1]. Our data were originally used to develop and validate prognostic indices of 5- and 10-year mortality among older Veterans with diabetes. We include various potential predictors including demographics (e.g., sex, age, marital status, and VA priority group), healthcare utilization (e.g., # of outpatient visits, # days of inpatient stays), medication history, and major comorbidities. This novel dataset provides researchers with an opportunity to study the associations between a large variety of individual-level risk factors and longevity for patients living with diabetes.


a b s t r a c t
The dataset summarized in this article includes a nationwide prevalence sample of U.S. military Veterans who were aged 65 years or older, dually enrolled in the Veterans Health Administration and traditional Medicare and had a previous diagnosis of diabetes (diabetes mellitus) as of December 2005 ( N = 275,190) [1] . Our data were originally used to develop and validate prognostic indices of 5-and 10-year mortality among older Veterans with diabetes. We include various potential predictors including demographics (e.g., sex, age, marital status, and VA priority group), healthcare utilization (e.g., # of outpatient visits, # days of inpatient stays), medication history, and major comorbidities. This novel dataset provides researchers with an opportunity to study the associations between a large variety of individual-level risk factors and longevity for patients living with diabetes.

Value of the Data
• Clinical practice guidelines for diabetes treatment state that treatment goals should account for patients' comorbidities and life expectancy. However, there are currently no nationwide, publicly available datasets of longevity for patients with diabetes (diabetes mellitus). The uncertainty of life expectancy for older adults with diabetes can make it difficult for clinicians to work with individual patients to develop ideal treatment plans. • Our data provide a unique opportunity for researchers to estimate longevity and identify mortality risk-factors for patients living with diabetes. • Findings may then be used to inform clinicians and patients as they participate in shared decision-making and set individualized treatment goals.

Data Description
Approximately 34.2 million Americans currently live with diabetes, of which Type II diabetes (diabetes mellitus) is the most common [2] . Diabetes is associated with increased mortality and an increased risk for other conditions including kidney disease, retinopathy, dementia, nerve damage, and high blood pressure [3] . Potential treatments include lifestyle modifications, oral antihyperglycemic medications, or insulin therapy. Clinical practice guidelines suggest that patients' life expectancy should be taken into account when developing individualized treatment plans [4 , 5] . However, there are a dearth of long-term data and clinical risk prediction tools for life expectancy among patients living with diabetes. We obtained administrative data from 2004 to 2016 for a prevalence sample of Veterans who were dually enrolled with the Veterans Health Administration and traditional Medicare, aged 65 years or older, and had a prior diagnosis of diabetes. The VHA Corporate Data Warehouse (CDW) contains data for every enrolled Veteran including outpatient and inpatient health services utilization, medication history, laboratory tests, diagnosis and procedure codes, and demographic information. Data are captured if a Veteran has an encounter at a VHA facility or with a community-based provider at VHA expense [6] . We also obtained Medicare Standard Analytic Files [7] for the same time period to obtain dates of death and to ensure we had a more comprehensive view of utilization and health conditions. The characteristics of Veterans in our dataset are presented in Table 1 .

Experimental Design, Materials and Methods
We used SQL to query the VHA CDW and identify patients with either two outpatient visits or one inpatient visit with an ICD-9 code for diabetes (362.0X, 357.2, 250.X, 366.41) or a prescription for a diabetes medication (excluding metformin-only) during calendar year 20 04-20 05 [8] . The sample was then limited to those patients who met the following criteria; (1)   as having at least one primary care visit during 20 04-20 05 with records for routine biomarkers (i.e., blood pressure, body mass index, hemoglobin A1C). Our final sample included 275,190 Veterans; a sample selection flowchart is presented in Fig. 1 .
Predictor variables were selected based on their demonstrated associations with mortality among patients living with diabetes in previous research. Data were extracted from the following CDW tables: Veterans' gender, age, dates of death, and enrolment in either traditional Medicare or Medicare Advantage was determined using Medicare's Master Beneficiary Summary Files. ICD-9 codes, CPT codes, counts of inpatient days, and outpatient visits were also extracted from Medicare's MedPAR and Carrier files. From the CDW, we also extracted Veterans' priority group, demographics, ICD-9 codes, CPT codes, counts of inpatient days, and outpatient visits. Veterans' priority group, age, sex, marital status, race, and ethnicity were retrieved as of December 31, 2005. Priority groups 1 and 4 constitute those with serious service-related disabilities (greater than 50% disability or housebound); groups 2, 3, and 6 are those with non-compensable, low, or moderate disabilities; group 5 comprises those with economic hardships; and groups 7 and 8 have no service-related disabilities and household incomes above certain thresholds. Our dataset also includes binary indicators for whether a Veteran was alive at the five-year mark (December 31, 2011), or at the ten-year mark (December 31, 2016).
We used a two-year lookback period (January 1, 2004 to December 31, 2005) to identify Veterans' comorbidities, health services utilization, medication history, and select measures of diabetes complications. We included a variety of predictors that have been previously associated with mortality in either older adults or people living with diabetes [10] . Measures of prior health services utilization included counts of inpatient days and outpatient visits. We included binary variables indicating whether Veterans were prescribed sulfonylureas, meglitinides, metformin, thiazolidinediones, α-glucosidase inhibitors, insulin, or antihypertensive medications (e.g., β-blockers, calcium channel blockers, antihypertensive combinations). ICD-9 and CPT codes were used to create binary indicators for Quan-Elixhauser comorbidities [11] as well as endstage liver disease, major depression, coronary artery disease, acute myocardial infarction, percutaneous coronary interventions [12] , nicotine dependence or smoking cessation, retinopathy, hyperglycemia, lower-limb amputation, and diabetic foot infections [13] . We used CPT codes to create indicators of screenings for retinopathy and ankle-brachial indices [14] . A frailty index ranging from 0 to 1 was also created using 30 variables identified from ICD-9 or CPT codes related to morbidity (e.g., arthritis), functional status (e.g., need for durable medical equipment), cognition and mood (e.g., dementia), sensory impairment (e.g., hearing impairment), or other conditions (e.g., incontinence) [15] . Biomarkers (e.g., BMI, blood pressure, A1C) were calculated as the mean of all measurements during the baseline period Table 2 displays a complete list of included variables and their coded classifications.
For completeness and reproducibility, we have included SQL scripts that were used to extract raw data from the VHA CDW, an R script to prepare the analytic file, and the R script used to estimate the mortality risk prediction models. Pre-processed, deidentified data files are also available in CSV, Stata (.dta), and R (.rds) formats.
We note several limitations with these data. First, our determination of diabetes status may include a small amount of misclassification. The criteria we used to identify patients with diabetes based on VHA electronic health records have been previously validated, achieving high sensitivity (93%) and specificity (98%) compared to patients' self-reported health status [8] . Second, we obtained administrative data from both VHA and Medicare, but our data do not capture health services utilization from other payers. Lastly, while the CDW incorporates death records from several federal sources, a small number of Veteran deaths may remain unreported.

File inventory
• SQL scripts to extract data from the VHA CDW.
• R statistical code to pre-process the data.
• R statistical code to estimate mortality risk prediction models.
• Deidentified individual-level datasets of mortality and patient characteristics (processed).

Ethics Statement
The study was reviewed and approved by the VA Boston Health Care System's Institutional Review Board (protocol #1584905-2). A waiver of informed consent was granted for this database-only study, with identifiable information limited to the minimum required to complete the study. Contacting patients to provide informed consent, in addition to being infeasible due to sample size, would thus increase the risks associated with a breach of confidentiality. Release of deidentified datasets was also authorized as part of this publication.
The Privacy Office of the Veterans Affairs Boston Healthcare System have certified these datasets are de-identified and may be publicly released as part of this publication.

Data Availability
A Novel Dataset of Predictors of Mortality for Older Veterans Living with Type II Diabetes (Original data) (Mendeley Data).

Declaration of Competing Interest
David Mohr, Paul Conlin, and Kevin Griffith are investigators at the VA Boston Healthcare System. The content is solely the responsibility of the authors and does not necessarily represent the views of the VHA, which did not have editorial input or control over this research. Thus, the views expressed in this article are those of the authors and do not necessarily reflect the position or policy of the Department of Veterans Affairs or the United States government.