False Alarm? Estimating the Marginal Value of Health Signals

We investigate the marginal value of information in the context of health signals after checkups. Although underlying health status is similar for individuals just below and above a clinical threshold, treatments differ according to the checkup signals they receive. For the general population, whereas health warnings about diabetes increase healthcare utilization, health outcomes do not improve at the threshold. However, among high-risk individuals, outcomes do improve, and improved health is worth its cost. These results indicate that the marginal value of health information depends on setting appropriate thresholds for health warnings and targeting individuals most likely to benefit from follow-up medical care.


Introduction
Prevention of chronic disease has become a key health policy initiative in recent years. For example, the World Health Organization (WHO) provides a road map and menu of policy options that aim to reduce premature deaths due to chronic non-communicable diseases such as cardiovascular disease, cancer, and diabetes (WHO 2013). An important part of prevention is monitoring an individual's health condition and intervening early enough to make a difference in the course of a disease. Traditional approaches include routine health checkups, cancer screening, and disease management programs. More recently, wearable and portable devices are gaining popularity, allowing people to monitor their own health in real time.
Advocates suggest that such real-time health signals will lead to appropriate preventive care and improve health outcomes at a lower cost compared to conventional approaches, although others recognize that such signals are no panacea. 1 While the importance of prevention is hard to deny, relatively little attention has been paid to whether preventive care along different margins is worth its cost. The aim of this paper is to investigate this issue in the context of mandatory health checkups in Japan, focusing on risk for diabetes mellitus (DM). We first graphically look at whether health signals about risk of developing DM embodied in health checkup reports affect individuals' medical care utilization, health behaviors, and health outcomes. We then econometrically examine whether the additional care triggered by a health signal is worth the cost.
To identify the cost effectiveness or net value of preventive care, we apply a regression discontinuity (RD) design. We exploit the fact that health checkup results just below and above a threshold, e.g., the level of fasting blood sugar (FBS), may be viewed as random. People with measured values just above the threshold may receive more preventive care -such as further diagnostic tests and diabetes-related physician visits -compared to those with values just below the threshold. This additional care may lead to better health outcomes for the individuals just above the threshold, compared to those just below the threshold. By comparing the cost of care and health outcomes of these people, we can assess the cost effectiveness of providing preventive care around the threshold. Our RD analysis is a "fuzzy" one, where we use the status of surpassing a threshold as an instrumental variable for medical care utilization. This alleviates the endogeneity concern that an omitted variable such as the person's health status, rather than the preventive care, affects health outcomes.
Using Japanese data provides several key advantages. First, we can construct unique individual-level panel data, which consist of medical claims data, health survey data, and health checkup data. These data sets can be linked by a patient ID. This rich longitudinal data set allows us to examine how health signals embodied in a checkup affect the individual's medical care utilization and health outcomes after the checkup. Second, an annual health checkup is mandatory in Japan; this mitigates concern about sample selection bias. Typically, health-conscious people are more likely to obtain health information by, for example, participating in health checkups or by using wearable devices, and this sample selection is likely to bias estimation results. Health checkups are mandatory in Japan, which alleviates this concern. Third, we have health outcome variables suitable for examining prevention. We apply a Japan-specific risk prediction model, the JJ risk engine (Tanaka et al. 2013), to our data to predict the 5-year risk of mortality and significant DM complications for each individual. These measures allow us to examine directly whether additional preventive care promotes health as measured by medium-and longer-run health outcomes. This is an advantage compared to only examining intermediate health measures such as FBS, HbA1c, and BMI that are more easily available but are also more difficult to interpret (Lipska and Krumholz 2017).
DM is an important case to study because it is a costly and incurable chronic disease of growing incidence and prevalence, and accordingly one of the primary targets for prevention (WHO 2013). DM is often called a "silent killer": individuals at first are asymptomatic and often not aware of the condition, but in the long-run suffer from various serious complications, including problems of the eye, heart, kidney, nerves, and feet. Recent research underscores the economic and human cost of DM: in 2014, approximately 422 million adults have diabetes worldwide, incurring costs estimated to total $825 billion per year (NCD-RisC 2016). DM can generally be prevented by early intervention to reduce lifestyle risk factors (such as smoking, unhealthy diet, sedentary lifestyle, and obesity). DM and pre-diabetes can be detected by elevated blood sugar levels (i.e., as measured pg. 4 by fasting blood sugar (FBS) or hemoglobin A1c (HbA1c)), a diagnostic test commonly included in regular health checkups. Indeed, in Japan, policymakers consider this so important that in 1972 they mandated that all employees receive annual screening for elevated blood sugar, as we describe below.
We have three main findings. First, at a relatively low diagnosis threshold (i.e., FBS=110 mg/dl) that corresponds to "borderline type" DM in Japan (sometimes called "pre-diabetes"), we find strong evidence that surpassing the threshold significantly increases medical care utilization as measured by DM-related physician visits and DM-related outpatient expenditures, including on medications. This finding indicates that people do respond to health signals by undertaking follow-up visits with physicians, and thus health signals can potentially promote preventive care. However, the absolute impact of the signal is small: exceeding the threshold increases the probability of visiting a physician for DM treatment by only 5 percentage points (albeit representing a 50% increase, i.e. from 10% to 15%). This small magnitude indicates that health signals do not effectively translate into preventive care for the majority of individuals in the present circumstances.
Indeed, we also find no evidence that individuals improve their health-related behaviors (whether on their own or in response to physician advice during preventive care). One of the reasons for this low response rate may be the lack of intervention: currently, after receiving a warning, whether or not to visit a physician is entirely up to the individual; no one monitors response or reminds individuals about the importance of a follow-up visit.
Second, despite the significant increase in medical care utilization at the "borderline threshold", we find no evidence that the additional care improves health outcomes. This is true both for intermediate health measures (such as FBS, BMI, and SBP) and for predicted risks of mortality and serious complications using the JJ Risk Engine. Thus, we conclude that there is no evidence that DM-related medical care is cost effective around this threshold.
The results hold both in the short-run (one year after a checkup) as well as in the medium-to longer-run (three years after a checkup). These results suggest that the threshold may need to be reexamined from the perspective of cost-effectiveness.
Third, at a higher diagnostic threshold (i.e., FBS=126 mg/dl) above which the person is a "diabetic type," we do not find robust evidence that pg. 5 crossing the threshold increases medical care utilization or improves health outcomes. At first glance, these results are surprising, because the results indicate that people are less responsive to a signal of higher risk. However, inspections of actual checkup reports revealed that employers rarely flag this threshold in their health reports, and thus most individuals do not receive a health signal when crossing that threshold. Since almost all employers focus on the lower threshold to signal a warning of pre-diabetes, and neglect the threshold signifying the higher risk category of diabetes, we interpret our empirical results as suggesting that policymakers should re-consider the importance of sending a separate signal at each threshold when multiple diagnosis thresholds are of independent clinical significance.
Assessing the cost-effectiveness of health interventions has a venerable, if challenging, history (Garber 2000 Almond et al. (2010), estimating marginal returns to medical care for at-risk newborns. Focusing on the "very low birth weight" threshold for newborns, Almond et al. (2010) find that those whose birth weights are just below the threshold receive more medical care and experience lower one-year mortality rates, compared to newborns with birth weight just above the threshold. These discontinuities allow them to conclude that medical care for at-risk newborns is cost effective around the threshold. 2 There is a huge literature on the cost effectiveness of pharmaceuticals and devices based on RCTs. However, such studies are often limited to a single medication or device and very few are able to capture the heterogeneity of treatment effects in the general population. The recent widespread application of RCTs in development economics has also highlighted the strengths and limitations of this approach (see for example Duflo, Glennerster, andKremer 2007 andDeaton 2010). 3 See Doyle, Graves, and Gruber (2015) for an interesting recent example using random ambulance assignment, and Cawley (2015) for an excellent review highlighting common and creative instruments such as relative distance to a medical care provider offering the treatment, the provider's historic tendency to administer the treatment, day of week of admission, or randomization of treatment for reasons other than research. Soumerai and Koppel (2016) provide a cautionary view.

pg. 6
While inpatient mortality is a salient outcome for at-risk newborns, it is not a feasible metric for cost-effectiveness of many other health interventions such as routine outpatient care, including preventive care. The difficulty of obtaining an appropriate measure of health outcome is a major obstacle in calculating cost effectiveness. We circumvent this problem by calculating predicted risks of mortality and significant DM complications, using the JJ Risk Engine.
Recently, the value of annual physicals has received renewed attention and our study also relates to this strand of research (e.g., Mehrotra and Prochzka, 2015;Goroll, 2015). For example, systematic reviews have found little evidence that annual physical checkups reduce morbidity or mortality, "though they may be associated with reduced patient worry and increased use of preventive care" (Mehrotra andProchzka 2015, p. 1485). 4 Our study is also closely related to Zhao, Konishi and Glewwe (2013), Oster (2015), and Kim, Lee and Lim (2017). Using data from the China Health and Nutrition Survey (CHNS), Zhao, Konishi and Glewwe (2013) apply regression discontinuity analyses to estimate the causal effect of diagnosis with hypertension (in the 3-4 years since the previous wave of CHNS) on food consumption and use of anti-hypertensives. They find a significant increased use of anti-hypertensives, as well as reduced fat intake, particularly among the higher-income individuals told they had high blood pressure. Their results are indicative that health signals from check-ups can lead to behavioral change and preventive care, at least along some margins for some specific population groups.
Evidence in higher-income contexts have generally been less encouraging about modifying long-term behavior with individual health signals. For example, studying consumers in the US, Oster (2015) finds that households with a newly diagnosed diabetic -inferred from household scanner data recording purchase of blood sugar testing strips -exhibit little change in their food consumption behavior over the following months.
She suggests that relatively modest "sin taxes" (e.g. on sugary sodas) or 4 Our study is also related to health technology assessment (Institute of Medicine 1985) and systematic review of preventive services (e.g. US Preventive Services Task Force 2009), as well as studies of the impact of health information on individuals' health-related behavior such as smoking and healthy diets (Chern, Loehman and Yen 1995;Kim and Chern 1999), or response to diagnosis of hypertension (Neutel and Campbell 2008; Zhao, Konishi and Glewwe 2013). subsidies of healthy fruits and vegetables might ultimately be more effective than individual health signals.
In a recent working paper, Kim, Lee and Lim (2017) study the impact of screening for diabetes, obesity, and hyperlipidemia under the National Health Screening Program in Korea. They find that health checkup information combined with appropriate intervention (such as a follow-up consultation) can prompt positive behavioral change and improve physical measures such as BMI and waist circumference. Although closely related, our study differs from theirs in two important ways. First, we study whether preventive care is cost effective by taking into account the increase in medical expenditures triggered by health checkups, whereas the focus of Kim, Lee and Lim (2017) are the effects of health signals on health behaviors and health outcomes. Second, for the effects on health outcomes, we study not only intermediate physical measures (such as FBS and BMI) but also predicted risks of mortality or serious DM complications by utilizing risk prediction models. Studying the latter is important because it provides more direct evidence on whether health signals promote final health outcomes in addition to physical measures.
The reminder of the paper is organized as follows. In Section 2, we briefly discuss mandatory health checkups in Japan and the key threshold values for DM diagnosis. Section 3 introduces our empirical model and Section 4 describes our data. In Sections 5 and 6, we report graphical and econometric results, respectively. Section 7 reports results from additional analysis, including long-run effects of preventive care. We conclude our study in Section 8. In both checkups, people receive a report within one or two months after a checkup. Figure 1 shows an example of a report that an individual receives.
If any checkup measure exceeds a threshold, the report typically gives a warning (such as "H" for high) for the item and recommends a visit to a physician for further consultation. Although conducting a health checkup with specific required screening items is mandatory, the government does not specify threshold values for each physical measure and employers and insurers that conduct a checkup determine their own thresholds. Also, after receiving a health warning, such as "H", whether to visit a physician is up to the individual. The person has no obligation to make a visit, and the employer or the insurer is not obligated to monitor or enforce such a follow-up visit.
If a person makes a physician visit after a checkup, fees for the visit are covered by health insurance, and we observe all the treatments made in our claims data. A physician has to record the name of the health condition for which the visit is made, and this information in the claims data allows us to identify DM-related physician visits. When the physician is not yet definitive about the diagnosis, the physician puts a "suspicion" flag on the diagnosis, which we also observe in our data. Because many physician visits triggered by health checkups may not have a confirmed diagnosis at the time of the initial physician visit, we include DM visits with or without a "suspicion" flag in our empirical analysis.

Threshold values for DM diagnosis
There are several thresholds that could trigger preventive care for DM. Specifically, an individual with FBS greater than or equal to 126 mg/dl is considered a "diabetic type," while an individual with FBS greater than or equal to 110 mg/dl but below 126 mg/dl is regarded as "borderline type." People with borderline type have a high rate of developing DM (Seino et al. 2010). A FBS value below 110 mg/dl is "normal type." 6 An alternative, newer measure used for DM diagnosis is HbA1c, which JDC adopted in July 2010.
HbA1c greater than or equal to 6.5% signifies that an individual is "diabetic type." 7 Additionally, HbA1c between 4.6% and 6.2% is considered the "standard value." 8 To summarize, the clinical thresholds for DM diagnosis are FBS=126 mg/dl or HbA1c=6.5%, and for pre-diabetes or "borderline type," it is FBS=110 mg/dl. Additionally, HbA1c=6.3% is also considered a cutoff value.

Empirical distribution of thresholds used in checkup reports
As discussed in the previous section, there are four thresholds (i.e., FBS=126,110 or HbA1c=6.5,6.3) that are most relevant for diagnosing DM.
However, employers do not have to adopt these values, because they are not legally bound to any specific signal to employees and can determine their own thresholds for reporting results of health checkups. Unfortunately, our data does not have information on the clinical threshold(s) that each employer adopts. As an alternative, we searched the Internet and investigated what thresholds are typically used in actual checkup reports.
We found more than 50 checkup reports posted on the Internet that contain FBS and/or HbA1c thresholds. One of our first main findings is that in all reports, only one threshold (or standard range) is specified for each physical 5 For example, DM can be diagnosed by examining blood sugar two hours after ingesting 75 grams of glucose. Please see Seino et al. (2010) for more details. 6 Starting in April 2013, JDS defines FBS value greater than or equal to 100 and less than 110 as "high normal." 7 In this paper, we express HbA1c values based on the National Glycohemoglobin Standardization Program (NGSP) values. 8 Please see the following treatment guideline for DM: http://www.fa.kyorin.co.jp/jds/uploads/Treatment_Guide_for_Diabetes_2014-2015.pdf pg. 10 exam measure. That is, no report defines two thresholds for one measure, such as both FBS=110 mg/dl and FBS=126 mg/dl. Figure 2 shows the "empirical" distribution of the thresholds obtained from our Internet search. It shows that for FBS, clinical thresholds at 110 mg/dl and 100 mg/dl are both common. FBS=110 mg/dl corresponds to "borderline type," as we discussed in Section 2.2. FBS=100 mg/dl is not a threshold for DM diagnosis, but corresponds to the threshold for metabolic syndrome screening. Understanding the effects of metabolic syndrome screening is also important, but it deserves a thorough investigation beyond the scope of this paper. Thus, we do not study the FBS=100 mg/dl threshold in this paper. FBS=126 mg/dl is the threshold that corresponds to "diabetic type." However, as Figure 2 shows, almost no health checkup reports adopt this threshold. Thus, it appears that in Japan, individuals are not receiving independent signals from their required check-ups about both pre-diabetes and diagnosable diabetes FBS values.
HbA1c thresholds exhibit a different pattern. In contrast to the FBS values, many more thresholds are used for HbA1c and, more importantly, these values are in close proximity. This makes it difficult to implement an RD approach because there is not a large enough "window" to identify the impact of each threshold. For example, to empirically examine a discontinuity at HbA1c=6.3% that is part of the "standard range," we need a large number of observations just below and above the threshold. However, because many employers also use the HbA1c=6.0% threshold, only three data points, i.e., HbA1=6.0, 6.1, and 6.2, can be used to represent the data just below the 6.3% threshold. For these reasons, we use FBS values for our RD analyses in this paper.

Regression discontinuity design
We attempt to identify the effect of preventive care on health outcomes using an RD design. We exploit the fact that health checkup results just below and above a threshold may be viewed as random. We believe this is a reasonable assumption because blood sugar levels can go up and down and it is difficult for individuals to precisely control those levels. Moreover, since we focus on non-DM patients in our analysis, these people are typically neither pg. 11 aware of their blood sugar levels nor have any incentive to manipulate them.
People with measured values just above the threshold may receive more preventive care than those with values just below the threshold, such as further diagnostic tests and diabetes-related physician visits. This additional care may lead to better health outcomes (e.g., lower likelihood of developing diabetes and lower medium-to long-term cardiovascular mortality risk) for the patients just above the threshold, compared to those just below the threshold.
A concern for identifying the effects of medical care on health outcomes is endogeneity. An omitted variable, such as the person's health condition, which may be correlated with the amount of medical care, may also affect health outcomes. If so, this will bias the results. We address this issue by implementing a "fuzzy" version of RD, where we use the status of surpassing a diagnosis threshold as an instrumental variable for medical care utilization. Specifically, in the case of the FBS=110 mg/dl threshold, we estimate the following RD model. The first stage regression is as follows: where is medical care utilization of person i in year t. we allow to be correlated over time. 9 The second stage regression is given by the following: where +1 captures a health outcome of person i in year t+1. As we discuss below, +1 represents three types of outcome variables. The remaining variables are the same as in Equation (1).
are parameters to be estimated and 1 is the RD coefficient, our main interest in this paper.
The fuzzy RD approach is valid if our excluded instrument, 110 , satisfies the following conditions. First, it is correlated with and, second, it affects health outcomes, +1 , only through . We will examine these conditions in more detail later, but the second condition requires special attention from the outset. In particular, one may be concerned that the health signal (e.g., "H" for FBS) may not only increase medical care utilization but also independently alter the person's health-related behaviors such as smoking and exercise habits (without a warning from the physician to stop smoking or engage in more exercise), that could also affect health outcomes. If this is true, the second condition above will be violated. Moreover, we note that whether health signals affect health behaviors is in itself an important policy issue. For these reasons, we test this hypothesis in our data by examining changes in reported health-related behaviors the year after the checkup. As we discuss in Section 7.4, health signals appear to have little effect on health behaviors, which supports the assumption required for identification.

2 Dependent variables
We use two types of health outcomes as our second-stage dependent includes the year and month of the checkup. A health survey -asking 10 We are extremely grateful for Shiro Tanaka and co-authors for providing the computer code for this project. 11 JMDC claims data have been used by a number of studies, including Iizuka (2012) and Fukushima et al. (2016).
pg. 14 respondents about their self-assessed health and health-related behaviorsis conducted as part of a checkup and thus is usually the same month as the checkup.
In this study, we are interested in the effects of preventive care on health outcomes and thus we focus on those who are not being treated for DM at the time of a health checkup. We include a checkup in our analysis if it meets the following conditions: i) the patient was not diagnosed with DM during the 6 months before the checkup, ii) we have data for the patient at least 6 months before the checkup, iii) we have data for the patient at least 12 months after the checkup, iv) the patient was 30～64 years old at the checkup, and v) the patient had a checkup only once in a given year. Table 1 provides summary statistics for the variables used in the analysis. We have more than 1.7 million observations in our data set. Figure   3 looks at the distribution of FBS values. It shows a smooth distribution of measured FBS values, with no apparent discontinuity at either the FBS=110 mg/dl or FBS=126 mg/dl thresholds. More than 287,000 observations are available around the "borderline type" signal, i.e., measured values which fall between FBS>=100 and FBS<=119. We have fewer observations around the "diabetic type" signal, but we still observe more than 40,000 observations for the same bandwidth.
An underlying assumption of an RD approach is that covariates do not exhibit a discontinuity at the threshold. To check whether covariates are balanced just before and after the thresholds, we plot the average values of our covariates, i.e., female and age, for each FBS value. As shown in Figure 4, there is no apparent discontinuity at the two thresholds for these variables, indicating that our covariates are reasonably balanced.
In our data, we observe individuals only if they are working at the same company and while the health insurance group provides data to JMDC. To address a potential selection issue, in Figure 5 we plot whether attrition is related with the threshold values, where Attrition equals one if the person disappears from our data in the month after a checkup and zero otherwise.
As shown in Figure 5, there is no apparent discontinuity at the thresholds, indicating that attrition is unrelated to cutoff values.

pg. 15
We first graphically examine how people respond to health signals. In particular, we look at the two diagnosis thresholds for DM diagnosis, i.e., FBS=110 mg/dl or 126 mg/dl, and how people's (i) health behavior, (ii) medical care utilization, and (iii) health outcomes change at the thresholds.

Effects of "borderline type" signal
Figures 6-9 present graphical analysis for the effects of the FBS>=110 signal, above which the person is considered a "borderline type." Figure 6 clearly shows that all types of medical utilization significantly increase at the threshold. Whereas we observe a clear jump at the threshold, its absolute impact seems limited. For example, the probability of visiting a physician for DM increases about 5 percentage points at the threshold. Although this represents a 50% increase, it does not seem to be a large absolute magnitude, given that nearly 90% of people could potentially respond to the signal at the threshold (please see Figure 6). One reason for the low response rate may be that only half of those who exceed the threshold receive a warning signal of "borderline type," as Figure 2 indicates. Moreover, people may discount the clinical importance of the "borderline type" signal even when they receive it.
In contrast to Figure 6, we observe little change in health behavior ( Figure 7) and health outcomes (Figures 8 and 9) at the threshold. In Figure   7, the probability of "walking or exercising" appears to somewhat increase at the threshold, but other health behaviors exhibit little change. In Figures 8   and 9, we observe virtually no discontinuities in health outcomes at the threshold, whether measured by intermediate health outcomes (Figure 8) or predicted 5-year risks of mortality and significant DM complications ( Figure   9). to receive a signal that they are a "diabetic type," it is not surprising that they do not respond at the FBS>=126 mg/dl threshold. Of course, it is a serious concern if, as we suspect, high-risk people are not alerted that they are actually high risk. Such "false reassurance" could offset any health benefits of (possibly repeated) signals at lower thresholds. One implication of these results is that if multiple threshold values exist for a physical measure, it is important that separate signals be considered at each risk level, calibrated to the strength of the evidence and seriousness of the risk.

Effects
6. Results from a fuzzy RD regression

1. First-stage results
This section reports the results from the first-stage regression of the 12 In Figure 13, "risk for stroke" and "risk for non-CV mortality" appear to decrease somewhat at the threshold. However, if we estimate a regression as specified in Equation (1) using these health outcomes as the dependent variables, the coefficients for 126 dummies are not statistically significant in either case even at the 10 percent confidence level (not reported). fuzzy RD analysis, Equation (1). We estimate the model by using a window width of 5 mg/dl and 10 mg/dl. The 10 mg/dl width is the widest possible one in the case of FBS=110 mg/dl, as there is another cutoff at 100 mg/dl as shown in Figure 2. For the dependent variable , we employ two variables, i) total number of DM visits and ii) DM-related outpatient medical expenditure. To save space, we only report the coefficients for 110 and 126 .
The results are reported in Table 2 The results for the FBS>=126 mg/dl threshold are in stark contrast. As reported in Panel B of Table 2, only one coefficient for the threshold dummy variable is statistically significant. Consistent with the graphical analysis reported in Figure 10, there is no robust evidence that crossing the "diabetic type" threshold increases DM-related medical care, perhaps because individuals are not receiving a health signal at that threshold. Table 3 reports the results from the second-stage regression. As we discussed in Section 3, we use two types of health outcomes as our dependent variables, namely (i) intermediate health outcomes and (ii) predicted risks of mortality and significant DM complications. For the endogenous variable in Equation (2), we experiment with two variables, i.e., DM_visits (Panel A) and pg. 18 DM_spending (Panel B), both of which are assumed to be sufficient statistics for DM-related medical care. We only report the results for the "borderline type" threshold because in Section 6.1 we did not find that medical care utilization increases at the "diabetic type" threshold. 13 The table shows that none of the coefficients are negative and statistically significant. Thus, there is no empirical evidence that the significant increase in DM-related medical care around the threshold improves health outcomes. While we only look at short-run effects in this section, as we report later in Section 7.2., we also do not find evidence of improvements in long-run health outcomes. Therefore the medical care triggered by the FBS>=110 mg/dl threshold does not appear to be cost effective.

Second-stage results
Please note that we have at least 54,000 observations in our regressions even when we use a narrow window width of 5 mg/dl. Moreover, the first-stage F-statistics are usually bigger than 20 except for some cases where we use the quadratic functional form with a narrow window width of 5 mg/dl (not reported), indicating that the excluded instrument is strongly correlated with DM care, as anticipated by Figure 6. Thus the insignificant second-stage results are not because of the weak instrument problem. 7. Additional analysis 7.1. Longer-run effects on health outcomes So far, we have looked at short-run effects of health signals and found no evidence that additional care triggered by health signals improves health outcomes. However, medical care can have cumulative effects and thus we might observe stronger effects in the long-run.
To assess this possibility, we graphically examine the effects on health outcomes three and five years after a checkup, focusing on the "borderline threshold" where we found significant short-run increases in medical care utilization. Figure 14 shows the effects on intermediate health outcomes: we find no apparent discontinuity at the threshold. The results for predicted risks of mortality and significant complications are similar, as shown in 13 Nonetheless, we also estimated the regressions for FBS>=126 mg/dl and found that the first-stage F-stats are less than 5 in all cases. This invalidates the instrumental variable approach, as we expected.   Figures A1 and A2 in the Appendix. Note that for these predicted risks of mortality and complications, we are in effect examining 10-year outcomes, since we use risk factors 5 years after the checkup to predicted outcomes for the next 5 years. In other words, for a 2009 checkup, we use blood pressure and other risk factors as measured in 2014 to predict probabilities of suffering a stroke, developing CHD, or non-cardiovascular mortality between 2014 and 2019. Thus, even in the long-run, there is no evidence that additional care for DM (around the margin of "borderline type") improves health outcomes.

Effects on individuals who did not receive a signal in the previous year
One might expect that some people are not health conscious and they may routinely ignore health warning signals even if they receive them. If we exclude these individuals, the effects of health signals on health outcomes and medical expenditures might be substantially larger. To explore such a possibility, we redo the analysis by focusing on individuals who did not get the "borderline type" signal in the previous year because their FBS values were below the threshold.
In Figure A3, we report the results for medical care utilization. As before, the "borderline type" signal clearly increases medical care utilization.
In fact, as expected, the impacts of the signal are slightly larger than what we reported in Figure 6. For example, as shown in Figure A3, the probability of visiting a doctor for DM increases by approximately 6 percentage points, as opposed to the 5-percentage-point increase found in Figure 6.
In contrast, as reported in Figure A4, the effect of the health signal on health outcomes does not appear to be different. Similar to our previous result reported in Figures 8 and 9, there is no clear evidence that the "borderline type" health signal affects health outcomes. Our regression results confirm this finding (not reported). Thus, although individuals who did not receive the warning last year respond more to the signal as expected, additional medical care utilization still does not seem to improve health outcomes. Again, we find no evidence that preventive DM care around the "borderline threshold" of FBS=110 mg/dl is cost effective.

Alternative health outcome measure
As an alternative to the predicted risks of mortality and significant complications using the JJRE, we also experimented with a risk measure calculated by the WHO risk prediction model (please see Appendix IV for how we implemented the risk model.) Unlike the JJRE, the WHO risk measure is based on individuals without diagnosed diabetes and thus complements the JJRE measures. As shown in Figure A5, there is no clear evidence that the "borderline type" and "diabetic type" signals affect the WHO risk measure. We also run the same regression models as before and found that the coefficients for the thresholds are not statistically different from zero (not reported). These results provide additional support for our finding that the health signals have little effect on health outcomes.

Effects of health signals on health behaviors
As discussed in Section 3.1, one concern for our identification approach is that after receiving a health signal, individuals may alter health behavior, which in turn may also affect health outcomes. If so, this makes it difficult to identify the impact of medical care utilization on health outcomes. Although we did not find evidence that medical care utilization affects health outcomes, in this section, we empirically examine the relationship between health signals and health behaviors by estimating Equation (1), using health behavior variables as the dependent variables. Specifically, we create dummy variables for (i) exercise or walk regularly, (ii) smoke, (iii) drink every day, and (iv) eat after dinner, and use them as the dependent variables.
We perform this analysis for the "borderline type" threshold where the signal can potentially work as an instrument for medical care utilization. Table 4 reports the results. To save space, we only report the coefficients for the threshold dummy variables. As shown in the table, only one out of 32 coefficients is statistically significant at the five percent confidence level.
These results are in stark contrast to the results for medical care utilization, where we found all coefficients are significant at the one percent confidence level. Thus, we have no evidence that health signals affect health behaviors, and this result provides support for our instrumental variable approach. To the extent that preventive visits to physicians might also involve counseling to reduce lifestyle risk factors such as smoking and sedentary lifestyle, this pg. 21 non-response along margins of health-related behavior could also be interpreted as further evidence of the lack of cost-effectiveness of preventive care at this margin.
The result that people do not alter health behaviors after receiving a health warning is not surprising in the current context. As previously mentioned, when a physical health measure exceeds a DM diagnosis threshold such as FBS>=110 mg/dl, a checkup report usually recommends a visit to a physician, but typically does not recommend or provide programs for lifestyle changes. Moreover, lifestyle changes such as quitting smoking are notoriously challenging. These factors may help to explain why health behaviors change little after surpassing a diagnosis threshold for DM. At the same time, the result is worrisome from the perspective of preventing DM.

Heterogeneous responses to health signals
In Section 5.1, we found that, on average, the "borderline type" signal increases DM-related medical care utilization. This suggests that health signals can potentially help prevent DM by triggering physician visits.
Responses to health signals may differ, however, depending on individuals' characteristics. For example, health-conscious people may respond more to health signals, while those who have a present bias may be less responsive.
Information on heterogeneous responses to health signals could potentially help inform the design of effective prevention programs. The aim of this section is to investigate heterogeneous responses to health signals.
To investigate heterogeneous effects, we add two terms to Equation (1).
The first is an individual characteristic of our interest, such as whether the person has a "poor eating habit," and the second is its interaction with the threshold dummy variable, e.g., 110 . The coefficient for the latter indicates heterogeneity and is the main interest of this section.
We look at three individual characteristics. First, we test the hypothesis that health-conscious individuals are more responsive to health signals, using "poor eating habit" as a proxy for being not health-conscious. A dummy variable, PoorEat, equals one if the person reports either eating before sleep or skipping breakfast. 14 Second, we examine whether smokers are less responsive to health signals. Studies in behavioral economics (Laibson 1997, Chaloupka and Warner 2000, Gruber 2001, Volpp et al. 2009, Giné, Karlan, and Zinman 2010 have found that smokers have a present bias and tend to discount the future more than their non-smoking counterparts. These people may respond less to health signals, because they discount the benefits of good health in the future, which may result from today's visit to a physician. Third, similarly, we also examine whether those who drink alcohol respond more to health signals. 15 Table 5 reports the results. We focus on the "borderline threshold" where DM-related medical care clearly increased. We use diagnosis of DM as the dependent variable and only report the coefficients for the FBS threshold dummy variable, individual characteristics, and their interaction term. Also, to save space, we focus on the quadratic specification. 16 We find that as shown in columns (1) and (2), those who have a "poor eating habit" respond less to the health signal by 1.5-1.6 percentage points relative to all others.
This effect is statistically significant at the one percent confidence level. We also find that smokers and drinkers are less responsive to the health signal relative to non-smokers and rare drinkers by 1.3 and 1.5-1.8 percentage points, respectively (please see columns (3)- (6)). In columns (7) and (8), we show the results that include all of the three variables at the same time. The coefficients for the interaction terms become somewhat smaller in this model but are still statistically significant in most cases. A simple summation of the three coefficients in column (8) reveals that the impact of the "borderline type" signal is reduced by half --from 0.065 to 0.031 --if the person has all three of these negative health habits.
These results provide suggestive evidence that people respond to health signals differently and those who smoke, drink, and have unhealthy lifestyles (in term of eating habits) respond less to health signals. Note that these people are the primary targets of disease prevention and our results indicate that extra efforts are necessary to induce them to visit a physician.
A caveat of this analysis is that our approach only suggests an association but not causation because some confounding factors may be correlated with the characteristics of our interest. A further analysis would be necessary to more rigorously identify heterogeneous effects.

Conclusions
While the importance of preventive care is hard to refute, it is also true that not all preventive care can improve welfare. Using unique individual-level panel data, we investigated whether people respond to health signals and if so, whether medical care triggered by health signals is worth the cost. We did so in the context of mandatory health checkups in Japan, focusing on preventive medical care for DM.
We find that, first, people respond to health signals and increase their probability of visiting a physician. This result confirms that health signals can potentially help prevent chronic diseases by bringing people to physicians' offices. The result also implies that the thresholds adopted in health checkups or wearable devices should reflect the cost-effectiveness of preventive care. If, on the other hand, employers or device makers can freely determine the thresholds on their own, the result may exacerbate rather than mitigate wasteful over-use of some kinds of care while not effectively promoting use of medical resources that are under-used relative to their cost-effectiveness (Baicker, Mullainathan, and Schwartzstein 2015).
Second, our results also indicate that only a small fraction of people respond to health signals along the margins that we study, and thus that the absolute impact of health signals may be limited. We can think of at least two reasons for this low response. First, although employers notify employees about their physical measures from the health checkup, employers may not specifically communicate that the employee belongs to a higher-risk group such as "borderline type" or "diabetic type." That is, the lack of a clear alert may explain the lack of response. Second, even when an individual receives a signal that he/she is a "borderline type," currently, whether to respond by visiting a physician is entirely up to the individual: neither is there a mandate for the individual in question to visit a physician, nor are employers obligated to follow up with employees about the potential benefits of a visit. In this case, if preventive care is indeed cost-effective, then to make prevention work, a more interventionist approach may be necessary. Further analysis is needed to distinguish these explanations. 17 17 Results from the National Health Screening Program in Korea suggest that clear

pg. 24
Third, most importantly, we do not find evidence that additional medical care triggered by health signals is cost effective. For the "borderline type" threshold, we find substantial increases in DM-related medical care utilization. However, health outcomes did not improve, either for physical measures (risk biomarkers) or for predicted risks of mortality or serious DM complications. Thus, there is no evidence that additional medical care is worth the cost around this threshold, and the current threshold may need to be reexamined. For the higher threshold value that corresponds to "diabetic type," few people respond to this signal and thus we were unable to assess cost effectiveness at that margin.
There are a large number of diagnosis thresholds that could trigger additional preventive care -primary, secondary, and tertiary preventionand little is known about their cost effectiveness. While we focus on DM in our analysis, our approach can easily be applied to many other health conditions and clinically-relevant diagnostic criteria. Such analyses could be useful inputs for establishing appropriate diagnosis thresholds and conveying their significance to patients, leading to more efficient use of medical resources.
information combined with prompting for a follow-up consultation can lead to a higher response rate, with between 16% and 19% of individuals above the 126 threshold in the baseline check-up subsequently receiving a diagnosis of DM (Kim, Lee and Lim 2017).  ("drinking everyday" = 0 otherwise.)  "smoking" = 1 if he/she has a habit of smoking.

pg. 27
("smoking" = 0 otherwise.)  "eating after dinner" = 1 if he/she eats a midnight snack 3 days or more per week. ("eating after dinner" = 0 otherwise.)  "poor eating" = 1 if he/she has dinner within 2 hours of sleep or skip breakfast 3 days or more per week.

DM diagnosis (DMDX)
・The rate of a diagnosis of diabetes (or at least rule-out diagnostic testing for diabetes) within 1 year of the index checkup ・"diagnosis" is defined by the following 2 conditions:    Figure A3. Effects of "borderline type" signal on medical care utilization (for those who did not exceed the threshold in the previous year) Figure A4. Effects of "borderline type" signal on selected health outcomes (for those who did not exceed the threshold in the previous year) pg.46 Figure A5. Effects of "borderline type" and "diabetic type" signals on WHO risk pg.47    (2). Only the coefficients for the endogenous explanatory variables are reported. Standard errors, corrected for clustering at the person level, are in parentheses. ***: 1 % confidence level, **: 5 % confidence level, *: 10 % confidence level.   (1). Only the coefficients for the FBS>=110 mg/dl and 126 mg/dl thresholds are reported. Standard errors, corrected for clustering at the person level, are in parentheses. ***: 1 % confidence level, **: 5 % confidence level, *: 10 % confidence level.
pg.51    (1). The dependent variable is "DM diagnosis." Only the coefficients for the FBS threshold dummy variable, individual characteristics, and their interaction term (our main interest) are reported. Standard errors, corrected for clustering at the person level, are in parentheses. ***: 1 % confidence level, **: 5 % confidence level, *: 10 % confidence level.