Nationally representative household survey data for studying the interaction between district-level development and individual-level socioeconomic gradients of cardiovascular disease risk factors in India

In this article, we describe the dataset used in our study entitled “The interaction between district-level development and individual-level socioeconomic gradients of cardiovascular disease risk factors in India: A cross-sectional study of 2.4 million adults”, recently published in Social Science & Medicine, and present supplementary analyses. We used data from three different household surveys in India, which are representative at the district level. Specifically, we analyzed pooled data from the District-Level Household Survey 4 (DLHS-4) and the second update of the Annual Health Survey (AHS), and separately analyzed data from the National Family Health Survey (NFHS-4). The DLHS-4 and AHS sampled adults aged 18 years or older between 2012 and 2014, while the NFHS-4 sampled women aged 15–49 years and - in a subsample of 15% of households - men aged 15–54 years in 2015 and 2016. The measures of individual-level socio-economic status that we used in both datasets were educational attainment and household wealth quintiles. The measures of district-level development, which we calculated from these data, were i) the percentage of participants living in an urban area, ii) female literacy rate, and iii) the district-level median of the continuous household wealth index. An additional measure of district-level development that we used was Gross Domestic Product per capita, which we obtained from the Planning Commission of the Government of India for 2004/2005. Our outcome variables were diabetes, hypertension, obesity, and current smoking. The data were analyzed using both district-level regressions and multilevel modelling.


Data
The provided data are supplementary data of the study entitled "The interaction between districtlevel development and individual-level socioeconomic gradients of cardiovascular disease risk factors in India: A cross-sectional study of 2.4 million adults", which was recently published in Social Science & Medicine [1].
Tables 1 and 2 report unweighted sample characteristics for the data, stratified by gender. Figs. 1 and 2 display the association of a district's development with the difference in the probability of having hypertension between most and least educated categories (i.e., having completed secondary school or a tertiary education versus not having completed primary school) ( Fig. 1) or between the top two and bottom two household wealth quintiles computed for each district (Fig. 2). We used the following indicators of district-level socio-economic development: median household wealth (Figs. 1a and 2a), GDP per capita (Figs. 1b and 2b), percentage of participants living in an urban area (Figs. 1c and 2c) and female literacy rate (Figs. 1d and 2d. We also show the same analyses for the following CVD risk factors: obesity (Figs. 3ae4d), diabetes (Figs. 5ae6d), and currently smoking (Figs. 7ae8d).

Value of the Data
The data allow researchers and policy makers to examine how individual-level socio-economic gradients of cardiovascular disease risk factors are associated with district-level socio-economic development. Insights gained from these analyses might give an indication as to how individual-level socio-economic gradients of cardiovascular disease risk factors will change in the future as districts continue to develop economically. These data could be used to conduct analyses on socio-economic determinants of cardiovascular disease risk factors in India and merged with data from other countries to conduct analyses at a larger scale In Figs. 9aed, we compare top and bottom household wealth quintiles computed for each district (for district-level primary school completion rate only). In Figs. 10aed we examine the association of a district's primary school completion rate, with the difference in the probability of a CVD risk factor between the top two and bottom two household wealth quintiles computed nationally. The numbers of districts included in the district-level regressions for each risk factor and SES measure are presented in Table 3. Multilevel linear regressions for the interaction between district-level socio-economic development and participants' educational attainment or household wealth, computed for each district and nationally, are shown for hypertension (Tables 4 and 5), obesity (Tables 6 and 7), diabetes (Tables 8  and 9) and currently smoking (Tables 10 and 11). As before, district-level indicators of socio-economic development were median household wealth, GDP per capita, percentage of participants living in an urban area, and female literacy rate. In addition, multilevel linear regressions with all our available indicators for district-level development (including primary school completion rate) were fitted for the following outcome variables: high blood pressure (Tables 12 and 13) and high blood glucose (Tables  14 and 15) in the NFHS-4 dataset, diabetes assuming that AHS participants have not fasted (Tables 16  and 17), and currently smoking separately for male (Tables 18 and 19) and female (Tables 20 and 21) survey participants.
We conducted two additional analyses to improve our understanding of our findings: i) association of a district's primary school completion rate with the difference in the continuous household wealth  index between highest and lowest household wealth quintile (Figs. 11a and b), and ii) logistic and linear regressions of CVD risk factors onto household wealth and district-level fixed effects, conducted in the total sample and a subset of the data (Tables 22e25).  Tables 26 and 27 show how the district-level independent variables are correlated.
In the sampling procedure, the health surveys used projections from either the 2001 or the 2011 India Census, while the GDP per capita data was collected in 2004/2005. Because of these time differences, we did not have GDP per capita data for some districts in each survey. We, therefore, excluded districts that were newly created within that time period (2001e2011) [2]. Neighboring districts, which underwent subsequent jurisdictional changes, were also excluded, leaving us with GDP per capita data for 476 of 640 districts in the NFHS-4 dataset and 467 of 561 districts in the DLHS-4/AHS dataset.

Experimental design, materials, and methods
Methods and statistical analyses are described in our main publication entitled "The interaction between district-level development and individual-level socioeconomic gradients of cardiovascular disease risk factors in India: A cross-sectional study of 2.4 million adults". Here, we provide more detail on sampling procedure, anthropometric and biomarker measurements, construction of educational attainment categories, and the computation of household wealth quintiles. Analysis code files and raw data are provided in the Harvard Dataverse (link shown in the specifications table). a Fig. 1a. Hypertension: association of district-level median household wealth with the difference between completing at least secondary school and less than primary school. The points in the plot represent the regression coefficient from a linear probability model (for the absolute difference) and the Odds Ratio from a logistic regression (for the relative difference) comparing those participants who completed at least secondary school to those who did not complete primary school education in a district. These regressions regressed hypertension onto sex, age, and urban/rural residency separately for each district. The analysis included 595 districts in the NFHS-4 and 516 districts in the DLHS-4/AHS. The grey line through the scatterplots has been fitted using ordinary least squares regression (with each data point in the plot having the same weight). The p-value shows whether the slope of the grey line is significantly different from zero. The y-axis for the relative difference is on the logarithmic scale. The NFHS-4 covered all 640 districts of India as of the time of the 2011 India census [3] and was conducted between 2015 and 2016. In the first stage of the stratified two-stage-cluster random sampling design, each district was separated into rural and urban areas and, within each rural or urban stratum, primary sampling units (PSUs) were selected with probability proportional to population size using the 2011 India census as a sampling frame. Rural PSUs were villages and urban PSUs were census enumeration blocks. In the following step, a household listing was carried out in the PSUs whereby large PSUs (defined as having more than 300 households) were divided into segments (each segment with approximately 100e150 households). Lastly, systematic random sampling (i.e., the first household was selected randomly, followed by the sampling of every nth household) was used in each PSU or PSU segment to select 22 households. Eligible women and men included all residents and visitors (who stayed the night prior to the survey) of the selected households. Women eligible for the women's survey were female residents or visitors that stayed the night prior to the survey and were 15e49 years old. The men's questionnaire was conducted in a random subsample of 15% of households. Eligible men were men aged 15e54 years who spent the night prior to the survey in the household or were usual residents. Men are, therefore, underrepresented in this survey and the variables for men that we used in this analysis are not representative at the district level. The socio-demographic data used in this analysis were ascertained by administering questionnaires using Computer Assisted Personal b Fig. 1b. Hypertension: association of a district's GDP/capita with the difference between completing at least secondary school and less than primary school. The points in the plot represent the regression coefficient from a linear probability model (for the absolute difference) and the Odds Ratio from a logistic regression (for the relative difference) comparing those participants who completed at least secondary school to those who did not complete primary school education in a district. These regressions regressed hypertension onto sex, age, and urban/rural residency separately for each district. The analysis included 450 districts in the NFHS-4 and 436 districts in the DLHS-4/AHS. The grey line through the scatterplots has been fitted using ordinary least squares regression (with each data point in the plot having the same weight). The p-value shows whether the slope of the grey line is significantly different from zero. The y-axis for the relative difference is on the logarithmic scale.
Interviewing (CAPI). Interviews with eligible women were completed with a response rate of 97%, while the response rate for eligible men was 92%. We only included non-pregnant residents (i.e. excluded pregnant women and visitors that stayed the night prior to the survey) in our dataset.
The biomarker questionnaire was administered to all eligible women and men and included measurements of height, weight, blood pressure, and blood glucose. For glucose measurements, capillary blood samples were taken with a finger prick and were analyzed with the FreeStyle Optimum H glucometer. The Omron Blood Pressure Monitor was used to measure blood pressure three times in the same arm in each individual, with a five-minute break in between measurements. Weight was assessed using the Seca 874 scale, and height measurements were conducted with the Seca 213 stadiometer. More information on the methodology of the survey and data collection procedures is available in the national report [4] and the NFHS-4 CAB manual [5].

District-Level Household Survey-4 (DLHS-4) & Annual Health Survey (AHS)
The District-Level Household Surveye4 (DLHS-4) and the second update of the Annual Health Survey (AHS) were carried out simultaneously (between 2012 and 2014) and, when pooled, cover all Indian states except Gujarat and Jammu and Kashmir as well as all Union Territories except for Lakshadweep, and Dadra and Nagar Haveli. Sampling procedure and clinical, anthropometric, and biomarker (CAB) measurements are described elsewhere in detail and summarized below [6].
The DLHS-4 was conducted in 18 states and five Union Territories (comprising 336 districts in total) between 2012 and 2014 [7,8]. In the first stage of the two-stage cluster-random sampling design, PSUs were selected, which were "census villages" (sampled with probability proportional to population size c Fig. 1c. Hypertension: association of a district's urban population with the difference between completing at least secondary school and less than primary school. The points in the plot represent the regression coefficient from a linear probability model (for the absolute difference) and the Odds Ratio from a logistic regression (for the relative difference) comparing those participants who completed at least secondary school to those who did not complete primary school education in a district. These regressions regressed hypertension onto sex, age, and urban/rural residency separately for each district. The analysis included 595 districts in the NFHS-4 and 516 districts in the DLHS-4/AHS. The grey line through the scatterplots has been fitted using ordinary least squares regression (with each data point in the plot having the same weight). The p-value shows whether the slope of the grey line is significantly different from zero. The y-axis for the relative difference is on the logarithmic scale. using projections from the 2001 India census) in rural areas and "urban frame survey blocks" (selected through simple random sampling) in urban areas. Systematic random sampling was used in the second step to select the households in each PSU.
The AHS was conducted in nine states, comprising 284 districts between 2012 and 2013 [7,9]. These states were chosen because they had high percentages of infant and child mortality at the time of the conception of the first AHS. The two-stage cluster-random sampling approach was, again, stratified by rural versus urban areas. The PSUs were villages in rural areas and enumeration blocks in urban areas and both were selected through simple random sampling with probability proportional to population size using projections from the 2001 India census. Systematic random sampling was employed to choose households in each PSU. CAB measurements were conducted 12e18 months after the household questionnaire was conducted. Importantly, because sociodemographic information and CAB data in the AHS was published in the public domain in two separate datasets without a unique identifier that could be used to match participants across these two datasets, we had to resort to "fuzzy matching" to match individuals across these two datasets. Specifically, we merged participants using a composite indicator consisting of state, district, stratum (indicating rural versus urban areas and village size), a household identifier that is unique within each PSU, and a household serial number assigned before and one assigned after data entry. 59.0% (607,227 out of 1,028,545 participants) of non-pregnant adults in the CAB dataset were successfully merged to their corresponding sociodemographic information. Those whom we could not match had similar sample characteristics as those whom we were able to match; detailed tables of this comparison are shown in the appendix of our first publication with this data [6].

DLHS-4/AHS NFHS-4
Difference between the two most educated categories and the least educated category  Fig. 1d. Hypertension: association of district-level female literacy with the difference between completing at least secondary school and less than primary school. The points in the plot represent the regression coefficient from a linear probability model (for the absolute difference) and the Odds Ratio from a logistic regression (for the relative difference) comparing those participants who completed at least secondary school to those who did not complete primary school education in a district. These regressions regressed hypertension onto sex, age, and urban/rural residency separately for each district. The analysis included 595 districts in the NFHS-4 and 516 districts in the DLHS-4/AHS. The grey line through the scatterplots has been fitted using ordinary least squares regression (with each data point in the plot having the same weight). The p-value shows whether the slope of the grey line is significantly different from zero. The y-axis for the relative difference is on the logarithmic scale.
CAB measurements were conducted in all adult non-pregnant household members. Visitors were excluded from our dataset. Trained data collectors quantified blood glucose from a finger prick blood specimen with a handheld glucometer (SD CodeFree), which automatically converted capillary blood glucose readings into a plasma-equivalent value by multiplying with 1.11 [10]. Participants were instructed to fast overnight before blood glucose was measured the following morning. Blood pressure was measured with an electronic blood pressure monitor (Rossmax AW150) in the upper arm when the participant was sitting. Blood pressure measurements were repeated twice with a ten-minute interval between readings. A household questionnaire was used to ascertain the socio-demographic information that was used in our analysis. The respondent was the household head, who answered on behalf of all household members.
A more detailed description of the sampling procedure and data collection procedures is available in the state reports [8,9] and the CAB manual [11].

Measures of socio-economic status (SES)
We used educational attainment and household wealth as individual-level SES measures. Table 28 shows the ordinary least squares regression of household wealth onto educational attainment.
The household wealth quintile of DLHS-4 and AHS respondents was constructed as previously described [12]. Shortly, the household wealth quintiles were created by dividing a continuous a Fig. 2a. Hypertension: association of district-level median household wealth with the difference between the top two and bottom two household wealth quintiles computed for each district. The points in the plot represent the regression coefficient from a linear probability model (for the absolute difference) and the Odds Ratio from a logistic regression (for the relative difference) comparing the richest to the poorest household wealth quintile in a district. These regressions regressed hypertension onto sex, age, and urban/rural residency separately for each district. The analysis included 608 districts in the NFHS-4 and 517 districts in the DLHS-4/AHS. The grey line through the scatterplots has been fitted using ordinary least squares regression (with each data point in the plot having the same weight). The p-value shows whether the slope of the grey line is significantly different from zero. The y-axis for the relative difference is on the logarithmic scale. b Fig. 2b. Hypertension: association of a district's GDP/capita with the difference between the top two and bottom two household wealth quintiles computed for each district. The points in the plot represent the regression coefficient from a linear probability model (for the absolute difference) and the Odds Ratio from a logistic regression (for the relative difference) comparing the richest to the poorest household wealth quintile in a district. These regressions regressed hypertension onto sex, age, and urban/ rural residency separately for each district. The analysis included 462 districts in the NFHS-4 and 437 districts in the DLHS-4/AHS. The grey line through the scatterplots has been fitted using ordinary least squares regression (with each data point in the plot having the same weight). The p-value shows whether the slope of the grey line is significantly different from zero. The y-axis for the relative difference is on the logarithmic scale. c Fig. 2c. Hypertension: association of a district's urban population with the difference between the top two and bottom two household wealth quintiles computed for each district. The points in the plot represent the regression coefficient from a linear probability model (for the absolute difference) and the Odds Ratio from a logistic regression (for the relative difference) comparing the richest to the poorest household wealth quintile in a district. These regressions regressed hypertension onto sex, age, and urban/ rural residency separately for each district. The analysis included 608 districts in the NFHS-4 and 517 districts in the DLHS-4/AHS. The grey line through the scatterplots has been fitted using ordinary least squares regression (with each data point in the plot having the same weight). The p-value shows whether the slope of the grey line is significantly different from zero. The y-axis for the relative difference is on the logarithmic scale. d Fig. 2d. Hypertension: association district-level female literacy with the difference between the top two and bottom two household wealth quintiles computed for each district. The points in the plot represent the regression coefficient from a linear probability model (for the absolute difference) and the Odds Ratio from a logistic regression (for the relative difference) comparing the richest to the poorest household wealth quintile in a district. These regressions regressed hypertension onto sex, age, and urban/ rural residency separately for each district. The analysis included 608 districts in the NFHS-4 and 517 districts in the DLHS-4/AHS. The grey line through the scatterplots has been fitted using ordinary least squares regression (with each data point in the plot having the same weight). The p-value shows whether the slope of the grey line is significantly different from zero. The y-axis for the relative difference is on the logarithmic scale. a Fig. 3a. Obesity: association of district-level median household wealth with the difference between completing at least secondary school and less than primary school. The points in the plot represent the regression coefficient from a linear probability model (for the absolute difference) and the Odds Ratio from a logistic regression (for the relative difference) comparing those participants who completed at least secondary school to those who did not complete primary school education in a district. These regressions regressed obesity onto sex, age, and urban/rural residency separately for each district. The analysis included 531 districts in the NFHS-4 and 443 districts in the DLHS-4/AHS. The grey line through the scatterplots has been fitted using ordinary least squares regression (with each data point in the plot having the same weight). The p-value shows whether the slope of the grey line is significantly different from zero. The y-axis for the relative difference is on the logarithmic scale. b Fig. 3b. Obesity: association of a district's GDP/capita with the difference between completing at least secondary school and less than primary school. The points in the plot represent the regression coefficient from a linear probability model (for the absolute difference) and the Odds Ratio from a logistic regression (for the relative difference) comparing those participants who completed at least secondary school to those who did not complete primary school education in a district. These regressions regressed obesity onto sex, age, and urban/rural residency separately for each district. The analysis included 407 districts in the NFHS-4 and 376 districts in the DLHS-4/AHS. The grey line through the scatterplots has been fitted using ordinary least squares regression (with each data point in the plot having the same weight). The p-value shows whether the slope of the grey line is significantly different from zero. The y-axis for the relative difference is on the logarithmic scale. c Fig. 3c. Obesity: association of a district's urban population with the difference between completing at least secondary school and less than primary school. The points in the plot represent the regression coefficient from a linear probability model (for the absolute difference) and the Odds Ratio from a logistic regression (for the relative difference) comparing those participants who completed at least secondary school to those who did not complete primary school education in a district. These regressions regressed obesity onto sex, age, and urban/rural residency separately for each district. The analysis included 531 districts in the NFHS-4 and 443 districts in the DLHS-4/AHS. The grey line through the scatterplots has been fitted using ordinary least squares regression (with each data point in the plot having the same weight). The p-value shows whether the slope of the grey line is significantly different from zero. The y-axis for the relative difference is on the logarithmic scale. d Fig. 3d. Obesity: association of district-level female literacy with the difference between completing at least secondary school and less than primary school. The points in the plot represent the regression coefficient from a linear probability model (for the absolute difference) and the Odds Ratio from a logistic regression (for the relative difference) comparing those participants who completed at least secondary school to those who did not complete primary school education in a district. These regressions regressed obesity onto sex, age, and urban/rural residency separately for each district. The analysis included 531 districts in the NFHS-4 and 443 districts in the DLHS-4/AHS. The grey line through the scatterplots has been fitted using ordinary least squares regression (with each data point in the plot having the same weight). The p-value shows whether the slope of the grey line is significantly different from zero. The y-axis for the relative difference is on the logarithmic scale. a Fig. 4a. Obesity: association of district-level median household wealth with the difference between the top two and bottom two household wealth quintiles computed for each district. The points in the plot represent the regression coefficient from a linear probability model (for the absolute difference) and the Odds Ratio from a logistic regression (for the relative difference) comparing the richest to the poorest household wealth quintile in a district. These regressions regressed obesity onto sex, age, and urban/rural residency separately for each district. The analysis included 589 districts in the NFHS-4 and 461 districts in the DLHS-4/AHS. The grey line through the scatterplots has been fitted using ordinary least squares regression (with each data point in the plot having the same weight). The p-value shows whether the slope of the grey line is significantly different from zero. The y-axis for the relative difference is on the logarithmic scale. b Fig. 4b. Obesity: association of a district's GDP/capita with the difference between the top two and bottom two household wealth quintiles computed for each district. The points in the plot represent the regression coefficient from a linear probability model (for the absolute difference) and the Odds Ratio from a logistic regression (for the relative difference) comparing the richest to the poorest household wealth quintile in a district. These regressions regressed obesity onto sex, age, and urban/rural residency separately for each district. The analysis included 454 districts in the NFHS-4 and 389 districts in the DLHS-4/AHS. The grey line through the scatterplots has been fitted using ordinary least squares regression (with each data point in the plot having the same weight). The p-value shows whether the slope of the grey line is significantly different from zero. The y-axis for the relative difference is on the logarithmic scale. c Fig. 4c. Obesity: association of a district's urban population with the difference between the top two and bottom two household wealth quintiles computed for each district. The points in the plot represent the regression coefficient from a linear probability model (for the absolute difference) and the Odds Ratio from a logistic regression (for the relative difference) comparing the richest to the poorest household wealth quintile in a district. These regressions regressed obesity onto sex, age, and urban/rural residency separately for each district. The analysis included 589 districts in the NFHS-4 and 461 districts in the DLHS-4/AHS. The grey line through the scatterplots has been fitted using ordinary least squares regression (with each data point in the plot having the same weight). The p-value shows whether the slope of the grey line is significantly different from zero. The y-axis for the relative difference is on the logarithmic scale. d Fig. 4d. Obesity: association district-level female literacy with the difference between the top two and bottom two household wealth quintiles computed for each district. The points in the plot represent the regression coefficient from a linear probability model (for the absolute difference) and the Odds Ratio from a logistic regression (for the relative difference) comparing the richest to the poorest household wealth quintile in a district. These regressions regressed obesity onto sex, age, and urban/rural residency separately for each district. The analysis included 589 districts in the NFHS-4 and 461 districts in the DLHS-4/AHS. The grey line through the scatterplots has been fitted using ordinary least squares regression (with each data point in the plot having the same weight). The p-value shows whether the slope of the grey line is significantly different from zero. The y-axis for the relative difference is on the logarithmic scale. a Fig. 5a. Diabetes: association of district-level median household wealth with the difference between completing at least secondary school and less than primary school. The points in the plot represent the regression coefficient from a linear probability model (for the absolute difference) and the Odds Ratio from a logistic regression (for the relative difference) comparing those participants who completed at least secondary school to those who did not complete primary school education in a district. These regressions regressed diabetes onto sex, age, and urban/rural residency separately for each district. The analysis included 200 districts in the NFHS-4 and 469 districts in the DLHS-4/AHS. The grey line through the scatterplots has been fitted using ordinary least squares regression (with each data point in the plot having the same weight). The p-value shows whether the slope of the grey line is significantly different from zero. The y-axis for the relative difference is on the logarithmic scale. b Fig. 5b. Diabetes: association of a district's GDP/capita with the difference between completing at least secondary school and less than primary school. The points in the plot represent the regression coefficient from a linear probability model (for the absolute difference) and the Odds Ratio from a logistic regression (for the relative difference) comparing those participants who completed at least secondary school to those who did not complete primary school education in a district. These regressions regressed diabetes onto sex, age, and urban/rural residency separately for each district. The analysis included 155 districts in the NFHS-4 and 393 districts in the DLHS-4/AHS. The grey line through the scatterplots has been fitted using ordinary least squares regression (with each data point in the plot having the same weight). The p-value shows whether the slope of the grey line is significantly different from zero. The y-axis for the relative difference is on the logarithmic scale. c Fig. 5c. Diabetes: association of a district's urban population with the difference between completing at least secondary school and less than primary school. The points in the plot represent the regression coefficient from a linear probability model (for the absolute difference) and the Odds Ratio from a logistic regression (for the relative difference) comparing those participants who completed at least secondary school to those who did not complete primary school education in a district. These regressions regressed diabetes onto sex, age, and urban/rural residency separately for each district. The analysis included 200 districts in the NFHS-4 and 469 districts in the DLHS-4/AHS. The grey line through the scatterplots has been fitted using ordinary least squares regression (with each data point in the plot having the same weight). The p-value shows whether the slope of the grey line is significantly different from zero. The y-axis for the relative difference is on the logarithmic scale. d Fig. 5d. Diabetes: association of district-level female literacy with the difference between completing at least secondary school and less than primary school. The points in the plot represent the regression coefficient from a linear probability model (for the absolute difference) and the Odds Ratio from a logistic regression (for the relative difference) comparing those participants who completed at least secondary school to those who did not complete primary school education in a district. These regressions regressed diabetes onto sex, age, and urban/rural residency separately for each district. The analysis included 200 districts in the NFHS-4 and 469 districts in the DLHS-4/AHS. The grey line through the scatterplots has been fitted using ordinary least squares regression (with each data point in the plot having the same weight). The p-value shows whether the slope of the grey line is significantly different from zero. The y-axis for the relative difference is on the logarithmic scale. a Fig. 6a. Diabetes: association of district-level median household wealth with the difference between the top two and bottom two household wealth quintiles computed for each district. The points in the plot represent the regression coefficient from a linear probability model (for the absolute difference) and the Odds Ratio from a logistic regression (for the relative difference) comparing the richest to the poorest household wealth quintile in a district. These regressions regressed diabetes onto sex, age, and urban/rural residency separately for each district. The analysis included 373 districts in the NFHS-4 and 477 districts in the DLHS-4/AHS. The grey line through the scatterplots has been fitted using ordinary least squares regression (with each data point in the plot having the same weight). The p-value shows whether the slope of the grey line is significantly different from zero. The y-axis for the relative difference is on the logarithmic scale. b Fig. 6b. Diabetes: association of a district's GDP/capita with the difference between the top two and bottom two household wealth quintiles computed for each district. The points in the plot represent the regression coefficient from a linear probability model (for the absolute difference) and the Odds Ratio from a logistic regression (for the relative difference) comparing the richest to the poorest household wealth quintile in a district. These regressions regressed diabetes onto sex, age, and urban/rural residency separately for each district. The analysis included 282 districts in the NFHS-4 and 401 districts in the DLHS-4/AHS. The grey line through the scatterplots has been fitted using ordinary least squares regression (with each data point in the plot having the same weight). The p-value shows whether the slope of the grey line is significantly different from zero. The y-axis for the relative difference is on the logarithmic scale. c Fig. 6c. Diabetes: association of a district's urban population with the difference between the top two and bottom two household wealth quintiles computed for each district. The points in the plot represent the regression coefficient from a linear probability model (for the absolute difference) and the Odds Ratio from a logistic regression (for the relative difference) comparing the richest to the poorest household wealth quintile in a district. These regressions regressed diabetes onto sex, age, and urban/rural residency separately for each district. The analysis included 373 districts in the NFHS-4 and 477 districts in the DLHS-4/AHS. The grey line through the scatterplots has been fitted using ordinary least squares regression (with each data point in the plot having the same weight). The p-value shows whether the slope of the grey line is significantly different from zero. The y-axis for the relative difference is on the logarithmic scale. d Fig. 6d. Diabetes: association district-level female literacy with the difference between the top two and bottom two household wealth quintiles computed for each district. The points in the plot represent the regression coefficient from a linear probability model (for the absolute difference) and the Odds Ratio from a logistic regression (for the relative difference) comparing the richest to the poorest household wealth quintile in a district. These regressions regressed diabetes onto sex, age, and urban/rural residency separately for each district. The analysis included 373 districts in the NFHS-4 and 477 districts in the DLHS-4/AHS. The grey line through the scatterplots has been fitted using ordinary least squares regression (with each data point in the plot having the same weight). The p-value shows whether the slope of the grey line is significantly different from zero. The y-axis for the relative difference is on the logarithmic scale. a Fig. 7a. Current smoking: association of district-level median household wealth with the difference between completing at least secondary school and less than primary school. The points in the plot represent the regression coefficient from a linear probability model (for the absolute difference) and the Odds Ratio from a logistic regression (for the relative difference) comparing those participants who completed at least secondary school to those who did not complete primary school education in a district. These regressions regressed current smoking onto sex, age, and urban/rural residency separately for each district. The analysis included 390 districts in the NFHS-4 and 508 districts in the DLHS-4/AHS. The grey line through the scatterplots has been fitted using ordinary least squares regression (with each data point in the plot having the same weight). The p-value shows whether the slope of the grey line is significantly different from zero. The y-axis for the relative difference is on the logarithmic scale. b Fig. 7b. Current smoking: association of a district's GDP/capita with the difference between completing at least secondary school and less than primary school. The points in the plot represent the regression coefficient from a linear probability model (for the absolute difference) and the Odds Ratio from a logistic regression (for the relative difference) comparing those participants who completed at least secondary school to those who did not complete primary school education in a district. These regressions regressed current smoking onto sex, age, and urban/rural residency separately for each district. The analysis included 303 districts in the NFHS-4 and 429 districts in the DLHS-4/AHS. The grey line through the scatterplots has been fitted using ordinary least squares regression (with each data point in the plot having the same weight). The p-value shows whether the slope of the grey line is significantly different from zero. The y-axis for the relative difference is on the logarithmic scale. c Fig. 7c. Current smoking: association of a district's urban population with the difference between completing at least secondary school and less than primary school. The points in the plot represent the regression coefficient from a linear probability model (for the absolute difference) and the Odds Ratio from a logistic regression (for the relative difference) comparing those participants who completed at least secondary school to those who did not complete primary school education in a district. These regressions regressed current smoking onto sex, age, and urban/rural residency separately for each district. The analysis included 390 districts in the NFHS-4 and 508 districts in the DLHS-4/AHS. The grey line through the scatterplots has been fitted using ordinary least squares regression (with each data point in the plot having the same weight). The p-value shows whether the slope of the grey line is significantly different from zero. The y-axis for the relative difference is on the logarithmic scale. d Fig. 7d. Current smoking: association of district-level female literacy with the difference between completing at least secondary school and less than primary school. The points in the plot represent the regression coefficient from a linear probability model (for the absolute difference) and the Odds Ratio from a logistic regression (for the relative difference) comparing those participants who completed at least secondary school to those who did not complete primary school education in a district. These regressions regressed current smoking onto sex, age, and urban/rural residency separately for each district. The analysis included 390 districts in the NFHS-4 and 508 districts in the DLHS-4/AHS. The grey line through the scatterplots has been fitted using ordinary least squares regression (with each data point in the plot having the same weight). The p-value shows whether the slope of the grey line is significantly different from zero. The y-axis for the relative difference is on the logarithmic scale. a Fig. 8a. Current smoking: association of district-level median household wealth with the difference between the top two and bottom two household wealth quintiles computed for each district. The points in the plot represent the regression coefficient from a linear probability model (for the absolute difference) and the Odds Ratio from a logistic regression (for the relative difference) comparing the richest to the poorest household wealth quintile in a district. These regressions regressed current smoking (as a binary variable) onto sex, age, and urban/rural residency separately for each district. The analysis included 513 districts in the NFHS-4 and 514 districts in the DLHS-4/AHS. The grey line through the scatterplots has been fitted using ordinary least squares regression (with each data point in the plot having the same weight). The p-value shows the whether the slope of the grey line is significantly different from zero. The y-axis for the relative difference is on the logarithmic scale. b Fig. 8b. Current smoking: association of a district's GDP/capita with the difference between the top two and bottom two household wealth quintiles computed for each district. The points in the plot represent the regression coefficient from a linear probability model (for the absolute difference) and the Odds Ratio from a logistic regression (for the relative difference) comparing the richest to the poorest household wealth quintile in a district. These regressions regressed current smoking (as a binary variable) onto sex, age, and urban/rural residency separately for each district. The analysis included 387 districts in the NFHS-4 and 434 districts in the DLHS-4/AHS. The grey line through the scatterplots has been fitted using ordinary least squares regression (with each data point in the plot having the same weight). The p-value shows the whether the slope of the grey line is significantly different from zero. The y-axis for the relative difference is on the logarithmic scale. c Fig. 8c. Current smoking: association of a district's urban population with the difference between the top two and bottom two household wealth quintiles computed for each district. The points in the plot represent the regression coefficient from a linear probability model (for the absolute difference) and the Odds Ratio from a logistic regression (for the relative difference) comparing the richest to the poorest household wealth quintile in a district. These regressions regressed current smoking (as a binary variable) onto sex, age, and urban/rural residency separately for each district. The analysis included 513 districts in the NFHS-4 and 514 districts in the DLHS-4/AHS. The grey line through the scatterplots has been fitted using ordinary least squares regression (with each data point in the plot having the same weight). The p-value shows the whether the slope of the grey line is significantly different from zero. The y-axis for the relative difference is on the logarithmic scale. d Fig. 8d. Current smoking: association district-level female literacy with the difference between the top two and bottom two household wealth quintiles computed for each district. The points in the plot represent the regression coefficient from a linear probability model (for the absolute difference) and the Odds Ratio from a logistic regression (for the relative difference) comparing the richest to the poorest household wealth quintile in a district. These regressions regressed current smoking (as a binary variable) onto sex, age, and urban/rural residency separately for each district. The analysis included 513 districts in the NFHS-4 and 514 districts in the DLHS-4/AHS. The grey line through the scatterplots has been fitted using ordinary least squares regression (with each data point in the plot having the same weight). The p-value shows the whether the slope of the grey line is significantly different from zero. The y-axis for the relative difference is on the logarithmic scale. a Fig. 9a. Hypertension: association district-level primary school completion rate with the difference between richest and poorest household wealth quintile computed for each district. The points in the plot represent the regression coefficient from a linear probability model (for the absolute difference) and the Odds Ratio from a logistic regression (for the relative difference) comparing the richest to the poorest household wealth quintile in a district. These regressions regressed hypertension onto sex, age, and urban/rural residency separately for each district. The analysis included 606 districts in the NFHS-4 and 517 districts in the DLHS-4/AHS. The grey line through the scatterplots has been fitted using ordinary least squares regression (with each data point in the plot having the same weight). The p-value shows whether the slope of the grey line is significantly different from zero. The y-axis for the relative difference is on the logarithmic scale. b Fig. 9b. Obesity: association district-level primary school completion rate with the difference between richest and poorest household wealth quintiles computed for each district. The points in the plot represent the regression coefficient from a linear probability model (for the absolute difference) and the Odds Ratio from a logistic regression (for the relative difference) comparing the richest to the poorest household wealth quintile in a district. These regressions regressed obesity onto sex, age, and urban/rural residency separately for each district. The analysis included 528 districts in the NFHS-4 and 413 districts in the DLHS-4/AHS. The grey line through the scatterplots has been fitted using ordinary least squares regression (with each data point in the plot having the same weight). The p-value shows whether the slope of the grey line is significantly different from zero. The y-axis for the relative difference is on the logarithmic scale. c Fig. 9c. Diabetes: association district-level primary school completion rate with the difference between richest and poorest household wealth quintiles computed for each district. The points in the plot represent the regression coefficient from a linear probability model (for the absolute difference) and the Odds Ratio from a logistic regression (for the relative difference) comparing the richest to the poorest household wealth quintile in a district. These regressions regressed diabetes onto sex, age, and urban/rural residency separately for each district. The analysis included 142 districts in the NFHS-4 and 408 districts in the DLHS-4/AHS. The grey line through the scatterplots has been fitted using ordinary least squares regression (with each data point in the plot having the same weight). The p-value shows whether the slope of the grey line is significantly different from zero. The y-axis for the relative difference is on the logarithmic scale. d Fig. 9d. Current smoking: association district-level primary school completion rate with the difference between richest and poorest household wealth quintile computed for each district. The points in the plot represent the regression coefficient from a linear probability model (for the absolute difference) and the Odds Ratio from a logistic regression (for the relative difference) comparing the richest to the poorest household wealth quintile in a district. These regressions regressed current smoking (as a binary variable) onto sex, age, and urban/rural residency separately for each district. The analysis included 314 districts in the NFHS-4 and 503 districts in the DLHS-4/AHS. The grey line through the scatterplots has been fitted using ordinary least squares regression (with each data point in the plot having the same weight). The p-value shows the whether the slope of the grey line is significantly different from zero. The y-axis for the relative difference is on the logarithmic scale. a Fig. 10a. Hypertension: association of district-level primary school completion with the difference between the top two and bottom two household wealth quintiles computed nationally. The points in the plot represent the regression coefficient from a linear probability model (for the absolute difference) and the Odds Ratio from a logistic regression (for the relative difference) comparing the top two to the bottom two household wealth quintiles in a district. These regressions regressed hypertension onto sex, age, and urban/rural residency separately for each district. The analysis included 591 districts in the NFHS-4 and 501 districts in the DLHS-4/AHS. The grey line through the scatterplots has been fitted using ordinary least squares regression (with each data point in the plot having the same weight). The p-value shows whether the slope of the grey line is significantly different from zero. The yaxis for the relative difference is on the logarithmic scale. b Fig. 10b. Obesity: association of district-level primary school completion with the difference between the top two and bottom two household wealth quintiles computed nationally. The points in the plot represent the regression coefficient from a linear probability model (for the absolute difference) and the Odds Ratio from a logistic regression (for the relative difference) comparing the top two to the bottom two household wealth quintiles in a district. These regressions regressed obesity onto sex, age, and urban/ rural residency separately for each district. The analysis included 573 districts in the NFHS-4 and 448 districts in the DLHS-4/AHS. The grey line through the scatterplots has been fitted using ordinary least squares regression (with each data point in the plot having the same weight). The p-value shows whether the slope of the grey line is significantly different from zero. The y-axis for the relative difference is on the logarithmic scale. c Fig. 10c. Diabetes: association of district-level primary school completion with the difference between the top two and bottom two household wealth quintiles computed nationally. The points in the plot represent the regression coefficient from a linear probability model (for the absolute difference) and the Odds Ratio from a logistic regression (for the relative difference) comparing the top two to the bottom two household wealth quintiles in a district. These regressions regressed diabetes onto sex, age, and urban/rural residency separately for each district. The analysis included 368 districts in the NFHS-4 and 466 districts in the DLHS-4/AHS. The grey line through the scatterplots has been fitted using ordinary least squares regression (with each data point in the plot having the same weight). The p-value shows whether the slope of the grey line is significantly different from zero. The y-axis for the relative difference is on the logarithmic scale. household wealth index variable into quintiles, either at the district or national level. At the national level, this was done separately for rural and urban areas.
If the urban or rural proportion in a district was !5%, the computation of wealth quintiles at the district level was also performed separately for urban and rural areas. The continuous household wealth index was the standardized (to yield a mean of zero and standard deviation of one) first principal component from a principal component analysis (PCA) of binary variables, which indicated household ownership of durable goods and key housing characteristics (coded each as 1 or 0) [13]. The PCA was conducted separately for urban and rural areas. d Fig. 10d. Current smoking: association of district-level primary school completion with the difference between the top two and bottom two household wealth quintile computed nationally. The points in the plot represent the regression coefficient from a linear probability model (for the absolute difference) and the Odds Ratio from a logistic regression (for the relative difference) comparing the top two to the bottom two household wealth quintiles in a district. These regressions regressed current smoking onto sex, age, and urban/rural residency separately for each district. The analysis included 491 districts in the NFHS-4 and 499 districts in the DLHS-4/AHS. The grey line through the scatterplots has been fitted using ordinary least squares regression (with each data point in the plot having the same weight). The p-value shows whether the slope of the grey line is significantly different from zero. The yaxis for the relative difference is on the logarithmic scale. a Numbers in brackets are the numbers of districts remaining after excluding districts with urban population <5% or >95% and fewer than 50 participants in low or high SES category. Numbers without brackets are the final numbers for analysis (after excluding districts with fewer than 20 cases jointly in the low and high SES category for each risk factor).  iii) had five-year age group, sex, urban/rural residency as level 1 (the individual level) independent variables; and iv) districtlevel median household wealth, Gross Domestic Product (GDP) per capita, the percentage of participants in a district living in an urban area, and district female literacy rate as level 2 (the district level) independent variable. b The numbers in square brackets are 95% confidence intervals. c These models included educational attainment as level 1 independent variable and an interaction term between educational attainment and the district-level indicator. d These models included household wealth quintile as level 1 independent variable and an interaction term between household wealth quintile and the district-level indicator. The household wealth quintiles for NFHS-4 respondents were created using the same methodology. A more detailed description of the construction of the wealth indices in the NFHS-4 is provided by the DHS program [14]. The assets (ownership of durable goods) and key housing characteristics that were used to construct the household wealth index in each survey are listed in Table 29.
The construction of educational attainment categories is presented in Table 30. iii) had five-year age group, sex, urban/rural residency as level 1 (the individual level) independent variables. b The numbers in square brackets are 95% confidence intervals. c These models included educational attainment as level 1 independent variable. d These models included household wealth quintile as level 1 independent variable.  had five-year age group, sex, urban/rural residency as level 1 (the individual level) independent variables; and iv) district-level median household wealth, Gross Domestic Product (GDP) per capita, the percentage of participants in a district living in an urban area, and district female literacy rate as level 2 (the district level) independent variable. b The numbers in square brackets are 95% confidence intervals. c These models included educational attainment as level 1 independent variable and an interaction term between educational attainment and the district-level indicator. d These models included household wealth quintile as level 1 independent variable and an interaction term between household wealth quintile and the district-level indicator.  had five-year age group, sex, urban/rural residency as level 1 (the individual level) independent variables. b The numbers in square brackets are 95% confidence intervals. c These models included educational attainment as level 1 independent variable. d These models included household wealth quintile as level 1 independent variable.   had five-year age group, sex, urban/rural residency as level 1 (the individual level) independent variables; and iv) district-level median household wealth, Gross Domestic Product (GDP) per capita, the percentage of participants in a district living in an urban area, and district female literacy rate as level 2 (the district level) independent variable. b The numbers in square brackets are 95% confidence intervals. c These models included educational attainment as level 1 independent variable and an interaction term between educational attainment and the district-level indicator. d These models included household wealth quintile as level 1 independent variable and an interaction term between household wealth quintile and the district-level indicator.  had five-year age group, sex, urban/rural residency as level 1 (the individual level) independent variables. b The numbers in square brackets are 95% confidence intervals. c These models included educational attainment as level 1 independent variable. d These models included household wealth quintile as level 1 independent variable.  iii) had five-year age group, sex, urban/rural residency as level 1 (the individual level) independent variables; and iv) district-level median household wealth, Gross Domestic Product (GDP) per capita, the percentage of participants in a district living in an urban area, and district female literacy rate as level 2 (the district level) independent variable. b The numbers in square brackets are 95% confidence intervals. c These models included educational attainment as level 1 independent variable and an interaction term between educational attainment and the district-level indicator. d These models included household wealth quintile as level 1 independent variable and an interaction term between household wealth quintile and the district-level indicator.  The numbers in square brackets are 95% confidence intervals. c These models included educational attainment as level 1 independent variable. d These models included household wealth quintile as level 1 independent variable.

Table 12
Results from multilevel linear regressions for the interaction between district-level socio-economic development and participants' education and household wealth: High blood pressure (NFHS-4). a,b

NFHS-4
Absolute difference (% points) P  a All multilevel linear regression models i) had high blood pressure as outcome variable; ii) contained a random intercept for district; iii) had five-year age group, sex, urban/rural residency as level 1 (the individual level) independent variables; and iv) district-level primary school completion rate, district-level median household wealth, Gross Domestic Product (GDP) per capita, the percentage of participants in a district living in an urban area, and district female literacy rate as level 2 (the district level) independent variable. b The numbers in square brackets are 95% confidence intervals. c These models included educational attainment as level 1 independent variable and an interaction term between educational attainment and the district-level indicator. d These models included household wealth quintile as level 1 independent variable and an interaction term between household wealth quintile and the district-level indicator.  intercept for district; iii) had five-year age group, sex, urban/rural residency as level 1 (the individual level) independent variables. b The numbers in square brackets are 95% confidence intervals. c These models included educational attainment as level 1 independent variable. d These models included household wealth quintile as level 1 independent variable.

Table 14
Results from multilevel linear regressions for the interaction between district-level socio-economic development and participants' education and household wealth: High blood glucose (NFHS-4). a,b

NFHS-4
Absolute difference (% points) P  a All multilevel linear regression models i) had high blood glucose as outcome variable; ii) contained a random intercept for district; iii) had five-year age group, sex, urban/rural residency as level 1 (the individual level) independent variables; and iv) district-level primary school completion rate, district-level median household wealth, Gross Domestic Product (GDP) per capita, the percentage of participants in a district living in an urban area, and district female literacy rate as level 2 (the district level) independent variable. b The numbers in square brackets are 95% confidence intervals. c These models included educational attainment as level 1 independent variable and an interaction term between educational attainment and the district-level indicator. d These models included household wealth quintile as level 1 independent variable and an interaction term between household wealth quintile and the district-level indicator.  The numbers in square brackets are 95% confidence intervals. c These models included educational attainment as level 1 independent variable. d These models included household wealth quintile as level 1 independent variable.

Table 16
Results from multilevel linear regressions for the interaction between district-level socio-economic development and participants' education and household wealth: Diabetes (assuming AHS participants were not fasted). a,b

DLHS-4 AHS
Absolute difference (% points) P  contained a random intercept for district; iii) had five-year age group, sex, urban/rural residency as level 1 (the individual level) independent variables; and iv) district-level primary school completion rate, district-level median household wealth, Gross Domestic Product (GDP) per capita, the percentage of participants in a district living in an urban area, and district female literacy rate as level 2 (the district level) independent variable. b The numbers in square brackets are 95% confidence intervals. c These models included educational attainment as level 1 independent variable and an interaction term between educational attainment and the district-level indicator. d These models included household wealth quintile as level 1 independent variable and an interaction term between household wealth quintile and the district-level indicator.   iii) had five-year age group, sex, urban/rural residency as level 1 (the individual level) independent variables; and iv) district-level primary school completion rate, district-level median household wealth, Gross Domestic Product (GDP) per capita, the percentage of participants in a district living in an urban area, and district female literacy rate as level 2 (the district level) independent variable. b The numbers in square brackets are 95% confidence intervals. c In this analysis only male participants were included. d These models included educational attainment as level 1 independent variable and an interaction term between educational attainment and the district-level indicator. e These models included household wealth quintile as level 1 independent variable and an interaction term between household wealth quintile and the district-level indicator.    iii) had five-year age group, sex, urban/rural residency as level 1 (the individual level) independent variables; and iv) district-level primary school completion rate, district-level median household wealth, Gross Domestic Product (GDP) per capita, the percentage of participants in a district living in an urban area, district female literacy rate as level 2 (the district level) independent variable. b The numbers in square brackets are 95% confidence intervals. c In this analysis only female participants were included. d These models included educational attainment as level 1 independent variable and an interaction term between educational attainment and the district-level indicator. e These models included household wealth quintile as level 1 independent variable and an interaction term between household wealth quintile and the district-level indicator.  a Fig. 11a. Association of a district's primary school completion rate with the difference in the continuous household wealth index between highest and lowest household wealth quintile (computed for each district). The asset score was standardized by subtracting the mean and dividing by one standard deviation. The grey line through the scatterplots has been fitted using ordinary least squares regression (with each data point in the plot having the same weight). The p-value shows whether the slope of the grey line is significantly different from zero. We excluded districts with fewer than 20 participants in the highest or lowest household wealth quintile.
b Fig. 11b. Association of a district's primary school completion rate with the difference in the continuous household wealth index between highest and lowest household wealth quintile (computed for each district) stratified by urban-rural residency.
The asset score was standardized by subtracting the mean and dividing by one standard deviation. This analysis was performed separately for urban and rural areas. The grey line through the scatterplots has been fitted using ordinary least squares regression (with each data point in the plot having the same weight). The p-value shows whether the slope of the grey line is significantly different from zero. We excluded districts with fewer than 20 participants in the highest or lowest household wealth quintile.     were regressed onto the district-level indicators displayed in the columns. b District-level variables (as continuous variables) were centered and scaled by subtracting the mean and dividing by two standard deviations prior to fitting these models. c The numbers in square brackets are 95% confidence intervals.  were regressed on the district-level indicators displayed in the columns. b District-level variables (as continuous variables) were centered and scaled by subtracting the mean and dividing by two standard deviations prior to fitting these models. c The numbers in square brackets are 95% confidence intervals.