Environmental risk factors for reduced kidney function due to undetermined cause in India

Supplemental Digital Content is available in the text.

working environments, 12,13 (3) low altitude, 14,15 (4) exposure to pesticides via agricultural farming practices, 4,16-18 and (5) exposure to heavy metals and other contaminants via potable water sources. [19][20][21][22] Exposure to extreme ambient temperatures can cause dehydration and kidney volume loss, resulting in mortality from exacerbations of an existing chronic disease. 9,23 Studies investigating the heat hypothesis in relation to CKDu have shown that recurrent heat exposure together with extreme physical exertion and inadequate rehydration can lead to CKD in the absence of common risk factors diabetes, hypertension, or glomerulonephritis. 23,24 Furthermore, toxic agents [25][26][27] in soil and water can spread to wider communities via displacement or absorption into the food chain and can adversely affect susceptible individuals, particularly those with unhealthy lifestyles (i.e., heavy drinkers and/or smokers) and harsh working conditions (long hours conducting high-intensity labor). 18,28 Altitude is hypothesized to have a protective effect against reduced eGFR 10,14 with the hormone Erythropoietin thought to help slow renal disease progression, as it attenuates interstitial fibrosis and reduces apoptotic cell death, which is known to be a major contributing factor to the loss of renal function. 29 It has been postulated that the altitude threshold range for protection against renal damage can be observed between 250 and 1500 m. This estimate; however, this was from a single study, and this association will require further investigation. 14 Although most cases of CKDu have been reported across Central American countries, there is growing evidence that this disease is also present in India, where the prevalence is estimated to be as high as 73% in selected southern rural communities. 6 To date; however, no multifactorial environmental exposure studies have been conducted in India, and therefore it is currently unknown whether these risk factors are indeed relevant to the disease observed in the community, and whether environmental risk factors are observed across all CKDu-endemic regions are similar. Our study, therefore, aims to conduct exploratory environmental analyses in urban and rural areas across Northern and Southern India, to investigate whether the risk factors of temperature, altitude, and vicinity to agricultural land are associated with a low eGFR.

Study settings and participants
We used cross-sectional data from two population-based studies conducted in India: the "Centre for Cardiometabolic Risk Reduction in South Asia" cohort study (CARRS study) 30 and the "Implementing a comprehensive diabetes prevention and management program" study (UDAY study). 31 Both studies collected socioeconomic, anthropometric, and biosample data including household income, body mass index, blood pressure, and serum creatinine measures. Details on study design, participant selection, and variables collected for these studies have been previously described. [30][31][32] The CARRS study is a representative sample of adults ≥18 years of age (n = 12,270) between 2010 and 2011 in two urban sites in North (n = 6906) and South (n = 5364) India. The northern site was located in India's capital city New Delhi which covers a 1483 km 2 area and has a population of 16,787,941. 33 The southern site was in Chennai, the capital of Tamil Nadu state which covers an area of 426 km 2 and has a population of 8,653,521 34 ( Figure 1). We used data from both cross-sectional surveys which comprised 1798 participants from New Delhi and 3193 participants from Chennai.
The UDAY study was conducted on adults ≥30 years in urban and rural sites in Sonipat district, Haryana, North India, and Visakhapatnam district in Andhra Pradesh in South India. In both districts, the program was implemented in a sample of 100,000 participants in each rural and urban subsite with a total population of 400,000 participants. We used data from the first cross-sectional survey conducted among the general population (n= 12,243; Sonipat: n = 6208; Vizag: n = 6035) between July and December 2014. Bio samples were collected from 10,452 participants (Sonipat: n = 5110; Vizag: n = 5342). In Sonipat, the urban site was in Sonipat city (n = 3104), which covers a 388-km 2 area, and has a population of 1,450,000. 33 The rural site was in the Kharkhoda subdistrict (n = 3104) which covers 278 km 2 with a population of 135,844 33 ( Figure 1). The southern sites were located in Andhra Pradesh on the South East coast in the Visakhapatnam district (Vizag) which covers 11,161 km 2 , with a population of 4,290,589. 33 The urban site was in the city of Visakhapatnam (n = 2966), and the rural site was in mandals (which are similar to an administrative area) Makavarapalem and Nathavaram (n = 3069) (Figure 1). 31 The dataset comprised 1540 participants from urban Sonipat, 1640 participants from rural Sonipat, 1228 participants from urban Visakhapatnam, and 1816 participants from rural Visakhapatnam were included in the analysis.
We investigated the associations of the following environmental risk factors: (1) ambient temperature, 23,35 (2) altitude, 10,36 and (3) residential proximity to agricultural land 37 with eGFR as a continuous variable and risk of eGFR <60 as a categorical marker of CKD stage 3 or worse. 1,38 Proximity to agricultural land is increasingly being studied as a proxy to agrochemical exposure due to resource and financial constraints of effectively measuring pesticides in laboratories. Using satellite-derived environmental data, we assigned these environmental exposures to participant residential coordinates which were collected over the CARRS and UDAY study periods. 30,31

Data sources
We used satellite-derived heat index (HI), altitude, and land cover data to estimate participants' exposure.
To estimate the combined effects of temperature and humidity on the human body, we used an HI. [39][40][41][42] The HI is a commonly used exposure proxy for heat stress in environmental health studies [43][44][45] as it provides a "feels like" heat measure and is considered preferable to measuring air temperature alone. 41,42 We used the retrospective ERA-5 Land re-analysis dataset from the European Centre for Medium-Range Weather Forecasts which has a spatial resolution of 9 km. 46 We extracted monthly averaged 2-m height temperature and 2-m height dew point variables between October 2010-November 2011 and July 2014-December 2014 to capture the timeframes across which the CARRS and UDAY studies were conducted, respectively.
First, using the R studio "Weathermetrics" package we calculated the relative humidity using the temperature and dew point data variables using the following equation: where R = relative humidity, T d = Dewpoint, T = Temperature. 47 We then calculated the HI for each grid point in Celsius units using the following equation: . .
-.9 99 10 6 2 2 × − T R where HI = heat index; T = ambient temperature; R = relative humidity. 48 Using ArcGIS software, we modeled a continuous HI surface across the study sites, using the probabilistic interpolation method Kriging. For geostatistical modeling, the structure of spatial variation is estimated through the semivariogram which is a visual depiction of the covariance exhibited between each pair of points in the sampled data and determines the weights to apply to data points when developing predictions. 49 The variogram is expressed as: where h denotes the translation between any two sites s i and s j in a study area. 50 We conducted a sensitivity analysis comparing the Kriging method with other interpolation methods inverse distance weighting and Empirical Bayesian Kriging (EBK) to assess the best overall fit. We used a holdout cross-validation approach by which the dataset is randomly divided into a training and validation set (Table S1, Supplemental Digital Content 1; http://links.lww.com/EE/A153). The model is trained on the training dataset and evaluated on the validation dataset. 51 The overall fit was evaluated on the basis of the root mean squared error (RMSE) and the mean actual error. Sensitivity analyses showed that Kriging was the most accurate interpolation method in comparison to inverse distance weighting and EBK (Tables S2-S5 To create a continuous altitude surface across the study sites, we used a digital elevation model (DEM) from the Shuttle Radar Topography Mission, a global DEM giving coverage of void-filled data at a resolution of 30 m with a vertical accuracy of 20 m. 52 Altitude values were then assigned to the participant residential coordinate. For land cover, we used The European Space Agency Climate Change Initiative programme raster products which have a 300-m resolution 53 and a typology comprising 22 classes 53 including cropland and urban cover. To capture and assign a land cover class in the immediate neighborhood of each participant, we placed a 300-m buffer around each residential coordinate to match the land cover data resolution. We assigned the mode land cover class inside each buffer to capture the most common class for each participant and then grouped classes into "cropland" and "urban" cover classes. All environmental exposure values were assigned to residential coordinates using ArcGIS software version 10.5.1 by ESRI.

Data cleaning and coding
The dataset was pre-restricted for those with missing serum creatinine, age, and sex variables, as were those diagnosed or self-reported with diabetes (fasting plasma glucose ≥126 mg/dl), hypertension (systolic blood pressure ≥140 mm Hg, or diastolic blood pressure ≥90 mm Hg) and proteinuria [albumin/creatinine ratio (ACR) in urine ≥300 mg/g]. Detail on how these variables were measured are described elsewhere. 30,31 We assigned participant household coordinates to the restricted dataset using participant ID numbers. For more parsimonious models, we re-grouped income categories into three groups "unknown," "<30,000RS," and "≥ 30,000RS" which represented the midpoint of the salary categories. Those with missing coordinate data were also excluded.
Kidney function is measured using the eGFR calculated using serum creatinine, age, and sex variables. 38 Normal kidney function is defined by an eGFR >90 ml/min/1.73 m 2 body mass area. 54 For this dataset, eGFR was calculated using the Chronic Kidney Disease-Epidemiology Collaboration equation using serum creatinine, age, and sex data. 38

Statistical analyses
We used linear regression models to estimate the associations between eGFR and socioeconomic, anthropometric, and environmental exposure variables; and logistic regression models to estimate odds ratios (ORs) and corresponding 95% confidence intervals (CI) for risk of having a low eGFR (<60 ml/min/1.73 m 2 ) in relation to the aforementioned variables.
We tested each linear and logistic regression model for multicollinearity using the variance inflation factor (VIF) >10 as the threshold for exclusion. Due to multicollinearity between HI and altitude, we modeled these variables separately in the linear and logistic regression models.
Having ascertained the key risk factors for low eGFR and eGFR <60, we assessed the effects of environmental exposures on eGFR alone, accounting for different geographical locations. To do this, we used Linear mixed models (LMMs) to model associations between eGFR and environmental exposures between different geographical locations, adjusting for age and sex. LMMs are an extension of simple linear models which allow both fixed and random effects and are used when there is nonindependence in the data. The model assumes that observations in the same cluster (or study site) are more correlated than those in another study site, 55 that the average eGFR will differ between sites, and that the effect of environmental exposures on eGFR is stable across the study site. We modeled associations between environmental exposures HI, altitude, and land cover with eGFR. In each model, the fixed effects were eGFR and the satellite-derived environmental exposures listed above, and the random structures were stated with the subcategory "urban/rural area" which represented each study site. We then calculated a fixed-effects model which illustrates how each environmental exposure affects eGFR across the study sites.
All statistical analyses were conducted in R Studio version 3.5.1.
The HI values assigned to participants ranged from 23.95 to 30.31°C ( Figure 3C), and approximately 45% of participants lived within 300 m of "cropland" (n = 4951), 70% of which lived in rural areas (n = 3449) ( Figure 3B). The altitude ranged from 1 to 391 m above sea level ( Figure 3A). See Table 1 for a summary of environmental characteristics per study site.

Risk factors for reduced eGFR and eGFR <60
Using linear regression models, we tested for multicollinearity between altitude and HI, as these are generally inversely correlated. In fully adjusted models we observed multicollinearity between altitude (VIF 25.06) and HI, and north/south latitude with HI (VIF = 12.06). HI can be used as a proxy for latitude, as broadly temperatures in the north of India are generally cooler than in the South. 56 Tables 2 and 3 show linear and logistic regression models including crude estimates, models mutually adjusted for age and sex (model 1), and models fully adjusted for all socioeconomic and environmental risk factor variables (model 2). We observed large decreases in effect estimates between crude and minimally adjusted linear regression models specifically for sex, proximity to cropland, and education. After further stratified analyses, age was the key driver of these changes. In general, male participants were older than females and had a lower eGFR, and a higher proportion of older participants (with a lower eGFR) live in proximity to cropland than those in urban areas. In the case of education, an increasing number of school years were inversely associated with age. These factors likely explain these stepped differences in effect estimates when adjusted for age.
Overall, the coefficients do not vary widely between minimally and fully adjusted models and we report the results from the fully adjusted linear and logistic regression models only (Tables 2 and 3, model 2, respectively).
In linear regression models (Table 2, model 2), age was a key risk factor for reduced eGFR, with a decrease of 9.   Linear mixed models Figure 4 shows the results of the LMM for India which has been modeled with a hierarchy of state and urban/rural area as the random structures, and environmental exposures HI, altitude, and cropland as the fixed effects. The results indicate that the Southern study sites in Tamil Nadu and Andhra Pradesh had the lowest ranking mean eGFR values, and the northern urban sites in Delhi and Haryana had the highest. The mean eGFR values in rural Haryana (northern India) did not deviate from the overall population sample mean. The fixed-effects model (Table 4) shows the effects of the environmental exposures, HI, altitude, and cropland on eGFR in each study site, adjusted for age and sex. Residential proximity to cropland had a small negative association with mean eGFR [−0.89 (−0.44, −0.14)]. A weak protective association was observed with increasing temperatures [0.53 (0.89, 1.14)].

Discussion
Our environmental epidemiological analysis was conducted in states across Northern and Southern India; CKDu is known to be endemic in the latter region. We used data from 11,119 adult participants to assess whether key hypothesized environmental exposures such as HI, altitude, and proximity to land cover type were associated with having an eGFR <60 ml/min/1.73 m 2 in the absence of diabetes, hypertension, or proteinuria. Across all study sites, increasing age and sex (males) were associated with both low eGFR and increased risk of eGFR <60.
The environmental conditions observed across this population (higher temperatures, lower altitude, and much of the population living in and around agricultural land) concur with the environmental characteristics associated with CKDu across the literature. 13,37,57,58 The highest prevalence of CKDu was observed in Andhra Pradesh. This observation also corroborates with findings from other Indian studies 6,59 which found that the   Exchange rate (RS to USD) 0.001 at time of questionnaire; Hypertension = systolic bp ≥140 mm Hg, or diastolic bp ≥90 mm Hg; Diabetes = fasting glucose ≥7 mg/l; Proteinuria = ACR [Albumin Creatinine Ratio] ≥30 mg/mmol. *Effect estimate for altitude modeled separately from heat index due to multicollinearity.
prevalence of CKDu was markedly higher in this region in comparison with other urban and rural sites.
Of the environmental risk factors, vicinity to cropland was associated with both low eGFR and risk of eGFR <60. This finding corroborates with a previous study in El Salvador whereby proximity to specific crop types was significantly associated with suspected CKDu mortality and hospital admissions rates. 37 There are few studies that investigated the spatial distribution of CKDu in relation to proximity to land cover; however, there are some studies and reviews in India which investigated the association between CKDu and agricultural occupation, which suggest that those cultivating rice and cashew crops, as well as those using pesticides are the most affected subgroup of agricultural laborers. 16,59,60 Although the current dataset does not contain detailed employment information or high-resolution crop detail, collecting this data will be key in future studies across India to geographically highlight specific subcommunities which may be at a higher risk and help direct future observational and treatment plans in affected regions.
The LMM results presented slightly different observations from the linear and logistic regression models. The southern Indian study sites in Tamil Nadu and Andhra Pradesh had the lowest ranking mean eGFR values, and the northern urban sites in Delhi and Haryana had the highest. Like the linear and Table 3.
Associations of sociodemographic, anthropometric, and environmental characteristics with eGFR <60 in participants without diabetes, hypertension, and heavy proteinuria in India, n = 11,119. logistic regression models, vicinity to cropland had a negative association with eGFR. Interestingly, a weak positive association was observed with increasing HI and eGFR. This observation does not support the hypothesis of heat stress nephropathy that persistent heat exposure and inadequate rehydration leading to CKD and may therefore count against the current heat hypothesis. 9,12,23 Although exposure to high temperatures is a key hypothesis in the literature, there are some important counter arguments that are important to consider. Herath et al. 61 argued that there is sparse evidence of CKDu in workers exposed to heat in most tropical regions, which was supported by the finding that it is also seen in people who are not exposed to heat stress in these affected regions, and therefore there is inadequate evidence for heat being the initiating or main cause of CKDu. In addition to this, a systematic review of risk factors for CKDu in Central America did not identify heat as a risk factor. 3 Our study has some potential limitations. First, our dataset had only single eGFR measures meaning we may not distinguish acute kidney injury from CKD, potentially resulting in case misclassification and inflated prevalence estimates. Furthermore, the use of cross-sectional data cannot prove causality in relation to environmental exposures; however, it can help to generate causal hypotheses which can be further investigated in future cohort studies in affected regions. Second, there were no occupational variables in our dataset, therefore it was not possible to link the proximity to landcover associations to a specific occupational category in this population. In addition, the use of indirect measures of exposure to pesticides could have resulted in exposure misclassification potentially attenuating estimates.
Finally, unmeasured biological confounders across this Indian sample -such as endemic hepatitis which is linked with nephropathy 62-64 -could be linked to excess CKD prevalence was not measured in these populations which could also affect our estimates. Further sensitivity analyses into the effects of these residual confounders such as probabilistic bias analysis 65 could be investigated in future work; however, this is outside the scope if this study.
Strengths of our study include the use of a large, randomly selected sample population in urban and rural areas across the north and south of India. Second, this is the first study in India that has used satellite-derived imagery to investigate associations between environmental risk factors and CKDu.
The use of satellite measurements has advantages over conventional ground measurements as data can be collected repeatedly and automatically and can provide better coverage than ground monitors. The growth in the use of remote sensing and geographic information systems in public health has facilitated the analyses of multiple environment-disease associations at varying geographical resolutions and continues to be a valuable exposure analysis tool, particularly in developing nations that may be too resource-constrained to conduct individual-level environmental exposure analyses.

Conclusions
The findings from this environmental epidemiological analysis show that the environmental risk factor of residential proximity to cropland (particularly in Southern India) appears to have a negative impact on eGFR. Although we must be cautious in our interpretation of these initial observations, these findings are inconsistent with the current hypothesis that CKDu is a heat-induced disease but are reasonably consistent with some of the other hypotheses such as exposure to pesticides using proximity to cropland as a proxy. The use of satellite-derived data to model environmental exposures of CKDu at the individual level is a useful step in identifying subpopulations at risk of CKDu and could help to direct further environmental investigations in affected regions. In future studies, the collection of detailed employment information and occupational practices will be key in helping to identify further potential associations with environmental risk factors. . Linear mixed model caterpillar plot of eGFR in study sites in India accounting for land cover, heat index, and altitude. Intercept denotes the overall mean eGFR across the study sites; blue circles represent the deviation from the mean eGFR for each zone; black bars represent 95% confidence intervals.