High-resolution model for estimating the economic and policy implications of agricultural soil salinization in California

This work introduces a generalizable approach for estimating the field-scale agricultural yield losses due to soil salinization. When integrated with regional data on crop yields and prices, this model provides high-resolution estimates for revenue losses over large agricultural regions. These methods account for the uncertainty inherent in model inputs derived from satellites, experimental field data, and interpreted model results. We apply this method to estimate the effect of soil salinity on agricultural outputs in California, performing the analysis with both high-resolution (i.e. field scale) and low-resolution (i.e. county-scale) data sources to highlight the importance of spatial resolution in agricultural analysis. We estimate that soil salinity reduced agricultural revenues by $3.7 billion ($1.7–$7.0 billion) in 2014, amounting to 8.0 million tons of lost production relative to soil salinities below the crop-specific thresholds. When using low-resolution data sources, we find that the costs of salinization are underestimated by a factor of three. These results highlight the need for high-resolution data in agro-environmental assessment as well as the challenges associated with their integration.


Introduction
Maintenance and intensification of agricultural practices will be critical to meeting the nutritional demands of the world's growing population. One widely practiced method of intensification, crop irrigation, can also lead to unintentional soil degradation and reduced crop yield when the total dissolved solids (TDS) concentration of the irrigation water is high or a substantial fraction of water is lost to direct evaporation [1,2]. Quantifying the economic and social costs of soil salinization is critical to assessing conditions under which technology or policy intervention is necessary to correct market inefficiencies [3].
Soil salinization, the process by which dissolved solids in the irrigation water accumulate in the root zone as irrigation water evaporates or transpires, is problematic in arid regions, in regions dependent on groundwater, in regions that have adopted waterconserving irrigation practices, and in regions with shallow, impermeable soil layers [4]. In the last case, the applied water forms a perched water table, from which salts can be transported back to the surface through capillary action [5]. Once salinized, agricultural soils become less productive due to the combination of osmotic and ionic stress exerted on the plant [6,7]. Growers may choose to fallow salinized land, leading to land use change [8] and greenhouse gas emissions [9,10], or continue to cultivate less productive salinized land by switching crops and/or adjusting management practices, leading to reduced revenues and intensification of agricultural inputs (e.g. water).
Best management practices for mitigating the effects of soil salinization in high-risk agricultural areas have largely been derived from laboratory and field-scale experiments that elucidate the mechanisms by which soil salinization diminishes crop yield. These studies have probed the relationship between irrigation water quality and soil salinity, have provided yield reduction models that relate expected yield to soil salinity for a specific crop [7,[11][12][13], and have evaluated the efficacy of various remediation strategies. The most prevalent remediation strategy is salinity leaching, or the excess application of irrigation water to drain soils of existing salt content. Field experiments over a range of soil and crop types have been used to identify leaching fractions capable of maintaining soil salinity at moderate levels, but these leaching guidelines may not be implemented due to practical constraints on water availability or environmental discharge [14][15][16][17].
In addition to the work seeking to identify best practices for managing saline soils at the farm-scale, a separate body of literature has quantified the regional extent of soil salinization. Combining local inventories of affected land area with expert judgment, the UN Global Assessment of Human-induced Soil Degradation (GLASOD) program estimated that 76 million hectares, an area larger than France, were affected by human-induced salinization in 1991 [1]. More recent data compilation efforts suggest that China, Australia, and Pakistan are all experiencing the negative impacts of agricultural soil salinization [5,18,19]. Beyond land inventories, hydrological models have estimated the rate of salt flux over large areas [20] and informed predictions about future salt accumulation under current management practices.
These regional assessments of soil salinity also serve as the basis for work quantifying the future social and economic impacts of soil salinization. Typically formulated as an optimization problem in which individual agents maximize their profits given constraints and costs of the inputs to production, these studies estimate changes in the economic output of a region relative to a baseline year. Examples include estimated revenue losses due to inefficient management of 14.5% in the Tungabhadra project in western India [21]; additional profit losses of $1.8 to $3.6 billion annually by 2030 over 2008 levels in the Central Valley of California [22]; and annual profit losses of 44%-87% in the Murray-Darling Basin due to the combined effects of climate change and salinity [23]. Each of these approaches quantifies losses relative to a baseline scenario, rather than the total current losses incurred, making it difficult to assess the full value of soil remediation efforts.
These loss estimation approaches typically rely on aggregated data for either crop or soil parameters, while yields are determined by field-scale processes. Unfortunately, using aggregated data can introduce bias into the estimate, as demonstrated in the climate adaptation and downscaling literature [24]. Similarly, the characteristic resolution of data can have large impacts on the final output of the analysis [25,26]. While these effects have seldom been studied with regards to agricultural optimization models employed to assess soil salinization, there are examples of these effects within agricultural models generally [27].
The present work makes three contributions. First, we develop a novel method for quantifying the absolute yield and revenue losses attributable to soil salinization. Integrating high-resolution satellite data, interpolated ground measurements, and county level yields and prices, we extend grower models traditionally applied to field-scale processes to understand regional-scale trends in the productivity of critical agricultural regions. Salinity and crop data are analyzed at the pixel scale (30 m by 30 m resolution), allowing precise modeling of specific soil conditions and avoiding the use of regional averages for spatially sensitive parameters. Second, we incorporate techniques to remove bias from our satellite-based estimates using coarse resolution validation data as a basis for quantifying classifier accuracy. This step is important in achieving correct results, and is documented in appendices B and D. Finally, we compare the model output to results from an analogous model using coarser resolution data to assess the influence of data resolution on the magnitude of the estimated salinity impacts. We apply these models to estimate 2014 yield and revenue losses from soil salinization in the state of California and to study the effects of spatial aggregation in model estimation.

Methods
We assess the costs of salinization by analyzing high resolution pixel-level data. These costs are quantified by calculating yield and revenue losses relative to a hypothesized non-salinized baseline state. After analyzing the model with high-resolution data sources, we apply the same approach to data aggregated at the regional level. We then assess the advantages of using the computationally intensive disaggregated data in comparison with more aggregated data sources.
2.1. Disaggregated approach for estimating yield and revenue losses from soil salinization We quantify current losses due to soil salinization in terms of yield (tons) and revenue (dollars). Yield loss at each pixel is calculated using (1), where Y L p is the yield lost (Y L ) due to salinity at pixel p. P c Ã ;c is the probability that given the satellite based crop classifier indicates that a pixel contains a particular crop c Ã , the pixel actually contains crop c. This term is used in order to remove bias from the estimate; a procedure further discussed in appendix B. F p;c is the fraction of maximum yield achieved given existing levels of salinity and choice of crop, estimated in (2). Y M c is the theoretical maximum yield, or the estimated crop-specific yield in the absence of soil salinity, and is estimated in (3).
Environ. Res. Lett. 12 (2017) 094010 Equation (2) models the crop salt tolerance response, as originally developed in Maas and Hoffman [12] where S S p is the soil salinity at a particular pixel and a c and b c represent the crop-specific threshold and slope response. The function is piecewise linear, with F p;c equaling 1 until S S p reaches a c , then linearly decreasing at rate b c until reaching 0. In (3) the theoretical maximum yield, Y M c , is calculated by dividing observed regional data on yields (Y r;c ) by regionally averaged F p;c values, where n represents the number of pixels in region r. This results in a single estimate of average maximum yield Y M c for each crop in each region, which captures differences in soil fertility, climate, and technology between regions.
Following this method, we estimate total area wide yield losses Y TL (tons) using (4), where k is a coefficient that converts the intensity of yield from acre −1 to pixel −1 .
Similarly, revenue loss R L p is obtained by estimating the fraction of theoretical maximum revenue R M p;c that is realized given the salinity impacted yields (equations 5 and 6). The formulation for revenue loss parallels that of yield lost in (1), with the addition of regional prices p r;c to convert maximum yield to maximum revenue. R L p is translated into total revenues lost R TL by summing over all pixels in the study and multiplying by the correction factor k.
2.2. Aggregated approach for estimating yield and revenue losses from soil salinization The aggregated approach mimics the disaggregated approach, but substitutes regionally aggregated estimates in place of pixel-level crop acreage estimates and salinity values. First, regional salinity values S S r are calculated by averaging pixel level salinity values across the region. Next, the fraction of maximum yield achieved is estimated (7). Regional yield losses are calculated by estimating the theoretical maximum yield Y M c (8) and assessing the impact of salinity S S r on yields and revenues (9,10). Lastly, estimates of total yields lost Y TL and revenues lost R TL are obtained by multiplying the per acre revenue and yield losses (Y L r;c , R L r;c ) by regional crop acreages (A c;r Þ and summing over all crops and regions. 2.3. Case study: California yield and revenue losses from soil salinization We apply these methods to assess the effects of soil salinization on yields and revenues in the state of California. California is the highest grossing agricultural state in the United States, with 2013 cash receipts of $46.4 billion, or 12% of US agricultural totals [28]. Growers in the arid Central Valley of California are dependent on irrigation to sustain agricultural output and have long been plagued with soil salinization issues. Reduced yields from soil salinity are likely to be exacerbated during periods of drought, when application of leaching water is curtailed. We combine statewide agricultural statistics from the California Department of Water Resources (DWR) with national statistics from the National Agricultural Statistics Service (NASS) to populate the yield reduction model, with county-level data serving as regions. We use the top 20 most profitable (highest gross revenues) crops in California for participation in the study (table A1). When combined, these crops account for over 95% of the non-livestock agricultural cash receipts in the state [29]. In addition to these twenty crops we account for fallowed land, bringing the total crop categories to 21. Crop data are released as 30 m square pixels, which define the characteristic resolution of the case study.
The data is sourced from a variety of agencies that publish on intermittent intervals, requiring the combination of data from 2013 and 2014 in the loss analysis. The two driving data sources, crop patterns and salinity, are both for 2014. Prices and yields are for 2013, but have low year-to-year variation. We thus consider the year of analysis to be 2014. Full detail on data sources is given in appendix A.

Statistical analysis
We perform a statistical analysis to determine the magnitude of the linear correlation between salinity and four parameters: crop marginal value, crop salt tolerance, estimated yield reduction, and estimated revenue losses per acre (table G1). To perform the regression, we use the vector salinity data and aggregate the four parameters up to the same scale using zonal averages. The regression is estimated using a generalized additive model (GAM) to control for latitude and longitude using a thin plate spline regression. See appendix G for additional detail.

Results
This method provides the first quantitative estimate of lost yields and revenues due to soil salinity at a sufficiently high resolution (30 m) for both field and region-level decision making. We demonstrate the value of these methods using the state of California as a case study.  Environ. Res. Lett. 12 (2017) 094010 crops (r = À0.30). When regressing each of these four parameters of interest on salinity, we observe a statistically significant response for each variable's coefficient (see table G1). We find that salinity is correlated with lower relative yields (b ¼ À5:38), higher revenue losses (b ¼ 364:9), higher crop tolerance (b ¼ 0:034), and lower crop revenues (b ¼ À304: 46). Each of these coefficients are significant at the p < 0.001 level.
Salinity values are highest in the Imperial Valley (located in southeast California along the border with Mexico) and the southern Central Valley. Relative yield is driven by two parameters: soil salinity and crop salt sensitivity (table A1). Although we observe that growers compensate for elevated levels of soil salinity by planting salt tolerant crops on salinity-impaired fields (figure 1(a)), relative yield remains lowest where salinity values are highest.
The spatially resolved data from figure 1(b) is replotted in figure 2(a) as a cumulative density function (CDF) which relates the fraction of agricultural land in California to the percentage of relative yield. According to the Cropland Data Layer published by USDA, approximately 1.7 million acres of California farmland are fallowed and produce no agricultural output, encoded with a relative yield of zero. Another 1.6 million acres have reduced agricultural yield, reporting salinity in excess of the tolerance threshold of the current crop mix. The existing salinity levels on the final 4.8 million acres of agricultural land are unlikely to affect yield for the current crop mix. Aggregating across all agricultural farmland in California, we estimate that soil salinization reduces crop yields by 8.0 million tons annually.
Reduced agricultural yields result in lost revenue. Revenue losses (figure 1(c)) are highest in the western San Joaquin Valley, with losses as high as $3000 per acre on select fields. While yield losses are higher in the Imperial Valley, revenue losses are less substantial as growers are primarily planting lower revenue crops, such as alfalfa. The CDF of revenue lost per acre is reported in figure 2(a). At the state level, we estimate that soil salinization reduced grower revenues by $3.7 billion, or 7.9% of California agricultural output, in 2014. We find that this value likely ranges between $1.4 billion and $7.0 billion and that nearly all the uncertainty in our data can be attributed to uncertainty in the salinity measurements (figure D1). A similar analysis is conducted analyzing the lost calorie production in appendix F.
Lost yield and lost revenue for individual crops are plotted in figure 2. The other high uncertainty crops (OHUC) category is an amalgamation of the seven crops that could not reliably be identified using remote sensing techniques as described in appendix D ( figure  D1). Together, almonds, strawberries, grapes, and alfalfa account for approximately half (49.5%) of the total $3.7 billion in annual revenue loss. Revenue lost is a function of soil salinity, crop sensitivity, crop marginal revenue, and total crop acreage. While these four crops account for a large percentage of total acreage, they also experience higher yield reductions on a per acre basis. Relative yield averages 85% for these crops, compared to a 94% average yield for all other crops.

Results of adopting an aggregated approach to estimating yield and revenue losses from soil salinization in California
A similar approach is applied using regional data. Cropping data comes from county-level statistics (see appendix C for detail) and salinity data are aggregated to the county level ( figure 1(d)).
There are several prominent differences between results from the aggregate and disaggregate analyses. The first is that all of the estimated lost revenues occur in just 10 counties, compared with the 40 counties with estimated revenue loss using the disaggregated approach. Of the 10 counties with lost revenues, 97% of the losses occur in three counties (Imperial, Kings, and Merced). This contrasts with the wider distribution of losses that are estimated by the disaggregated model. Two effects are driving the differences between the aggregated and disaggregated approaches. The first is that salt-sensitive, high revenue crops are not likely to be grown on soils with salinity levels equal to the county average. Indeed, the spatially resolved satellite data shows a negative relationship between salinity and crop marginal revenue and a positive relationship between salinity and salt tolerance (appendix E). This causes an upward bias in loss estimates.
The second effect is that if elevated salinity is confined to a small geographic area within the county, then the average salinity for the county may be low. While models based on disaggregated data estimate damages in the area with elevated salinity, models based on aggregated data inherently smooth over variability in field-level soil salinity. As a result, these aggregated modeling approaches may miss salinity losses if the county average salinity is below the threshold for the cultivated crops. This effect will cause a downward bias in the final estimate.
At the state level, the aggregated model estimates annual revenue losses of $1.0 billion, compared with the $3.7 billion estimated in the disaggregated approach. This number lies outside the $1.7-$7.0 billion uncertainty analysis in appendix C.
The differences in the two estimates may result from either the different crop data sources (i.e. satellite vs regional surveys) or the spatial resolution of the analysis. To eliminate the effect of the differing crop data sources, we perform a third analysis where we run the aggregated model using pixel-level satellite data that has been aggregated at the county level to estimate regional crop acreage. When we perform this analysis, we find that the aggregated satellite data results in similar cumulative damages ($1.2 billion) to the those estimated using regional cropping data ($1.0 billion) and leaves figure 1(d) unchanged. Thus, the lower yield and revenue loss estimates stem directly from performing the analysis at the county, rather than pixel, scale.
Lastly, in appendix H, we analyze the effect of using higher resolution salinity data sourced from satellite mesaurements which are available for a subset of California [47,48]. We find even higher damage estimates when using these data, supporting our conclusion that lower resolution data may underestimate revenue loss.

Discussion
In analyzing agricultural systems there has historically been a tradeoff between the scale of the analysis and its resolution. Field measurements are highly accurate, but can be costly to collect at sufficient density over large regions. Regional estimates provide data at broader scales, but are typically limited in their ability to describe a variable's spread and correlation with other variables, factors which are of critical importance in assessing spatially distributed processes such as salinization. Recent improvements in remote sensing, combined with modern data storage and processing, are helping to circumvent the scale/ resolution tradeoff. By continuously collecting measurements with both high spatial and temporal resolution, orbital sensors are capable of describing a variable's entire distribution as well as its spatial correlation with other covariates. As the accuracy and availability of remotely sensed data increase, effectively integrating this information with more traditional, regional data sources offers significant promise for improving the accuracy of regional level agricultural policy analysis.
In this study, we present a novel method that integrates high-resolution satellite data, interpolated ground measurements, and county-level yields and prices to estimate the regional effects of soil salinization on agricultural productivity. The estimates are performed at the field-scale, allowing the model to capture the local variation inherent in agricultural systems. The method has broad applicability for testing alternative management practices and policies at the regional level. Moreover, the generalized approach may serve as a template for integrating multi-modal data to assess the economic effects of soil degradation on agriculture at high resolution over regional scales.
While the general approach to estimating the revenue impacts of soil salinization is valid, the proposed methods are limited by their inability to predict crop switching, as well as their continued reliance on regional averages for a number of the inputs. The model quantifies losses based on current cropping patterns and does not account for how producers might adjust their production practices given changes in resource availability, soil quality, or other factors. For example, as salinity levels decrease, growers are likely to switch from lower value, salt tolerant species to higher value, salt sensitive crops. The current analysis does not account for the theoretical gain of revenue that growers would accrue due to switching crops, meaning the current model is likely to underestimate the revenue losses from soil salinity. In order to address this limitation, we would need to expand to a spatially resolved, multi-year data set. Applying panel data methods to such data would allow for the estimation of switching costs, and accounting for these costs could provide an avenue for estimating the likely change in crop mix as salinity levels decrease. Second, the model is limited by the available data. The salinity data is sourced from interpolated ground measurements which are smoothed through time and space, increasing the probability that reported values deviate from actual field conditions. Emerging techniques for remotely collecting soil salinity data, which rely on either airborne electromagnetic surveys or salinity estimates from orbital sensors [34][35][36], will Environ. Res. Lett. 12 (2017) 094010 further enhance the resolution of soil salinity estimates and reduce uncertainty in yield reduction analyses.
Lastly, the formulas approximating the relationship between leaching fraction, soil salinity, and crop yield represent average responses of each crop class. They do not account for variation in other soil parameters or management practices that influence yield in specific fields, including irrigation practices, soil organic carbon and micronutrient concentrations, or the use of salt tolerant cultivars. To account for this underlying variability, we perform extensive uncertainty analysis on the yield response function as described in appendix C. Future improvements in the remote sensing of soil quality parameters and crop yield may enable stronger statistical prediction of yield response. If successful, revised yield response functions would be easily incorporated into this analysis framework.
The limitations of the method are critical for interpreting the results of the case study. The estimated yield and revenue losses for California are calculated directly from current cropping patterns and do not account for crop switching. Accounting for crop switching is likely to increase the estimated revenue losses, though the effect on relative yield is less clear. The effect of incorporating local estimates for salinity, yield, and water use in place of regional ones is also uncertain, and depends on underlying correlations between the data. For instance, it is possible that theoretical maximum yield is correlated within region to areas with either lower or higher salinity levels, an effect that would cause our results to be biased upwards or downwards, respectively. Directly measuring these correlations and accounting for their effects is a significant motivation for seeking higher resolution data.
Despite these limitations, multi-modal models for estimating the effects of soil salinization on agricultural productivity offer valuable insight into the magnitude of yield and revenue losses in vulnerable regions. The estimated 2014 losses of 8.0 million tons of yield and $3.7 billion in 2014 are a significant fraction of state agricultural outputs of 69 million tons with a combined worth $46 billion. To put this in perspective, these estimates suggest that the yearly economic damages due to soil salinization are of comparable scale to the yearly damages associated with the California drought in 2014 and 2015 [37,38]. While the uncertainty analysis (appendix C) accounts for multiple years of data, the results presented are for 2014 and may differ under non-drought conditions.
We have shown that salinity has large impacts on Californian agriculture. Leaching, the primary strategy for managing soil salinity, is likely to be further constrained in California as a result of the high selenium content of this agricultural drainage discharge [39]. At the same time, drought and reduced snowpack water storage is expected to limit the water supply critical to leaching practices. Alternatives to salinity leaching include land fallowing or the application of lower salinity irrigation water, leading to climate impacts [9,10] or the need for costly water treatment systems [3,40]. In short, salinization is likely to remain a significant societal and technological challenge in arid regions such as California. Successful management will rely on accurate monitoring and assessment coupled with impact analyses that are performed at a spatial scale that captures the underlying mechanisms of yield loss.
Appendix A: Data for case study A.1. Study area A map is provided in figure A1 as a spatial reference for various geographic features of interest in California. Table A1 reports the crops assessed in this study. These 20 crops correspond to those crops with highest revenues as outlined in the statewide 2013 crop report [28]. Together, they correspond to over 95% of the revenues generated by the non-livestock agricultural sector.

A.2. NASS county commissioner data
Yields and prices are obtained from the NASS County Agricultural Commissioner's Data [29]. These data are published yearly and report statewide crop yields and prices at county-level resolution. In table A2, the crop names used in this study are paired with their corresponding NASS names and commodity codes. If multiple NASS crops are listed for a single study crop, we calculate the weighted average of the yields and prices.
If no yield or price data was available for a particular crop in a particular county, the values reported for 'sum of others' was used in its stead. For a single crop (walnuts) no values were available for 'sum of others,' and instead we substituted state averages. All reported values are for 2013.
Alongside the crop in table A1 we display the threshold and slope parameters for the yield reduction model (4). These parameters are collected from a number of studies carried out in the mid-twentieth century that were first summarized in Maas and Hoffman [12] and subsequently updated and republished. While presented in numerous publications in varying degrees of completeness, we found no discrepancies between the values reported in the different articles and reports [7,11,12,41]. Where possible, we use values in more recent publications. For three crops (walnuts, pistachios, and oranges), no direct threshold and slope parameters were given. Rather, they were categorized into one of several tolerance groupings (sensitive, moderately sensitive, moderately tolerant, and tolerant). For each of these categories, a representative threshold and slope parameter were chosen based on graphical representations reported in Hoffman [41] and other publications. In order to use the agricultural county commissioner's data, we first match the 20 study crops with NASS commodities codes [29]. The crops corresponded to NASS commodities with a 'one-to-many' relationship. Table A2 reports the mapping used in this study. Once mapped, we develop county and state level datasets containing information on yield, prices, and revenue per acre. These datasets inform key parameters in our study.
A.3. NASS cropland data layer NASS produces the Cropland Data Layer (CDL) satellite-based crop classifier [42]. The CDL distinguishes between 132 distinct crops with an overall accuracy rating of 84.9%. Table D1 relates crops in this study to CDL object identifiers and provides the producer and user accuracy for each crop. Producer accuracy represents the number of ground-truth points accurately classified in generating the map, representing the likelihood that a random crop will be correctly rendered. User accuracy represents the likelihood that a given pixel on the map is actually what is found in the field. These ground-truth points are used to remove bias from the estimates, as discussed previously. All reported values are for 2014.
A.4. Gridded soil survey geographic (gSSURGO) database SSURGO is a nationwide dataset developed from the National Cooperative Soil Survey (NCSS). NCSS is a collaboration between federal, state, and private institutions with the goal of disseminating information about the state of soils across the country led by the US Department of Agriculture and the National Resource Conservation Service. SSURGO map scale is between 1:12 000 and 1:63 360 and is the most detailed soil survey product available from the program [43].
The FY2015 gSSURGO database is a December 1, 2014, snapshot of the soil data mart database released in the Environmental Systems Research Institute, Inc. (ESRI) file geodatabase format at the state level. Vector data are released as map units, including the 456 249 map units spanning California that have a median area of 0.12 km 2 and average area of 0.92 km 2 . Vector data is converted to raster format to improve computational performance.
Electrical conductivity (EC) is measured at the 'component' level, a unit of soil classification smaller than map units. No spatial data are available for components, and so to connect the EC measurements to a specific geographic location each component is first referenced to the map unit in which it is  contained. Next, the map unit EC value is calculated by taking the weighted average (weights determined by area) of the component level data, then calculating a second weighted average through the A and B soil horizons. In order to arrive at the EC estimate for the component, SSURGO aggregates many local measurements. While the individual measurements are not released, a reported representative value is accompanied by the top and bottom of the observed range, allowing us to account for uncertainty in the salinity estimate ( figure C1).
The SSURGO dataset is derived from field measurements and is continuously updated to reflect changing soil conditions. The labor intensity of measuring soil quality parameters and spatial extent of the dataset limit the frequency of resurveying, with most measurements occurring in regions with the greatest rate of soil quality change. To account for uncertainty in soil quality estimates, we vary the SSURGO data in a sensitivity analysis in appendix C.

A.5. Computing
The total analysis was performed in 20 h of processing time using a combination of Python 2.7.11, ESRI ArcGIS 10.2.2 accessed through ArcPy, NumPy 1.10.1 [30], and IPython 4.0.1 [31]. All plots were made using Matplotlib 1.5.0 [32]. Regression analyses were performed in R 3.1.1 using mgcv package 1.8.0 [33]. The analysis was run on a desktop computer with an Intel i7 3.4 GHz processor with 16GB of installed RAM.

Appendix B: Bias removal
The approaches taken in both models utilized in the main paper make use of field-scale estimates of crop type sourced from a satellite-based crop classifier and combines this estimated crop type with soil maps of salinity and region-level information on management practices, crop water-use, yields, and prices. Small differences in classifier accuracy between different crop types can introduce a large bias into the estimate, even if the classifier on aggregate produces highly accurate results [44]. This bias can be systematically removed using accurate, though sparse, ground measurements to correct widely available, though biased, satellite classification. While several approaches for removing bias have been discussed in the literature, in this study we make use of a direct estimator due its straightforward implementation and relative efficiency [45].
To remove bias, a confusion matrix is constructed from ground truth data that indicates both the probability that a pixel is correctly classified and the probability that a pixel is misclassified as each of the other possible categories. From this, an arbitrary value Z p;c calculated using crop-specific information is compiled into a pixel estimate of the same quantity Z p by using the following transformation: where the subscript c indicates crop, the subscript p indexes the pixel, and P c Ã ;c is the probability that given the classifier indicates that a pixel contains a particular crop c Ã , the pixel actually contains crop c.

Appendix: C: Uncertainty analysis
The key parameter in our study is the total revenues lost due to salinity, calculated at $3.7 billion annually. There are key uncertainties in this calculation which should be tested in order to assess the robustness of our estimate. First, the spatially resolved 30 m pixel salinity data generated by the SSURGO project contains considerable smoothing. A single point estimate of soil electrical conductivity is generated for each 'map unit.' Map units have a median area of 0.12 km 2 , meaning that this median map unit is comprised of 4000 pixels each reporting the same value of salinity. Since salinity as a process can vary in relatively short distances, this smoothing introduces uncertainty into our estimate. SSURGO reports a low and high value for each parameter alongside the representative value. We assess the uncertainty parametrically by repeating the analysis using 'low' , 'medium' , and 'high' salinity values. The medium salinity value scenario is identical to the analysis in the main paper, while for the low and high analyses we apply the low and high electrical conductivity values respectively. Additional uncertainty arises from county level estimates of prices and yields. While likely very accurate for the 2013 crop year, it is possible that this crop year was an anomaly in either yields or prices for some of the crops in the study. To assess the likelihood of anomalies driving our result, we collect ten years of crop data (2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013) and calculate the standard error on the trendline [29]. Specifically, we calculate the following regressions for each crop: where each observation i corresponds to a single year's data. The dependent variables, p i and y i , are state average prices and yields. The standard error of prices and yields is calculated by taking the root mean squared error of the residual. As with salinity, we calculate a 'low' , 'medium' , and 'high' scenario with both prices and yields. The medium scenario is calculated as in the main paper, while the low and high scenarios are calculated by subtracting and adding the de-trended standard error to the 2013 estimates, respectively.
In addition to rerunning the analysis with each individual parameter set at its low, medium, and high values, we perform a final analysis with all parameters set at their low, medium, and high values. While setting all values simultaneously low or high is likely to be unduly pessimistic or optimistic, it is a useful step in understanding if total uncertainty is driven primarily by a single parameter or by the combination of parameters.
In figure C1, the low, medium, and high scenarios are reported. The year to year variation of prices and yields makes little impact on the final analysis, as indicated by the relatively similar estimates of revenue loss. Uncertainty in the salinity data, on the other hand, drives a relatively large variation. The low and high scenarios for salinity correspond to estimates of revenue lost of $1.5 and $6.7 billion, making up the majority of the $1.7-$7.0 billion range.

Appendix D: Performance of crop classifier
The USGS conducts an internal assessment of its crop classifier by collecting ground truth estimates of actual cropping patterns and comparing them to predictions from the crop classifier. The accuracy can be quantified by using producer and consumer accuracy. User accuracy is the likelihood that, given a crop is classified Figure C1. Revenue losses under low, medium, and high scenario cases for each parameter used in this study. For each line, the other variables are held at their best guess, with the exception of the 'All' line, in which each parameter is set at low, medium, and high.
Environ. Res. Lett. 12 (2017) 094010 in a pixel as c, it is actually c in the field. Producer accuracy is the likelihood that, given a crop is actually c in the field, it is classified correctly as c in the prediction. Producer and consumer accuracies for the 2014 CDL for California are reported in table D1. Most crops have relatively high accuracies, though, vegetable and fruit crops (e.g. broccoli, lettuce, strawberries) are identified less consistently than the field and tree crops. Celery in particular, is never selected by the classifier, and is thus omitted from this analysis.
All estimates in the study are unbiased using information from table D1 as discussed in Methods.
In order to test the performance of the USGS crop classifier, we compare unbiased land estimates from the crop classifier with the land estimates from the NASS county commissioner data ( figure D1). If the ratio of these two values is above 1.0, the crop classifier is likely overestimating the representation of a particular crop, while if below 1.0 it is likely underestimating its representation. Even after removing bias, seven crops are not estimated within 30% of their true value. These crops are broccoli, carrots, celery, corn, lettuce, peaches and peppers. Comparing with table D1, these crops generally report lower accuracies than those crops more accurately identified.  We aggregate these crops into a single category, other high uncertainty crops (OHUC), when reporting their values in the main manuscript. Further detail on OHUC crops is available in appendix D. Important to note is the CDL is developed for crop year 2014 while the NASS data corresponds to 2013, meaning that some of the variation noted in figure D1 may be due to temporal mismatch.

Appendix E: Grower adaptation
We observe that growers are likely modifying their crop choices in accordance with extant salinity values. In figure E1(b), crop salt tolerance is plotted spatially. It is quantified as Y 50 , or the salinity value at which yields are expected to be 50% of maximum levels. Figure E1(b) shows that in the Central Valley, crop salt tolerance is qualitatively higher in the southern and western regions where salinity is elevated, suggesting that salinity may be a factor in crop selection. In the northern and southern extremes of the state crop salinity tolerance values are also high, driven largely by the large amounts of alfalfa grown in these regions.
We observe that the average soil salinity by crop and salt tolerance levels are positively correlated, while crop salt tolerance levels and marginal revenues are negatively correlated (figure E2). These two facts illustrate the tradeoff experienced by agricultural producers; they can either produce lower value, high tolerance crops with high yields or high value, low tolerance crops with reduced yields.  Appendix F: Estimating calorie losses associated with soil salinization in California While caloric losses are not of primary interest in the specialty crop dominated Californian agriculture system, we demonstrate here that the methods applied in the main paper can be used to analyze the loss in human nutrition as a result of land degradation.
Caloric losses were determined by multiplying the estimates of lost yield at the pixel level by the energy density in calories per ton for each crop. Energy density estimates were sourced from the FAO report on Food Composition for International Use [46]. Certain crops, such as alfalfa and cotton, are not typically consumed and are therefore assigned a caloric density of 0 calories g À1 even though, in the case of alfalfa, the crop may contribute indirectly to human caloric intake.
We find that the majority of the calories lost are associated almonds and rice, due to a combination of their high acreage and high caloric density ( figure F1). The OHUC category also features prominently, driven primarily by the inclusion of corn. Aggregate losses across all crops total 6.0 million person-years, assuming a 2000 calorie day À1 requirement.
We find that different crops are driving aggregate caloric losses than those that are driving aggregate revenue losses. Figure F1(a) makes this difference clear by plotting losses for each crop. Those crops towards the lower right quadrant have high calorie losses but low revenue losses (e.g. rice) while those in the upper left quadrant (e.g. strawberries) have relatively higher revenue losses in comparison with their calorie losses. Almonds, due largely to their large gross acreage, lead in both metrics.

Appendix G: Statistical analysis
We performed a brief statistical analysis in order to determine the correlation between salinity and four parameters-crop marginal value, crop salt tolerance, estimated yield reduction, and estimated revenue losses per acre (table G1). The goal of the analysis was to assess the relative magnitude of the correlation between salinity and the parameters of interest as well as test the correlation for significance.
The salinity data used in other parts of this study is stored in raster format to speed computation. It is, however, originally released as a vector file, with 456 249 individual polygons spanning California. In this section we revert to the polygon format so as to avoid artificially inflating our sample size. Our four parameters of interest are each spatially averaged within the polygon to construct the dependent variable. After removing those polygons with no crops, we are left with a sample size of 296 987 data points.  Spatial data often violate the independent and identically distributed (IID) assumption of ordinary least squares (OLS) due to correlation in the error term. While the estimated OLS coefficients remain unbiased, inference testing becomes inappropriate as the standard errors are downward biased. Common approaches for handling this issue (e.g. generalized least squares, spatial lag model, spatial error model, spatial durbin model) typically require the estimation of the correlation structure codified in a spatial weights matrix W ij . This matrix, if stored in a dense format, has size n × n where n is the number of data points. While size constraints can be lessened by imposing cutoffs based on distance and storing data in a sparse format, we found it difficult to construct a weighting matrix with a reasonable structure given our large sample size.
Instead, we fit a GAM in an effort to control for spatial location. GAMs are non-parametric models that are used to estimate the effect of linearly independent covariates on a dependent variable. Since GAMs avoid specifying the parametric form of the regressors they are able capture complex nonlinear behavior.
We perform four regressions, each fitting on the four parameters of interest while controlling for location. Latitude and longitude are smoothed together using a thin plate spline regression, the parameters of which are fitted using generalized cross validation. The general form is given in (G.1), where y i represents a single observation i of one of the four parameters of interest and S S i is soil salinity.
We find that both the b 0 and the b 1 parameter in each of the four regressions are highly significant. While we are primarily interested in the magnitude and significance of the effect of salinity (b 1 ), the intercept (b 0 ) is also informative since it can be interpreted as the estimated value of the parameter when there is no salinity present in the soils. Regressing marginal revenue [$ acre −1 ] on soil salinity [dS m −1 ] results in a positive intercept of $6245.44 and a negative slope of À$304. 46. Marginal revenue is calculated as simply the observed price per ton multiplied by the observed yield per acre, both resolved at the county level. The intercept indicates that crops being grown at locations with zero salinity have an expected value of $6245.44, and that with each increasing unit salinity, the marginal value decreases by $304.46. This effect is likely due to the observed trend that crops with higher marginal revenues exhibit lower salt tolerance ( figure E2(b)).
Salt tolerance is calculated by solving for the salinity value at which crop yields would be reduced 50%. Regressing salt tolerance [dS m −1 ] on salinity [dS m −1 ] results in an intercept of 7.47 and a slope of 0.034. The positive slope indicates that, as salinity increases, farmers are observed planting more salt tolerant crop species. While the effect is statistically significant the slope is low in magnitude.
Relative yield and revenue losses are both estimated (not observed) parameters that take salinity as direct inputs (equations 2 and 6), making it unsurprising that the regressions report statistically significant correlations. The slope in the relative yield [%] equation is negative, indicating that for each unit increase in soil salinity relative yield decreases by 5.38 percentage points. Revenue losses increase with soil salinity, registering a $364.90 increase per unit increase in soil salinity. The intercepts of both regressions indicate a slight misspecification, with zero salinity registering 101.2% yield and À$25.32 revenue losses. This misspecification is likely a result of the truncated nature of salinity response in equation (2), and is relatively small in magnitude.

Appendix H: Estimating damages from satellite salinity measurements
In this analysis, we estimate how salinization damages may differ given higher-resolution satellite salinity data. The salinity data is estimated using Landsat 7 surface reflectance data [47,48]. The size of each satellite salinity pixel is less than 1% the median size of the SSURGO polygons used in the main analysis, allowing us to further test the effect of resolution on the outcome damage estimates. The satellite salinity dataset only covers a subset of California (see figure  H1), so the comparative analysis presented in this appendix is limited to regions in California where both satellite salinity and SSURGO salinity measurements are available.
The procedure is to first compute the annual revenue loss predicted by satellite salinity data using Figure H1. Geographic range of satellite salinity dataset.
Environ. Res. Lett. 12 (2017) 094010 the disaggregated approach depicted in the main paper, and then compare the output with the loss predicted by polygon salinity data in the same area. Apart from using best guess salinity values, an uncertainty analysis was carried out by creating a revenue loss range estimated by low and high salinity scenarios for both datasets. For SSURGO polygon data, the high and low salinity ranges are the same as presented in appendix C. For satellite raster data, the high and low ranges are computed by adding and subtracting the mean average error (2.90 dS m −1 ) computed in Scudiero et al [48] from the reported values, respectively.
Side-by-side maps provide a visual comparison of the salinization cost distribution. According to the results shown in figure H2, satellite revenue loss is distributed evenly, while the SSURGO polygon revenue losses are more clustered. The regularity seen in the SSURGO salinity data can be attributed to the coarser spatial resolution of the polygon salinity data.
We present the revenue losses under the high, best guess, and low scenarios in table H1. Under the best guess salinity scenario, satellite prediction yields a loss that is almost 50% higher than the cost estimation using the SSURGO polygon dataset. This result can be ascribed to two separate factors. First, the mid-level of salinity is higher in the satellite dataset, resulting in greater salinization cost when predicted by satellite data. And second, satellite salinity has a finer spatial resolution. In contrast, SSURGO dataset carries unique salinity values in shapes of polygons whose median size is more than 100 times the size of a satellite unit pixel. Consequently, the salinity variations within each polygon are smoothed out, leading to underestimated total revenue losses when predicted by SSURGO polygon data.
Uncertainty range of salinization cost under high and low scenarios is large for both salinity datasets, thus lowering the preciseness of best estimate for revenue loss. Looking ahead, improvements in remote sensing technologies to measure soil parameters including salinity in agriculturally significant range as well as crop cover type are likely to reduce the bias in predicting crop yield response. With enhanced accuracy in soil property investigation, researchers and policy makers will be able to estimate damages due to soil salinization more precisely.