The implicit loss function for errors in soil information

The loss function expresses the costs to an organization that result from decisions made using erroneous infor- mation.In closelyconstrainedcircumstances, suchasremediation ofsoil on contaminated landprior to development, it has provedpossibleto compute loss functions and to use these to guide rational decision making on the amount of resource to spend on sampling to collect soil information. Inmany circumstances it may not be possible to de ﬁ ne loss functions prior to decision making on soil sampling. This may be the case when multiple deci- sionsmaybebasedonthesoilinformationandthecostsoferrorsarehardtopredict.Weproposetheimplicitloss function as a tool to aid decision making in these circumstances. Conditional on a logistical model which ex- presses costs of soilsampling asa function of effort, and statisticalinformation fromwhichtheerror ofestimates canbemodelledasa functionofeffort,theimplicitlossfunction isthelossfunctionwhich makes aparticularde- cisiononeffortrational.Afterde ﬁ ningtheimplicitlossfunctionwecomputeitforanumberofarbitrarydecisions onsamplingeffortforahypotheticalsoilmonitoringproblem.Thisisbasedonalogisticalmodelofsamplingcost parameterized from a recent survey of soil in County Donegal, Ireland and on statistical parameters estimated with the aid of a process model for change in soil organic carbon. We show how the implicit loss function might provide a basis for re ﬂ ection on a particular choice of sampling regime, speci ﬁ cally the simple random sample size, by comparing it with the values attributed to soil properties and functions. In a recent study rules were agreedto deal with uncertainty insoilcarbon stocks for purposes of carbon trading bytreating a percentile oftheestimationdistributionastheestimatedvalue.Weshowthatthisisequivalenttosettingaparameterofthe implicitlossfunction,itsasymmetry.Wethendiscussscopeforfurtherresearchtodevelopandapplytheimplicit loss function to help decision making by policy makers and regulators.


Introduction
The collection of soil information, both inventory and monitoring over time, is sponsored by various end-users including land-managers, regulators and policy-makers. In all cases the end-user must accept that there is uncertainty in the information which they obtain. This uncertainty could result in a cost due, for example, to over-or underapplication of a fertilizer, a decision to implement unnecessary land remediation or failure to identify decline in soil quality and respond with appropriate policy. The uncertainty of soil information, given some fixed methodology, depends on the effort that can be deployed in field sampling, and so the cost to the sponsor. The sponsor is therefore faced with the problem of deciding how much effort it is appropriate to invest in soil sampling.
A rational approach to this problem is to choose a level of investment in soil sampling such that the benefit to the sponsor from the information over the cost of obtaining it is maximized. Yates (1949) was, perhaps, the first to point this out formally. To do this requires the specification of a loss function. A loss function expresses the costs incurred by a data-user (which may be an individual, a business or society at large) which result from using some estimate, e x, of a quantity (for example, an estimate of the mean concentration of available phosphorus in the soil of a field) to make a decision (e.g., a fertilizer rate) when the true value of the quantity is x t . The loss is, in general, non-zero when e x≠x t , i.e., the information is erroneous. In our example the loss is incurred because of under-application of fertilizer and consequent loss of potential profitable yield (e xNx t ) or wasteful over-fertilization (e xbx t ) such that the marginal gain in yield does not cover the marginal cost of the input, and other costs may be incurred because of the environmental impact of the surplus nutrient. Because overestimation and underestimation incur losses for different reasons the loss function may be asymmetrical. Given a loss function and an error distribution for the information, one may make a decision which minimizes expected loss (e.g., Journel, 1984;Goovaerts, 1997). Some form of loss function, not necessarily a continuous function of the target variable, may be used to plan optimal sampling for decision-making (e.g., Yates, 1949;Ramsey et al., 2002;Boon et al., 2011) or to make decisions as to whether and how to supplement existing soil data by further sampling (e.g., Marchant et al., 2013).
Such rational planning of soil sampling requires that loss functions can be determined. This is plausible in some cases, where the analysis of decisions based on the soil information is relatively simple (e.g., remediate or do not remediate) and where reasonable values can be obtained for costs under different combinations of decision and future scenarios (chose to remediateland was not contaminated; chose not to remediateland was contaminated etc.). Some of the most sophisticated analyses of decision-making from uncertain soil information have been undertaken in the context of contaminated land where relatively simple decision trees based on single variables can be defined (e.g., Ramsey et al., 2002). Similar analyses have been undertaken for nutrient sampling at field scale by arable growers (Marchant et al., 2012). There is a wider literature on the use of loss functions for planning and control, particularly in manufacture (e.g., Freisleben, 2008;Pan and Chen, 2013), and these methodologies may be useful in environmental management and regulation. We call loss functions that can be developed in this way explicit loss functions.
In many cases, however, this is not a feasible approach. For example, when considering the design of a national-scale soil monitoring system for the UK, Black et al. (2008) asked sponsors (a range of regulators, government departments and public bodies responsible for environmental management) to give acceptable tolerances on estimates of regional and global mean values of soil properties, and changes in these properties. They then computed the costs of achieving these targets under different sampling regimes. Note that the process of defining acceptable tolerances was not straightforward, and was identified as an area for continued attention. Note also that the process was essentially 'openloop'. There is no consistent method to evaluate whether the final costs are commensurate with the benefits of achieving the original target precision. Effectively it is assumed that the target precision must be achieved regardless of cost. However, if the sponsor decided that the total cost of the resulting scheme was unaffordable then it is not clear how to proceed, other than by assuming that the cost is fixed and reporting the corresponding precision.
It is, perhaps, not surprising that sophisticated decision analysis is possible for soil sampling on possibly-contaminated land, whereas planning of regional or national-scale soil monitoring and inventory remains 'open-loop'. In the former case there is generally a fairly simple binary decision to be supported (remediate or do not), and the costs under different decisions and scenarios (e.g., of remediating a site prior to development, of undertaking remediation after development on discovery that contaminants do exceed regulatory thresholds, etc.) can be reasonably approximated. For example, Ramsey et al. (2002) use approximate remediation costs, legal costs and liabilities in their case studies. In contrast, a soil monitoring scheme at regional or national scale will serve a range of purposes, not all of them foreseeable, and support a range of decisions and actions the consequences of which it is difficult to predict or quantify, let alone cost. One may therefore think it unlikely that policy makers or their advisors would be any more able to specify explicit loss functions for errors in soil information than they can specify acceptable confidence limits for estimates. This could be regarded as an argument against any attempt to use a cost-benefit analysis when considering the design of soil inventory and monitoring, consistent with the criticisms of the ecosystem services valuation approach (Robinson et al., 2013) as voiced, for example, by Matulis (2014). However, Hansjürgens (2004) suggests, without conceding the broader agenda of monetizing the value of ecosystem components, that approaches based on cost-benefit analysis can provide a useful framework for the collection and evaluation of environmental information. That is the basis of our approach. Specifically we develop the concept of the implicit loss function. Consider a case of the 'open-loop' approach to planning of inventory and monitoring where a sponsor states that 'N samples are affordable'. The implicit loss function is the loss function implicit in that decision. That is to say it is the particular loss function which would lead to a selection of sample size N to maximize the benefit of sampling over its costs. In short, the implicit loss function, given some decision on how to undertake sampling, is the loss function under which that decision is rational. Our contention is that, by computing and examining implicit loss functions, one may, without entirely closing the planning loop, provide a basis for more rational reflection on sample effort by examining whether the form of the implicit loss function is congruent with the sponsor's expectations and any valuations of the target soil variable.
In this paper we develop the concept of the implicit loss function. While implicit loss functions have been used in financial analysis, we believe that they are a novel technology in the valuation of environmental information. There are three novel developments in this paper. First, we show that, for a specified sampling strategy which determines the precision of the resulting estimate as a function of sample size (e.g., a simple random sample from a variable of standard deviation σ), a given relationship between sample size and the cost of sampling and a specified asymmetry of the loss function, a unique implicit loss function exists for some specified sample size. Second, we point out that the asymmetry of the general linear loss function is implicit in certain criteria agreed in Australia for valuing soil carbon stocks from uncertain estimates. This suggests that the asymmetry of loss functions could be elicited from data users. Third, we use soil sampling records from a part of Ireland with rugged terrain and relatively sparse communications to develop a simple logistical model for sampling which allows us to estimate costs for particular sampling intensities. On the basis of these we present a hypothetical example of the implicit loss function for a case of monitoring change in soil carbon.

Theory
In this section we review the loss function and its use to determine optimal sample size, and develop the explicit expected loss under normal errors with a linear loss function. We then introduce the implicit loss function.

The loss function and optimal sample size
The most general form of the loss function is which is the loss incurred as a result of a decision made on the assumption that some variable X takes the value e x when the true value is x t . We define the loss as the difference between all costs incurred as a result of the decision between the present and some future time horizon over and above any costs that would be incurred as a result of making the decision on the assumption that X = x t . It follows that so one may think of L e xjx t ð Þas the difference between the value of imperfect information e x and perfect information x t . However, the perfect information is never worth less than the imperfect information, but is not necessarily worth more. If, for example, X is the concentration of a soil contaminant and remediation is required if and only if the concentration exceeds a regulatory threshold, x N x R , then the loss function in respect of decisions on remediation is zero for all cases where In some conditions we may treat the loss function as a function only of the error of e x as an estimate of x t : This loss function assumes that the loss is independent of the absolute value of x t , and is commonly used in discussion of estimation error and its implications. See, for example, Journel (1984) and Goovaerts (1997). We use it in this paper, although the ideas developed here could be extended to the more general case of Eq. (1).
Consider a case where we obtain an estimate of x t by sampling. Sampling and application of an appropriate estimator, invoking some assumptions, gives us a conditional distribution for x t (conditional on the sample). We assume that the sampling procedure is unbiased. The probability density function (pdf) for the conditional distribution is denoted by f(x) and the cumulative distribution function (cdf) by F(x) where If some value e x is used as the estimate of x t for decision making then the expected loss from the decision is that is to say, the statistical expectation of the loss given the estimate and the conditional distribution of the sample mean. If the loss function is a quadratic function of the error (as assumed by Yates, 1949) then the expected loss is minimized by using the expectation of the conditional distribution of x t as the estimate, i.e., the sample mean. In this paper we follow Journel (1984) by using a general linear loss function: The parameters α 1 and α 2 have positive real values. In this case it can be shown (Journel, 1984) that the expected loss is minimized by setting e x to: where and F −1 (p) denotes the inverse of the cdf, i.e., the pth quantile of the conditional distribution of x t . For a symmetrical loss function e x is therefore the median of the conditional distribution. This is equal to the mean if the conditional distribution is assumed to be Gaussian, which is justified in the simple random sampling case or under other probability sampling designs where independence of the samples allows the central limit theorem to be invoked for the distribution of the sample mean. However, in a case where α 2 N α 1 (i.e., a larger loss is incurred when e x underestimates x t than when it overestimates it by the same amount) the value of e x is larger than the median. With the linear loss function the expected loss at e x is given, following Journel (1984), by If e x is set to e x min (Eq. (8)) then the minimum expected loss is If we obtain an estimate of x t by simple random sampling across a domain of interest with a sample size of n, and the variable, X, has variance σ 2 , then the conditional distribution of x t is a Gaussian distribution (from the central limit theorem) with mean x t (from the designunbiasedness of simple random sampling) and variance σ 2 /n (from the independence of the observations in simple random sampling). We write the pdf of this distribution as f G (x|x t , σ 2 /n).
Because we are considering a loss function which depends only on the estimation error and not on the absolute value of x t we can, without loss of generality, set x t to zero and write the minimum expected loss as a function of sample size n: where e x G; min is obtained with the inverse cdf for f G (x|0, σ 2 /n): Fig. 1 shows a simple example of minimum expected loss as a function of sample size for a hypothetical case. The target variable is soil pH estimated to select the liming rate on a farm assuming a known buffering capacity. We assume that the standard deviation of pH across the Sample size Expected loss / £ Fig. 1. Expected loss as a function of sample size for an asymmetrical loss function for error in determination of soil pH with α 1 = £10000 per pH unit error, α 2 = α 1 /3 for a decision based on a simple random sample and the standard deviation of soil pH of 2 units. farm is 2 pH units and that the slope of the linear loss function for a particular farm has α 1 = £10000 per unit error in pH and α 2 = α 1 /3. That is to say we assume that the loss due to a unit overestimation of pH and consequent under-liming and loss of potential profitable yield is three times the loss due to an equivalent underestimation of pH leading to overliming. Note that increasing sample size reduces the minimum expected loss, but with diminishing returns as the variance of the conditional distribution is proportional to n − 1 . Assuming that the loss function gives loss in the same units as we may measure the costs of obtaining data for n samples, a sample size can be chosen at which the marginal cost of an additional sample is equal to the reduction in expected loss that the sample achieves.
In this paper we limit our discussion to simple random sampling, but one could extend the approach to more complex cases. For example, if an exhaustively-measured covariate such as a remote sensor image were available for the region, and this were correlated with the target variable, then one could consider the variance of the regression estimator for x t (Brus, 2008). At a given sample size this variance would be smaller than for simple random sampling so the expected loss would be less.

The implicit loss function
As stated in Section 1, an implicit loss function is a loss function which makes some specific sample size a rational choice, given the marginal costs of sampling and the conditional distribution of x t given the sample size. If the specified sample size is denoted by n, and the variance of the variable X is σ 2 then the implicit loss function has parameters α 1 and α 2 such that where C(n) is a function which returns the costs of a sample of n observations, assumed to be a real positive value for any positive n. Because the loss function is defined by two parameters there is not a unique solution α 1 ; α 2 f gto Eq. (15). However, if the asymmetry ratio is fixed, α 1 /α 2 = a, then and so the integral on the right-hand-side of Eq. (13) depends only on a, n and σ 2 . For notational simplicity we denote this value by I(n, a, σ 2 ), then which, with a, n and σ 2 all fixed, is linearly proportional to α 2 . Because C n ð Þ−C n−1 ð Þis a positive constant for fixed n on the assumption that an additional sample point inevitably incurs some additional cost, it follows from Eq. (16) that, with specified a, n and σ 2 there exists a unique value of α 2 which provides a solution of Eq. (15). A numerical solution is necessary because the equation includes integrals of the normal density function.
In order to find a unique solution the asymmetry ratio a must be specified. In general one would expect loss functions to be asymmetric because the consequences of over-and under-estimation are generally different in kind and magnitude. Underestimation of soil carbon content may result in certain social costs from loss of production and unnecessary payment of incentives to land managers, whereas overestimation may result in insufficient investment in soil protection and incentives to improve soil management with long-term consequences for a range of soil functions.
An example of the selection of the loss asymmetry (albeit implicit) is provided by the Australian Government in their adoption of a practice for trading soil carbon stocks of uncertain magnitude (Department of the Environment, 2014). The practice is to use the 40th percentile of the sample distribution of soil carbon stock as the estimated value for trading purposes. This was selected both as an incentive for efficient sampling of stocks, and in explicit recognition that errors of over-and under-estimation have different consequences. With a linear loss function (Eq. (7)), the use of the 40th percentile as the effective estimate of the carbon stock implies (Eq. (9)) that The asymmetry ratio is larger than 1.0 because the loss from trading carbon stocks overestimated by some amount is regarded as greater than that from trading similarly underestimated stocks. The selection of the percentile was not based explicitly on loss functions, but implies a slight preference for the interests of subsequent owners of the carbon stocks and for the environment, given the benefits of carbon sequestration, over those of the landowner.
The fact that data users in Australia were able to agree on a percentile of the sampling distribution to use as an estimator of soil carbon stocks is encouraging because it suggests that an elicitation procedure for the asymmetry ratio might be based on a consideration of percentiles of the sample distribution to treat as effective estimates. This is beyond the scope of the present paper. In general we propose that the implicit loss function is estimated for a range of asymmetry ratios which can be presented to the sponsor for consideration.
The idea of an implicit loss function is not entirely novel. It has been used in finance, for example to model how auditors make decisions about the collection of evidence (Scott, 1975). Estimation of the implicit loss function has been proposed by Elliott et al. (2005) as a method to elucidate the basis on which experts make financial forecasts. If one thinks of the expert's forecasting procedure as a tacit model estimation, then a loss function is effectively minimized, much as the quadratic loss function of a standard statistical estimation algorithm such as ordinary least squares. The recovery of an implicit loss function may explain apparent biases of forecasts in terms of asymmetry of the function. This procedure has been used to examine how members of the Federal Open Market Committee (FOMC) of the US Federal Reserve weight under-and over-prediction of economic variables such as inflation, growth rate and unemployment in terms of possible impacts through effects on the FOMC's decisions (Pierdzioch et al., 2013). We are not aware of a previous extension of this concept to our sampling problem.

The case study
We now illustrate the implicit loss function with an example. We consider a hypothetical region of 10 000 km 2 . We are interested in determining change in the regional mean stock of soil organic carbon over a period of time. To obtain the implicit loss function requires that we can approximate the variance of the sample mean of the target variable as a function of sample size, and the marginal cost of the nth soil sample. We discuss how this was done below. In brief, we follow Lark (2009) in using the soil carbon model of Nye and Greenland (1960) to compute distributions of soil carbon stocks and their changes under a change in land use based on a sample from a distribution of model parameters for the lowland tropics. We use detailed information on sampling rate from a recent soil geochemical survey in Ireland (Knights and Scanlon, 2013) as a basis for the logistical component of the cost model. To make the logistical model as consistent as possible with a soil carbon model for the lowland tropics we extracted information on sampling rate in County Donegal in northwest Ireland, where relatively sparse communications and rugged terrain made field work most challenging.

Information on variability
To compute the implicit loss function we require information on the variance of the target variable. For purposes of this study we used a simple single-pool model of soil carbon, and sampled from a distribution of model parameters extracted from the literature for the lowland tropics, to obtain means and variance for soil carbon stock (t ha −1 ) at two time points in a particular scenario. Lark (2009) describes the procedure in detail. The scenario we considered was forest land cleared for agriculture, with the initial or baseline survey undertaken 25 years after conversion and the resampling after a further 10 years. We added to the variance of the simulated data an analytical variance on the assumption that the coefficient of variation of analytical error is 5% (Landon, 1984). On this basis the mean carbon stocks 25 and 35 years post-clearance were 104 and 82 t ha −1 with standard deviations of 53 and 46 t ha −1 respectively. These are comparable with reported results for similar conditions in tropical and subtropical South America (Assad et al., 2013). If n samples are collected independently and at random on each date, and the standard deviations of carbon stock on the two dates are σ 1 and σ 2 , then the standard error of the estimated mean change in carbon stock is

The costs model
We developed a cost model on the basis of an analysis of the rate of soil sampling during the recently-completed Tellus Border survey (Knights and Scanlon, 2013) in six counties of Ireland (Donegal, Sligo, Leitrim, Cavan, Monaghan and Louth). This sampling was undertaken at an average rate of 0.25 samples km −2 by teams each of two workers. Analysis of the daily records of GPS locations allowed us to estimate the mean rate at which the teams sampled sites per county. For purposes of this paper we use the sampling rate for part of County Donegal, which was seven sites per team day, excluding local duplicate sampling. The rate of progress of sample teams across terrain in this part of Ireland was relatively slow. This can be attributed to the marked relief and complexity of the terrain which, over most of the land area of the county, is used for extensive grazing. The pronounced regional strike from northeast to south-west (Whittow, 1974) is reflected in the topography and restricted road access across the region. Rather than a uniform and isotropic road network allowing good access across the region, major roads follow the orientation of the regional strike, with relatively short branching access roads.
In this paper we consider a simple random sampling strategy. The empirical sampling rate from Donegal is from systematic sampling, because sample teams aimed to visit sample sites at the centre of 2-km square grid cells (Knights and Scanlon, 2013;Knights, 2013). One may expect such systematic sampling to be somewhat slower than an equivalent random sample because of the absence of short trips between points closer than the mean grid spacing. To adjust the Donegal sample rate to a simple random sampling equivalent we considered a notional 6 × 6-km region encompassing nine sample points set out according to the Tellus Border survey design. The shortest route around all these points is 18.83 km, assuming that the landscape can be traversed in a straight line. We then considered 1000 realizations of a simple random sample of 9 points in a 6 × 6-km region, computing the shortest route around each sample using the solve_TSP procedure from the TSP package for the R platform (Hahsler and Hornik, 2014; R core team, 2013). The mean distance travelled between points in a simple random sample was 16.66 km. The time per sample point in the systematic and simple random sampling regimes are therefore in the ratio 18.83/16.66 = 1.13. On the assumption that traveling speed is the same for random and for systematic sampling and that total time to undertake sampling is dominated by travelling time between points, the rescaled number of sample points per day under a simple random sampling scheme in Donegal is approximated as 7 × 1.13 = 7.9.
Following Beardwood et al. (1958), we assume that the distance travelled, d n to visit n independently and randomly selected locations in a fixed area scales with n according to On the assumption that the speed of travel is constant, and that total sampling time is dominated by travel between sites, we assume that sampling time is linearly proportional to distance travelled. On that basis the time to sample a unit area at density r, t r (days km −2 ), scales with sample density (r samples per km 2 ) as The time per unit area in the Tellus Border survey in County Donegal at density 0.25 points km −2 was 0.25/7.9 = 31.6 × 10 −3 days km −2 . The corresponding time per km 2 to sample at density r samples per km 2 , where r is of similar order to the density of the Tellus Border survey, is therefore assumed to be t r ¼ 31:6 Â 10 We assume that this scaling relationship holds over sample densities such that the sample rate per day is between about 2 and 11 (the range of sample rates in Donegal was 1 to 15).
On this basis one may compute the costs of sampling an area A km 2 with n = rA points as where Ω is the fixed costs, v is the unit analytical cost and β is the field work cost of a team-day. For purposes of this study we assumed that v = £20, based on preparation and analytical costs quoted in early 2014 by a UK-based company. We assumed that a team-day cost is β = £270, based on salary costs of technical staff and a two-person team. These figures are for illustrative purposes to develop the concept of the implicit loss function. Further refinement would be possible, for example to allow for economies of scale on analytical costs. In the case study we consider a monitoring programme with two time points and so we double the costs to visit sample points twice and compute two analyses at additional expense. Once again, further refinement would be possible to compute the costs at the start and end of the survey on a common net present value. Fig. 2 shows variable costs (c n − Ω) and marginal costs for a single sampling campaign in a region of 10 000 km 2 with different sample sizes and using Eq. (21) with the constants in the previous paragraph. and assuming a baseline and resampling campaign. Note that the marginal cost of an extra sample decreases with sample size.

Implicit loss functions
Given a sample size n the standard error of the estimate of the change in soil carbon stock from two independent random samples was computed from Eq. (17). The cost of sampling at this intensity (over fixed costs) was computed from Eq. (21). For some sample size and specified value of the asymmetry ratio, a, we found the value of α 1 that satisfies Eq. (16) using the optim procedure in the R platform (R core team, 2013).
In this case study we consider an environment where we expect ongoing reductions in soil carbon stocks because, even with no change in the mean inputs of carbon to the soil, it is likely still to be approaching a new steady-state soil carbon stock under new land use. The policy maker wishes to know the mean rate of this change across the 10 000-km 2 region to formulate policy in respect of the role of soil in the carbon budget and likely implications for soil functions including agricultural production, the modulation of surface water flows and stability of soil against erosion by water or wind.
The variable that we consider is the loss of soil carbon stock, and so positive errors mean that this loss is underestimated. We considered an asymmetry ratio of 1, and alternatives smaller than one. By excluding asymmetry ratios larger than 1 we make an assumption that underestimation of the loss of soil carbon never incurs smaller costs than overestimation. This seems reasonable, since underestimation may result in complacency about soil quality, the amount of carbon that remains sequestered in soil and the success of existing policy on land use and soil protection with implications for future food security, water resource management etc. However, overestimation of the loss may result in undue regulatory burdens on producers, excessive expenditure if it is decided to offset loss of soil carbon from agricultural land and possible distortions in land use which may have implications for food prices.
Some comparisons may be drawn between the asymmetry of the loss function in this case and that for soil carbon trading, referred to in Section 2.2 above, where there is a small preference for underestimation of stock. The carbon trading case is simpler in that we are considering only the value of the soil carbon in a particular market. However, at least in principle, this integrates at least some of the factors of interest here: specifically the value of soil carbon as an offset for carbon emissions, and also the asymmetry of interests between landowners selling the carbon, and the subsequent owners and environmental beneficiaries of the offset. The asymmetry ratio for estimation of tradable carbon stocks was 1.5, in the case of estimates of loss of soil carbon the equivalent ratio is the reciprocal of this, 0: 6 , because the equivalent preference is for an overestimate of loss of stock. We therefore considered this asymmetry ratio for our case study, along with two rather smaller ratios, 0.4 and 0.2 which imply stronger preferences for overestimation. For completeness we also present the symmetrical implicit loss function. Fig. 3 shows the implicit loss functions with the four asymmetry ratios for the case where 500 sample points are proposed for each of the baseline and resampling surveys. The values of the α 2 parameter for a = 1, 0: 6 , 0.4 and 0.2 are approximately £50 000, £60 000, £80 000 and £124000 t −1 ha soil carbon. In a 10000 km 2 region these units are equivalent to £per Mt error in the estimated loss of total soil carbon stock. Fig. 4 shows the slope of the implicit loss function (α 1 and α 2 ) for the same asymmetries and different proposed sample sizes.

Interpretation of the implicit loss functions
How might a policy-maker interpret the implicit loss function? First, recall that we propose the implicit loss function for situations where a loss function is not straightforward to specify. This is because of the complexity of the policy decisions informed by soil information, the relevance of this information to different sectors, uncertainty about future costs of interventions and uncertainty about the efficacy of policy options or specific interventions which might be based on the information. It is this complexity that makes it impossible to make the sampling decision a closed-loop process in the sense of Section 1. The point of the implicit loss function is to exhibit the assumptions that are implicit in a particular decision on sampling so that they can be open to general scrutiny. Consider a hypothetical example. In our 10000-km 2 region soil scientists have proposed a sample size of 2000 for the two-phase soil monitoring procedure (somewhat sparser than the Tellus Border sample density). However, based on initial budgetary considerations, the officials who make the decisions on resources propose reducing this to 500. If the policy-maker takes an asymmetry a ¼ 0: 6 on the grounds that this loss function implies a mild preference for environmental considerations, then the value of α 2 for the implicit loss function for a sample size of 500, expressed as loss per unit error in the reduction of total soil carbon stock for the region is £60000 Mt −1 and for a sample size of 2000 it is £300 000 Mt −1 .
To support a decision on sampling one must decide which of these two loss functions is most plausible. For reasons already enunciated this process cannot be formal, but it can be systematic. First, one may identify the possible consequences of an error. For example, the possible consequences of underestimation of the loss of soil carbon stock include: 1. Failure to prioritize soil protection with respect to competing policy areas. 2. Failure to implement appropriate soil protection measures and in consequence of this • Failure to improve food productionc.f. examples presented by Lal (2004) who quotes yield gains for cereal or legume crops between 1 and 40 kg ha −1 from increases of soil organic carbon of 1 t ha −1 . • Failure to sustain the soil's capacity to modulate water flows by accepting infiltration and allowing groundwater recharge. 3. Failure to account correctly for the role of soil in the regional greenhouse gas budget.
A similar set of consequences for overestimation of loss of soil carbon can be identified, for example: 1. Imposition of excessive regulation on producers, with consequences for sustainability, employment and food security. 2. Distortions in policy priorities with respect to other areas.
3. Overestimation of soil contribution to the regional greenhouse gas budget, with financial costs if this is offset by carbon trading or other mechanisms.
Reflection on these lists may allow refinement of any previous choices of asymmetry ratio and the implicit balance of preferences between these considerations. One may then consider any numerical information pertinent to these factors. For example, one might assume that the value of carbon in trading schemes reflects social costs of emissions. The cost of 1 Mt of carbon may then be approximated as £18 m based on a cost of €6 t −1 under the European Emissions Allowance scheme (EUA) (costs and exchange rates in August 2014). Some severe qualifications are required here. First, it is certainly not clear that EUA prices at present reflect social costs but rather particular policy objectives and various institutional factors affect their price (Lutz et al., 2013). Second, the EUA scheme can only be indicative, at present carbon stocks associated with land use and land use change are not tradable in the scheme.
Accepting these qualifications, one may take as a starting point the observation that an underestimation by 1 Mt of the soil carbon lost from a region is an underestimation by £18 m of the costs imposed by soil carbon loss. However, we may not assume that this underestimation, through its effects on policy decisions, results directly in a social cost of the same size. This does not translate simply into costs due to the effect of this error on future policy. First, one must note that future carbon losses, other factors remaining equal, will be smaller as soil carbon stocks approach a steady state. Second, one must ask how successful any policy or mitigation measures would be in reducing these losses even if they were based on error-free information. Nonetheless, it may be argued that very severe discounting for future uncertainty and uncertainty about the consequences of policy decisions is required to reduce the market-based value of 1 Mt of soil carbon to a value smaller than α 2 for the implicit loss function for a sample size of 2000.

Discussion
We have defined the implicit loss function for errors in soil information and shown that a unique implicit loss function exists for any specified sample size given a logistical model of sampling costs, and information on the variability of the target property. With an example we have shown how the implicit loss function allows us to exhibit the assumptions implicit in any decision on sample size in an 'open loop' context where the costs and benefits of environmental information cannot be simply and directly compared. In the context of our example we have made some tentative suggestions about how the implicit loss function could be used to reflect on such sampling decisions, without claiming that this 'closes the loop'. The implicit loss function is a novel concept in the valuation of environmental information, and we suggest that it merits further investigation.
The amount of field effort to be deployed to address a question in environmental science, management or policy is often a fraught matter between field scientists and their sponsors. Yates (1952), discussing the investment of resources in research for agricultural development wrote: 'With the present drive for economy there is serious danger that even such facilities as are available for experimental work of this kind will be curtailed or not used to full advantage. It is therefore important to stress that such curtailment will result in much more substantial and immediate losses through failure to determine the best practices.' We recognize that we write from one side of this fence, but offer the implicit loss function method as a tool to improve communication across that fence. Environmental scientists increasingly recognize the importance of effective communication of environmental information to decision makers in the presence of uncertainty, and we suggest that the implicit loss function is potentially a contribution to this task.
There is scope for further development of this work. It would be informative to undertake experiments with groups with policy or regulatory responsibilities to examine implicit loss functions for notional tasks in the commissioning of soil inventory or monitoring. The objective would be to assess whether and how the implicit loss function helps the decision-making process. One approach would be to present the experimental subjects with a series of implicit loss functions corresponding to different sample sizes, without initially disclosing the sample sizes or total sampling cost, and to elicit a view as to which function best represents the socio-economic and environmental costs of error in environmental information. One could then ask the group to make a decision on sample effort given the sample effort and cost that corresponds to the selected loss function, and others close to it. One could then compare these results with decisions made purely from the costs of sampling. One useful extension of this work would be to show how the approach could be used to choose a partition of a fixed total resource between two or more competing projects.
Another way to develop this approach would be to work with a policy or regulatory group in a post-hoc analysis of past projects, regarded as more or less successful. One might ask managers, for example, to assign such projects to groups characterized in terms such as: 1. 'A "Rolls-Royce" study: it was useful but we suspect that the effort was excessive when we look at the true costs' 2. 'This provided useful information, we would pay for it again in comparable circumstances' 3. 'This was a waste of time and resources. There was too much uncertainty in the final results, which were therefore hard to interpret' One would then undertake a comparable elicitation of plausible implicit loss functions for each project, and test the hypothesis that these results would be congruent with the classification (i.e., the selected loss function for projects in class 1 would have smaller slopes than the implicit loss function for the actual project sample size, the selected loss function would more or less match the implicit loss function for the actual project sample size in class 2, and the selected loss function would be steeper than the implicit loss function for cases in class 3).
There is scope for further work on using elicitation to obtain the asymmetry of loss functions. It was interesting that the asymmetry of the general linear loss function is implicit in the percentile-based approach used in the Australian Carbon trading scheme, and suggests that percentiles may provide a basis for such an elicitation. This could be useful for development of the implicit loss function, but could also facilitate the development of explicit loss functions for rational sample planning.
Further work is also needed to include economies of scale and Net Present Values in the implicit loss function, and to improve the logistical model to make it more flexible. Beckett (1981) presents a review and evaluation of logistical models in the context of agricultural extension and soil survey. There may be scope to develop this model and to calibrate it with records from the Tellus Border survey and similar sampling exercises. While attempts have been made in the past to compute costs for notional soil sampling schemes of different intensity (Black et al., 2008) we are not aware of previous systematic attempts to calibrate models from GPS records of the movement of sampling teams. Given the widespread use of GPS in field work, and the scope to download daily records, the collation of such information from sampling schemes with different designs in different conditions could be informative and useful for planning.

Conclusions
We defined the implicit loss function and exemplified it, using a process model to compute statistical parameters for a soil monitoring problem and records from a survey in Ireland to provide a logistical model. The implicit loss function is offered as a method to aid decision making on soil sampling problems where the costs of errors in soil information are not sufficiently clear cut to support a classical value of information analysis. This will often be the case in soil sampling and monitoring at regional and national scale. In such circumstances the selection of a level of investment in sampling may not be based on the information required but rather on arbitrary constraints. The implicit loss function allows one to exhibit the implicit assumptions in making a decision to invest a certain amount of resource in soil sampling, and we propose that this could help in reflection on this decision and on comparisons between levels of investment in different projects.