Rejoinder to letter to the editors

We thank Bouma (2020) and Ros et al. (2020) for their thoughtful comments on our paper. Both letters express a concern that our conclusion, “Soil-based, field-specific fertilizer recommendations are a pipe-dream” is harsh. They further express doubts as to whether our analysis supports such a conclusion. Bouma expresses a concern that our paper directs a hundred years of research in the dustbin. The key issues they raise are as follows: 1) Both letters provide a short description of how critical thresholds are typically identified where crops will respond to a specific nutrient addition within a specific setting, i.e. thresholds are calibrated to crop types and regional conditions. 2) Authors of both letters strongly doubt that a regional and model-based approach combined with farmer’ experience will result in more reliable and sustainable recommendations than traditional recommendation systems. 3) Bouma highlights that we ignored differences in soil types and find this a weakness of the study. 4) Ros et al. are of the opinion that using the error resulting from analyses being conducted by different laboratories, as reported in Table 1 of our paper, overestimates the error and that the actual error generated by each of the best laboratories is much smaller. 5) Further, Ros et al. suggest that spectral assessments of soil nutrients provide promising new developments to address within-field variability and hence produce more reliable recommendations. Before we respond to each of these comments, we consider it important to explain some background to our paper. While soil analysis is a minor cost for many farmers, it represents a major investment for smallholder farmers. All our experimental work in smallholder environments highlights that there is a very large variability within and between fields in soil fertility and crop productivity. This variability is poorly explained by soil characteristics suggesting that the value of soil analysis, either from a laboratory or from field sensors will be limited. This was one of the main stimuli for us to write the paper. Our analysis on the impact of sampling and laboratory analysis error propagation was based on QUEFTS, a model that was originally developed for tropical soils (Janssen et al., 1990) but has also been used for countries in other climate zones, e.g. China (Jiang et al., 2017; Liu et al., 2006). Our approach provides a quantitative error analysis. Poor accuracy of soil tests in, e.g., the Dutch fertilizer recommendation system are known in qualitative terms: “The fertilizer recommendations in The Dutch recommendation system appear to be accurate, yet this is a false accuracy as the variability between trials is large and a differentiation up to 20 kg K2O per hectare is certainly not substantiated by the evidence provided” (Ehlert et al., 1998). Further, our analysis is based on the idea that soil analysis is used for fertilizer recommendations for a single crop, not for recommendations for a rotation or for a soil. Fertilizer recommendation systems can be focused on either fertilizing the soil or fertilizing the crop. Fertilizing the soil aims at building and maintaining soil P and K pools that are large enough to feed the crop. Fertilizing the crop focuses on crop growth and yield responses where building soil pools is a welcome side-effect but not the primary objective. In many smallholder systems, soil nutrient pools are strongly depleted and will not approach sufficiency levels in the near future. Especially building plant-available soil P pools to satisfactory levels requires large investments over many years, which is beyond the reach of many smallholders, and is nigh-on impossible in strongly weathered tropical soils with a very large P sorption capacity. In such systems, strategies should aim to fertilize the crop rather than the soil, combining a corrective application with banded or placed P applications (Sanchez, 2019). One of the key findings of our work is that application of balanced fertilizers including N, P and K in ratios of about 1:0.41:0.67 strongly reduces the influence of the size of soil nutrient pools on yield responses to applied fertilizers and reduces between-field variability. This reduction in between-field variability is also observed in on-farm experiments: Njoroge et al. (2019) found that variation between fields that were fertilized with NPK was about 50% less that fields that received NP, NK or PK.

recommendation system appear to be accurate, yet this is a false accuracy as the variability between trials is large and a differentiation up to 20 kg K 2 O per hectare is certainly not substantiated by the evidence provided" (Ehlert et al., 1998).
Further, our analysis is based on the idea that soil analysis is used for fertilizer recommendations for a single crop, not for recommendations for a rotation or for a soil. Fertilizer recommendation systems can be focused on either fertilizing the soil or fertilizing the crop. Fertilizing the soil aims at building and maintaining soil P and K pools that are large enough to feed the crop. Fertilizing the crop focuses on crop growth and yield responses where building soil pools is a welcome side-effect but not the primary objective. In many smallholder systems, soil nutrient pools are strongly depleted and will not approach sufficiency levels in the near future. Especially building plant-available soil P pools to satisfactory levels requires large investments over many years, which is beyond the reach of many smallholders, and is nigh-on impossible in strongly weathered tropical soils with a very large P sorption capacity. In such systems, strategies should aim to fertilize the crop rather than the soil, combining a corrective application with banded or placed P applications (Sanchez, 2019). One of the key findings of our work is that application of balanced fertilizers including N, P and K in ratios of about 1:0.41:0.67 strongly reduces the influence of the size of soil nutrient pools on yield responses to applied fertilizers and reduces between-field variability. This reduction in between-field variability is also observed in on-farm experiments: Njoroge et al. (2019) found that variation between fields that were fertilized with NPK was about 50% less that fields that received NP, NK or PK.
1. Our analysis highlights that a single analysis of a pooled soil sample results in outcomes with a large uncertainty. Fertilizer recommendations are based on a single soil analysis in most commonly-used systems. The uncertainty from laboratory analysis will therefore also propagate into the recommendations made. We agree with Ros et al. that a calibration of relative measures to system-specific conditions is a key component of any recommendation system. For example, the 10 mg kg − 1 threshold for Olsen P that we used may vary between soil types and crops, due to differences in mycorrhizal symbiosis and/or extent of rooting systems and length of cropping seasons. But this does not address the uncertainty in assessment as to whether a particular field is at or below the threshold and the amount of nutrients a soil can supply. Therefore, field specific recommendations are, also in these recommendation systems, probably acceptable on average but not accurate for any individual field. 2. The influence of error is present in all recommendation systems. The use of QUEFTS allowed to quantify the influence of uncertainty on the prediction of soil N, P and K supply and therefore on site-specific N, P and K fertilizer recommendations. The error of prediction is unknown in most traditional fertilizer recommendation systems, i.e. the recommendation is based on a prediction of soil nutrient supply, nutrient recovery from fertilizers and expected crop demand. All three components vary from year to year, which makes it difficult to evaluate the accuracy of field specific recommendations. In one of very few analysis of errors in fertilizer recommendations based on soil tests, Fryer et al. (2019b) concluded that "…Mehlich-3 extractable P and K in Arkansas accurately predicted the correct crop response to fertilization at 38-50% of the site-years for P and 60-78% of the siteyears for K". The most common error was a false-positive result: the lowest accuracy occurred at site-years where P, K, or P and K were recommended, while fewest errors were observed when P and K fertilizer was not recommended. The results from Fryer et al. (2019b) suggest that Mehlich-3 extractable K from oven-dried soil can predict with reasonable accuracy where K fertilizer is needed to maximize agronomic yield, but is a poor predictor of the optimal fertilizer-K rate that is needed. Response predictions for flooded rice were even poorer and only accurate for 40% for soil test P and K (Fryer et al., 2019a). These findings provide strong support for our conclusions that site-specific tailoring of fertilizer recommendations is, at least with current methods, a pipe-dream and indicate that uncertainty is also large for calibrated soil tests, in particular for fields with small concentrations of available soil P and K as often observed in smallholder systems. Ros et al. argue that fertilizer recommendations should be based on empirical relationships to e.g. relate a soil test result to uptake of a particular nutrient under the assumption that all other nutrients are not limiting. The QUEFTS model has a strong empirical basis and includes descriptive components and parameters that vary and must, therefore, be calibrated to specific conditions, just like any empirical relationship. Recommendations based on regionally calibrated QUEFTS are thus not different from recommendations based on empirical relationships derived from experimentation. By using this model, we assumed that the relationship between fertilizer supply and yield response is perfectly known and depends only on a combination of fertilizer applications, recovery of applied N, P and K and the ratio of N, P and K in the soil.
We have shown that a site-specific recommendation based on a single soil sample makes no practical sense and is not accurate enough to reliably differentiate fields. There is no evidence that our suggested approach will be more reliable or sustainable than an approach based on one soil sample, but it surely is a lot cheaper and easier to implement at scale. Our analysis shows that applying NPK in balanced ratios results in more reliable yield responses to N than when site-specific ratios of N, P and K are used. This suggests that uptake and yield responses to N are better and environmental risks associated with N will be strongly reduced, including leaching of nitrate to ground-and surface waters and excessive greenhouse gas emissions due to denitrification.
However, in our approach we assume that farmers will be able to differentiate fields with a poor soil nutrient status, characterised by low yields from fields with a good nutrient that are characterised by relatively high yields. Ros et al. made the point that a correct selection of representative field trials is important as other factors may overrule the impact of soil nutrient availability. Firstly, detailed calibration of soil tests with experimental data has never been done in many countries and representative and context-specific field trials will not be available within the near future. Secondly, this is a very theoretical point as variability between fields on individual farms is far larger and more important than variability among regions. These differences between fields are due to a combination of differences in soil nutrients pools and crop management, where soil nutrient pools strongly reflect differences in historical management. Given this very large on-farm variability it is nigh-on impossible to select an appropriate representative field trial without acknowledgement of field management factors. It is therefore no surprise to see that sitespecific nutrient management tools such as the Nutrient Expert (Pampolino et al., 2012;Pasuquin et al., 2014) rely more strongly on management information than on information derived from soil sampling. 3. We agree with Bouma that soil types strongly differ in a wide range of properties that affect yields and the influence of soil type on waterlimited yields is strong. However, under good agronomic practices, which include placement of basal fertilizers and top-dressing with N in split applications, the differences between soil types in agronomic efficiency (AE) are minor when moderate amounts of fertilizer are applied and target yields are well below the yield potential in a given field. The differences in AE between fields in a region are therefore not strongly influenced by soil type, and soil tests for a particular nutrient are not related to responses to that nutrient (Maman et al., 2018). Further, Njoroge et al. (2019) observed that past management was more important than soil texture in explaining differences in yield among fields. These somewhat surprising results can be understood when considering that yields are, in general, limited by N and P and occasionally K in many smallholder systems. Under good agronomic management, the N input is adapted to crop demand. AE is, therefore, largely determined by the supply of P and K in the soil: high AEs for N are only possible when P and K are abundantly available. Differences in P recovery among soils are strongly reduced by P fertilizer placement close to the crop roots. Placed P fertilizer eliminates the strong pH influence on P availability and circumvents strong soil P retention by sesquioxides in the soil, by creating local pockets of P-saturated soils where P is available for plant uptake (van der Eijk et al., 2006). Soils with strong P retention create a smaller volume of P saturated soil but with higher concentrations and "apparently these opposing tendencies caused that the crop response to incorporated P was not affected by the soil's P retention capacity." (Van der Eijk et al., 2006). However, the differences in uptake from soil P and K stocks remain when P is applied in moderate amounts, as plant demand is not fully met. 4. We acknowledge that within-laboratory variability is smaller than between laboratory variability. We have tested a situation where a sample was sent to any certified laboratory that is part of the Wageningen Evaluation Programs for Analytical Laboratories (WEPAL, www.wepal.nl) ring test. In our view, this is the most honest test as internal quality standards of laboratories are unknown to users. A user is therefore unaware whether a particular laboratory is better than others: there is no comparable certificate or objective quality standard available. Many laboratories have a national or ISO certification that indicate if proper procedures are followed for e.g. sample preparation, yet they do not evaluate the accuracy or reproducibility of results (Hartmann and Suvannang, 2018). This requires proficiency tests or ring-tests. Outcomes of the WEPAL ring test are anonymous and only known by the laboratory. Further, a high repeatability of analytic procedures alone is not sufficient, also the bias and differences between a specific laboratory and the laboratory that was used to determine the critical thresholds used are important. In our analysis, we assumed that the threshold value is known precisely while also this value has a range due to variability in the underlying field experiments, soil sampling procedures, and laboratory errors. 5. Spectral analysis or proximal soil sensing (Molin and Tavares, 2019) provide ample opportunities to reduce costs of analytical procedures and can also be used to repeatedly measure in the field, substantially reducing the field sampling error. A lower cost for analyses provides options to change the standard sampling procedure where a single sub-sample of a pooled sample is analysed, to one where multiple samples from each field are taken and analysed. Further proximal sensing is most promising in reduction of field sampling errors. We increasingly see promises being made of proximal soil sensors that can scan the soil and be used to provide a fertilizer recommendation (van Beek, 2019). While this may be a good marketing strategy for consultants and companies selling advisory services, these promises are not supported by the peer-reviewed literature (Holmes et al., 2019;Pätzold et al., 2020). The spectral reflectance or transmission signature is specific for organic matter content, clay mineral content, metal-OH bends or first-second or third overtones of OH, SO 4 or CO 3 groups and CO 2 and H 2 O bonds in molecules (Stenberg et al., 2010). The good accuracy of predictions of soil texture, pH, organic matter and N contents with soil spectra can be fully understood. But it is often overlooked that the spectral reflectance or transmission signal is not specific for soil P or K content: estimates rely on autocorrelations, e.g. between P and K and organic matter and clay content, amongst others (Molin and Tavares, 2019). Organic material in plants or soil, but also in animal manures, have a rather narrow range of macronutrient content with rather stable N:P:K ratios. However, when fertilizer is applied, these ratios are no longer in tune with soil nutrients and autocorrelations are broken. Predictability of Olsen P and exchangeable K contents of soils using spectral methods is poor and includes a large prediction error (Towett et al., 2015). These two soil chemical parameters are most important for fine-tuning fertilizer recommendations, as N fertilizer is nearly always needed. Other techniques that are specific for P and K atoms, such X-ray fluorescence or laser induced breakdown spectroscopy (Lu et al., 2013), are not yet sensitive enough or limited to bench-top applications (Molin and Tavares, 2019) and measure only total, not available concentrations. Further, soil N supply strongly varies with seasonal conditions and soil N supply cannot be predicted from soil analysis, even in regions with a long history of synthetic fertilizer use. So despite the promises of soil proximal sensing and its ability to provide cheap in situ soil measurements, in our opinion the currently available in situ sensors will not improve P and K fertilizer recommendations.
Our findings highlight the need to acknowledge and report on the influence of error in laboratory procedures for analysis of nutrient contents, especially under nutrient-limiting concentrations that frequently occur in smallholder environments. Rigorous ring-testing procedures with transparent outcomes are key, such as those developed by WEPAL, aligning with current Global Soil Laboratory Network initiatives (Hartmann and Suvannang, 2018). Further, strict protocols for calibration and testing of novel scanners and sensors that are used in the field need to be developed. In our opinion, tests must include fields with and without N, P and K fertilizers in various ratios to test the validity of predictions, especially where strong autocorrelations between soil characteristics and nutrient concentrations can be expected. We urge laboratories to report on the accuracy of soil nutrient content predictions on independent sites, including predictions of plant nutrient uptake for unfertilized fields and nutrient omission plots.