Accuracy of regional-to-global soil maps for on-farm decision making: Are soil maps “good enough”?

. A major obstacle to selecting the most appropriate crops and closing the yield gap in many areas of the world is a lack of site-specific soil information. Accurate information on soil properties is critical for identifying soil limitations and the management practices needed to improve crop yields. However, acquiring accurate soil information is often difficult due to 15 the high spatial and temporal variability of soil properties at fine scales and the cost and inaccessibility of laboratory-based soil analyses. With recent advancements in predictive soil mapping, there is a growing expectation that soil map predictions can provide much of the information needed to inform soil management. Yet, it is unclear how accurate current soil map predictions are at scales relevant to management. The main objective of this study was to address this issue by evaluating the site-specific accuracy of regional-to-global soil maps, using Ghana as a test case. Four web-based soil maps of Ghana were 20 evaluated using a dataset of 6,514 soil profile descriptions collected on smallholder farms using the LandPKS mobile application. Results from this study revealed that publicly available soil maps in Ghana lack the needed accuracy (i.e., correct identification of soil limitations) to reliably inform soil management decisions at the 1-2 ha scale common to smallholders. Standard measures of map accuracy for soil texture class and rock fragment class showed that all soil maps had similar performance in estimating the correct property class, with overall accuracies ranging from 9-35% for soil texture classes and 25 26-59% for soil rock fragment classes. Furthermore,

Ideally, smallholder farmers would be able to characterize the variability of their soils using laboratory-based physical and chemical analyses. In reality, high cost, limited access, and slow turnaround times have prevented most farmers from obtaining and using detailed soil laboratory information, while limited crop-and soil-specific knowledge have constrained the use of this information. Soil maps have been widely viewed as at least a potential solution to this information gap, resulting in continued efforts to improve the spatial resolution and accuracy of soil map information (Brevik et al., 2015). 70 While recent advancements in soil mapping allow for the prediction of soil information at management-relevant scales, the utility of those predictions depends on how accurately they portray fine-scale (i.e., 1-2 ha) soil variability. Failure to accurately characterize soil variability at the farm or field scale can severely limit the reliability of land suitability assessments (i.e., fitness for a specific land utilization type, e.g., low-input, rain-fed wheat), and thus the ability to identify soil limitations and/or the conditions suitable for sustainable agricultural intensification. 75 Soil maps characterize spatial variability using either conventional or predictive soil mapping techniques. Conventional soil maps partition a landscape into finite circumscribed regions (i.e., soil map units), where boundaries are sharp lines delineating clear differences in soil types (Heuvelink and Webster, 2001). Conventional soil maps use empirical, expert-based models to delineate the location and extent of soil types. These empirical models are often based on local geomorphology and vegetation patterns and validated by direct observation. The typical ranges of soil properties encountered for each soil type are established 80 based on representative soil profiles and expert knowledge. In contrast, predictive soil maps characterize soil properties and classes (i.e., class probabilities) as continuous modelled values at a fixed grid cell resolution across a mapping area. Predictive soil maps are created from numerical or statistical models based on relationship among environmental variables and soil properties or classes. These models often use legacy soil profile data and remotely sensed environmental covariates (e.g., slope, normalized difference vegetation index [NDVI]) that approximate soil forming factors (e.g., topography, climate, geology, 85 vegetation). Predictive soil maps are driven by the modelling of spatial data and therefore limited by both the point data available for training/validation and the covariate data used for model development.
For many smallholder farmers, obtaining actionable soil information from soil maps is an attractive option. Multiple sources of soil map information raise several important questions for end-users, including: How accurate are soil maps at my farm?
Which soil map product is the most accurate? Can I use soil map information to inform my soil management decisions? 90 Answers to these questions require an understanding of site-specific soil accuracy as it relates to both the relative accuracy of soil map information (i.e., compared to soil profile measurements) and the levels of soil map accuracy required for different land management applications (i.e., functional assessment).
This study evaluated the site-specific accuracy of four publicly available web-based soil maps of Ghana (Harmonized World Soil Database, World Inventory of Soil Property Estimates, SoilGrids250m, and iSDAsoil) using a dataset of 6,514 soil profile 95 descriptions collected on smallholder farms using the LandPKS mobile application ('app') (Herrick et al., 2013). We evaluated three static soil properties (reflecting the long-term potential of the soil): soil texture class (USDA), rock fragment volume class, and soil depth (i.e., depth to bedrock). These properties directly affect agricultural production and can be used to inform farmer decisions on a variety of management practices such as irrigation frequency and erosion control. They also determine how susceptible soils are to declines in fertility and how responsive they are likely to be to different types and amounts of 100 fertilizer and organic amendments such as compost and manure. We used standard measures of classification accuracy to assess the relative accuracy of each soil map. To help further contextualize these soil property differences, we used a modified version of the Global Agro-Ecological Zone (GAEZ) soil suitability modelling framework to derive soil suitability ratings for each soil data source (Fischer et al., 2008). Using a low-input, rain-fed, maize production scenario, we evaluated the functional accuracy of map-based soil property estimates relative to site-based measurements. The main objective of this study was to 105 improve our understanding of differences in soil map products, the relation of these products to field observations, and the functional accuracy of soil map data for informing soil management recommendations.

Study Area
The study was conducted in Ghana, West Africa, within the Northern, Upper West, and Upper East regions in northern Ghana 110 and the Western and Ashanti regions in southern Ghana (Fig. 1). The study area spans four agro-ecological zones, the Guinea Savannah and Sudan Savannah in the north and the Deciduous Forest and Wet Evergreen Rainforest in the south. The northern agro-ecological zones have a unimodal rainfall pattern with a mean annual rainfall of 1,100 mm, resulting in a single growing season from July to September. Agro-ecological zones in the south have a bimodal rainfall pattern and receive between 1500and 2200-mm rainfall per annum, resulting in a major and minor cropping season. Soils in northern Ghana are predominantly 115 Plinthisols and Planosols with smaller areas of Lixisols and Luvisols (Adjei-Gyapong and Asiamah, 2002;Awadzi and Asiamah, 2002). Soils in southern Ghana are predominantly Ferralsols and Acrisols, with smaller areas of Lixisols, Alisols, and Nitisols. Except for Luvisols in the north and Nitisols in the south, most soil types in Ghana have moderate-to-severe limitations for crop production, including low fertility (Acrisols, Alisols, Ferralsols, Lixisols), aluminum toxicity (Acrisols, Alisols, Planosols), shallow rooting depth (Plinthosols), high erosion risk (Alisols), and susceptibility to drought (Acrisols, 120 Alisols, Ferrasols, Plinthosols).

Soil map acquisition and processing
Four soil mapping products were evaluated in this study: two conventional soil maps ( (i.e., soil map unit components). A common method for dealing with this spatial uncertainty is to assign any location within a SMU to its dominant soil component. In our comparisons of soil property values, we used the property values associated with the dominant SMU component from the HWSD and WISE maps. In Ghana, HWSD is derived from the FAO-UNESCO Digital Soil Map of the World (DSMW) which has a map scale of 1:5,000,000 (translates to a spatial resolution of ~2.5 km). HWSD soil property data is derived using soil profile data from the WISE soil profile database and pedotransfer rules, producing two 135 aggregated soil depth intervals (0-30 and 30-100 cm) (Nachtergaele et al., 2009). The WISE soil map is a recent improvement upon HWSD, where an expanded WISE soil profile database and new pedotransfer rules were used to derive soil profile data at seven standardized depth intervals (0-20, 20-40, 40-60, 60-80, 80-100, 100-150, 150-200 cm). HWSD and WISE have identical spatial data (map scale: 1:5,000,000; spatial resolution: ~2.5 km) but differ in their soil property data (2 vs 7 depths for HWSD and WISE, respectively) (Batjes, 2016). Predictive soil mapping products (e.g., SoilGrids, iSDA) offer an 140 alternative to conventional soil maps by providing predictions of soil properties and classes at specific locations. SoilGrids is a global predictive soil map that predicts soil properties at a 250 m spatial resolution at six standard depths (0-5, 5-15, 15-30, 30-60, 60-100, and 100-200 cm) (de Sousa et al., 2020). iSDA is a predictive soil map of Africa that predicts soil properties at a 30 m spatial resolution at two standard depths (0-20, 20-50 cm) (Hengl et al., 2021). Soil map data was obtained from online repositories. Soil map predictions for sand, silt, and clay percentage, rock fragment volume, and depth to bedrock were 145 extracted from each map at all 6,514 sampling locations. For SoilGrids, depth to bedrock values were extracted from SoilGrids version 1.0 since no new map predictions were available for version 2.0. Among the soil data sources, the maximum prediction depth was shallowest for iSDA at 50 cm (Table 1).
To facilitate comparison between the different soil data sources, we segmented each soil profile into 1 cm slices and then aggregated the slices (depth-weighted average) using a standard set of depth intervals (i.e., 0-10, 10-20, 20-50 cm). The 150 maximum soil depth for each data source was set at 50 cm to ensure all data sources had soil property values at each depth interval in our comparison (Fig. 2). The segmenting algorithm was implemented using the 'aqp' package for R (Beaudette et al., 2013). For each reaggregated soil depth interval, we calculated soil texture class based on USDA texture classes and rock fragment volume class based on the LandPKS rock fragment class intervals (i.e., 0-1, 1-15, 15-35, 35-60, and >60%).

Field data collection 160
Soil profiles were sampled as part of two different monitoring and evaluation (M&E) surveys of smallholder farmers in Ghana: USAID's Feed the Future (FTF) project (Northern Ghana) and a World Bank funded research project, Map to the Future (M2F) (Southern Ghana). The FTF project used a cross-sectional multi-stage cluster sampling design, using probability proportional to size sampling to select smallholder farms (USAID, 2013). At each selected farm a single representative site (i.e, visually assessed to represent the average biophysical condition) was selected at each farm for soil sampling (farms/soil profiles=6,289). 165 The M2F project used a conditioned Latin hypercube sampling design (cLHS) to select a subset of smallholder farmers participating in an agricultural advisory pilot project (FarmGrow: Daniel et al., 2020). Baseline agronomic information (e.g., agricultural practices, soil condition, annual yield) was used to stratify the cLHS subsampling. At each selected farm in the M2F project, three soil profiles were sampled from each farm field, with sampling locations chosen by the farmer to reflect within-field soil variability (farms=75, soil profiles=225). 170  et al., 2018; https://landpotential.org/knowledge-hub/). This involved sampling of soils by either hand auger (northern Ghana) or from soil pits (southern Ghana) at 5 standard depth intervals (i.e., 0-1, 1-10, 10-20, 20-50, 50-70 cm). Soil samples were passed through a 2 mm sieve and analysed for soil texture (USDA textural classification) using the hand texturing method (Schoeneberger et al., 2012) and rock fragment volume (i.e., volume percent of rock fragments >2mm) class (i.e., 0-1%, 1-15%, 15-35%, 35-60%, >60%) using visual estimates (USDA-NRCS, 2020). Depth to bedrock was also recorded if 180 encountered within the 70 cm sampling depth.
While most soil map predictions are derived from laboratory-based property measurements (e.g., HWSD, WISE, SoilGrids), several recent studies have shown field-estimated soil property values can produce relatively accurate estimates when compared to laboratory measurements (Salley et al., 2018;Vos et al., 2016). For example, Salley et al. (2018) reported that professional soil scientists and field technicians correctly estimated laboratory-determined texture classes for 66% and 41% of 185 samples, respectively. And when a 'correct' prediction also included adjacent textural classes, accuracies increased to 91% and 78% for professionals and field technicians, respectively (Salley et al., 2018). The compatibility of these different measurement methodologies was recently demonstrated with the iSDA soil maps which used both laboratory and field-based measurements to predict soil texture and rock fragment volume (Hengl et al., 2020).
All data recorded in the LandPKS app were synchronized to a cloud-based data storage system. Soil profile data were 190 downloaded from the LandPKS data portal (https://landpotential.org/data-portal/, accessed Nov 6, 2020). Quality control filtering was performed on LandPKS data to remove incomplete sites. This included removing sites with missing soil property data and sites that were not sampled at all 5 depth intervals.

Soil evaluation datasets
In developing our soil map evaluation procedure, we identified two issues that needed to be addressed. First, the LandPKS soil 195 data from the FTF project (6,289 sites) were used as part of the iSDA model calibration/training dataset, and therefore could not be used for independent evaluation of the iSDA map predictions (Hengl et al., 2020). The second issue was with the spatial support of the soil map accuracy assessment. Since most of the smallholder farms evaluated in this study (i.e., FTF project in northern Ghana) were sampled at only one location, most sites could only be evaluated at point-support (i.e., individual site value vs. predicted map value). The M2F study sites, although a considerably smaller dataset (n=225) and concentrated in 200 southern Ghana, were not used in the iSDA model and each farm contained three soil profile observations allowing for both an independent accuracy assessment of iSDA predictions and for accuracy assessments at both point and field-support.
Consequently, we evaluated three different datasets to account for these issues:

Soil map accuracy assessment at field-support
To calculate accuracy measures at field-support we need to compare the average of site values within a field to the average of all predicted map values within a field. For the study sites from the M2F research project in southern Ghana, the exact boundaries of each field were not available. Consequently, we approximated the area of each field by creating a convex hull around each set of sampling points (n=3) within a field and then applied a 10 m buffer around the perimeter of each delineated 210 area. Using the approach described by Bishop et al. (2015), each buffered area was then discretised into a 10 m grid, with the center of each grid converted to a point and used for extracting soil map predictions. A 10 m grid was chosen to ensure representative sampling of the soil maps across all grid resolutions. The values extracted at each point where then averaged, giving an approximate area weighted average for each sub-field delineation. The average measured field values were obtained by averaging values from the three soil profiles within each field. 215

Soil map evaluation methods
We evaluated the relative and functional accuracy of the soil maps using two different methods, (1) matching of soil property classes (relative accuracy), and (2) matching of crop-specific GAEZ soil suitability ratings (functional accuracy).

Soil property class match
The soil property class match approach applies an exact matching criterion where the measured soil property class at each site 220 and soil depth is compared to the predicted property class in each soil map. Because this approach requires an exact match it can result in a high rate of misclassification among similar soils and therefore provides a conservative measure of map accuracy. We addressed this with a second measure that also considers all adjacent property classes to be correct. For this method, we evaluated map accuracy for soil texture and rock fragment volume classes.
Map performance was evaluated using overall map accuracy, adjacent-overall accuracy, producer's accuracy, user's accuracy, 225 and balanced error rate. Overall map accuracy (OA) is the proportion of all observation points at which the map predicts the correct soil property class (i.e., soil texture class, rock fragment volume class). Adjacent-overall accuracy (OA-adj) includes all property classes adjacent to the correct class as a 'correct' prediction. Producer's accuracy (PA) and user's accuracy (UA) are calculated separately for each class. PA is the probability that a ground reference test sample is classified correctly in the the FNR and FPR, BER can account for problems of class imbalance, where models that overpredict the dominant class will receive a higher BER.

Global Agro-Ecological Zone soil suitability
Assessing map accuracy based on the soil property class match rate fails to account for when the predicted class is functionally similar to the measured class. In other words, sometimes misclassification of a soil property simply does not matter much for 240 management. For example, a sand texture misclassified as a loamy sand would be functionally more similar than a sand texture misclassified as a sandy clay. To account for these relative differences, we evaluated the functional similarity between data sources using a simplified version of the GAEZ soil suitability modelling framework. The GAEZ framework, developed by the Food and Agriculture Organization of the United Nations (FAO) and the International Institute for Applied Systems Analysis (IIASA), uses soil data and detailed agronomic knowledge to quantify land productivity and crop-specific agronomic 245 potential (Geze Toth, Bartosz Kozlowski, Sylvia Prieler, 2012). GAEZ soil suitability calculations follow a two-step approach, where (1) cropping system-specific responses (i.e., unique combination of crop type, management level, and water supply) to individual soil properties are combined into soil quality ratings, and (2) individual soil quality ratings are combined to calculate management-specific soil suitability ratings. The soil suitability ratings serve as a functional metric for comparing differences in the soil property predictions between the different soil maps. The GAEZ soil quality framework uses multiple soil properties 250 to calculate each of the soil quality indices, including: soil nutrient availability (SQN ) = ƒ(soil texture, organic carbon, pH, and total exchangeable bases); soil rooting conditions (SQR) = ƒ(soil depth, soil phases); and soil workability (SQW) =ƒ(soil depth, texture, rock fragments, soil phases, vertic soil properties).
Our modified GAEZ framework used a low-input, rain-fed, maize production scenario to translate soil property information at each site into crop-specific soil suitability ratings for each soil data source. We calculated simplified soil quality indices 255 using soil texture class, rock fragment class, and soil depth as input properties. We used maize as our modelled crop due to its widespread cultivation throughout Ghana ( Fig. 1) and our selection of input soil properties was limited by those properties common to all data sources. Soil property values at each site were used to calculate three different soil quality indices (SQs): soil nutrient availability (SQN), soil rooting conditions (SQR), and soil workability (SQW). Each soil quality index has its own unique set of soil properties ratings based on their relative influence. SQs: SQN = soil texture, organic carbon, pH, and total 260 exchangeable bases; SQR = soil depth, soil phases; SQW = soil depth, texture, rock fragments, soil phases, vertic soil properties.
Since soil texture, rock fragments, and soil depth were the only properties was the only common property among all datasets and therefor used to calculate SQN The natural availability of soil nutrients is critical for crop productivity in low-input farming systems. Soil texture class was used as an indicator of rock-derived nutrient availability, with finer textured soils (e.g., clay) typically having higher nutrient 265 availability than coarse textured soils (e.g., sand). The rock-derived nutrients include phosphorus, micro-nutrients, and base cations, and many of these nutrients are associated with specific mineralogy and tend to be less concentrated in sandy soils (Sollins et al., 1988). In contrast, nitrogen is substantially influenced by nitrogen fixation and soil organic matter content. Soil nutrient availability was calculated as: where STR is the soil texture class rating.
Soil rooting condition assesses the effective soil depth and volume for crop roots by accounting for the effects of soil depth, soil texture, and rock fragments volume. The soil rooting condition index was calculated as: where SDR is the soil depth rating, STR is the soil texture rating, and RFR is the rock fragment rating. 275 Soil workability or ease of tillage is affected by both physical hindrances to cultivation (e.g., bedrock, rock fragments) and limitations imposed by soil texture. Soil workability was calculated as: where X is the soil property rating (i.e., SDR, STR, RFR), jo denotes the soil property with the lowest rating such that: SRjo ≤ SRj, j=1:3. 280 The three soil quality indices were combined to calculate the soil suitability rating (SR):

Soil Property Distributions 285
While the spatial distribution of surface soil texture classes differed among the four soil maps, they all displayed a general trend of coarser soil textures in the north and finer soil texture in the south (Fig. 3). Furthermore, all four maps showed similarities in the relative distribution of certain soil texture classes, with sandy loam, loam, sandy clay loam, and clay loam dominant within most maps ( Fig. 4a-d). The WISE soil map predicted the highest diversity of soil texture classes (i.e., classes ≥1% map area) with six texture classes, followed by HWSD and SoilGrids with five texture classes, and iSDA with four texture 290 classes (Fig. 4). LandPKS field-based measurements spanned the widest range of textures, with a total of 11 classes (Fig. 4e). This is not surprising given the natural variability of soil texture at fine spatial scales. For WISE, HWSD, and SoilGrids, both the diversity of soil property classes and their relative distributions across the 6,514 study sites (Fig.4f-h), which comprised a total of 19,542 soil layers, were similar to the soil property maps (Fig. 4a-c). In contrast, ISDA soil property distributions were markedly different at the study sites ( Fig. 4i), exhibiting a higher diversity of property classes relative to their depth-wise areal 295 distributions across Ghana (Fig. 4d), likely a result of the influence of the FTF sites on iSDA model predictions. This contrasted with the M2F dataset (independent of iSDA model predictions) where the distribution of iSDA texture predictions was more closely aligned with the depth-wise areal distributions across Ghana. It should be noted that the soil depth intervals for the depth-wise areal distributions differ from the depth intervals used for the study site distributions, where each unique set of depths for each map were converted to LandPKS standard depth intervals. This resulted in maps with deep surface depth 300 intervals (i.e., HWSD: 0-30 cm, WISE and iSDA: 0-20cm) being assigned these surface property values for the first two LandPKS soil intervals (i.e., 0-10, 10-20 cm), effectively adding greater weight to the surface texture in most maps relative to the unweighted distributions based on the original map depth intervals presented in Figures 4 and 5.
LandPKS sites are predominantly coarse textured soils with 71% of soil layers classified as sandy loam or coarser. In contrast, SoilGrids predicted only 8% and HWSD 21% of soil layers as sandy loam or coarser. WISE was more similar with 51% of 305 soil layers, and the predictions from iSDA the most similar with 77% of soil layers classified as sandy loam or coarser. Figure   5 shows the distribution of soil rock fragment classes for the LandPKS sites and corresponding soil map values. LandPKS sites showed a range of soil rock fragment classes in the FTF-M2F dataset, with an almost equal distribution among the first four classes. WISE and SoilGrids had high percentages in several of the higher rock fragment classes, which more closely aligned with LandPKS values, while HWSD predicted low rock fragments across the majority of sites (97% in the 1-15% 310 class). For the M2F sites, LandPKS values were spread across multiple rock fragment classes (Fig. 5j), in contrast to HWSD, WISE and ISDA which predicted almost all sites in the 1-15% class (Fig. 5).

Soil Property Class Match
Overall accuracy of soil property maps was low, ranging from 9% to 35% for soil texture class and 26% to 59% for soil rock fragment class across the three test datasets (Table 2). Overall accuracies were slightly higher in southern Ghana (M2F dataset) for both texture and rock fragments, although these higher accuracies are due to the overprediction of the dominant classes. 330 For example, in the M2F dataset, WISE, SoilGrids and iSDA soil maps predicted 99-100% of the sites in the 1-15% rock fragment class which was the most dominant measured class (i.e, 59% of sites; Fig. 5j). This resulted in higher model accuracy but low sensitivity for all other classes (Figs. 6,7). The balanced error rate was high across all maps, ranging from 75% to 95% for soil texture class and 79% to 83% for soil rock fragment class.
For the M2F dataset there was little-to-no difference between the soil map accuracies calculated at point-support (M2F-PS) 335 relative to accuracies calculated at field-support (M2F-FS) ( Table 2). The average farm size across the 75 farms was 2.4 ha (SD ± 2.0) and our delineation procedure captured, on average, 48% of a field's area, with a range of 8% to 100%. On average these delineated areas intersected 1.8 SoilGrids pixels (range: 1-4 pixels) and 8.6 iSDA pixels (range: 2-24). Due to the large size of HWSD and WISE map unit polygons in Ghana, all farms fell within a single map unit and thus were attributed with the dominant component property value for both the point-support and field-support cases. 340 When we expanded our measure of prediction accuracy to include adjacent classes (i.e., OA-adj), classification accuracy increased to 39-85% for soil texture and 73-93% for rock fragment volume (Table 2). Individual soil texture class and rock fragment class producer accuracies for the FTF-M2F-PS and M2F-PS datasets are show in Figures 6 and 7, respectively.
Although 51% of LandPKS site-based texture measurements were either sand or loamy sand, none of the web-based soil maps predicted these classes at any of the sampling sites and at only <1% across all of Ghana (Fig. 4). Soil texture classes with 345 higher prediction accuracies included sandy clay loam, sandy loam, loam, and clay loam, which corresponded to the most common texture classes predicted among the four maps (Figs. 4,6,7). A similar trend occurred for rock fragment volume class (Figs. 4,6,7).

Agro-Ecological Zone Soil Suitability 365
The distribution of maize soil suitability ratings and classes calculated using LandPKS site-base data and soil map data are shown in Figure 8. Soil suitability ratings were noticeably different between LandPKS and the soil maps, with LandPKS values being substantially lower than those of the soil maps. Furthermore, the range of suitability ratings for LandPKS was significantly wider than that of the soil maps for both datasets (Fig. 8a,b). While iSDA appears to capture these lower suitability ratings (Fig. 8a), when we look at the independent test dataset (M2F) we see that iSDA fails to detect these lower suitability 370 soils (Fig. 8b). When suitability ratings are translated to suitability classes, these differences are further emphasized, with sampling sites classified in the top two suitability classes for the soil maps, whereas sampling sites for LandPKS were more evenly distributed across the suitability classes (Fig. 8c,d). While both the predictive and conventional soil maps were classified in either the 'No constraint' or 'Slight constraint' classes, 65% of sampling sites (17% of M2F sites) were classified as having moderate to very severe soil constraints based on LandPKS site-specific data (Fig. 8c,d). OA for the GAEZ suitability classes 375 were similar for all four soil maps, at just 15-18% for the complete dataset and 27-61% for the M2F datasets (Table 3). High PA for the 'No constraint' suitability class and low-to-zero percent accuracies for the other suitability classes further show the over-prediction of the 'No constraint' suitability class among the four maps (Table 3). A PA of zero for a suitability class means that the maps did not correctly predict any of the field observations of that class. UAs that fail to return a value indicate that the map failed to predict any values of that class (Table 3). 380 Analysis of the individual soil quality indices revealed that most sites were limited by their availability of soil nutrients, with 50% of LandPKS sites having a moderate to very severe constraint (Fig. 9a). Far fewer sites were constrained by rooting conditions or workability, with only 15% and 2% of LandPKS sites having a moderate to very severe constraint for rooting conditions (Fig. 9b) and workability (Fig. 9c), respectively. Nutrient availability was a main source of limitations identified for HWSD but not for either WISE or SoilGrids. HWSD and WISE also had limitations identified for soil rooting conditions 385 in a small subset of sites. No limitations were identified for workability by any of the soil maps (Fig. 9c).

Evaluation of soil map accuracy
The utility of a soil map depends on its intended use and the level of accuracy required for that use. When applied at a regional scale, current soil maps have been used effectively to inform agronomic and environmental policies. However, less is known about the accuracy of soil maps at the farm/field scale and whether soil map data at this scale is sufficiently accurate to inform site-specific land management. Differences in soil properties between sites, like pH or texture, can result in highly different 405 management requirements. For example, many essential plant nutrients become increasingly unavailable in soils at low pH (e.g., <4.5), making any efforts to fertilize acidic soils ineffective. Acidic soils require the application of amendments (e.g., lime) to raise the soil pH before any inherent nutrient deficiencies can be addressed. Accurately identifying these site-specific soil differences is critical for addressing the soil limitations that currently inhibit crop yields. Results from this study revealed that publicly available web-based soil maps of Ghana lack the needed accuracy to reliably inform soil management decisions 410 on smallholder farms (i.e., 1-2 ha). Standard measures of map accuracy for the class-based soil properties (i.e., texture class, rock fragment class) showed that all the soil maps were equally inaccurate in estimating the correct property class, predicting the wrong texture class 6-9 times out of 10 and the wrong rock fragment class 4-7 times out of 10. A similar study in Namibia evaluated the accuracy of surface soil texture estimates from seven soil maps (including HWSD, WISE and SoilGrids) (Buenemann et al., 2021). This study found that soil maps in Namibia predicted the correct topsoil texture class in only 13% 415 to 42% of test sites, indicating that none of the maps were sufficiently accurate for most site-specific management applications.
Another study in Rwanda evaluated the accuracy of SOC and pH predictions from AfsoilGrids250 maps . Söderström et al. (2017) found that the AfsoilGrids250 soil map predictions in Rwanda were poorly correlated to an independent validation dataset, with coefficients of determination of 0.05 and 0.11 for SOC and pH, respectively.
Accuracy assessments based on exact matching of soil classes, however, can underrepresent the functional accuracy of soil 420 properties. For example, two soils with the same clay content (15%) but slightly different sand contents (51 vs 53% sand) would fall into two different soil texture classes (sandy loam and loam, respectively) due to their proximity to the texture class boundary (Fig. 10a). If we predicted both soils to be sandy loam, our accuracy would only be 50% even though both soils may function like a sandy loam. Accounting for class adjacency in the overall accuracy evaluation accounts for these 'near misses', providing a less restrictive assessment of map accuracy. However, although class-adjacent accuracies in this study were higher 425 than overall accuracies for texture across all soil maps, they only increased to 39-49% for the FTF-M2F dataset, indicating that 50-60% of map-based soil texture estimates were considerably different (i.e., greater than one texture class difference) than site-based estimates. A similar result was found in Namibia, where topsoil texture predictions were often more than one textural class away from the site-based classes (Buenemann et al., 2021). In the smaller M2F dataset, class-adjacent soil texture accuracies were considerably higher (74-90%) at both point-and field-support, most likely due to the higher proportion of 430 finer textured soils which were more accurately predicted by the soil maps in these areas. (Table 2, Fig. 1).

435
While comparing relative differences in soil property values can provide insight into the accuracy of map-based estimates, it is often difficult to interpret the functional implications of those differences. Modelling frameworks like GAEZ provide a way to translate soil property differences into crop-specific soil quality indices and soil suitability ratings that can compare soil functional differences. Additionally, the GAEZ framework provides a more holistic means of comparing soils since each soil suitability rating is calculated based on all soil properties and across all soil depths at a site, providing a single functional 440 measure at each location. This contrasts with standard measures of map accuracy that are based on a single soil property at only one soil depth.
In our application of the GAEZ framework (low-input, rain-fed maize) the soil property rating criteria were based on two different levels of soil property generalization. The first based on the groupings of soil property values into soil property classes (Figs. 6,7) and second, the broad grouping of soil classes into soil suitability ratings (e.g., Fig. 10). This means that in many 445 cases soil property differences resulting from either measurement or prediction error will be minimized due to these large within-group property ranges. This is not the case, however, within certain regions of the soil property space. For example, in the GAEZ system, soil texture poses no limitations to nutrient availability for all texture classes except the three classes with the highest sand content (sandy loam, loamy sand, sand) which pose increasing limitations with increasing sand content (Fig.   10). Thus, to accurately assess this limitation one must accurately differentiate a sand, loamy sand, or sandy loam from any 450 texture finer than a sandy loam. On the other end of the soil textural triangle, the clay texture class negatively impacts the soil quality ratings for rooting condition and workability, while all other texture classes do not pose any limitations. Despite these relatively narrow soil property ranges for identifying crop constraints, 65% of sites were classified as having moderate to very severe soil constraints, while the soil maps all failed to predict these high constraint classes at any of the study sites. The low functional accuracy of soil maps in Ghana based on our modified GAEZ framework was due to several contributing factors: 455 (1) constraints on maize soil suitability were largely confined to coarse textured soils (i.e., sand, loamy sand, sandy loam), (2) 71% of site-based texture estimates were in coarse texture classes, and (3) soil maps had low prediction accuracy for the coarse texture classes. In areas dominated by medium-to-fine textured soils, functional accuracies would likely be much higher due to the wide range of texture classes (e.g., sandy clay loam vs silty clay) that are rated functionally similar (Fig. 10a).

Sources of field sampling error
When evaluating the accuracy of soil map predictions, two sources of error can occur from the field sampling protocol: the first originating from the sampling design and second from the sampling methodology. Several recent studies have shown that unbiased assessments of soil map accuracy require independent test datasets that have been generated using probability-based sampling designs (Brus et al., 2011;Piikki et al., 2021). The datasets used in this study were purposive, focusing on smallholder 465 farmers in Ghana, and spatially clustered in northern Ghana, with only a small subset of farms in southern Ghana. However, within this specific land use type, probability-based sampling methods were used to select sites from a larger population of smallholder farms. Therefore this assessment should only be viewed in the context of utilizing soil maps to help inform smallholder farmers, and not their utility for informing other land use types (e,g., forestry, grazing lands) which may provide more accurate soil predictions. 470 Field sampling error can also occur due to differences in sampling methodology. This study used field-estimated soil property values as reference data for evaluating the accuracy of soil map predictions. Field estimation of soil texture using simple dichotomous keys has been shown to produce relatively accurate estimates when compared to laboratory measurements (Salley et al., 2018;Vos et al., 2016). Furthermore, some soils due to their minerology or chemical make-up are not well-suited for laboratory particle-size analysis and field-based estimates may be considered more reliable than lab data (Landon, 1988). This 475 is true for highly weathered oxide-rich tropical soils where traditional laboratory techniques often underestimate clay content due to the soil's resistance to dispersion (Silva et al., 2015). However, difficulties with lab-based clay estimation extend beyond oxide-rich tropical soils, as was shown in the 6 th FSCC interlaboratory comparison which found that clay content was one of the most difficult properties to consistently measure, with a coefficient of variation (CV) of 32% among 50 participating laboratories (Cools and Vos, 2010). A recent interlaboratory comparison among three soil laboratories in Ghana revealed 480 significant variation in soil texture measurements, with a CV of 47% for clay based on 10 soils with contrasting textures (2020, unpublished data).
Numerous studies evaluating the accuracy of field-based soil texture estimates have found that very coarse (e.g., sand) and very fine (e.g., clay) soil texture classes are estimated with higher accuracies relative to medium texture classes (e.g., loam).
For example, field-based texture class prediction accuracies averaged 80% (±7% std.) for sand and 56% (±15% std.) for clay 485 among 8 different studies (Akamigbo, 1984;Foss et al., 1975;Levine et al., 1989;Minasny et al., 2007;Post et al., 1986;Rawls and Pachepsky, 2002;Salley et al., 2018), compared to 34% (±11% std.) for loam among the same studies. Field-based estimates of soil texture in this study were made using the Thien (1979) flow diagram which begins with a ball test to determine if a soil classifies as a sand. The higher accuracies for sand reported in previous studies are likely due to the simplicity of this initial test and its low likelihood of resulting in errors of omission or commission. 490 When evaluating soil data uncertainty, it is important to determine the minimum level of measurement precision needed to inform a particular outcome. While soil particle size mass fractions (high measurement precision) are often required for soil modelling, soil texture classes (low measurement precision) are generally sufficient for on-farm soil management. Thus, using soil texture classes lowers our level of measurement precision which in turn minimizes the different sources of uncertainty (e.g., lab, field). Functional soil assessments often require even lower levels of measurement precision as demonstrated by the 495 GAEZ framework where the soil suitability ratings and classes have relatively low measurement precision (Fig. 10), which in turn further minimized the inherent uncertainty associated with both lab and field-based measurements. Furthermore, in our evaluation of functional accuracy using the modified GAEZ framework, functional differences for soil texture only occur for textural classes either high in sand (sand, loamy sand, sandy loam) (SQN, Fig. 10) or high in clay (clay) (SQW). The generally higher accuracy of hand texture estimates in these regions of soil texture space decreases the probability of sampling error in 500 our functional accuracy evaluation.

Sources of map error: spatial uncertainty
Soil maps are created at defined spatial scales, producing map information (e.g., field data, assigned classes, spatial delineations, interpretations) that is constrained by the patterns and characteristics of those scales (Soil Survey Division Staff, 505 2017). Both conventional and predictive soil maps account for spatial uncertainty in different ways. For conventional soil maps, the mapping scale determines the size and purity of soil map units, where small map scales (e.g., 1:5,000,000) contain large map units comprised of multiple soil components, while large map scales (e.g., 1:12,000) contain smaller map units that are often comprised of a single soil component. The small map scale of HWSD and WISE (1:5,000,000) resulted in individual map unit polygons ranging in area from 61 km 2 to 17,947 km 2 . Across these vast areas each map unit is only attributed with a 510 few soil components whose spatial delineation within each polygon is unspecified. The most common approach to deal with this spatial uncertainty is to attribute each polygon to its dominant soil component, as was done in this study. Depending on the number of soil components in a map unit and their areal extents, it is possible for the dominant component to comprise only a small percentage (e.g., 20%) of a large map unit area. Given the large spatial extent of the map unit polygons in the study area and our generalization of map units based on dominant component, it is not surprising that these map products 515 resulted in low site-specific accuracies.
Predictive soil maps are faced with a different set of challenges relating to spatial uncertainty. Since predictive soil maps use raster based environmental data as their predictive covariates, the spatial resolution of covariate data determines the spatial scale of the resulting soil map, which imparts an implied level of precision to the end-user. The accuracy of predictive soil maps, however, depends on the characteristics of both the soil point data and covariate data used to build the models. An 520 important characteristic of the soil point data is how well it represents the variability of covariate data across the entire inference space. Predictive soil maps often use existing field data from soil surveys that were conducted at different spatial scales to train and validate their models, and therefore may not adequately represent the full covariate information space. This can result in cases where the global model accuracy is high but local model accuracies are low, because certain geographic regions within the prediction area are poorly represented in covariate space. This was seen in the case of iSDA where global prediction 525 accuracies (Concordance Correlation Coefficient) for sand, silt, and clay ranged from 0.78-0.85 (Hengl et al., 2021), yet texture class prediction accuracies in Ghana were low (OA: 0.28-0.32). Similarly, SoilGrids global prediction accuracies (model efficiency coefficient) for sand, silt, and clay ranged from 0.40-0.70 (Poggio et al., 2021), while SoilGrids texture class prediction accuracies in Ghana were low (OA: 0.08-0.35). While these map products were not able to provide sufficiently accurate soil property predictions for site-specific management within the study area, high global prediction accuracies for 530 many of the modelled soil properties indicates that these maps have higher accuracies across larger spatial/variance scales.
In predictive soil modelling, model error is composed of two components: bias, which relates to model accuracy; and variance, which relates to model precision or uncertainty. SoilGrids and iSDA maps both employ ensemble modelling approaches to calculate spatial predictions of model uncertainty (SoilGrids: Quantile Random Forest; iSDA: Ensemble Bootstrapping).
Ensemble models are effective at increasing model accuracy and are often implemented using some form of model averaging 535 (Polikar, 2012). Assuming the different soil models produce different errors at each location, averaging the model outputs generally reduces the error (model bias) by averaging out the error components. A downside of model averaging is that it has a smoothing (variance-reducing) effect which can remove valid information from the outer ranges of the soil property distribution. There are many cases where the ability to predict these 'extreme' values is crucial, for example, at the smallholder farm scale where the risk or cost associated with incorrectly identifying soil constraints can be high for cash-constrained 540 farmers. Depending on the soil management scenario, there are financial costs associated with both false negative results (i.e., failure to detect constraint -Type 2 error; e.g., failure to lime a very strongly acidic soil before applying fertilizer) and false positive results (i.e., false detection of constraint -Type 1 error; e.g., applying lime to a neutral soil). While the mean or median predicted soil values may not indicate the presence of soil constraints, spatial predictions of model uncertainty can be used to determine where constraints have a predicted probability of occurrence. Future research is needed to evaluate 545 information on soil map uncertainty and how this information can be effectively communicated and incorporated into smallholder agronomic decision making.
The increasing availability of higher spatial resolution environmental covariates has led to expanded efforts to produce finer spatial resolution soil predictions. However, the relationship between each soil property or class and the covariate data can be scale-dependent, meaning that the spatial scale (i.e., grid resolution and spatial extent) at which a covariate is calculated can 550 affect the strength of its relationship to the modelled property. Thus, higher spatial resolution covariates do not always translate to more accurate fine-scale model predictions, and in some cases model accuracy may decrease due to the scale-dependency of the predictor-covariate relationships. Several studies have demonstrated these spatial scaling effects for terrain attributes, where the highest model accuracies did not correspond to the terrain attributes calculated at the finest spatial scales (Kim and Zheng, 2011;Maynard and Johnson, 2014;Roecker and Thompson, 2010). 555 Growing recognition of the need for site-specific soil data has prompted efforts to produce finer spatial resolution soil data from existing soil maps. For example, disaggregation techniques are being used to delineate the location of soil map unit components within conventional soil maps (Häring et al., 2012;Nauman and Thompson, 2014;Vincent et al., 2018) and predictive soil maps are using higher spatial resolution covariates to make higher spatial resolution predictions (e.g., iSDA).
However, it is important to recognize that the primary data (e.g., map unit polygons, point data), metadata, and inherent 560 decisions made at the original soil mapping scale remain determinate, where those original biases persist across scales. These initial biases can be compensated for through targeted sampling to expand and refine the model inference space (Soil Survey Division Staff, 2017). For example, Stumpf et al. (2017) used model uncertainty to guide additional sampling efforts for model refinement, while other studies have used additional sampling to refine regional-to-continental scale soil map predictions within a localized area for farm-scale applications Söderström et al., 2017). 565

Sources of map error: temporal uncertainty
Current soil maps provide models of soil spatial variation that ignore temporal changes in soil properties. Conventional soil maps are created over years-to-decades and populated with soil data that represents the modal concept of soil types. Predictive soil maps use soil profile datasets collected over multiple decades which are correlated to environmental covariates that often represent either a single point in time or some aggregate value calculated from a fixed time interval. If significant soil 570 degradation occurs sometime after the soil profile data was collected, the covariate data at that site may no longer correspond to the original soil property values. This can weaken or introduce confusion into the modelled relationship between soil property data and environmental covariates, which in turn can negatively impact the accuracy of model predictions (Owusu et al., 2020). Our evaluation of soil map accuracy in this study focused on static soil properties, which in theory should provide more accurate map estimates relative to measures of soil health (e.g., soil nutrients) that change in response to land use and 575 management over short time scales (i.e., years-to-decades). However, in high erosional or depositional environments these static properties can also change over relatively short time scales, which may have contributed to the low map accuracy for these static soil properties in this study. Furthermore, these results suggests that efforts to map more dynamic properties using either conventional or predictive mapping approaches would likely produce estimates with even greater uncertainty.

Implications for site-specific soil management 580
To close existing yield gaps smallholder farmers must identify the factors that constrain productivity. Many yield limiting factors are directly or indirectly soil-related, including nutrient deficiencies, susceptibility to drought, soil compaction, waterlogging, high erosion risk, etc. Obtaining accurate site-specific soil data is a first step towards uncovering soil-based limitations and implementing management practices that can mitigate these production constraints. Current web-based soil maps of Ghana fail to meet the accuracy requirements for site-specific farm management or even for farm-level land use 585 planning. Field-based texture assessments like the ones used in this study, coupled with ongoing advancements in soil mapping and on-site verification technologies like proximal sensors (Piikki et al., 2016;Viscarra Rossel et al., 2011) and smartphonebased decision support tools (Herrick et al., 2013;O'Geen et al., 2017), can help constrain the uncertainty associated with sitebased soil map predictions.
There is a need for improved technologies that can assist farmers in identifying their soil characteristics, and matching those 590 characteristics to appropriate inputs and technologies that can enhance the long-term production capacity of their soils (Berkhout et al., 2015). A useful conceptual model employed by conventional soil maps is the grouping of soils into soil types based on both field-described morphology data and laboratory analysis. Soil types convey information on the general range of soil behavior a land manager can expect in response to management actions and disturbance effects. Through identifying the soil type at a location, smallholder farmers can gain a better understanding of potential soil limitations and the most appropriate 595 management strategies for improving soil health and crop yields. The concept for each soil type is based on a set of reference soils which define the representative soil property distributions for each soil type. Soil types that have been intensively managed over long periods of time, however, can deviate significantly from these representative property ranges. Thus, in addition to understanding the soil type, information on a site's management history and current resource allocation are needed to better assess general soil health and possible soil-related limitations. 600 To make soil information actionable for smallholder farmers, soil information needs to be contextualized for their intended land use. For example, a maize farmer needs to know how their soil texture, rock fragment content, and soil depth will affect crop growth. Based on their soil type and management history, farmers also need to know what type of fertilizer to apply, in what amount, and when to apply it for optimal crop uptake. This study demonstrated how downscaling the GAEZ soil suitability framework provides a way to interpret site-specific soil information for crop-specific soil management. While this 605 study applied a modified version of GAEZ based on static soil properties, this approach could also incorporate dynamic soil properties that influence soil nutrient availability as well as other soil quality indices.

Conclusions
Many agronomic constraints are directly or indirectly soil-related, and therefore accurate site-specific soil information is needed to address these limitations. Technological advancements are facilitating the creation of soil maps at high spatial 610 resolutions which impart an implied level of precision. The accuracy, and thus utility, of these maps for applications at large spatial scales is often unknown. This study evaluated both the relative and functional accuracy of four publicly available webbased soil maps of Ghana and found that in most cases these map products are not accurate enough to inform site-specific soil management. We found that overall accuracies for soil texture and rock fragments predictions ranged from 9-35% and 26-59%, respectively. When accounting for class adjacency, overall accuracies increased, ranging from 39-85% and 73-93% for 615 soil texture and rock fragments, respectively. Traditional measures of map accuracy, however, can be misleading since small differences in soil property values, while technically different, may be functionally similar. To account for this, we used a modified version of the GAEZ soil suitability framework to evaluate the functional accuracy of the soil map predictions. This functional assessment confirmed the results from the standard accuracy assessment, with overall accuracies for soil suitability classes ranging from 15-61%. Results from this study highlight the variable site-specific accuracy of current soil map 620 information and the potential implications for on-farm decision making. The urgent need for reliable soil information, that is, information with a specified accuracy and precision for a targeted objective (e.g., attainable crop yield), has become increasing clear and many areas of research are being advanced to address this global challenge. Among these is the continued improvement of soil maps, particularly through the advancement of predictive soil mapping technologies, including improved predictive algorithms, expanded soil training/testing datasets, and advancements in the quantification of model uncertainty. 625 For example, both predictive soil maps evaluated in this study (SoilGrids, iSDA) provide uncertainty maps and future work is needed to utilize this information for large-scale site-specific analysis. There is also a need for the training of more soil scientists to expand the characterization and sampling of soil landscapes with high information uncertainty resulting from a lack of ground-truthed samples and/or poorly understood soil landscape relationships. Soil scientists can also assist in the training of non-soil specialists (e.g., field enumerators, citizen scientists), who, with the help of on-site verification 630 technologies like smartphone-based decision support tools (e.g., LandPKS) and proximal sensors (e.g., VisNIR), can collect high quality field-based soil data. This information can then be used both directly to inform site-specific decision making (e.g., smallholder fertilizer application rates), as well as to improve soil map predictions, as demonstrated by the iSDA soil map which used LandPKS field data to generate model predictions. All these advancements can help constrain the uncertainty associated with site-based soil map predictions and help provide access to accurate soil property information so urgently needed 635 by smallholder farmers to improve soil health and enhance the long-term production capacity of their soils.