Digital soil mapping of lithium in Australia

. With a higher demand for lithium (Li), a better understanding of its concentration and spatial distribution is important to delineate potential anomalous areas. This study uses a digital soil mapping framework to combine data from recent geochemical surveys and environmental covariates to predict and map Li content across the 7.6 million km2 area of 15 Australia. Soil samples were collected by the National Geochemical Survey of Australia at a total of 1315 sites, with both top (0–10 cm depth) and bottom (on average 60–80 cm depth) catchment outlet sediments sampled. We developed 50 bootstrap models using a Cubist regression tree algorithm for both depths. The spatial prediction models were validated on an independent Northern Australia Geochemical Survey dataset, showing a good prediction with a root mean square error of 3.82 mg kg-1 (which is 50.9 % of the inter-quartile range) for the top depth. The model for the bottom depth has yet to be 20 validated. The variables of importance for the models indicated that the first three Landsat 30+ Barest Earth bands (blue, green, red) and gamma radiometric dose have a strong impact on Li prediction. The bootstrapped models were then used to generate digital soil Li prediction maps for both depths, which could select and delineate areas with anomalously high Li concentrations in the regolith. The map shows high Li concentration around existing mines and other potentially anomalous Li areas. The same mapping principles can potentially be applied to other elements. The Li geochemical data for calibration


Introduction
Minerals have become essential commodities in modern human society.Many minerals are fundamental to technological and industrial advancement, particularly those utilised in renewable energy systems, electric vehicles, consumer electronics and telecommunications (Kabata-Pendias, 2010).These minerals can be considered critical, in the sense that they are of high importance and have a high risk of supply disruption.Methods for quantifying mineral criticality are discussed in detail in Graedel et al. (2012).
Lithium (Li) is an important chemical element as the world transitions towards a lower-carbon economy.It has been listed as one of the critical elements by various countries, including Australia, Canada, the European Union, Japan, the Republic of Korea and the United States (Mudd et al., 2018;D. Huston, Geoscience Australia, pers. comm. March 2022).Australia is endowed with significant resources of many of the critical elements and the critical minerals hosting them, including Li.
Currently, Australia's ranking for economic resource of Lithium was the second, but it ranked the first for its production (Senior, 2022), with potential of additional discoveries.According to recent survey (Senior, 2022), Australia produced 40 kilotons (kt) of Li (in terms of spodumene, Li2O.Al2O3.4SiO2,concentrates; assuming 6% of Li 2 O in spodumene concentrates) in 2020, or 49% of the global production; a significant increase from 21.3 kt of Li in 2017 (Champion, 2019).
The two primary sources for Li are brine stores and mineral deposits, where Li is hosted mainly spodumene (LiAlSi 2 O 6 ).A 2013 investigation by Geoscience Australia found that the potential of Li-rich salt lakes in Australia was relatively low in comparison to those, for instance, in the Americas (Jaireth et al., 2013;Mernagh et al., 2015;Mernagh et al., 2016).Most of the Li in Australia exists as mineral deposits (Champion, 2019).Despite Australia's current position as the world's leading supplier of Li, it has limited prospects for immediate expansion as the potential for spodumene similar deposits in Australia has not yet been fully investigated (Mudd et al., 2018).This study aims to contribute to filling this knowledge gap by providing the first digital map of Li content of Australian soils.
Lithium is found in trace amounts in all soil types, primarily in the clay fraction, with slightly smaller concentrations in the organic soil fraction (Kabata-Pendias, 2010).Possible means by which Li is bound to clay was explained elsewhere (Starkey, 1982).A typical background concentration of Li in the soil ranges from 7 -200 mg kg -1 (Schrauzer, 2002).In New Zealand, a study of Li concentration in soil reported a range between 0.08 -92 mg kg -1 (Robinson et al., 2018).While in southwestern Siberia, Gopp et al. (2018)  from 0.24 -0.68 mg kg -1 .The amount of soil-available Li is usually relatively low, about 3 -5% of the total Li content in the surface layers (Gopp et al., 2018;Anderson et al., 1988).De Caritat and Reimann (2012)  Higher concentrations of Li are often found in the deeper layers of soil profiles (Merian and Clarkson, 1991) because, typically, Li enters the soil column through the weathering of sedimentary minerals in the underlying saprolite and bedrock (Aral and Vecchio-Sadus, 2008).Because clay minerals predominantly drive the mineralisation and dissolution of Li, the clay mineral fraction will play a significant role in determining the Li concentration.The Li content of soil is controlled more by the soil formation conditions than by the composition of the parent materials (Kabata-Pendias, 2010).
Mineral exploration aims to find ore deposits for mining purposes.Therefore, delineating target areas for mineral exploration through a series of mapping activities is a crucial initial stage leading to discovery (Carranza, 2011).Mineral prospectivity mapping is a method to quantify the probability of mineralisation in a selected area for mineral exploration purposes.This prioritisation allows for the exploration of smaller, higher-potential areas for detailed prospecting to minimise exploration costs, e.g., the number of drillholes.
Two common paradigms for creating mineral prospectivity maps are knowledge-driven and data-driven models (Carranza, 2011).Knowledge-driven models do not require any data on mineral deposits, but rely on expert knowledge of spatial associations between mineral deposits and geological features, field experience and conceptual models to develop evidential maps that enables the discovery of mineral deposit (Carranza, 2008).Meanwhile, data-driven models utilise existing knowledge on the location of mineral occurrences, various survey data and spatial statistical methods to represent the likelihood of mineral occurrence within prospective areas (Carranza, 2008).With the development of machine learning and technology (computer hardware, software and geographic information system (GIS) technology), there have been growing applications of mineral prospectivity mapping in the recent decades (Carranza, 2011;Zuo, 2020).
Several studies have demonstrated the use of remote sensing to explore various deposit types, such as gold deposits (Cŕosta et al., 2003), copper deposits (Pour and Hashim, 2015) and iron ores (Ducart et al., 2016).Recently, the application of remote sensing for Li deposits has also emerged.Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER) images were used to map Li content in the Vale do Jequitinhonha region of Brazil (Perrotta et al., 2005).The factors are measured or approximated from various data types, including point observations, maps (polygons), existing data, and remote sensing data and derivatives thereof (e.g., gradients, buffer distances, etc.); these can be numerical or categorical data types.
In this study, we attempt to model Li distribution in the surface and subsurface soils of Australia by invoking the NGSA soil geochemistry dataset and various environmental covariates commonly used in DSM related to soil formation in Australia.In detail, the objectives of this study are thus to: (i) evaluate the use of digital soil mapping framework to predict Li concentration in Australian soils, and (ii) delineate anomalous areas potentially attractive for Li exploration and discuss their interpretations.

Li measurement
This study used two soil datasets, referred to as the calibration and validation datasets.The calibration dataset was used to build the spatial prediction model and the validation dataset was used to test the prediction quality of the calibrated model.
The calibration dataset data were generated as part of the NGSA project (www.ga.gov.au/ngsa), a collaborative project between Geoscience Australia and the States/NT between 2007 -2011, which aimed to document the soil geochemical concentration levels and patterns across Australia.Details on the project, analysis, sampling methods and the measurement of other parameters can be found in De Caritat and Cooper (2011a) and De Caritat and Cooper (2016).
The NGSA collected samples at 1315 sites (including field duplicates) at or near the outlet of large catchments with a total area coverage of 6.17 million km 2 and an average sampling density of 1 site for every 5200 km 2 (De Caritat and Cooper, 2011a).The target sampling medium was floodplain sediments away from river channels, though in various places in Australia, an aeolian input can be important; thus, the medium was called 'catchment outlet sediment' rather than floodplain sediment.These geomorphological entities are typically vegetated and biologically active (plants, worms, ants, etc.), thereby making the collected materials true soils (e.g., Sssa, 2022), albeit soils developed on transported alluvium.Due to limitations in access, samples from some parts of South Australia and Western Australia could not be obtained.
Samples were collected from two depths, namely 'top outlet sediment' (TOS) from 0 -10 cm depth, and 'bottom outlet sediment' (BOS) from, on average, 60-80 cm depth.All of the soil samples were air-dried, homogenised and dry sieved to <2 mm and <75 µm prior to various analyses for 60+ elements (see De Caritat et al. (2009) and De Caritat et al. (2010), for a full description of the NGSA sample preparation and analytical methods, respectively).A detailed quality assessment of the NGSA data is given in De Caritat and Cooper (2011b); for Li after aqua regia digestion, analytical precision (repeat analysis of certified reference material Till-1) of 12% and overall precision (based on field duplicates) of 39% were reported, whilst accuracy could not be determined for lack of certified aqua regia Li data for Till-1.In this contribution, we use Li concentration after aqua regia digestion data for the NGSA <2 mm TOS and BOS samples analysed by inductively coupled plasma mass-spectrometry (ICP-MS) in a commercial laboratory.Any Li measurements that fell below the detection limit (0.1 mg kg -1 ) were replaced with half the detection limit (0.05 mg kg -1 ).The distribution of sampling sites and the concentration levels of Li are shown in Figure 1.
As an independent validation dataset, we used the geochemical dataset from the Northern Australia Geochemical Survey (NAGS) project (Main et al., 2019).This dataset contains 773 observations located in the Tennant Creek -Mt Isa region in the Northern Territory and Queensland, with an approximate sampling density of one sample every 500 km 2 and collected in 2017.The distribution of these samples is also shown in Figure 1.These samples were collected, prepared and analysed following the NGSA protocols (De Caritat and Cooper, 2011a), albeit at a higher sampling density.However, only TOS samples were collected in NAGS.Furthermore, these samples were collected at different times and /or laboratories.To address the analytical variation that could potentially arise, levelling method were utilized using the standards Certified Reference Materials (Main and Champion, 2022).In short, a correction factor based on the CRM measurements from the two datasets is calculated and applied as multiplier to relevel the data.

Environmental covariates
A total of 19 environmental covariates (Table 1) characterising factors of climate, parent material, soil, and topography, that contributes to soil formation were considered in this study.
The first factor is climate.Water and temperature affect the rate of mineral weathering and thus soil formation.Hence, we included precipitation, evaporation and temperature data (Harwood, 2019), along with the topographic wetness index (TWI) data (Gallant and Austin, 2012a), informing the relative wetness within a landscape.In short, the TWI was derived from contributing area product, which was computed from Hydrologically enforced Digital Elevation Model, and from the percent slope product, which was computed from the Smoothed Digital Elevation Model (Gallant and Austin, 2012a).
The second factor is parent material (i.e.degree of weathering and mineralogical composition), including gamma-ray radiometric and total magnetic intensity.Gamma-ray radiometric surveys provide estimates for the concentrations of gamma-ray-emitting radioelements K, U and Th at/near the soil surface.The gamma-ray radiometric data was measured from airborne surveys throughout most of Australia (Poudjom Djomani et al. (2019).In this study we used a complete gamma-ray survey grid where gaps in the airborne coverage were filled in using covariate machine learning (Wilford and Kroll, 2020).Gamma-ray radiometric data have been found to be a useful covariate in identifying surface processes such as sediment transport and weathering (Wilford, 2012;Wilford et al., 1997) and detecting radioactive minerals deposits and occurrences (Alhumimidi et al., 2021;Wilford et al., 2009;Dickson et al., 1996;Dickson and Scott, 1997).Total magnetic intensity (TMI), which measures variations in the Earth's magnetic field intensity caused by the contrasting content of  The third factor is the soil itself, particularly, the relevant physical soil properties.Previous studies, e.g. by Kabata-Pendias (1995) and Robinson et al. (2018), highlighted the high correlation between Li and clay content of soil, soil texture was used as a covariate.The soil texture spatial information (sand and clay contents) was derived from Malone and Searle (2021), which contained updated information on soil texture map across Australia derived using a digital soil mapping approach.The sand and clay fractions were developed by integrating field morphological (n = 180,498) and laboratory measurements soil fractions (n = 17,367) from the Soil and Landscape Grid of Australia (SLGA).The SLGA is based on a comprehensive dataset of soil attributes across Australia including the NGSA dataset.These sand and clay content of Malone and Searle (2021) were for specific depth intervals of 0-5 cm, 5-15 cm, 15-30 cm, 30-60 cm, 60-100 cm, and 100-200 cm.They were converted to the depths corresponding to the NGSA Li measurement (0-10 cm and 60-80 cm) using the mass-preserving spline function, described in Bishop et al. (1999) and modified by Malone et al. (2009).Soil reflectance in the visible, nearinfrared (NIR), and short-wave-infrared (SWIR) spectra captured by remote sensing images provides information on soil composition.However, the unprocessed images consist of a mixture of soil, bedrock, vegetation and clouds.By removing the influence of vegetation, Roberts et al. (2019) were able to document the 'barest' state of soil, so critical in mapping the characteristics of soil and rock.This was done by combining Landsat 5, 7, and 8 observations of the past 30 years to remove the contamination by vegetation, cloud cover, shadows, detector saturation and pixel saturation.The model used to develop Barest Earth was validated using the NGSA spectral archive (Lau et al., 2016).
Finally, topography is represented by elevation and slope.These factors also play an important role, as they affect how water is added to and/or lost from soil.The elevation was derived from the smoothed Digital Elevation Model (DEM-S) which was obtained from the 1 arc-second resolution Shuttle Radar Topography Mission (SRTM) data acquired by NASA in February 2000 (Gallant, 2011).The slope covariate was also calculated from DEM-S using the finite difference method (Wilson and Gallant, 2000).The different spacing in the E-W and N-S directions due to the geographic projection of the data was accounted for by using the actual spacing in metres of the grid points calculated from the latitude.
All covariates were reprojected to EPSG:3577 (GDA94 datum; Australian Albers equal area projection) and resampled to a common spatial resolution of 3 km prior to any analysis.All the environmental covariates used are shown in Table 1.

Modelling
Here we used a machine learning model Cubist to relate soil observations to the environmental covariates.Cubist is a treebased regression algorithm based on the M5 theory (Quinlan, 1993).This algorithm creates partitions of data with similar spectral characteristics and creates one or more rules for each partition.If the partition rules are satisfied, then the linear regression of that partition is used to create the prediction (Eq.1).Each rule can be defined as: The Cubist model has two tuning parameters: committees (number of sequential models included in the ensemble) and neighbours (number of training instances that are used to adjust the model-based prediction).A full combination of committees (5,10,20,30,40,50) and neighbours (0, 1, 5, 9) were tested to tune the Cubist model.To obtain the best estimates of optimum parameters, a 10-fold cross-validation approach was utilised.Based on the optimum parameters, 50 bootstrap models ('sampling with replacement') were trained.
The performances of the prediction models were then evaluated on both internal evaluation and on the independent validation dataset.An internal evaluation of the model was conducted using "out of bag" samples, which were not used during the development of the bootstrap models.The NAGS dataset was used to evaluate the performance on the independent dataset (top depth only).The following metrics were used: adjusted coefficient of determination (R 2 adj ), Lin's concordance correlation coefficient (LCCC), root mean square error (RMSE), bias, and ratio of performance to interquartile distance (RPIQ).R 2 adj is a measure of the linear association between observed and predicted values; LCCC measures the agreement between the observed and predicted values in relation to the 1:1 line; RMSE is a measure of the differences between the observed and predicted values; bias is the measure of the difference between the mean of the observed and the mean of the predicted values; and RPIQ is a measure of performance that takes into account the distribution of the values, and can be calculated as a fraction of the interquartile range of the observed values (Q 3 -Q 1 ) and the RMSE (RPIQ = (Q 3 -Q-1 )/RMSE) (Bellon-Maurel et al., 2010).Variable importance analysis was also conducted to evaluate the contributions of each covariate in the Li prediction.The relative variable importance is measured as the percentage of times the environmental covariate is either used as a condition or a rule within the Cubist model.These bootstrap models were then used to generate output maps with the same extent and resolution.The final map output was derived based on the mean prediction of the bootstrap models; similarly, the standard deviation was obtained based on the standard deviation of the prediction from the bootstrap models.

Data processing and statistical computing
All the data analytics, modelling, and mapping procedures in this study were conducted in R statistical open-source software (R Core Team, 2021).Besides the base R functionality, the R packages used in this study included "Cubist" (Kuhn and Quinlan, 2021) for fitting cubist models; "caret" (Kuhn, 2021) for tuning the hyperparameter of the Cubist model; and

Descriptive analysis
The distribution of Li concentrations (NGSA dataset, De Caritat and Cooper, 2011a) was positively skewed (Figure 2) with concentrations ranging from 0.1 -67.4 and 0.1 -56 mg kg -1 , for TOS and BOS respectively.Only limited observations above 20 mg kg -1 of Li concentrations were found in this study for both TOS (n = 76) and BOS (n = 95).The median concentration of TOS (5.7 mg kg -1 ) was slightly lower than that of BOS (7.0 mg kg -1 ).These concentrations were lower than those observed in Négrel et al. (2019) for mean Li concentration in European soil at 11.3 mg kg -1 , and across the background concentrations of Li in the world (7 -200 mg kg -1 ), according to Schrauzer ( 2002).The Li concentration at TOS was strongly correlated with BOS (r = 0.75, p <0.0001).
[Figure 2] Based on the data collected by the NGSA project, the highest concentration of Li for both TOS and BOS was found in northernmost Queensland (Cape York Peninsula), as shown in Figure 1.Other regions that have significant quantities of Li were located in the Goldfields-Esperance region in Western Australia, which has been recognised as one of the most resource-rich areas on the planet (Champion, 2019), and the region around the Victoria-New South Wales border (Figure 1).Some of the findings correlate well with the existing mine sites in Australia (red triangles in Figure 1).Yilgarn Craton, and Pilgangoora in the Pilbara Craton (Champion, 2019).In July 2019, Strategic Metals Australia (SMA) found a new Li exploration target near Cairns, in the Georgetown province of north Queensland (Gluyas, 2019).However, this discovery has not been updated in the data collected by Geoscience Australia because considerable work such as drilling, modelling, resource calculation and feasibility studies are needed to bring the discovery to the feasibility stage.

Correlation between Li with other measured soil properties
Despite other studies (Robinson et al., 2018;Kashin, 2019) reporting strong correlations between Li and Mg, and other elements elsewhere, including Al, B, Fe, K, Mn and Zn, the NGSA data only show strong correlations between Li and Al (Pearson's correlation coefficient r = 0.74), Ga (r = 0.69), Cs (r = 0.68), and Rb (r = 0.66) for TOS, and slightly lower correlation for BOS: Al ( r= 0.69), Ga (r = 0.64), Cs (r = 0.62), Rb (r = 0.61).Correlations between Li with K and Mg were only moderate for both TOS (r = 0.48 and 0.43) and BOS (r = 0.46 and 0.33).Similarly, Foregs ( 2006) also observed good correlations (r > 0.4) of Li with Al, Ga and Rb within the floodplain sediment samples.
The Li concentration in soil was negatively correlated with measured sand content from the NGSA dataset (r = -0.55),and positively correlated with clay content (r = 0.44).This is consistent with the findings of Kabata-Pendias ( 2010) and Robinson et al. (2018), who noted the tendency of clay minerals to concentrate Li.It has been suggested that Li may be located internally within clay minerals, mainly kaolinite, illites, smectites including hectorite, palygorskite and sepiolites, in ditrigonal cavities via isomorphous substitution, rather than on exchange sites (Anderson et al., 1988;Starkey, 1982).

Correlation with environmental covariates
Overall, the correlation of Li concentration with the environmental covariates was relatively low (Figure 3).The correlation with sand and clay content derived from digital soil maps was lower in comparison to the measured (NGSA) values discussed above, with r = -0.28 and 0.25, respectively, for TOS; and r = -0.23 and 0.22, respectively, for BOS.
For gamma-ray radiometric data, total dose and K content had correlations with Li of r = 0.10 to 0.14.These positive correlations are expected as the associations of Li deposits and felsic rocks (high in both total dose and K) due to the observed incompatibility in mineral structures (Benson et al., 2017).Precipitation has positive correlation (r = 0.12), while both temperature and elevation had negative correlations (r = -0.12)with Li content.Topographic variables such as slope had negligible correlation with Li content (r = 0.05).
For BOS, similar observations on the correlations between Li content and environmental covariates were found, except for the following differences.Landsat bands 3, 5 and 6 had stronger negative correlations (r = -0.14 to -0.16) compared to other bands.Potassium (r = 0.10) and dose (r = 0.09) had higher correlation with Li compared to the Th/K ratio (r = -0.02).Both temperature (r = -0.11)and elevation (r = -0.08)had negative correlations with Li content, while slope (r = 0.05) and precipitation (r = 0.08) had low positive correlations.

Model evaluation
The final Cubist model is tuned with committees of 20 and neighbours of 9, which resulted in the lowest value of RMSE compared to the other combinations of hyperparameters, indicating an optimised Cubist model.  1 for abbreviations.

Internal evaluation
Validation statistics based on internal evaluation using the out-of-bag data for the Li predictions are presented in Table 2.
Higher accuracy was observed in TOS (R 2 adj = 0.20; LCCC = 0.36) compared to BOS (R 2 adj = 0.12; LCCC = 0.29).There was a slightly lower RMSE on the prediction for TOS (RMSE = 6.29 mg kg -1 ) compared to BOS (RMSE = 7.28 mg kg -1 ).This is expected as most of the environmental covariates reflected soil surface conditions.To the best of our knowledge, the machine learning models developed in most mineral exploration studies were assessed based on classification accuracy, (i.e.

TOS BOS
presence or absence of specific minerals in sample) instead of regression accuracy (Jooshaki et al., 2021).In addition, remote sensing studies on mapping Li minerals are rarely validated (e.g.Cardoso-Fernandes et al. ( 2019)).Hence, no comparison can be made with other studies.

Independent validation dataset
The predictive model performance was also evaluated using an independent dataset (NAGS, TOS only) that was not part of the calibration dataset.Upon relevelling, the median Li concentration of this validation dataset (3.46 mg kg -1 ) was lower than that observed in the calibration dataset (5.7 mg kg -1 ), with a range of values between 0.1 to 22.5 mg kg -1 .A comparison of the subset of NGSA within the extent of the NAGS dataset also showed similar result, with slightly higher concentrations observed within the local NGSA dataset, which ranges between 0.1 and 28.7 mg kg -1 and has a median of 4.1 mg kg -1 .
However, the samples from the two datasets were deemed to have similar distribution with the two-sample Kolmogorov-Smirnov test (D = 0.18, p-value = 0.012).
We reported the performance of model validation the same way the model evaluation was conducted (Table 2; Figure 4).
The model validation resulted in higher accuracy (R 2 = 0.44; LCCC = 0.59).The RMSE was also slightly lower than those observed in the TOS model evaluation, most likely due to lower observation values within the NAGS validation dataset.The model overestimated the concentration with a mean error of 1.46 mg kg -1 .

Variable importance analysis
From the Cubist model, we can infer the relative importance of the covariates by calculating the percentage of times a covariate is being used in the model.The variables used by Cubist model can be further split in terms of "importance in the conditions" and "frequency of predictor usage in models".
For Li prediction in TOS, the variables TWI and Landsat bands 2, 3 and 4 are of higher importance in the conditions than other variables (Figure 5).This implies that the model separates out prediction values based on its spectral response of vegetation and Fe-bearing minerals related to Landsat bands 2 to 4 and the wetness index.However, within the regression models, the top five variables most frequently used in the regression were the Landsat band 2, band 6, band 1, band 3 and gamma-ray radiometric total dose.The first three Landsat bands (red, green, and blue) and band 6 (SWIR2) have been commonly used to predict soil properties and delineate geological boundaries, as well as discriminate and differentiate vegetation zones (Khorram et al., 2012).While the gamma radiometric dose discriminated the various soil types and their mineral makeup.The next set of covariates was annual precipitation and clay and sand content which bound the Li in the soil, indicating they have lower importance as predictors.As indicated in the correlation analysis, slope was not significant.
For the BOS model, Th/K had the highest importance in the conditions of the model (Figure 5), separating high and low values, but it does not affect the regression.Landsat bands 2 and 6, and temperature range also affect model conditions.
Overall, parameters that influenced the BOS regression model were similar to those for TOS, i.e. the top-five are Landsat bands 2, 1, 6, and 3, and gamma radiometric dose.In the BOS model, however, there was a higher importance of the clay content compared to the TOS model.Again, temperature and slope were of low importance.

Li prediction maps
The Cubist model led to the generation of spatial predictions of Li concentration in fluvial sediment-derived soils across Australia at two depths (Figure 6).So far, there are only five known Li mines in Australia (mostly in Western Australia), all of which are located within areas that were predicted by the model developed here to have a higher concentration of soil Li, especially for the BOS model (>~8 mg kg -1 ) (Figure 7).
[Figure 6] [Figure 7]    The highest predicted values on the Li digital soil maps are 28 mg kg -1 and 22 mg kg -1 in TOS and BOS, respectively.Although higher Li concentration was expected to be observed in the deeper layer, the model used in this study was not able to support such predictions yet.This is most likely because the covariates used within the model represent observations from TOS instead of BOS.The variance of covariates within BOS was not obtained, and hence yielding lower accuracy predictions.

Study limitations
While we have successfully modelled soil Li distribution in Australia and validated it using an independent sample dataset, we recognise that there are limitations to this study's approach.
(1) The NGSA data used apply to catchment outlet sediment representing the local accumulation of mainly detrital minerals.Therefore, strictly speaking, the predictions developed herein apply only to similar alluvial soils.(2) Despite the large amount and spread of data, the NGSA does not cover the whole of Australia.Notably, there is a data gap in parts of Western Australia and South Australia.

Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
reported median Li concentrations (after aqua regia digestion) of 12 and 5.7 mg kg -1 in European agricultural topsoils and Australian surface sediments, respectively, both in the coarse (< 2 mm) fraction.Subsequently, Reimann and De Caritat (2017, Fig.2SM) published the first continental map of Li in Australian soils, based on National Geochemical Survey of Australia (NGSA) data, showing that regions of high and low concentrations are found across all states.
formation.The DSM framework is derived from the conceptual model developed by Mcbratney et al. (2003) in which a certain soil attribute results from the interaction of soil-forming factors.These factors are modified from Jenny (1941) and include soil (s), climate (c), organisms (o), relief (r), parent material (p), age/time (a) and spatial position (n), or scorpan.

Figure 1 .
Figure 1.Distribution of sampling sites from the National Geochemical Survey of Australia (NGSA, black circles) for both depths: top outlet sediment (TOS) 0-10 cm, top; and bottom outlet sediment (BOS) ~60-80 cm, bottom.Distribution of sampling sites from the Northern Australia Geochemical Survey (NAGS, blue plus signs) for TOS only, top.All data refer to the coarse fractions (<2mm).Aqua regia-soluble Li concentrations (mg kg -1 ) are categorised in five quantile classes.Regions discussed in the text are highlighted in various shades of green.Projection: Australian Albers Equal Area (EPSG:3577).Data sources: De Caritat and Cooper (2011a), Hughes (2020).

Figure 4 .
Figure 4. Goodness-of-fit plot showing observed vs predicted Li concentrations based on the independent validation dataset (NAGS, TOS only).Red dashed line is the 1:1 line.

Figure 5 .
Figure 5. Variable importance of covariates in terms of importance as conditions (red dotted lines) and frequency of predictor usage (grey lines) by the Cubist algorithm for both NGSA depths: top outlet sediment (TOS) 0-10 cm, left; and bottom outlet sediment (BOS) ~60-80 cm, right.Covariates are sorted in order of decreasing frequency of usage.

Figure 7 .
Figure 7. Distribution of Li mines in the digital soil map of Li in Australia for bottom outlet sediment (BOS) ~60-80 cm depth.In Australia, the largest producer of spodumene is the Greenbushes Li operation, approximately 250 km south-southeast of Perth.In the most recent public report, the company reported combined measured and indicated resources of 118.4 million tons (Mt) of 2.4% Li 2 O containing proved and probable reserves of 61.5 Mt grading 2.8% Li 2 O(Champion, 2019).Other locations explored for Li include Mount Cattlin and Mount Marion in the Goldfields-Esperance region, and Pilgangoora of 370

Figure 8 .
Figure 8. Boxplots of lithium concentration across various soil orders based on the Australian Soil Classification (ASC) system.Red dashed line represented the median values of Lithium across both TOS and BOS depth.
(3) The environmental covariates used in the study were selected based on our understanding of relevant soil-forming processes.(4) There is also limited information on how the covariates vary with depth, except for the soil texture (sand and clay content) data.The inclusion of more environmental covariates related to depth and geological information may improve the predictive capability of these machine learning models.The final product was only validated in one area within Australia (Tennant Creek -Mt Isa region in the Northern Territory and Queensland).Despite our predictions of elevated soil Li in parts of Queensland, New South Wales and Victoria, ground-truthing is required to confirm them and further work is necessary to determine the origin of the contained Li.
https://doi.org/10.5194/essd-2022-418Preprint.Discussion started: 13 January 2023 c Author(s) 2023.CC BY 4.0 License. 4 Conclusions Spatial prediction models have been increasingly utilised to help minimise risk and thus cost of mineral exploration.In this study, digital soil mapping for Li concentrations at two different depths (TOS: 0-10 cm, BOS: ~60-80 cm) based on the Cubist model was carried out across Australia using the National Geochemical Survey of Australia data and publicly available environmental covariates.Geology and mineralogy are of high importance in predicting soil Li anomalies, as demonstrated by the reliance of the model on the Landsat and gamma-ray radiometric covariates.Despite most mineral exploration for Li being conducted in Western Australia, other regions (such as Queensland, New South Wales and Victoria) have elevated predicted Li concentration and could become potential areas of interest with anomalous Li concentration.The model accuracy tested on the independent Northern Australia Geochemical Survey (TOS only) was reasonable compared to the calibration model performance.Overall, the model performance was on the low side and inclusion of the results into a prospectivity framework needs to consider the model uncertainties.This approach provides an estimate of the environmental background concentration of Li which is reflecting a range of processes including source rock geochemistry from which the sediments were derived, weathering (including pedogenic) and geomorphic processes.The work provides a framework to better understand the process (as revealed through the covariate relationships) controlling Li concentration at the surface and the modelling effectively delineates regions with locally higher Li source potential.Future work should include other relevant environmental covariates, which could further improve model performance, ground-truthing of anomalous regions, and investigation of ultimate Li sources.As more survey data are collected, the use of more complex models can also be explored including the use of Li concentrations in bedrock materials.CRediT authorship contribution statement WN: conceptualisation, data curation, analysis, writing -original draft, review and editing.BM: conceptualisation, methodology, writing.AM: conceptualisation, methodology, writing.PdeC: conceptualisation, data curation, writing.JW: conceptualisation, data curation, writing.