Compositional Characterization of Glassy Volcanic Material From VSWIR and MIR Spectra Using Partial Least Squares Regression Models

The glass phase in volcanic rocks presents a challenge to obtaining compositional data from visible and short‐wave‐infrared (VSWIR) and mid‐infrared (MIR) spectral data of remote surfaces due to its amorphous structure and variable composition. Nonetheless, glass is a common phase in volcanic materials because it forms via the rapid quench of magma and can constitute up to the entirety of a volcanic deposit. Use of partial least squares regression (PLS) to predict glass contents creates models that are insensitive to viewing geometry and sample conditions such as grain size and spectrally inactive compositional variables, enhancing the ability to detect glasses with remote sensing. PLS models are used here to predict crystallinity and oxide composition of samples from VSWIR and MIR spectral data using training spectra from natural volcanic rocks and geologically relevant synthetic samples. Three spectral resolutions of VSWIR and MIR spectra (1, 10, and 100 nm/band, and 1.9, 19, and 190 cm−1/band, respectively) were tested to assess the effects of collection configuration on different spectrometers. PLS models trained on 1 nm and 1.9 cm−1 data sets have the lowest uncertainties of glass modal abundance for VSWIR and MIR, respectively. MIR models predicting sample wt. % SiO2 and FeO, and VSWIR models of wt. % FeO provide accurate estimates (e.g., RMSE‐P of 3.4 wt. % FeO) at all spectral resolutions. Results are based on training data sets skewed to mafic compositions, which affects model accuracies.


Introduction
Glass is a common and important phase in volcanic rocks, both in effusive and explosive materials.Formed via the rapid quench of melt during eruption, volcanic glass provides physical and compositional information on the magma storage region, providing key insights into planetary interiors and eruption processes (Anderson, 1973;Hauri, 1996;Saal et al., 2008).In terrestrial mixed-source areas (e.g., sedimentary basins), the presence and abundance of a glass phase can be used to discriminate between volcanic deposits (Carter et al., 1995;Cassidy et al., 2014;Lowe, 2011;Shane, 2000).Glass readily alters to palagonite and smectite clays (e.g., Jakobsson, 1978;Stroncik & Schmincke, 2002), so the preservation of volcanic glasses provides constraints on surface alteration history (Gonçalves et al., 1990;Jakobsson, 1978).
The composition of a glass is not constrained by steric considerations, unlike the composition of minerals.Glass composition can vary widely, by several weight percent, within a single volcanic center over its eruptive history (Eichelberger et al., 2006;Pallister et al., 1996) or within a single eruption.The bulk chemistry of erupted material provides information on petrogenesis of source magmas, as well as their transport, storage, and eruptive dynamics (Gansecki et al., 2019;Soderberg & Wolff, 2023;Stock et al., 2020;Stolper & Walker, 1980).Differences in crystallinity may be used to distinguish compositionally identical lava flows (Crisp et al., 1990) and can provide constraints on pre-eruptive magma storage conditions and eruptive behavior (Cashman & Sparks, 2013;Geist et al., 2021).Differences between glass and bulk composition also provide information on magma recharge and storage, informing hazard mitigation efforts during active eruptions (Gansecki et al., 2019).More broadly, the Fe content of glass in basalts is sensitive to source melt hydration and subsequent plagioclase crystallization (Grove & Brown, 2018).Therefore, it is desirable to develop methods to accurately measure SiO 2 , Na 2 O, K 2 O, and FeO of glasses from spectral data.
Spectral measurements can differentiate glasses and crystalline material of the same composition, particularly in laboratory settings (e.g., Fu et al., 2017;Minitti et al., 2002).However glasses as a phase lack unique, diagnostic spectral features due to their amorphous nature (Dalby & King, 2006;Minitti et al., 2002).Accurate remote characterization of volcanic samples is further complicated by the variable phase assemblages of volcanic deposits.At low modal abundances, the variable composition of volcanic glass makes it difficult to identify spectrally (Adams & McCord, 1971;Cannon et al., 2017;Horgan et al., 2014;McCanta et al., 2015).
This study uses machine learning to explore the use of VSWIR and MIR spectra for identification and characterization of volcanic material remotely and in the laboratory.Partial least squares regression (PLS) models, which have been successfully used to obtain oxide abundances from geologic materials (e.g., Dyar, Carmosino, Tucker, et al., 2012;L. Li, 2006), are here applied to VSWIR and MIR spectra to predict the modal abundance of glass and sample composition at varying spectral resolutions.As part of its focus on methodology for recognizing glass in volcanic samples via remote sensing, this study compares three spectral resolutions to investigate how differences in spectral resolution affect quantification of compositional parameters from VSWIR and MIR spectral data.

Background
Volcanic rocks are primarily classified using a total alkali versus silica diagram (Le Bas et al., 1986).Silica concentration of the melt phase exerts primary control on magma evolution, eruptive behavior, and mineral phase assemblage of the resultant rock (Bottinga & Weill, 1972;Giordano et al., 2004).Low-SiO 2 basaltic magmas have viscosities that are orders of magnitude lower than high-SiO 2 rhyolitic magmas (10 2 P and 10 8 P, respectively; Bottinga & Weill, 1972) and contain significantly lower amounts of dissolved volatile species (e.g., H 2 O, CO 2 , SO 2 , etc.).
In addition to SiO 2 , other elements can be used to further discriminate among magmas and can affect physical properties.Alkalis are network modifiers that modulate silicate structures and resultant magma properties, as well as modulating mineral assemblages (Mysen et al., 1982;Neuville, 2006).Iron is abundant in silicate melts, affecting physical properties such as viscosity and volatile abundance.Fe 3+ acts as a network-forming cation, while Fe 2+ is a network-modifying cation (Mysen et al., 1980;Stabile et al., 2021;Stanley et al., 2011).More broadly, the Fe content of glass in basalts is sensitive to source melt hydration and subsequent plagioclase crystallization (Grove & Brown, 2018).

Glass Spectral Studies and Spectral Detection
The VSWIR region records electron transitions in transition metals and the overtones of vibrational bands (Gaffey et al., 1993;Hunt, 1979).Spectral features from inter-element electron transitions between nearby Fe ions occur below 1 μm (Hunt, 1977).Crystal field transitions of Fe 2+ produces spectral features around 0.9 μm, with the exact shape and location depending on the surrounding crystal lattice.Overtones of fundamental -OH and H 2 O vibrations occur at 1.4, 1.9, and 2.2 μm in spectra from hydrous (and sometimes, anhydrous) minerals and glasses, and from surface adsorption (Clark et al., 1990;Hunt, 1979).The MIR region contains the reststrahlen bands produced by fundamental vibrations and stretching of silicate lattices (Dalby & King, 2006;Lyon, 1965;Ramsey & Christensen, 1998).The Christiansen feature (CF) appears at wavelengths slightly shorter than the fundamental silicate vibration band, where the index of refraction, n, of the surface and surrounding medium are approximately equal, and the extinction coefficient, k, of the surface approaches zero (Salisbury & Walter, 1989).The shapes and positions of the reststrahlen bands and CF are primarily controlled by variations in the Si-O bonds; the position of both spectral features shifts to longer wavenumbers (shorter wavelengths) with decreasing silicate polymerization (Lyon, 1965;Salisbury & Walter, 1989).
Linear deconvolutions provide estimates of mineral and glass abundances using variations on least-squares-fit regressions that fit library spectra to an unknown spectrum (Ramsey & Christensen, 1998;Rogers & Aharonson, 2008).While these techniques are powerful, successful application of linear deconvolution requires broad endmember spectral libraries (Hapke, 1981;Ramsey & Christensen, 1998;Rogers & Aharonson, 2008;Shkuratov et al., 1999).This can limit compositional characterization to endmember constituents and may not provide quantitative assessments of unknown compositions.Linear spectral mixing is expected for large grain sizes relative to the wavelength of light in the MIR, and not at all at VSWIR wavelengths.Conversion of spectra into single-scattering albedo prior to application of the least squares regression may mitigate nonlinear mixing (e.g., Cannon et al., 2017;Y. Liu et al., 2016;Zastrow & Glotch, 2021), but does not provide a total solution.Models to calculate single-scattering albedos are sensitive to particle size as optically small particles-on the order of magnitude or smaller than the wavelength of light being considered-are poorly modeled using spectra from larger grain sizes (Johnson et al., 1992).The inclusion of fine particle sizes as endmember spectra in linear deconvolutions can somewhat overcome this effect (e.g., Y. Liu et al., 2016;Ramsey & Christensen, 1998;Williams & Ramsey, 2022).Spectral mixing is also affected by sample porosity.Surface reflections occur in pore spaces, so increased porosity in fine particle samples can result in photons interacting with multiple particles before exiting the surface, resulting in nonlinear mixing (Mustard & Hays, 1997;Ramsey & Fink, 1999;Salisbury & Eastes, 1985).
Identification of glass based upon band parameters in the VSWIR has been previously studied.Horgan et al. (2014) used mineral mixture-based models to identify several band parameters indicative of the presence of basaltic glass in orbiter-based spectral observations.Rader et al. (2022) used the average reflectance between 0.5 and 1.0 μm to estimate the modal abundance of glass in basaltic lavas.McCanta et al. (2015) used a band parameter developed for andesitic compositions, in conjunction with other techniques (point counting, magnetic susceptibility), to identify cryptotephra units, or non-visible layers of tephra, in ocean drill cores.All of these studies provide suitable methods of glass or tephra detection within their constrained compositional applications.
Machine learning is a potentially powerful tool for quantifying sample phase abundances directly from their spectra, based on successes with other spectroscopic methods where it has been used to characterize sample compositions from spectral observations in reference libraries.One of the oldest regression techniques is partial least squares regression (PLS), which has been successfully used to quantify chemical composition and mineral abundances for geologic materials from LIBS, Raman, and MIR data (Breitenfeld et al., 2018(Breitenfeld et al., , 2021(Breitenfeld et al., , 2022;;Clegg et al., 2009;Dyar, Carmosino, Breves, et al., 2012;Gasda et al., 2021;Hecker et al., 2012;Pan et al., 2015).PLS has also been used on VSWIR spectra of lunar soil replicates and Venus-analog spectra to successfully quantify Fe abundances (Dyar et al., 2020;L. Li, 2006).PLS has been of limited usefulness for quantifying phase assemblages using VSWIR spectra (L.Li, 2006;S. Li et al., 2012;B. Liu et al., 2014), but has been used more successfully for this purpose in the MIR (Breitenfeld et al., 2021;Hecker et al., 2012;Pan et al., 2015).

Spectral Band Number, Placement, and Resolution
Any method for recognizing the contribution of glass to remote-sensed spectra must be robust enough to be useful for data acquired using multiple types of instruments.Instruments may collect over the same spectral range, but have varying numbers of spectral bands, band placement, and spectral resolutions (wavelengths a band is sensitive to).Spectral bands vary in shape between instruments, and can be defined either using the center wavelength or FWHM spectral response function.Most laboratory VSWIR and MIR spectrometers are hyperspectral and collect thousands of spectral bands at a high spectral resolution.VSWIR laboratory spectrometers like the ASD FieldSpec and the Spectral Evolution OreXpess collect 2152 bands, at 1 nm/band resolution.The Reflectance Experiment Laboratory (RELAB) collects VSWIR spectra at 10 nm/band (Pieters, 1983).MIR laboratory spectrometers typically collect with spectral resolution of 1.9 cm 1 /band or 4 cm 1 /band.Onboard satellite or orbital spectrometers, multispectral imagers, or radiometers have coarser spectral resolution and sometimes unevenly distributed bands compared laboratory spectrometers.The Landsat 8 and 9 Earth-observing satellites collect nine bands between 0.43 and 1.3 μm and two bands in the IR (Knight & Kvaran, 2014;Masek et al., 2020;Reuter et al., 2015).ASTER was designed with nine bands between 0.52 and 2.43 μm and five bands between 8.125 and 11.65 μm, with spectral resolutions varying between 10 nm/band and 700 nm/band (Yamaguchi et al., 1998), though the SWIR subsystem (1.6-2.4 μm) has been offline since 2008.Orbiting Mars, CRISM covers 0.362-3.920μm with 543 or 72 channels depending on the observational mode (Murchie et al., 2007) and TES covers 6-50 μm with a spectral resolution of 5 or 10 cm 1 /channel depending on the observation mode (Christensen et al., 1992).Mars rover-based spectrometers have variable spectral resolution depending on their design.MASTCAM-Z on the Perseverance rover has 11 channels covering 0.44-1.02μm (Bell et al., 2021).Mini-TES, on the Curiosity rover, covers 5-29.5 μm at 9.99 cm 1 resolution (Silverman et al., 2006).
The placement and number of spectral bands control whether or not a spectral feature is sampled (e.g., the 2.2 μm feature is resolved at 1 nm resolution, but not in the 10 and 100 nm resolution spectra in Figures 1d and 1e).Low spectral resolution may affect the ability of a spectrum to accurately represent spectral shapes, making spectral resolution a limiting factor in extracting compositional information from multispectral planetary data if it is coarser than the scale of the features (Crisp et al., 1990).However, PLS is not necessarily restricted by reduced spectral resolution, or band scarcity.For example, Breitenfeld et al. (2021) used laboratory MIR spectra resampled from ∼1.93 to ∼8.66 cm 1 /band, the spectral sampling of the OSIRIS-REx Thermal Emission Spectrometer (OTES), to model the surface minerology of asteroid Bennu.Dyar et al. (2020) showed that PLS models can be used to differentiate mafic and felsic compositions with only six spectral bands from 0.85 to 1.18 μm.The number and placement of spectral bands varies by necessity with resolution, so that the lower resolution spectra have few bands spaced further apart.Here, rather than replicating specific instruments, the distance between bands is the same as the spectral resolution.

Data Sets
Machine learning methods such as PLS require a training data set composed of both the input variables (spectra) and predicted variables (known compositional parameters of standards, here wt.% SiO 2 , Na 2 O + K 2 O, FeO, and % glass), from which the models are built.Increasing the size of the training data set may improve model performance (Dyar & Ytsma, 2021) and accurate machine learning models are typically built from hundreds of unique spectra (e.g., Breitenfeld et al., 2021;Ytsma et al., 2020).Robust spectral and compositional data sets are time-consuming and costly to build, so this study combines new spectra with those from published works to amass sufficient training data (Table 1).
Natural volcanic samples are utilized to ensure the models were trained and tested on realistic phase assemblages and compositions.Igneous compositions broadly evolve along well-defined trends, though samples' complex phase assemblages may be difficult to reproduce with artificial mixtures.Altered volcanic material was excluded from the data sets, though some of the data sources include altered samples (e.g., Farrand et al., 2018).Data from synthesized pure glasses provide compositionally diverse endmembers to expand upon natural high-and pureglass samples, which tend to be high-silica.Utilized synthetic glasses were either made from scratch using synthetic chemical compounds (Cannon et al., 2017;Minitti & Hamilton, 2010;Minitti et al., 2002) or fused from natural materials (Carli et al., 2016;Pisello et al., 2022) Sample characterization methods varied between data sources but were all of high analytical accuracy.
VSWIR data (Figure 1) were measured as bidirectional reflectance from 0.35 to 2.5 μm at varied phase angles.MIR spectral data (Figure 2) were collected as emissivity measurements from 400 to 1,600 cm 1 in purged or dry air at temperatures of around 80°C.While available in the wider literature, MIR data collected as hemispherical reflectance are not used in this work.Similarly, emissivity measurements collected under a vacuum or high temperature (>150°C) are excluded due to the effects of vacuum and temperature on the position of the Christiansen feature (Donaldson Hanna et al., 2012;Henderson & Jakosky, 1997;Logan & Hunt, 1970).Earth and Space Science 10.1029/2023EA003439

Spectral Preprocessing
Spectral data were resampled using in-house software (Carey et al., 2017).VSWIR spectral resolutions tested included 1 nm/band (2,146 bands), 10 nm (216 bands), and 100 nm (22 bands; Figure 1), and MIR spectral resolutions tested were 1.9 cm 1 (632 bands), 19 cm 1 (64 bands), and 190 cm 1 (7 bands; Figure 2).Spectral resolutions were not chosen to match specific spectral instruments, but rather to be even order-of-magnitude reductions in spectral resolution; the finest resolution was based off the data's initial spectral resolutions.Fefeatures in the VSWIR appear from ∼0.5 to 1 μm, with the width of the feature dependent on the minerals hosting Fe, while -OH absorbances are narrower, ≲100 nm.In the MIR, the reststrahlen band can cover 800-1,200 cm 1 , with the CF appearing at ∼1,200 cm 1 , and ≈100 cm 1 wide.The spectral resolutions therefore range from over-sampling spectral features (1 nm and 1.9 cm 1 ) to <5 spectral bands defining any given feature.
In the VSWIR, narrower features <100 nm wide may not be visible in the 100 nm resolution spectra.
For each compositional parameter, the intensities of all spectra were normalized to either the maximum intensity or to the intensity at a specific spectral band (band normalization) using MATLAB.In the former method, intensity at each band was divided by its highest reflectance or emissivity value, while in the latter, the reflectance or emissivity value at all band intensities was normalized to be equal at a specific spectral band.The best spectral band for band normalization was identified by testing multiple band normalizations for models of glass abundance, bulk wt.% SiO 2 and wt.% FeO, and glass wt.% SiO 2 and wt.% FeO.VSWIR and MIR data sets from Leight, McCanta, et al. (2023) were used as the training data set, with no withheld test data set.RMSE-CV was used to evaluate model performance.VSWIR spectra were normalized to 2.17 μm (Figure 1b), except the 100 nm/ band-VSWIR spectra, which did not have a band at 2.17 μm, and were normalized to the intensity at 2.15 μm.MIR spectra were normalized to the emissivity at 1,100 cm 1 (Figure 2b).

Partial Least Squares Regression
PLS regression is a multivariate regression technique used to predict explanatory variables from observational data.A variant of least-squares regression, PLS explains covariance in both the predictor variables, p, and predicted elements (Stone & Brooks, 1990).For this study, p is the number of spectral bands in the collected spectra and the predicted elements are the known sample abundance of bulk wt.% SiO 2 , Na 2 O + K 2 O, FeO, glass wt.% SiO 2 , Na 2 O + K 2 O, FeO, or modal % glass; each predicted element is modeled independently.This gives an input matrix X with the dimensions of the number of spectra and the number of spectral bands (p).
A least squares regression model takes the form: where X j are the input variables taking the vector form X T = (X 1 , X 2 , …, X p ), and β j are the coefficients taking the vector form β = β 0 ,β 1 ,…,β p ) T used to predict f(X ), or the compositional parameter, with p being the length of the input vector X T (Hastie et al., 2009).Here, each component of X j is a spectral band.Spectra and the collected compositional data provide a training set used to estimate the coefficients, β.PLS models place the largest coefficient values on components (in this case, spectral bands) that have high variance and high correlation with the predicted values (Hastie et al., 2009;Stone & Brooks, 1990).
PLS is a shrunken regression technique, which means that the algorithm assumes the input variables (p) are correlated and can be reduced (Hastie et al., 2009).The PLS algorithm reduces p to an absolute number of terms by creating a series of new, orthogonal (uncorrelated), hybrid variables, also referred to as components that are linear combinations of the original bands (Hastie et al., 2009;Wold et al., 2001).The ideal number of components, n, for a data set or prediction model must be determined by testing and is characteristic of the data set or prediction model in question (Wold et al., 2001).Larger numbers of components (n > 10) expose the model to overfitting to training data (Dyar, Carmosino, Breves, et al., 2012;Dyar, Carmosino, Tucker, et al., 2012), and small numbers of components correspond to the most generalizable models.This study limits n ≤ 10.
PLS models were built and tested using a suite of Python programs built to train and test multivariate regressions on spectral data sets for each compositional variable (Ytsma, 2022).To ensure that the models were tested on unseen data, a random subset of approximately 20% of the total data set was held out as a test data set (Dyar & Earth and Space Science 10.1029/2023EA003439 Ytsma, 2021).Which samples were used as the test set varied for each predicted compositional parameter.If a sample had multiple spectra, they were all assigned to the same data set; this resulted in unevenly sized data sets but was necessary to ensure that the models were not essentially tested on the training data.Table 2 lists the number of samples and spectra in the training and test sets for each model.
The remaining samples, ≈80% of the total data set, were used as training data to calibrate the model (Table 2).The number of components, n, was optimized using five-fold cross-validation (CV) (Dyar & Ytsma, 2021).The five folds were built within the PLS algorithm by randomly selecting spectra from the training data set, keeping spectra from the same standard in the same fold.To perform CV, a fold was iteratively held out from the training data and a regression performed on the remaining training data.Spectra in the withheld fold were then predicted by that regression, and the root mean squared error (RMSE) between predicted and actual values of the withheld fold was calculated.Once every fold had been tested, the RMSE values of each fold were averaged, giving the RMSE-CV of the n-component model.

Results
To maximize the predictive capabilities of models and their applicability to different geologic environments, data used here encompass a large compositional range (Table 1 and Table S1 in Supporting Information S1).Not all available spectral data had all of the desired compositional information-glass abundance, bulk wt.% SiO 2 ,  S1 in Supporting Information S1)-but spectral data were included in as many data sets as possible (Table 2), resulting in varyingly sized training and test sets.
Insufficient MIR spectra of samples with known glass compositions were available so only bulk compositions and modal abundance of glass could be considered for those data.Several published VSWIR data sets were sampled at >1 nm/band, and so were only included in the 10 and 100 nm spectral resolution data sets.Figure 3 compares the compositions encompassed in the utilized data sets for the four modeled parameters.
PLS models predicted the compositional parameters from VSWIR and MIR spectra at each spectral resolution.Compositional variables were chosen to address first-order scientific questions in identifying volcanic deposits: the modal abundance of glass, the bulk wt.% SiO 2 , Na 2 O + K 2 O, FeO, and glass phase wt.% SiO 2 , Na 2 O + K 2 O, FeO.Model uncertainties given as RMSE-P were used to quantify accuracies and validate model performance (Table 3).R 2 correlations in Table 3 were used to assess the difference between predictions and true values from the ideal 1:1 line (Figures 4 and 5) and are useful for validating model performance.
Both normalization to maximum intensity and a specific spectral band were applied to the spectral data before building PLS models.For most predicted parameters, normalization minimally affected model accuracy.The VSWIR models performed best with the band normalization, while the MIR models were best modeled with the max normalization, with two exceptions.The 10 and 100 nm VSWIR glass abundance models performed significantly better using the maximum normalization, rather than the band normalization (Table 4).
Model efficacy varied with changing spectral resolution.The VSWIR 1 nm models have the largest RMSE-P (worst accuracy) and smallest R 2 values for each parameter (Table 3); the one exception is the glass abundance models, where the 1 nm model has a lower RMSE-P and higher R 2 (RMSE-P = 12%, R 2 = 0.83) than the 10 and 100 nm models (RMSE-P = 15%, and 15, R 2 = 0.79 and 0.77, respectively).RMSE-P does not vary between the VSWIR 10 and 100 nm models of glass abundance (RMSE-P = 15%) and bulk wt.% FeO (RMSE-P = 3.4 wt.%) with minimal (∼0.02) change in R 2 .The MIR glass abundance and bulk wt.% Na 2 O + K 2 O models have the same RMSE-P for all resolutions (RMSE-P = 12% and RMSE-P = 0.9 wt.%, respectively).For all of the modeled parameters, MIR model R 2 values change only slightly (e.g., the glass abundance R 2 are 0.77, 0.77, and 0.75 for the 1.9, 19, and 190 cm 1 models, respectively) between spectral resolutions, except for the bulk wt.% FeO models.There are also several parameters where the model with the lowest RMSE-P or highest R 2 is the 19 cm 1 or 10 nm spectral resolution: the MIR bulk wt.% SiO 2 , VSWIR glass wt.% FeO, bulk and glass wt.% Na 2 O + K 2 O models.The lack of a coherent pattern of model performance and spectral resolution indicates that spectral resolution is not strongly restricting PLS as a predictive method.
With RMSE-P of 12%-15%, PLS models can differentiate well between low and high glass abundance at all resolutions of both VSWIR and MIR data.The 10 and 100 nm VSWIR models predict a wide spread of values for the pure-glass samples (Figure 4); all the MIR models have a wider spread of predicted values at low glass abundances (Figure 5).Therefore, these models are capable of discriminating between a sample with 20% modal glass from one with 90% glass.The 1.9 and 19 cm 1 MIR and 10 and 100 nm VSWIR bulk wt.% FeO models have different uncertainties (MIR RMSE-P of 2.4-2.5 wt.%, VSWIR RMSE-P of 3.4 wt.%), though all of the models have similar R 2 near 0.55, indicating that both regions provide the same predictive capability for bulk wt.% FeO, and are able to distinguish low-Fe samples from high-Fe samples.The MIR models are most accurate below ∼10 wt.% FeO, underpredicting higher FeO samples (Figure 5).Both MIR and VSWIR wt.% Na 2 O + K 2 O models have poor R 2 (R 2 = 0.02-0.43),indicating poor model accuracy.The MIR models generally over-predict wt.% Na 2 O + K 2 O, (Figure 5), and the VSWIR models have unsystematically bad predictions for this parameter (Figure 4).The MIR wt.% SiO 2 models have small enough RMSE-P (3.9-2.4 wt.%) to distinguish between low (40-50 wt.%) and high silica (>65 wt.%; Figure 5); the VSWIR models have poor R 2 (0.27-0.38), indicating they cannot separate mafic and felsic samples (Figure 4).

Discussion
The most accurate models are those with spectral features that are known to be directly affected by the predicted parameter, such as the VSWIR wt.% FeO models, and the MIR wt.% SiO 2 models.Because both the CF and reststrahlen bands are produced by the SiO 4 tetrahedra in minerals, their position is indicative of silicate polymerization.Network-modifying cations such as Fe, Na, and K affect the silicate structure, and therefore change the CF and reststrahlen bands.Conversely, none of the predicted compositional parameters in the VSWIR region has a specific feature except Fe.Spectral albedo in the VSWIR generally increases with increasing SiO 2 content, though this may not be apparent after normalization.The low R 2 of the MIR and VSWIR wt.% Na 2 O + K 2 O models show the importance of available spectral features in model performance.In the MIR, the models are more successful at predicting alkali contents because the steric configurations of SiO 4 tetrahedra are modified by Na + and K + .Na + and K + do not directly influence any spectral features in the VSWIR region, and so the models The PLS models have uncertainties similar to previously published statistical methods of predicting bulk oxide content or mineral abundance from VSWIR or MIR data.For example, the spectral parameter predicting bulk wt.% SiO 2 from Hook et al. (2005) has a RMSE = 5.5-8.4wt.% SiO 2 (as calculated from their Tables 4 and 5).
Predictions of bulk wt.% FeO via spectral parameters by Lucey et al. (1995) and Wilcox et al. (2005) have errors of 2.16%, 1.5 wt.% FeO, respectively.The VSWIR glass abundance parameter from Rader et al. ( 2022) has an error of 8.4%.These spectral parameters are intended for very limited compositional or phase ranges (i.e., basalts); the PLS models apply to a much larger compositional or phase range than any of these examples.All of these examples note that their predictions are sensitive to grain size.The training data sets for the PLS models include a wide range of grain sizes, and so the models are not strongly influenced by sample grain size, (Figure 6).
Predictions of phase abundance via linear deconvolution of MIR data by Feely and Christensen (1999) and Wyatt et al. (2001) had errors of 3%-11% and 2.5%-12%, respectively.The MIR glass abundance PLS models have 12% uncertainty, and the VSWIR have 15% uncertainty, which are respectively equal to and larger than the error in Feely and Christensen (1999) and Wyatt et al. (2001).However, most works utilizing linear deconvolution methods do not assess the model prediction error, but assume an error of 5%-10%.The PLS model methodology provides a more rigorous quantitative assessment of both model precision and accuracy by utilizing a large known test data set.Linear deconvolution methods are limited by the size of the reference library and the encompassed compositional range (e.g., Ramsey & Christensen, 1998;Rogers & Aharonson, 2008).Crucially, predictions of glass abundance via linear deconvolution methods are influenced by the included glasses' compositions (e.g., Cannon et al., 2017;Wyatt et al., 2001), whereas PLS models decouple predictions of phase abundance and composition.

Relative Linearity of VSWIR and MIR Spectra and Potential Effects on Prediction Accuracies
In both the VSWIR and MIR regions, spectral features from individual phases overlap and combine to produce the observed spectra.How spectral features combine is controlled by k, the imaginary index of refraction, or extinction coefficient, which describes how light attenuates as it propagates through a material.In the MIR, where k is large, light does not propagate far into the material, and thus spectral features are produced predominantly by surface scattering from a single phase (Hapke, 1981;Salisbury & Wald, 1992).At fine particle sizes (<70 μm), light may not fully attenuate within a single particle, resulting in transparency features and non-negligible volume scattering (Hunt, 1977;Moersch & Christensen, 1995;Mustard & Hays, 1997).In the VSWIR, k is orders of magnitude smaller than at MIR wavelengths.Light can propagate into the surface and may interact with more than one phase, resulting in nonlinear volume scattering of light (Mustard & Hays, 1997;Mustard & Pieters, 1989).Surface scattering causes linear spectral mixing, with phases contributing to the observed spectrum in proportion with their areal abundance, while volume scattering produces nonlinear spectral mixing (Moersch & Christensen, 1995;Mustard & Hays, 1997).Thus, it might be expected that MIR predictions would be more straightforward.
Previous applications of PLS to VSWIR data largely focused on quantifying the phase assemblage of lunar soils, producing models with limited predictive capabilities (L.Li, 2006;S. Li et al., 2012;B. Liu et al., 2014).These results, when contrasted to successful PLS models for MIR data, have led to speculation that spectral mixing may influence the success of PLS modeling, limiting the method to spectral regions with linear mixing (S.Li et al., 2012;Pan et al., 2015).In contrast, other studies (Breitenfeld et al., 2021;Pan et al., 2015) show that use of a range of particle sizes in training data, including some fine enough to have nonlinear mixing in the MIR, produces MIR models that are insensitive to particle size, suggesting that PLS can overcome effects of nonlinear mixing.
Results from this study further this conclusion.MIR data sets contain fine-grained samples that display features arising from volume scattering (Leight et al., 2022), but the models nonetheless perform well, and do not display a sensitivity to particle size (Figure 6).The VSWIR models predicting wt.% FeO and glass abundance, which solely use data in a nonlinear mixing wavelength regime, are only slightly less accurate than those of the respective MIR models.These VSWIR results indicate that nonlinear spectral mixing does not wholly inhibit the predictive capabilities of PLS models, though further work is needed to fully investigate this conclusion.

Influence of Size and Distribution of Data Sets
Large, compositionally diverse data sets are particularly important when predicting parameters without obviously correlative spectral features.Previous work by Dyar and Ytsma (2021) established that increased training data set size improves PLS model performance for laser-induced breakdown spectra.The current study provides a comparable test for VSWIR spectra.Here, the increase in data set size between the 1 and 10 nm resolution VSWIR bulk data sets (due the availability of more data in the literature), and accompanying compositional breadth, results in improved model accuracy for the 10 nm-resolution bulk and glass wt.% FeO models (Table 3, Figure 4).The VSWIR bulk wt.% SiO 2 and Na 2 O + K 2 O models exhibit marginal improvement with additional training data but low accuracy overall due to lack of spectral features at this wavelength, as discussed above.
Previous successful applications of PLS to quantify phase assemblages used MIR data of mineral mixtures (e.g., Breitenfeld et al., 2021;Hecker et al., 2012;Pan et al., 2015).In the VSWIR, minerals have spectral features arising from their crystal structures to which the PLS models can be tuned to determine abundance.Spectral features in glasses are less consistent and can vary with composition and quench rate (Minitti & Hamilton, 2010;Pisello et al., 2022).Published VSWIR PLS models for glass abundance use small data sets (number of samples <80) and result in limited predictive accuracy over a narrow compositional range (L.Li, 2006;S. Li et al., 2012;B. Liu et al., 2014).In this study, the accuracy of models that predict glass abundance likely reflects the large sample size and compositional diversity of the training sets, which encompass the entire range of potential glass abundances-from near-zero to 100%-with no trend between glass composition and abundance (Figure 3).
The range of compositions encompassed in a training set, and the spread of data within that compositional range, affect model accuracy.It is difficult, however, to achieve even coverage of oxide compositions using natural samples; intermediate compositions of ∼57-67 wt.% SiO 2 are surprisingly uncommon in natural volcanic rocks (Bennefoi et al., 1995;Charlier et al., 2011;Geist et al., 1995).Minor gaps like the ones in our training sets may be compensated for either by inclusion of samples on either side of the compositional gap or by strategic use of synthetic samples.The bias towards mafic compositions in the data used here results from the emphases of previous studies that focused on predominantly mafic samples analogous to the Moon and Mars.High-alkaline compositions are similarly underrepresented in all data sets, especially in the MIR.
The distribution of the training data inherently limits the calibration to unknowns within that same range, and model accuracy will most likely get worse for unknown compositions outside that range.Many of the compositional data sets include sparse data at compositional extremes (Figure 3) that may limit model accuracy toward endmember compositions (e.g., the MIR wt.% FeO model predictions for samples with >10% FeO; Figure 5).The narrower range of the MIR data oxide compositions relative to the VSWIR data sets may contribute to the better accuracy of MIR models relative to those in the VSWIR.

Effects of Spectral Preprocessing
Spectral preprocessing is an important step when utilizing machine learning.It is particularly important for building the data sets utilized in this work because spectral data collection protocols are variable among data sources.Normalizing spectral data can minimize differences in instrumentation or experimental configuration because these variations change the magnitude of measured reflectance or emission (e.g., Sklute et al., 2015).Normalization may also reduce spectral differences between different size fractions of a single sample, thus influencing model predictions (i.e., L. Li, 2006;Pan et al., 2015).As shown in Table 4, model accuracy is indeed sensitive to the normalization method used.Most VSWIR models performed better when spectra were normalized to 2.17 μm, except for the 10 and 100 nm glass abundance models, which performed best when the spectra were normalized to maximum intensity (Table 4).The MIR models all performed best when the spectra were max normalized, though the difference between normalizations is significantly less than the VSWIR models.Because only two normalization methods were tested in each spectral region, it is quite possible that different normalizations (i.e., a band normalization at 0.50 μm) would work better when predicting other variables.

Spectral Resolution
Spectral resolutions tested here were chosen to reflect differences in spectral resolution between laboratory spectrometers and multispectral instruments such as ASTER, which has 9 VSWIR bands and five bands in the MIR.The 1.9 cm 1 , 1 nm, and 10 nm resolutions used in this study are directly comparable to the resolutions of available data.The relative accuracies of the band-sparse models indicate that hyperspectral data with the high resolution of laboratory spectral are not needed for PLS models to accurately quantify sample composition.The difference in model efficacy between the 19 cm 1 and 190 cm 1 models, and between the 10 and 100 nm models, is minimal despite the order of magnitude reduction in spectral bands (Figures 4 and 5, Table 3).The MIR models Earth and Space Science 10.1029/2023EA003439 LEIGHT ET AL.
are less affected by differences in spectral resolution than the VSWIR models.This is probably due to the sensitivity of the available spectral bands to the predicted parameters, but is likely also influenced by the relative width of the spectral bands compared to the coarse spectral resolutions of 190 cm 1 and 100 nm.
Band-sparse spectra may sample narrow spectral features at only one or two points.Narrow spectral features, such as -OH absorptions at 1.4, 1.9, and 2.2 μm, are over sampled at the 1 nm and 1.9 cm 1 resolutions, but at only 10 s of nm wide, may only affect one or two spectral bands at a 10 nm or 19 cm 1 resolution, and would not be visible at the 100 nm or 190 cm 1 resolution, though the features will still affect the observed spectrum (Figures 1 and 2).Conversely, Fe absorptions in the VSWIR can affect reflectance values between 0.5 and 1.0 μm, depending on the crystal lattice; at 100 nm resolution, five bands sample this feature.The reststrahlen band in the MIR is similarly wide, and affects three bands at 190 cm 1 resolution.The position of spectral bands will also influence the ability of PLS to make accurate predictions, as the location of spectral bands affects how spectral band shapes are reproduced in band-sparce spectra.The emissivity maximum caused by the Christensen feature (CF), for example, falls between the centers of two bands in the 190 cm 1 data set, affecting the emissivity values of both.Unlike the 190 cm 1 and 100 nm data sets, most multispectral instruments do not have evenly spaced spectral bands.Earth-orbiting instruments frequently lack bands near 1.4 and 1.9 μm due to atmospheric water absorptions (e.g., Knight & Kvaran, 2014;Yamaguchi et al., 1998).Both the spacing and resolution will affect the observed spectrum and how well PLS can use the data for predictions.
Resampling of the 1.9 cm 1 MIR data sets and 10 nm VSWIR data sets through ASTER (Advanced Spaceborne Thermal Emission and Reflection Radiometer; Yamaguchi et al., 1998) spectral bands provides a test of the importance of band placement on PLS model efficacy.The nine VSWIR bands range in width from ∼8 to 100 nm and the five MIR bands have widths of 350 and 700 nm; spectral bands are placed unevenly over the sampled wavelength range (points in Figure 7; Yamaguchi et al., 1998).Resampled spectra (Figure 7) have varying resolution and reduced spectral range compared to the other data sets used here (0.56-2.4 μm in the VSWIR, and 8.26-11.24μm, or 890-1,208 cm 1 in the MIR).PLS models were made following the methods in Section 3, and the results are reported in Table 5. VSWIR ASTER-band PLS models generally have greater error than the 1, 10, and 100 nm resolution VSWIR models.However, several of the 1 nm models perform equally to or worse than the ASTER-band models (e.g., for bulk wt.% FeO, the 1 nm model RMSE-P = 2.2, and R 2 = 0.02, while the ASTERband VSWIR model RMSE-P = 3.4,R 2 = 0.55).This difference in performance is attributed to the larger size of the ASTER resolution data set.The ASTER-band MIR models have worse RMSE-P and R 2 than the 1.9, 19, and 190 cm 1 models (e.g., RMSE-P of 4.2 wt.% SiO 2 for the 190 cm 1 and 5.4 wt.% SiO 2 for the ASTER-band model; Tables 4 and 5) except for the bulk Na 2 O + K 2 O models.However, this difference is negligible, as the RMSE-P of the 1.9 cm and ASTER-band models are 0.9 and 0.8 wt.% Na 2 O + K 2 O, respectively.The reduction in spectral bands and spectral range likely factors into the larger model uncertainties for the ASTER-band models compared to the 100 nm and 190 cm 1 spectra and models; ASTER-band MIR spectra do not sample the minima caused by the Si-O bend at 500-700 cm 1 (King et al., 2004) nor fully sample of the reststrahlen band past 1,200 cm 1 while in the VSWIR, only three bands sample the Fe-absorption region.However, the similar RMSE-P and R 2 of the VSWIR and MIR ASTER-bands models to the 100 nm and 190 cm 1 models indicates that spectral band spacing and spectral resolution are less influential on model efficacy than the relevance of spectral features present to the modeled variable.

Prediction Accuracies for Glass Composition
Because glass composition is used to distinguish between volcanic deposits (Brown et al., 1992;Carter et al., 1995;Lowe, 2011;Westgate & Gorton, 1981), it is important for evaluating eruption dynamics and correlating tephra units across large regions.Quantifying the composition of an individual phase within a sample has previously been intractable with MIR or VSWIR data, though multivariate techniques show promise.Hecker et al. (2012) modeled plagioclase composition in granitic rocks using PLS, but the large uncertainties in the feldspar composition training data affected their model accuracies.Utilizing sample characterization methods with high analytical accuracy is key to training accurate PLS models, as the models cannot be more accurate than the training data set.
The absence of spectral features dependent on Si 4+ , Na 2+ , or K 2+ in the VSWIR inhibits accurate prediction of glass wt.% SiO 2 or wt.% Na 2 O + K 2 O (Figure 4; Table 3).Models predicting wt.% FeO in glass, however, are slightly more accurate than predicting bulk wt.% FeO (Table 3).Model uncertainties do not permit discrimination of minor differences between bulk and glass composition, as is observed in natural samples (e.g., Leight et al., 2022) and of interest for volcanological study.However, larger data sets are seen to increase prediction accuracy, suggesting this degree of accuracy may be possible with a larger training set.
Unfortunately, only 28 samples in the MIR data set have known glass compositions, most from a single data source.These data are insufficient for training PLS models.However, the accuracy of the 190 cm 1 MIR models suggest that glass composition could be quantified with 190 cm 1 spectral resolution should sufficient training data be amassed.

Conclusions
Identification of volcanic deposits relies on identification of a glass phase.Both the MIR and VSWIR spectral regions can be used to identify the presence of glass in all three of the tested spectral resolutions using PLS models.These PLS models provide a method of quantifying glass abundance, independent from other parameters, generalizable to a wide range of bulk composition, and largely independent of particle size.Identifying and quantifying glass abundance in multiphase samples using other methods, such as band parameters or least-squares regressions, is difficult and sensitive to composition, particle size, and glass abundance (e.g., Cannon et al., 2017;Horgan et al., 2014;Williams & Ramsey, 2022).The glass abundance models here do have a higher uncertainty than the 5%-10% modal error typically assumed for linear deconvolutions based off work by Ramsey and Christensen (1998) and Feely and Christensen (1999).However, the uncertainty of our PLS models provides a significantly more quantitative assessment of both model precision and accuracy by virtue of using a known test data set than this assumed error.PLS models presented here predict composition and glass abundance of volcanic samples from VSWIR and MIR spectral data with quantified accuracies over a wide range of composition, grain size, and spectral resolution.MIR models have better prediction accuracies than VSWIR models when predicting the same compositional parameters due to the presence of relevant spectral features in each region that are indicative of the predicted variable.The efficacy of the VSWIR glass wt.% FeO abundance models indicates that parameters directly related to available spectral features such as specific mineral compositions, can be modeled using PLS provided a sufficiently large training data set is available.
The spectral resolution of the data does affect model efficacy but sparce resolution does not preclude accurate PLS model predictions.Models using 100 nm and 190 cm 1 resolution provide useful predictions of glass abundance and bulk wt.% FeO with accuracies of 15% and 12% modal glass abundance and 3.4 and 3.1 wt.% FeO, respectively.The quantity and compositional range of training data are equally influential on the PLS model accuracy as the number of spectral bands.VSWIR and MIR PLS models are always influenced by the available training data when data sets are small or compositionally restricted, as is the case here.Additional spectral libraries of natural rock samples with quantified phase assemblages, phase compositions, and bulk compositions will undoubtably improve model performance and likely produce quantitative predictions.

Figure 1 .
Figure 1.VSWIR spectral data.The 10 nm resolution spectra are shown in (a) with no normalization, (b) with normalization to % reflectance at 2,170 nm, and (c) with normalization to the maximum % reflectance.Bulk wt.% FeO Test data set spectra shown colored by bulk wt.% FeO, all other data shown in gray.The spectral resolutions of (d) 1 nm/band, (e) 10 nm/band, and (f) 100 nm/band are shown using max-normalized spectra of samples CFSW (yellow line), Ves (light blue line) and MSH (dark blue line) from Leight, McCanta, et al. (2023).
, 125-250, 63-125, <63 19 76 Note.Table S1 in Supporting Information S1 gives each spectrum by name, and specifies available compositional information for each sample.a Data set obtained via repository download.b Limited compositional information.c Data set produced by authors.d Data set obtained via direct request.

Figure 2 .
Figure 2. MIR spectral data.The 1.9 cm 1 resolution spectra are shown in (a) with no normalization, (b) with normalization to % emission at 1,100 cm 1 , and (c) with normalization to the maximum % emission.Bulk wt.% SiO 2 test data set spectra shown colored by bulk wt.% SiO 2 , all other data shown in gray.The different spectral resolutions of (d) 1.9 cm 1 /band, (e) 19 cm 1 /band, and (f) 190 cm 1 /band are shown using spectra of samples CFSW (purple line), SS4 (blue line) and MSH (yellow line) from Leight, McCanta, et al. (2023).
The n-component model with the lowest RMSE-CV was chosen as the final model and the calibration RMSE (RMSE-C) was calculated using the entire training data set.Both the RMSE-CV and RMSE-P test model performance on data included in model training.Model prediction accuracy (RMSE-P) was determined by testing that final model on the held-out test data set.R 2 correlation assesses the variance in the test predictions to the ideal 1:1 line of prediction.Here we report the R 2 metric adjusted for the size of the data set.

Figure 3 .
Figure 3. Distributions of training and test data sets by the number of spectra in each data set.The 10-nm sampled and 100-nm sampled VSWIR data sets are the same.Training set is shown in purple and the test set is shown in yellow.

Figure 4 .
Figure 4. VSWIR model predictions.Triangles are the 1 nm resolution model predictions, circles are the 10 nm resolution model predictions, and squares are the 100 nm resolution model predictions.The blue reference line shows a 1:1 relationship.

Figure 5 .
Figure 5. MIR model predictions.Triangles are the 1.9 cm 1 resolution model predictions, circles are the 19 cm 1 resolution model predictions, and squares are the 190 cm 1 resolution model predictions.The blue reference line shows a 1:1 relationship.

Figure 7 .
Figure 7. MIR and VSWIR spectral data resampled through ASTER bandpasses, and PLS model results.MIR spectra are max normalized, and the test data set colored by bulk wt.% SiO 2 .VSWIR spectra are normalized to 2.17 μm, and the test data set is show colored by bulk wt.% FeO.The black dots in the MIR and VSWIR spectral plots show the center of the ASTER bandpasses.MIR model results are shown as purple triangles, VSWIR model predictions shown as blue circles.

Table 2
PLS Model Data Sets Used in This StudyThe total number of spectra and samples included in each data set are given.+ K 2 O, FeO, or glass wt.% SiO 2 , Na 2 O + K 2 O, FeO data (Table

Table 3
Model Summary Statistics Models are grouped by spectral resolution.RMSE-CV, RMSE-C, and RMSE-P are given in the unit of the predicted variable, for example, wt.%.R 2 is unitless.
LEIGHT ET AL.

Table 4
Effect of Normalization Method on VSWIR and MIR Glass Abundance and Bulk wt.% FeO Model Summary Statistics LEIGHT ET AL.

Table 5
ASTER-Resolution VSWIR and MIR Model Summary StatisticsNote.RMSE-CV, RMSE-C, and RMSE-P are given in the unit of the predicted variable, for example, wt.%.R 2 is unitless.