Simplified model of spectral absorption by non-algal particles and dissolved organic materials in aquatic environments

: Absorption by non-algal particles (NAP, a d ) and colored dissolved organic matter (CDOM, a g ) are frequently modeled by exponential functions of wavelength, either separately or as a sum. We present a new representation of NAP-plus-CDOM absorption a dg based on the stretched exponential function a dg ( λ ) = A exp {− [ s ( λ − λ o )] β } , whose parameter β can be considered a measure of optical heterogeneity. A double exponential representation of a dg can be ﬁt extremely well by a stretched exponential for all plausible parameter combinations, despite having one fewer free parameter than a double exponential. Fitting two published compilations of in situ a dg data – one at low spectral resolution ( n = 5, λ = 412-555 nm) and one at high spectral resolution ( n = 201, λ = 300-700 nm) – the stretched exponential outperforms the single exponential, double exponential, and a power law. We thereby conclude that the stretched exponential is the preferred model for a dg absorption in circumstances when NAP and CDOM cannot be separated, such as in remote sensing inversions.


Introduction
The total absorption spectrum a(λ) (in units of m −1 , with λ denoting wavelength, in units of nm) in aquatic environments is frequently decomposed as the sum of four components (e.g. [1]) a(λ) = a w (λ) + a φ (λ) + a d (λ) + a g (λ) (1) where a w (λ) is the absorption by water, a φ (λ) is the absorption by phytoplankton pigments, a d (λ) is the absorption by non-algal particles (NAP), and a g (λ) is the absorption by colored dissolved organic materials (CDOM). Here we are interested in the last two components: absorption by NAP and CDOM. a d (λ) and a g (λ) are routinely approximated by exponential functions for the purpose of radiative transfer computations [2]; by extension, their sum a dg (λ) = a d (λ) + a g (λ) is routinely approximated by a double exponential function (the sum of two exponentials). In applications where it is not possible to separate the signals, such as in semianalytical inversions of ocean-color remote sensing [3][4][5][6], they are typically lumped into a single exponential function due to the similar spectral shapes of a d (λ) and a g (λ). (Other models are also used; cf. Ref. [7], their Table 3.) The first approach treats NAP and CDOM as distinct and independently varying, while the second treats NAP-plus-CDOM as a single pool. Operationally, CDOM is defined as material which passes through a given filter (most often with pore size 0.2 μm) and NAP is defined as that which does not (other than soluble pigments, which are separated into a φ ), forming a continuum of material in terms of size [8]. This suggests it would be useful to model the absorption by NAP and CDOM as a the absorption of a continuum of heterogeneous material.
The objective of this paper is to propose a model for a dg (λ) that considers NAP and CDOM as a continuum, and to demonstrate that this model is a superior one for a dg (λ) and therefore may be useful in remote sensing signal inversion. We propose modeling a dg (λ) with the stretched exponential (a.k.a. Kohlrausch function [9]): The stretched exponential function can be considered the sum of many simple exponential functions with different exponents, where the wider the distribution of exponents, the smaller the value of β [10]. The stretched exponential reduces to a single exponential when the exponents being summed over are identical, in the limit where β = 1 (hence β being limited to the range [0,1]; β = 1 is the limiting case where the distribution of s p values is a delta function, and β = 0 is the limiting case of an infinitely wide distribution of s p values). Therefore, modeling a dg (λ) by a stretched exponential function considers NAP and CDOM as a collection of many sub-pools of material (e.g. many size classes), each of which has an exponential absorption spectrum with exponents s p that may differ. The parameter β then corresponds to the optical heterogeneity of NAP-plus-CDOM, where smaller values of β indicate more variance between the spectral shapes of absorption by the sub-pools. This implies that β may contain useful biogeochemical information (cf. Ref.

Stretched exponential fit to double exponential
We first demonstrate that for any double exponential curve with plausible values for the parameters ( A d , s d , A g , s g ), there exists a stretched exponential curve that closely fits that double exponential. Therefore using the stretched exponential as a model for a dg (λ) yields no loss of flexibility in modeling absorption curves, despite having one fewer parameter.
We fit a stretched exponential to a suite of artificial a dg (λ) data derived from double exponential curves: where s i and s k correspond to the exponents for NAP and CDOM respectively, R j corresponds to the ratio of the amplitudes A g /A d , and A n is a normalization constant set so that the maximum of a i , j,k (λ) is equal to 1 m −1 . We vary s i from 0.01 to 0.026 nm −1 and s k from 0.008 to 0.018 nm −1 , each by increments of 0.001 nm −1 (corresponding to the ranges reported in Ref. [12]), and vary R j = A g (λ o )/A d (λ o ) from 1/16 to 16 by factors of two (i.e. ranging from a d a g to a d a g ), and generate artificial a i , j,k (λ) data for each combination of (s i , R j , s k ) at 1 nm resolution from 300-700 nm, totaling 17 × 11 × 9 = 1, 683 curves.
We use two metrics to assess the quality of fit of the stretched exponential function to the double exponential function: the coefficient of determination (r 2 ), and the root-mean-squareerror (RMSE; the square root of the sum of squared residuals divided by the degrees of freedom of the regression). The fits (as well as those in Section 3) were performed in MATLAB (R2014b; all code used for all analyses herein will be made available at http://cael.space should this manuscript be accepted) using Nonlinear Least Squares, with the constraint β ∈ [0, 1] according to the definition of the stretched exponential function. We set max[a i , j,k (λ)] = 1 in all cases so that the RMSEs are comparable across different parameter combinations. Setting is relevant for the fitting procedure, as the stretched exponential can be rescaled by any factor C by multiplying its amplitude by C. In Equation 4, we use λ o = 300 nm, but this value is also immaterial for the same reason; for a single or double exponential, the choice of λ o is equivalent to a rescaling of the amplitude. However, the stretched exponential is not well-defined for wavelengths λ < λ o . For simplicity, throughout the manuscript we have therefore chosen λ o equal to the lowest wavelength present in any given analysis, e.g. λ o = 412 nm when analyzing the data from Ref.
[13], and λ o = 300 nm when analyzing the data from Ref. [14]. Figure 1 shows an example fit of a stretched exponential fit to a double exponential where the means reported in Ref.
[12] are used for s d = 0.0123 nm −1 and s g = 0.0176 nm −1 . The stretched exponential fit (A = 1.005 m −1 , s = 0.0165 nm −1 , β = 0.974) is extremely good, with r 2 > 0.9999 and RMSE = 8.2 ×10 −4 ; all deviations are within 0.005 m −1 , the largest occurring at the endpoint λ = 300 nm. Note that 0.005 m −1 is typically within the measurement uncertainty of the instruments used to measure absorption such as the WETLabs' ac-s [15]. Table 1 shows summary statistics from fitting the 1,683 a i , j,k (λ) curves. In each case, the fit was similar to Fig. 1, e.g. r 2 ≥ 0.9985. The largest residual was always at the endpoint λ = 300 nm as in Fig. 1. This demonstrates that for any plausible combination of parameters, a double exponential can be extremely well-fit by a stretched exponential, even though the stretched exponential has one fewer parameter. By extension, any a dg (λ) data that can be well-described by a double exponential should also be well-described by a stretched exponential.

Fits to in situ data
The above statistical exercise illustrates the capacity of the stretched exponential to fit a double exponential, but this means little if it does not translate into improvements in fitting a dg (λ) data. We next demonstrate that the stretched exponential is indeed a superior model in fitting a dg (λ) data, at both low and high spectral resolution.  Fig. 1(a). We fit a single, stretched, and double exponential to two compilations of in situ a dg data. We also fit a power-law function of wavelength, as has been shown to be a better model for multispectral CDOM absorption data [7]. For each a dg (λ) curve, we fit all four functions, and compare the stretched exponential to the other three by their RMSE and by their r 2 . Note that the RMSE is arguably a better metric by which to compare these models because it incorporates a regression's degrees of freedom, i.e. it suitably penalizes models for having additional fitting parameters. For each data compilation, we say one function outperformed another if it most often had the lower RMSE (or higher r 2 ).

b) Residual difference (solid line) between the two curves in
The first data compilation was developed to inter-compare different remote-sensing inversions [13]. We generate n = 656 curves of a dg (λ) by adding the measured a d (λ) and a g (λ) data in this compilation. As these data are measured at five wavelengths (λ = 412, 443, 490, 510, 555 nm), they correspond to the low-resolution limit, where the degrees of freedom are very small; the number of free parameters in the regression is close to the spectral resolution, and the RMSE will therefore harshly penalize the addition of free parameters.
The second data compilation is from the field program BIOSOPE [14]. We generate n = 260 curves of a dg (λ) by adding the measured a d (λ) and a g (λ) data in this compilation. As these data are measured every 2 nm from 300-700 nm, i.e. at 201 wavelengths, they correspond to the high-resolution limit, where the degrees of freedom are large, and the RMSE and r 2 are therefore likely to behave similarly.
We found that the stretched exponential outperformed each of the three other functions in both cases. Table 2 shows the number of cases for each data compilation where the stretched exponential had a better (lower) RMSE and better (higher) r 2 . The stretched exponential outperformed the single exponential, double exponential, and power law, in terms of both RMSE and r 2 , for both datasets. In the low-resolution dataset, in 128 cases the regression found β = 1 for the stretched exponential, reducing it to the single exponential and thereby making their r 2 and RMSE the same (hence the stretched exponential having a higher r 2 in only 528 cases). Table 2. Summary statistics for exponential fits to in situ data. First column corresponds to data compilation; 'High-resolution' data are those compiled in Ref. [14], and 'Lowresolution' data are those compiled in Ref. [13]. Second column corresponds to which function the stretched exponential is being compared: single exponential, double exponential, and power law. Third column reports the number of cases where the stretched exponential had a higher r 2 . Fourth column reports the number of cases where the stretched exponential had a lower RMSE.

Discussion
The results considered herein were insensitive to the spectral range considered; we repeated the analysis in Section 2 for the ranges 300-750 nm, 400-700 nm, and 400-750 nm, which yielded no substantive difference. While no ocean color satellite radiometer to date measures below 400 nm, future ones will -NASA's Plankton, Aerosols, Cloud and Ecosystem (PACE) is planned to have UV band extending at least to 350nm -so different spectral ranges are of interest. Interestingly, the regression estimated different values for β in the two data compilations; see Fig. 2. For the high-resolution data compilation, the β estimates were in the range β ∈ [.56, .96]; for the low-resolution data compilation, a majority of β estimates were > 0.96. These data compilations differ not only in terms of their spectral resolution and range, but also in terms of the water types they sample -the high-resolution data are taken from clear ocean waters [14], whereas the low-resolution data are taken from a range of water types. To examine whether spectral resolution and range can explain this difference in the estimated β values, we subsampled the high-resolution data at the same wavelengths as the low-resolution data and repeated the regressions on the subsampled data. The subsampling did not appreciably affect the β estimates, resulting in an average difference of 0.02 and a maximum difference of 0.07 (n.b. the stretched exponential outperformed the other models on the subsampled data). This suggests that these differences in β may result from optical differences between water samples, corrobo- rating the possibility mentioned in the Introduction that β may contain useful biogeochemical information (ideally when comparing data with good and similar spectral resolution and range). β and s are also anti-correlated throughout all analyses described in this paper, suggesting that a reduced-parameter expression with β as a function of s, or vice versa, could be employed; further research is required to determine the global distributions of β and s, as well as the relationship between these two parameters.

Summary
We found, using published in situ data compilations of CDOM and NAP absorption spectra, that the stretched exponential function (Equation 3) performed better as an analytical approximation of NAP-plus-CDOM absorption than a single or double exponential or a power law at both high and low spectral resolution. We also found that, for any plausible parameter combination, a double exponential representation of a dg can be very well fit by a stretched exponential. These results favor the stretched exponential model for parameterizing NAP-plus-CDOM absorption, especially in applications where their signals cannot be separated.
We expect that the strongest potential application for this new model is its incorporation into semi-analytical inversions, such as the model of Ref. [6]. We expect it to provide improved inversions, particularly when incorporating data in the UV. We hope our findings here might encourage other researchers to test this model against their own and other community data, to further evaluate its utility in other applications and study regions.