Assessment of a photon recollision probability based forest reflectance model in European boreal and temperate forests

We report a new version and an empirical evaluation of a forest reflectance model based on photon recollision probability ( p ). For the first time, a p -based approach to modeling forest reflectance was tested in a wide range of differently structured forests from different biomes. To parameterize the model, we measured forest canopy structure and spectral characteristics for 50 forest plots in four study sites spanning from boreal to temperate biomes in Europe (48 ◦ – 62 ◦ N). We compared modeled forest reflectance spectra against airborne hyperspectral data at wavelengths of 450 – 2200 nm. Large overestimation occurred, especially in the near-infrared region, when the model was parameterized considering only leaves or needles as plant elements and assuming a Lambertian canopy. The model root mean square error (RMSE) was on average 80%, 80%, 54% for coniferous, broadleaved, and mixed forests, respectively. We suggest a new parameterization that takes into account the nadir to hemispherical reflectance ratio of the canopy and contribution of woody elements to the forest reflec- tance. We evaluated the new parameterization based on inversion of the model, which resulted in average RMSE of 20%, 15%, and 11% for coniferous, broadleaved, and mixed forests. The model requires only few structural parameters and the spectra of foliage, woody elements, and forest floor as input. It can be used in interpretation of multi- and hyperspectral remote sensing data, as well as in land surface and climate modeling. In general, our results also indicate that even though the foliage spectra are not dramatically different between coniferous and broadleaved forests, they can still explain a large part of reflectance differences between these forest types in the near-infrared, where sensitivity of the reflectance of dense forests to changes in the scattering properties of the foliage is high.


Introduction
Multi-and hyperspectral remote sensing provide a means of spatially and temporally continuous monitoring of forest extent, change, health, phenology, energy exchange, photosynthesis, and biodiversity (Ryu et al., 2019;Schneider et al., 2017;Wulder et al., 2019). Accurate and cost-efficient monitoring is ever more important as, e.g., climate change and population growth induce various pressures to forest ecosystems. At the same time, the amount and diversity of available spectral remote sensing data is rapidly increasing (Rast and Painter, 2019). A persisting problem, however, is the lack of empirical data for constructing models which are needed to interpret the remote sensing signals.
Physically-based forest reflectance models allow to estimate forest variables from spectral observations while relying on a limited amount of empirical data (e.g., Rautiainen, 2005;Schraik et al., 2019). Applicability of physically-based models can be limited by the large number of model parameters, and consequently, ambiguity of the model inversion (Baret and Buis, 2008). Ideally, the models should be simple, yet realistic enough to be practically useful in interpreting remote sensing signals and extracting the forest parameters of interest (Woodcock et al., 1994(Woodcock et al., , 1997. Modeling the complex multiple scattering behavior is needed for realistic characterization of forest reflectance, especially in the near-infrared part of the spectrum where multiple scattering has a substantial role. The spectral invariant and photon recollision probability theories offer a useful framework for designing simple and realistic forest reflectance models which also account for multiple scattering. The spectral invariants theory states that canopy reflectance, transmittance, and absorption can be approximated based on optical properties of the foliage elements and spectrally invariant parameters (Knyazikhin et al., 1998). Smolander and Stenberg (2005) interpreted one of these spectrally invariant parameters as photon recollision probability (p), i.e., 'the probability that a photon, being scattered by the canopy, will interact with the canopy again'. This interpretation resulted in a family of canopy reflectance models, referred to as PARAS (see review by Stenberg et al., 2016). These models simulate the forest reflectance (or transmittance, albedo, or absorption) based on the spectral scattering or absorption properties of the forest canopy and the background (forest floor), and the canopy gap fractions that determine the relative contributions of the canopy and the forest floor (Rautiainen and Stenberg, 2005). The modeling of scattering or absorption by the canopy is based on plant element (usually leaf) albedo and p that summarizes the effect of canopy structure on its radiation regime. The benefits of PARAS models are simplicity, i.e., only a small number of parameters are needed, and the fact that the input parameters can be derived from field and laboratory measurements (e.g., Rautiainen and Stenberg, 2005;Hadi and Rautiainen, 2018;Hovi et al., 2017;Majasalmi et al., 2014).
Empirical evaluation of forest reflectance models is crucial to quantify model errors and thus provide a measure of reliability of the model simulations. The evaluation is however difficult, because of the laborious work required for measuring the input and validation data. Furthermore, the measurement uncertainties need to be accounted for in the interpretation of the results. Inter-comparison of radiative transfer models (e.g. Widlowski et al., 2015) is one solution proposed to identify model uncertainties. However, showing the model discrepancies does not necessarily reveal systematic uncertainties due to, e.g., incorrect parameterization. Only empirical evaluation, i.e., evaluation of modeling results against measurements, can reveal the model performance in real forests.
Previous studies have compared empirical reflectance, transmittance, absorption, or albedo data from forest canopies to those simulated with models based on photon recollision probability or the related spectral invariants theory (Disney et al., 2005;Hadi et al., 2017;Hadi and Rautiainen, 2018;Hovi et al., 2017;Huang et al., 2007;Kuusinen et al., 2014;Majasalmi et al., 2014;Manninen and Stenberg, 2009;Panferov et al., 2001;Rautiainen and Stenberg, 2005;Shabanov et al., 2003;Stenberg et al., 2008;Wang et al., 2003). Comparisons at the level of canopy components, such as shoots and small trees, have been reported by  and Rautiainen et al. (2012). Model performance has also been assessed based on model inversion and quantifying prediction accuracy of the inverted model parameters (e.g., Lukeš et al., 2011;Schraik et al., 2019). Based on the above literature, empirical evaluations have had only limited geographical extent. Because of different quantities simulated (e.g., albedo, reflectance) and versatile data sources used in model parameterization, it is difficult to directly compare results between studies. In addition, compromises and generalizations in the model parameterization have often been made because of limited measurements available, and the studies have often been restricted to certain spectral regions or few broad spectral bands. The above studies have contributed to an understanding of how p-based approaches perform in forest reflectance simulations. However, rigorous validations of the models themselves are still in their infancy. More extensive efforts are needed to quantify predictive performance and uncertainty of these models, and potentially also to point out how the models could be improved, thus making them more readily useful for practical applications in remote sensing.
In this paper, we report the performance of a photon recollision probability based forest reflectance model in multiple biomes for the first time. The study is based on an exceptionally detailed set of field measurements for 50 forest plots spanning from boreal to temperate regions in Europe (48-62 • N) coupled with airborne hyperspectral data that cover the spectral region between 450 and 2200 nm. The aims are i) to evaluate the model performance when using its default parameterization, and ii) to suggest ways of improving the parameterization to yield more realistic forest reflectance simulations. To assess the model performance in detail, we remove the forest floor contribution from the airborne spectra, which allows to evaluate the model component that simulates canopy scattering. We then report the model performance in simulating reflectance of both forest canopy and the entire forest (including forest floor) when using the default parameterization. Finally, we perform an inversion of the model, to demonstrate how to improve its parameterization.

Theory
We evaluated a model belonging to the PARAS model family (see review by Stenberg et al., 2016), the first model of which was published in Rautiainen and Stenberg (2005). In this study, we present a new, extended version of the PARAS model that can model the dependence of forest reflectance on view direction while taking into account multiple scattering between the canopy and the forest floor. Here, we define forest floor as all material (living, dead, and inorganic) up to height of 1.5 m above the ground. The multiple scattering between canopy and forest floor has previously been accounted for only by PARAS versions that simulated canopy absorption (Majasalmi et al., 2014) or forest albedo (Manninen and Stenberg, 2009;Stenberg et al., 2013). In addition, our approach differs from the previous studies in that we use empirical models to determine the fraction of incoming radiation that is scattered downwards by the canopy. This is done in the current study, in order to reduce uncertainties in model evaluation, but it should be noted that the use of empirical models for downward scattering is not an inherent requirement of our model. This will be explained in more detail later in Section 2. Similarly to Stenberg et al. (2013), our model assumes Lambertian forest floor and vertical symmetry of the canopy. The latter means that the directional scattering properties of the canopy are independent of whether the canopy is illuminated from above or from below.
Reflectance of forest (or other vegetation) at any optical wavelength can be expressed as a combination of solutions to the 'black soil' and 'soil' problems (Knyazikhin et al., 1998;Wang et al., 2003; see also review by Wang et al., 2018). The black soil problem describes a situation in which the canopy is illuminated from above, and the forest floor is optically black. The soil problem describes a situation in which the forest floor is assumed to act as an ideal Lambertian source that illuminates the canopy from below. The hemispherical-directional reflectance factor of the forest in view direction Ω, R(↓ sky ,Ω), can be written as where R BS (↓ sky ,Ω) is the canopy hemispherical-directional reflectance factor in direction Ω in the black soil case, T BS (↓ sky ,↓) is the canopy bihemispherical transmittance, R G is forest floor reflectance, R S (↑,↓) is the canopy downward bihemispherical reflectance in the soil case, and T S (↑,Ω) is the hemispherical-directional transmittance factor of the canopy in direction Ω in the soil case (i.e., signal observed in direction Ω, relative to a signal that would be observed from an ideal Lambertian radiation source without the forest canopy obstructing the view). For explanation of symbols and abbreviations, see also Table 1. The symbols in brackets after each R or T term denote direction of illumination and direction of view, respectively: arrow means hemisphere, Ω direction of the remote sensing sensor. The arrow with subscript 'sky' refers to hemispherical incoming radiation that in clear-sky conditions is composed of direct solar beam and a small diffuse component, as opposed to hemispherical fluxes scattered by the canopy and the forest floor, which are assumed Lambertian in the model. Note that we omit the wavelength sign in our equations here and elsewhere in the paper to simplify notation. The term is the ratio of below canopy to top-of-canopy downward hemispherical radiation flux. Eq. 1 indicates that forest reflectance is composed of canopy reflectance in the black soil case (first term on the right-hand side), and forest floor contribution (second term on the right-hand side). The calculation of forest floor contribution requires the ratio of below canopy to top-of-canopy downward hemispherical radiation flux (Eq. 2). It is modeled as a geometric series, with coefficient T BS (↓ sky ,↓), and common ratio R G R S (↑,↓), that represents multiple scattering between the forest floor and the canopy. The derivation of the geometric series is illustrated in Fig. 1. The terms in Eqs. 1-2 can be expressed as functions of canopy interception and scattering as where i 0 and i Ω are canopy interception of incoming radiation and canopy interception of radiation in the direction of view, respectively, i D is canopy interception of diffuse radiation, and ω C (#,#) are illumination and viewing geometry dependent canopy scattering coefficients. Canopy interception of incoming radiation is defined as where D is the ratio of diffuse to total incoming radiation and i S is canopy interception in the direction of the sun. The canopy scattering coefficients quantify the fraction of intercepted radiation that is scattered into a hemisphere (in case of hemispherical scattering, ↓ or ↑) or ratio of scattering observed in given direction to that observed from an ideal Lambertian surface that has the same interception and is observed in the same view direction (in the case of directional scattering, Ω). Canopy spherical scattering coefficient, often referred to as canopy albedo, quantifies the ratio of total scattered radiation (in all directions) to that intercepted by the canopy. Assuming that the photon recollision probability (p) is independent of i) direction of illumination and ii) scattering order, i.e., it does not depend on how many times the photon has already collided with the canopy, canopy albedo (denoted as ω C without direction signs in brackets) can be written as (e.g., Smolander and Stenberg, 2005) where ω E is the plant element albedo. The value of p depends on canopy (Stenberg, 2007), where L is true and L eff is effective plant area index, and β C is canopy clumping coefficient. If there are several hierarchical levels in the canopy, the albedo of an element at one hierarchical level can be calculated, using the p within that element and the albedo of the lower-level element (Stenberg et al., 2016). For example, the albedo of a coniferous shoot can be calculated with Eq. 7, by setting ω E equal to needle albedo and p equal to photon recollision probability within a shoot Rautiainen et al., 2012). Modeling scattering towards the remote sensing sensor requires taking into account the directionality of scattering. This can be achieved by replacing the total escape probability (1-p) in Eq. 7 with directional escape probability, ρ(Ω). A complication arises from that the directional  Yang et al., 2017). This is because photons hitting the canopy from the above are likely to hit the upper parts of the canopy and therefore more likely to escape towards the upper than lower hemisphere. As photons scatter within the canopy several times, they become evenly distributed in the canopy and directional escape probability approaches its limiting value, ρ lim (Ω). Taking the dependence of directional escape probability on scattering order into account, Mõttus and Stenberg (2008) introduced the semi-empirical wavelength-dependent parameter Q that quantifies the ratio of radiation scattered to the upper hemisphere to the total scattered radiation. Using the Q parameter, scattering into the upper hemisphere can be modeled as where ω C is the canopy albedo (Eq. 7). In reality, the canopy scattering is rarely Lambertian, and therefore modeling hemispherical scattering without considering the exact direction of view is not necessarily sufficient. It can be shown (Supplementary material Section 1) that if the ratio of limiting escape probabilities (directional vs. hemispherical) is From Eq. 9 it is seen that Q Ω can also be interpreted as the ratio of directional to hemispherical scattering, i.e., Q Ω = ω C (↓ sky ,Ω)/(Qω C ).
Because in our experiment the forest reflectance data were acquired in a near-nadir observation geometry, we refer to Q Ω as 'nadir to hemispherical scattering ratio' when discussing our experiment and results.
In the case of homogeneous canopy with Poisson-distributed leaves Q Ω equals i Ω /i D (Yang et al., 2017). This indicates that the ratio of directional to hemispherical scattering could potentially be approximated as the ratio of canopy interception in direction of view to canopy diffuse interception.
In order to avoid uncertainties in modeling of Q, in this study, we obtained the downward hemispherical scattering coefficient directly from empirical measurements. Noting that Qω C = ω C -(1-Q)ω C = ω Cω C (↓ sky ,↓), our final model for the directional scattering coefficient where ω C (↓ sky ,↓) is canopy downward hemispherical scattering coefficient for radiation coming from above (taken from empirical measurements in our study). We note that an alternative approach would be to use the semiempirical Q parameter from Mõttus and Stenberg (2008), i. e., Eq. 9 instead of Eq. 10. Using Eq. 9 and setting Q Ω = 1, and setting i Ω = i D (i.e., assuming a hemispherical view), would produce the same results as the model presented in Stenberg et al. (2013). Further, setting T(↓ sky ,↓) = 1-i 0 and T S (↑,Ω) = 1-i Ω , i.e., ignoring multiple scattering between canopy and forest floor, would result in the original PARAS model presented in Rautiainen and Stenberg (2005). In a nutshell, our primary model of interest is Eq. 1, and particularly its component R BS (↓,Ω) that models canopy scattering based on p-theory (Eqs. 3 and 10). The model requires as inputs spectral parameters forest floor reflectance (R G ), plant element albedo (ω E ), and canopy downward hemispherical scattering coefficient, ω C (↓ sky ,↓), and structural Overview of radiation fluxes that compose the forest hemispherical-directional reflectance factor, R(↓ sky ,Ω), and ratio of below canopy to top-of-canopy downward hemispherical radiation flux, T(↓ sky ,↓). The radiation fluxes form two geometric series, and the sums of these series (Eqs. 1 and 2) give R(↓ sky ,Ω) and T(↓ sky ,↓), respectively. For explanation of symbols, see Section 2 and Table 1. Solid color of an arrow indicates a downward flux, and a striped pattern an upward flux. The colors of the arrows represent coupled fluxes: the downward flux at the bottom of the forest canopy generates an upward flux when the radiation is reflected by the forest floor. The relative decrease of widths of the arrows from left to right represents the diminishing amount of radiation from first to higher scattering orders, but the absolute widths are compromised for sake of clarity. In reality, the upward reflected flux is always smaller than the downward flux, due to absorption by the forest floor. The scattering between forest floor and canopy continues infinitely and attenuates at each interaction. parameters canopy interception (i 0 , i D , i Ω ), photon recollision probability (p), and directional to hemispherical reflectance ratio (Q Ω ). The structural parameters are spectrally invariant, except that i 0 is wavelength-dependent because the diffuse fraction of incoming solar radiation decreases as a function of wavelength. In implementing the calculation of forest floor contribution (second term on the right hand side of Eq. 1), we assume that the forest canopy is vertically symmetric, and therefore the canopy downward hemispherical scattering coefficient for radiation coming from below is ω C (↑,↓) = 1/Q Ω × ω C (↓ sky ,Ω).
We also assume that the upward scattering of radiation that comes from below follows a Lambertian distribution, and therefore ω C (↑,Ω) = ω C (↓ sky ,↓).
Airborne hyperspectral data were acquired over all sites in summer 2019, and the field measurements were conducted close to the acquisition, or in similar phenological conditions a year later (Table 2). During the field measurements, we established a total of 50 forest plots, located within 2 km from the site center coordinates. The plots were selected to include both coniferous and broadleaved forests with different tree species and a wide range of plant area index and canopy cover values, because these parameters were assumed to influence the forest spectra. Coordinates of all plots were measured with a hand-held GPS (Garmin GPSmap 62stc) and later refined to submeter accuracy through co-registering terrestrial laser scanning with airborne laser Fig. 2. Location of the study sites with example photographs showing the studied forests. The study plots in Hyytiälä were in forests composed mainly of coniferous pine (a) and spruce (b), and broadleaved birch (c). The plots in Järvselja were in forests composed mainly of coniferous pine (d) and spruce (not shown), and various broadleaved species (e-f). The plots in Bílý Kříž were in spruce-dominated forests of varying age and density (g-h), and those in Lanžhot were in forests composed of various broadleaved species (i-j). scanning measurements. Forest inventory measurements (Supplementary material Section 2.1) were conducted in all plots, to obtain a general description of the forest in the plots, and to compute tree species fractions needed in parameterization of the forest reflectance model. Tables 3-4 summarize the forest variables and tree species composition in the plots.

Airborne hyperspectral data
Airborne hyperspectral measurements in all study sites were carried out using the same hyperspectral pushbroom sensors CASI-1500 and SASI-600 by Itres Ltd., Canada, mounted on a Cessna C208B aircraft. Both sensors had been spectrally and radiometrically calibrated before the flight season in March 2019. CASI-1500 sampled visible (VIS) to near-infrared (NIR) wavelengths from 382 to 1052 nm, and SASI-600 sampled NIR and shortwave-infrared (SWIR) from 958 to 2443 nm.
Sampling interval and spectral resolution were 15 nm. The flying altitude was approximately 1 km, resulting in pixel sizes of 0.5 m (CASI) and 1.25 m (SASI) on the ground. Both sensors had a field-of-view of 20 • from nadir. The data were thus acquired in near-nadir observation geometry. The flying azimuth direction was approximately the same as sun azimuth, which minimized spectral differences between plots that could occur due to large anisotropy of forest in the solar principal plane (Deering et al., 1999;Kuusk et al., 2014). Sun zenith angle during the acquisitions varied from 37 • to 60 • ( Table 2). The overlap of flight lines was 60-80%. Simultaneous airborne laser scanning data were acquired with a Riegl LMS Q-780 laser scanner and used in the plot positioning as well as in orthorectification of the hyperspectral data.
The raw digital number data collected by the hyperspectral sensors were first radiometrically corrected using the RadCor software (v11) provided by Itres Ltd. The data were then geo-orthorectified using IMU/ GNSS data and a canopy surface model (pixel size 0.5 m) computed from the laser scanning data. The atmospheric correction was performed using ATCOR-4 software bundle v7.2.0 (Richter and Schläpfer, 2018), which utilizes a database of atmospheric look-up tables calculated by means of the MODTRAN5 radiative transfer code. In the atmospheric correction, signals measured by the sensors were corrected for path radiance and adjacency radiance. For each site, inflight radiometric (vicarious) calibration was performed, using one bright ground target with known reflectance. The resulting data are at-surface (top-of-canopy) hemispherical-directional reflectance factors, i.e., correspond to R (↓ sky ,Ω) in our forest reflectance model. No topographic correction was applied.
The hyperspectral data were manually checked in order to exclude visible clouds or cloud shadows from the analysis. Clouds were present particularly over the Hyytiälä site, where the conditions varied from Table 2 Acquisition dates, times of day (range), and sun zenith angles (range) during the acquisitions of airborne and field data in the four study sites.   Mean (and range) of tree species percentage (from basal area) in the study plots. Species that belong to the same genus, and were difficult to recognize reliably in the field, were grouped together. Bold font marks species that were sampled for leaf or needle spectra. sunny to partly cloudy during the acquisition. Occasional clouds were also present in the Bílý Kříž site. The conditions during the Järvselja and Lanžhot acquisitions were cloud-free. If there were several cloud and shadow-free flight lines available for a plot, only the line acquired nearest to nadir was used. Finally, pixel values were extracted from a rectangle of 30 × 30 m around each plot center and averaged to result in a single forest reflectance spectrum per plot.

Hemispherical photography
We obtained effective plant area index and canopy interception values for all plots using hemispherical photographs taken in diffuse illumination conditions. A total of 21 photographs per plot were taken (Fig. 3). The height of the camera from the ground was 1.5 m in plots with mean tree diameter at breast height (DBH) equal or larger than 10 cm, and 1 m in plots with DBH less than 10 cm. We used a Nikon D5000 digital camera with a Sigma 4.5 mm 1:2.8 DC HSM Circular Fisheye lens that was geometrically calibrated to enable correction for radial lens distortion in the data processing. The photographs were thresholded into binary images by applying the algorithm of Nobis and Hunziker (2005) to the blue channel. Canopy gap fraction was computed in five concentric zenith rings with median zenith angles of 10.7 • , 23.7 • , 38.1 • , 52.8 • , and 66.6 • , and averaged over all photographs in a plot to yield plot-and zenith angle-specific mean gap fractions, P gap (θ i ). All azimuths were averaged, i.e., gap fraction was assumed independent of azimuth in our study. Effective plant area index (L eff ) and diffuse interception (i D ) of the forest canopy were calculated from the P gap (θ i ) values, using the same equations as used by the LAI-2200 instrument (LI-COR, 2012). See Supplementary material Section 2.2 for equations used in the calculations, as well as detailed description of image acquisition settings.
Linear interpolation between the five zenith rings was performed to obtain the canopy gap fractions corresponding to the view and sun zenith angles during the airborne data acquisition, i.e., P gap (θ Ω ) and P gap (θ S ). Interception in the direction of view and the sun were then calculated as i Ω = 1-P gap (θ Ω ) and i S = 1-P gap (θ S ), and interception of incoming radiation (i 0 ) was obtained by weighting diffuse interception (i D ) and interception in the sun direction (i S ) with the ratio of diffuse to total incoming radiation (D) as described in Section 2. The wavelengthdependent values of D were obtained from the diffuse and total irradiance spectra at ground-level produced by the ATCOR-4 software (Richter and Schläpfer, 2018).

Forest floor spectra
Reflectance spectra of the forest floor were measured at wavelengths of 350-2500 nm in each plot. The measurements (15 per plot) were taken at approximately 0.8 m intervals along an 11 m long east-west oriented transect, located 1 m south from the plot center (Fig. 3). We used an Analytical Spectral Devices (ASD) FieldSpec4 spectrometer (ser. nr. 18456) with a field-of-view of 25 • . The measurements were taken at nadir, from approximately 1.3 m height. White reference was measured at the beginning and end of the transect, and at every third measurement spot in between. The white reference was a 25 × 25 cm Spectralon panel with 99% nominal reflectance. The dark current was measured at the beginning and end of the transect. To minimize variation due to sunflecks and shadows, we performed the measurements always in diffuse illumination conditions, i.e., close to dusk or dawn, or on cloudy days. The measured quantity is thus hemispherical-conical reflectance factor. Integration time was adjusted to the prevailing light conditions, using the automatic optimization by the instrument. The measured raw radiation signals (digital numbers) were processed into forest floor reflectance (R G ) as where s G is the signal from forest floor and s ref is the signal from the white reference (dark current subtracted from both before calculations), and R ref is the reflectance of the white reference panel. The value of s ref for each measurement was obtained by linearly interpolating in time between the white reference measurements. The R G spectra were averaged to result in a single spectrum per plot.

Measurements of below canopy downward radiation flux
Ratio of below canopy to top-of-canopy downward hemispherical radiation flux, T(↓ sky ,↓), was measured at wavelengths of 350-2500 nm for a subset of 22 plots (4-8 plots per site). The measurement protocol and data processing have been described in detail in . Total of 49 measurements per plot were performed in a 5 × 5 m grid (Fig. 3). The measurements were performed in clear-sky conditions, using cosine receptors attached to two field spectrometers. One instrument (ASD FieldSpec4 ser. nr. 18641 or FieldSpec3 ser. nr. 16089) was continuously measuring at 15 s intervals in an open area nearby the study plots, and the other (ASD FieldSpec4 ser. nr. 18456) was used for performing measurements in the plots. The integration time was optimized before the measurements, and intercalibration of the instruments was performed before and after the measurements on each measurement day. The measurement height was 1.5 m from ground and the cosine receptor was manually leveled during each measurement. Sun zenith angle during the measurements varied from 40 • to 60 • ( Table 2). The measured spectra were averaged to result in a single spectrum per plot.

Foliage spectra
Foliage, i.e., leaf and needle, reflectance and transmittance spectra at wavelengths of 350-2500 nm were measured for all major tree species in the study sites (Table 4). Samples of both top-of-canopy (sun-exposed) and bottom-of-canopy (shaded) foliage were taken. For coniferous Fig. 3. Sampling layout of field measurements conducted in the study plots.
A. Hovi et al. Remote Sensing of Environment 269 (2022) 112804 8 needles, we sampled two age cohorts: current year (c 0 ) and one-year-old (c 1 ) needles. Only healthy foliage was measured. The sampling protocols differed slightly between Lanžhot and the other sites because the leaf spectral measurements in Lanžhot were conducted as part of another field campaign. For the other sites, we sampled three trees of each major species, and took three samples of each foliage class per tree ('sunexposed c 0 ' and 'shaded c 0 ' for all species, and additionally 'sun-exposed c 1 ' and 'shaded c 1 ' for conifers). In Lanžhot, we sampled one to four trees with one sample of each foliage class per tree ('sun-exposed c 0 ' and 'shaded c 0 '). For minority broadleaved species in Järvselja (littleleaf linden, goat willow, other willows, European ash, common hazel), one tree with three sun-exposed c 0 leaves per species was sampled. Reflectance and transmittance spectra of the samples were measured with ASD RTS-3ZC integrating spheres, attached to an ASD spectrometer (Field-Spec3 ser. nr. 16089, or FieldSpec4 ser. nr. 18456 or 18641). The measurement protocol and data processing are described in detail in the Supplementary material Section 2.3. Foliage albedo was calculated as the sum of reflectance and transmittance. Finally, a representative foliage albedo spectrum for each tree species in each study plot was calculated, as described in the Supplementary material Section 2.4.

Woody element spectra
Our primary source for woody element spectra was Juola et al. (2021), who collected stem bark reflectance spectra for boreal and temperate tree species in Finland and Estonia in summer 2020. The spectra were measured at 1.3 m height in diffuse illumination conditions, using a Specim IQ hyperspectral camera covering wavelengths 400-1000 nm. For wavelengths above 1000 nm, we used data from other published sources: pine, spruce, and birch stem bark spectra were taken from Lang et al. (2002), and aspen stem bark spectrum from Spencer and Rock (1999) was used for all other broadleaved species. We used only spectra measured from the lower part of the stem, in order to obtain best correspondence with the measurements of Juola et al. (2021). Because continuity of the spectra was required in the forest reflectance modeling, the spectra above 1000 nm were scaled with a linear scaling factor that was determined so that the reflectance at 1000 nm matched with the measurements of Juola et al. (2021) at the same wavelength. The stem bark reflectance spectra were used as albedo spectra of woody elements in the forest reflectance model, i.e., transmittance of woody elements was assumed to equal zero. Finally, representative woody element albedo spectrum for each tree species in each study plot was calculated, as described in the Supplementary material Section 2.4.

Filtering of spectra
High spectral resolution data (forest floor reflectance, ratio of below canopy to top-of-canopy downward hemispherical radiation flux, foliage and woody element albedo spectra) were filtered with the Savitzky-Golay filter (Savitzky and Golay, 1964), and resampled by weighting with Gaussian functions that had mean and full-width-at-halfmaximum corresponding to the band centers and spectral resolution of the airborne forest reflectance data, respectively. From all spectral data, we excluded noisy regions caused by atmospheric absorption or low signal-to-noise ratio of the instruments. A total of 69 spectral bands remained for the analyses. These bands covered wavelength ranges of 453-881 nm, 1018-1093 nm, 1183-1288 nm, 1543-1723 nm, and 2053-2203 nm.

Removal of forest floor contribution from forest reflectance spectra
We designed an algorithm for removal of forest floor contribution from the airborne forest reflectance spectra. The algorithm was derived from the PARAS forest reflectance model (Eqs. 1-6). The calculation started by estimating the canopy downward hemispherical scattering coefficients, ω C (↓ sky ,↓), from the measured spectral ratio of below canopy to top-of-canopy downward hemispherical radiation flux (Step 1 of the algorithm). The values of ω C (↓ sky ,↓) were then used together with field-measured forest floor reflectance and canopy interception calculated from hemispherical photographs to remove the contribution of forest floor from the airborne reflectance measurements (Step 2 of the algorithm). This resulted in estimates of the forest canopy directional scattering coefficients, ω C (↓ sky ,Ω). Mathematical formulation and a graphical illustration of the algorithm are given in the Supplementary material Section 2.5. The algorithm does not require a model for canopy scattering, except for initial estimates of ω C (↓ sky ,↓) at two wavelengths (blue and red). Independence from the canopy scattering model was important because our purpose was to evaluate the p-based model for canopy scattering. The two steps of the algorithm should be iterated until the solution stabilizes. We used five iterations. In the fifth iteration, the value of ω C (↓ sky ,Ω) changed less than 0.1%. As an intermediate result, the algorithm also produced estimates of forest hemisphericaldirectional reflectance factors without forest floor contribution, R BS (↓ sky ,Ω), which we used for illustrating the magnitude of forest floor contribution to airborne forest reflectance in different types of forests.

Modeling canopy scattering
Canopy directional scattering coefficient values obtained through the forest floor contribution removal were compared with those simulated using our model (Eq. 10). We tested two model parameterizations, which we call 'default' and 'fitted' models. Differences between them will be explained later in this section.
In both models, we considered the shoot as the basic green plant element. Shoot albedo (ω S ) was calculated with Eq. 7, setting ω E equal to foliage albedo and p equal to within-shoot photon recollision probability (p S ). The value of p S for each tree species was obtained as p S = 1-β S , where β S is the shoot clumping coefficient . For conifers, we used a β S value of 0.6 that represents a typical value for Scots pine and Norway spruce (Thérézien et al., 2007). For broadleaved species, value of 1 was used, i.e., we assumed no clumping at shoot level, and shoot albedo of broadleaved trees equaled leaf albedo. The canopy-level photon recollision probability (p in Eq. 10) for each plot was calculated from diffuse interception (i D ), canopy clumping coefficient (β C ) and effective plant area index (L eff ) as p = 1-i D β C /L eff (Stenberg, 2007). The value of β C (mean of 0.95 and standard deviation of 0.04 for our study plots) was obtained from hemispherical photographs following Ryu et al. (2010) and its calculation is explained in the Supplementary material Section 2.2. The canopy downward hemispherical scattering coefficient, ω C (↓ sky ,↓), was obtained from the measured ratio of below canopy to top-of-canopy downward hemispherical radiation flux, using the algorithm for forest floor contribution removal. In the default model, we assumed a Lambertian canopy and did not account for woody elements, i.e., nadir to hemispherical scattering ratio (Q Ω ) was set to 1, and shoot albedo was used as plant element albedo (ω E ). Our formulation, i.e., using shoot albedo as ω E and calculating p at higher than shoot level, is equal to using leaf or needle albedo as ω E and taking into account shoot-level clumping in the calculation of canopylevel p, as done in previous studies (e.g., Rautiainen and Stenberg, 2005;Hovi et al., 2017;Hadi and Rautiainen, 2018;Schraik et al., 2019). Plot-specific shoot albedo spectra were obtained by weighting species-specific shoot albedo values with the field-measured tree species fractions in a plot.
In the fitted model, we allowed Q Ω to deviate from 1, and took into account woody elements in the estimation of ω E . The latter was achieved by calculating the plot-specific plant element albedo (ω E ) as weighted average of species-specific woody element (ω W,sp ) and shoot (ω S,sp ) albedos as where f sp are tree species fractions in a plot, and f W,sp (constrained to 0 ≤ f W,sp ≤ 1) are species-specific woody element fractions from effective light-capturing plant area. The model had unknown parameters Q Ω and f W,sp , which were estimated by inverting the model. The inversion was performed by nonlinear least squares estimation, using function optimize.least_squares() and its default method (Branch et al., 1999) in SciPy Python package v1.1.0 (Virtanen et al., 2020). We minimized the sum of plot-and band-specific residuals (e ij ), calculated as where i refers to plot and j to spectral band, and ω C,ij (↓ sky ,Ω) is the reference value of the canopy directional scattering coefficient, estimated using the algorithm for forest floor contribution removal. For the purpose of the inversion, we divided the tree species in three groups: pine, spruce, and broadleaved. The value of f W,sp was constrained to be equal for all species belonging to the same group, in order to avoid overfitting. To take into account the uncertainty in ω C (↓ sky ,Ω), inverse values of the 95% confidence intervals of ω C (↓ sky ,Ω) (see Section 3.4.5) were used as weights for the model residuals.

Modeling forest reflectance spectra
The forest hemispherical-directional reflectance factors for each plot were modeled using Eq. 1, together with the default and fitted parameterizations of Eq. 10 as described above. Canopy interception values (i 0 , i D , i Ω ) were taken from hemispherical photographs, and forest floor reflectance spectra (R G ) from the field measurements. As explained in Section 2, the upward scattering coefficient of the canopy for radiation coming from below, i.e., ω C (↑,Ω), was assumed equal to ω C (↓ sky ,↓). The downward hemispherical scattering coefficient for radiation coming from below, i.e., ω C (↑,↓), was calculated as ω C (↑,↓) = 1/Q Ω × ω C (↓ sky ,Ω).

Sensitivity analysis
Sensitivity analysis was performed to quantify uncertainty in the results of forest floor contribution removal (Section 3.4.2) and forest reflectance modeling (Sections 3.4.3 and 3.4.4). We generated 1000 sets of results and computed 95% confidence intervals for the reference value of canopy directional scattering coefficient, ω C (↓ sky ,Ω), and for the modeled ω C (↓ sky ,Ω) and forest reflectance, R(↓ sky ,Ω). In each of the 1000 sets, we randomly selected the values for the input parameters from within their estimated uncertainty ranges. See Supplementary material Section 2.6 for the list of input parameters and their estimated uncertainties.

Model evaluation
The model simulations were evaluated, using wavelength-specific root mean square error (RMSE) and mean estimation error (MEE) as where ŷ i is the modeled and y i is the reference value for plot i, and n is the total number of plots. RMSE and MEE relative to the reference [%] were obtained by multiplying RMSE and MEE by term 100%/y, where y is the mean of the reference values. Wavelength-specific Pearson correlation coefficient (r) was also used to evaluate the agreement between the modeled and reference.

Variation in forest reflectance spectra and structure
We present the results separately for coniferous (more than 85% conifers, n = 23), broadleaved (more than 85% broadleaved trees, n = 21), and mixed forest plots (n = 6), because coniferous and broadleaved forests showed distinct spectral characteristics. Overall, airborne forest reflectance spectra varied considerably between plots (Fig. 4). Broadleaved forests showed systematically higher values of NIR reflectance compared to the coniferous forests (Fig. 4a,b). Effective plant area index (L eff ), and consequently also canopy interception varied within each species group. Overall, the L eff values ranged between 0.4 and 6.3 among our study plots (Table 3). The dependence of canopy interception on zenith angle was different for forests with different L eff . Sparse forests (low L eff ) had low values of interception at nadir compared to the more oblique angles, whereas in dense forests the interception in all zenith angles tended to be similar (Fig. 5).
In coniferous forests, the average airborne forest reflectance, R (↓ sky ,Ω), decreased as a function of L eff in all wavelengths (Fig. 6 left column), with the mean (and standard deviation) of r between L eff and R (↓ sky ,Ω) being − 0.71 (0.09). The densest coniferous plot in Bílý Kříž (L eff = 4.7) was an outlier, as its reflectance resembled that of broadleaved forests. This could be because the few broadleaved beech trees in that plot, with their wide crowns, probably had a larger contribution to the forest reflectance than indicated by their fraction from the basal area. Topographic effects are also possible because the plot was on a southern slope (~10 • ). In broadleaved forests, the correlation between L eff and R (↓ sky ,Ω) was weak, and in the mixed forests it resembled that observed in the coniferous plots (Fig. 6 middle and right columns). The mean (and standard deviation) of r between L eff and R(↓ sky ,Ω) was 0.00 (0.22) in the broadleaved, and − 0.73 (0.17) in the mixed forests.

Forest floor contribution to forest reflectance spectra
The forest floor contribution in each study plot and wavelength was calculated as (R(↓ sky ,Ω)-R BS (↓ sky ,Ω))/R(↓ sky ,Ω) × 100%. The average forest floor contribution (i.e., averaged over all study plots) depended on wavelength, and thus varied within the range of 18-28% in coniferous, 13-19% in broadleaved, and 25-37% in mixed forests in the VIS region. In NIR and SWIR, the average forest floor contribution was larger: 26-35% in coniferous, 15-26% in broadleaved, and 32-43% in mixed forests. The forest reflectance without forest floor contribution, R BS (↓ sky ,Ω), approached zero as L eff decreased (not shown). This is expected because a small amount of intercepting canopy material means that most of the incoming radiation would be absorbed by the (black) soil. Inaccuracies in the removal of forest floor contribution from the reflectance spectra were observed, and they resulted in negative R BS (↓ sky ,Ω) in some of the plots with sparse canopies (low L eff ). Consequently, also the canopy directional scattering coefficients, ω C (↓ sky ,Ω), exhibited large variation, which was seen also in their uncertainty estimates (Fig. 6). Overall, ω C (↓ sky ,Ω) values were very similar to R(↓ sky ,Ω) in dense forests (high L eff ) and tended to be somewhat lower than R (↓ sky ,Ω) in sparse forests (low L eff ). Due to high uncertainty, we excluded plots with L eff lower than 1, and used the remaining plots (n = 45) in the modeling of canopy scattering (Section 4.3). In this subset, the average forest floor contribution was smaller than in the entire dataset, varying within the range of 11-20% in coniferous, 13-19% in broadleaved, and 6-10% in mixed forests in the VIS region. In NIR and SWIR, the respective values were 18-27% (coniferous), 15-26% (broadleaved), and 8-14% (mixed forests).

Modeled canopy scattering
The default model tended to overestimate canopy scattering considerably, especially in the NIR region (Fig. 7). The maximum overestimation was larger in coniferous (MEE ranging from − 24% to 240%) than in broadleaved (MEE from 15% to 151%) and mixed forests (MEE from − 21% to 137%) (Fig. 8). In coniferous forests, the overestimation in the NIR region clearly exceeded the uncertainty estimates (Fig. 7). In broadleaved forests, the uncertainty estimates in NIR overlapped in some (but not all) of the plots. The correlation between reference and modeled directional scattering coefficients was on average weak, but varied between wavelengths and species groups, with mean r of 0.12, 0.09, and 0.94 for coniferous, broadleaved, and mixed forests, respectively (Fig. 8).
Contrary to the default model, the canopy directional scattering coefficients from the fitted model were close to the reference values (Fig. 7). The absolute values of the MEE for the fitted model were notably lower than for the default model: the MEE of the fitted model Fig. 4. Forest hemispherical-directional reflectance factors, R(↓ sky ,Ω), from the airborne measurements in the coniferous (more than 85% conifers), broadleaved (more than 85% broadleaved trees), and mixed forest plots. varied from − 51% to 27% for coniferous, from − 31% to 34% for broadleaved, and from − 33% to 22% for mixed forests (Fig. 8). Furthermore, the performance of the fitted model was not strongly dependent on wavelength (Fig. 8). Plots that had low (or high) values of directional scattering coefficients throughout the spectrum tended to have so also in the modeled data (Fig. 7). The site-and wavelengthspecific differences in the data were also correctly reproduced by the model. As an example, broadleaved canopies in Järvselja tended to have strong scattering in NIR and weak scattering in red, and the same behavior was observed also in the modeled data (middle column of Fig. 7). The largest relative differences between modeled and reference values of canopy scattering were observed in the VIS region in the coniferous plots in Bílý Kříž (Fig. 7). The somewhat large errors in Bílý Kříž also influenced the RMSE and MEE of the coniferous forests, which went up to 54% and down to − 51% in the VIS region, respectively (Fig. 8). The correlation between reference and modeled directional scattering coefficients was consistently high for the fitted model, with mean r of 0.88, 0.88, and 0.96 for coniferous, broadleaved, and mixed forests, respectively (Fig. 8).
The plant element albedo obtained through model fitting (as a linear combination of shoot and woody element albedos) was lower than pure shoot albedo in green and NIR wavelengths (Fig. 9). In the red and blue wavelengths, the plant element albedo obtained through model fitting was higher than shoot albedo, and in the SWIR the difference between Fig. 6. Measured forest hemispherical-directional reflectance factors, R(↓ sky ,Ω) (square symbols), and canopy directional scattering coefficients estimated from them, ω C (↓ sky ,Ω) (x symbols), as a function of effective plant area index (L eff ) at four wavelengths (rows) in coniferous, broadleaved, and mixed forest plots (columns). The vertical lines illustrate 95% confidence intervals for ω C (↓ sky ,Ω). the two depended on species but was relatively small. The fraction of woody elements obtained in model fitting was 0.32 for pine, 0.30 for spruce, and 0.12 for broadleaved tree species. The mean (and range) of nadir to hemispherical scattering ratio of the canopy (Q Ω ) was 0.55 (0.32-0.92) in coniferous, 0.58 (0.36-0.84) in broadleaved, and 0.71 (0.65-0.75) in mixed forests. The value of Q Ω depended on L eff (Fig. 10 top row), with sparse canopies (low L eff ) having on average low Q Ω values. In other words, compared to sparse canopies, dense canopies tended to have scattering properties closer to Lambertian. The dependence of Q Ω on L eff was particularly evident in broadleaved plots. In conifers, pine and spruce plots deviated from each other, with pine showing higher Q Ω values than spruce. Note that because the model inversion was not performed for plots with L eff lower than 1, we set the Q Ω value for those plots equal to average Q Ω of all remaining plots (0.58) when modeling the forest reflectance (Section 4.4). For comparison, theoretical estimates of Q Ω (i Ω /i D ), and canopy-level photon recollision probability (p), are shown in the middle and bottom rows of Fig. 10, and will be interpreted in Section 5.2.

Modeled forest reflectance spectra
The results regarding forest reflectance, R(↓ sky ,Ω), were similar to the Fig. 7. Modeled and reference (i.e., estimated from measurements) canopy directional scattering coefficients, ω C (↓ sky ,Ω), as function of effective plant area index (L eff ) at four wavelengths (rows) in coniferous, broadleaved, and mixed forest plots (columns). The diamond symbols represent the default model (i.e., assuming a Lambertian canopy that is composed of foliage only), square symbols the fitted model (accounting for nadir to hemispherical scattering ratio and contribution of woody elements), and the x symbols the reference. The vertical lines illustrate 95% confidence intervals for the default model, and for the reference ω C (↓ sky ,Ω). results of canopy directional scattering coefficients shown above. Once again, the default model resulted in notable overestimation in the NIR region (Fig. 11). The MEE values ranged from − 22% to 188% (coniferous), from 11% to 117% (broadleaved), and from − 17% to 124% (mixed forests) (Fig. 12). The fitted model showed good agreement with the measured forest reflectance (Fig. 11). The MEE values ranged from − 45% to 20% (coniferous), from − 28% to 27% (broadleaved), and from − 30% to 19% (mixed forests) (Fig. 12). The mean r between measured and modeled forest reflectance were 0.37 (coniferous), 0.10 (broadleaved), and 0.88 (mixed forests) for the default model, and 0.91 (coniferous), 0.77 (broadleaved), and 0.97 (mixed forests) for the fitted model (Fig. 12). To support quantitative comparisons with other studies, the MEE, RMSE, and r values for selected spectral bands have been tabulated in Supplementary material Tables S2-S5.

Evaluation of the model
For the first time, a photon recollision probability (p) based approach to modeling forest reflectance was tested in a wide range of differently structured forests from different biomes. The measurements were detailed, comprising, e.g., measurement of forest floor reflectance spectrum in each study plot individually. Our spectral data (450-2200 nm) covered almost all solar wavelengths commonly used in optical remote sensing. The main findings are related to the parameterization of the p-based part of the model that estimates canopy scattering. Large overestimation occurred, especially in the NIR region, when the model was parameterized considering only leaves or needles as plant elements and assuming a Lambertian canopy. This is the way that the previous version of PARAS canopy reflectance model has often been parameterized (e.g., Hadi and Rautiainen, 2018;Hadi et al., 2017;Rautiainen and Fig. 8. Top and middle rows: Root mean square error (RMSE) and mean estimation error (MEE) of the modeled canopy directional scattering coefficient, ω C (↓ sky ,Ω), relative to the mean of reference ω C (↓ sky ,Ω). Bottom row: Pearson correlation coefficient (r) between modeled and reference ω C (↓ sky ,Ω). Left column shows results for the default model (i.e., assuming a Lambertian canopy that is composed of foliage only) and right column for the fitted model (accounting for nadir to hemispherical scattering ratio and contribution of woody elements). Mean (and range) of the evaluation metrics are listed in each sub-figure. Schraik et al., 2019), not because it is believed to be perfectly correct, but because measurements of woody element spectra have not been readily available, and because the model itself has not provided a formula for modeling the scattering directionality. Our results now indicate that with careful attention to the parameterization, the model can correctly simulate the species-and wavelength specific relations of both canopy scattering and entire forest reflectance to plant area index.
Quantitatively, the results of our fitted model are in line with results obtained in a laboratory with small single trees, where relative RMSE up to 57% was obtained for the modeled tree reflectance . Similarly to our results, an evaluation of an earlier PARAS model version found that the largest overestimation occurred in the NIR region (Hadi and Rautiainen, 2018). The overestimation was however not as large as we observed here. In the referred study, the model was parameterized using a separate parameter (Q) for calculating the ratio of radiation scattered to the upper hemisphere to the total scattered radiation . This, together with the fact that multiple scattering between forest floor and the canopy was ignored in the previous model version, meant that a certain fraction of radiation scattered by the canopy was always directed towards the forest floor and did not contribute to the forest reflectance. We note that when running the earlier model version and using the Q parameter from Mõttus and Stenberg (2008), we obtained NIR overestimation of similar magnitude as in Hadi and Rautiainen (2018). We also note that the somewhat larger discrepancies between modeled and reference in the VIS region in the Bílý Kříž site compared to the other sites (Figs. 7, 11) could have been caused by slight inaccuracies in the airborne data, because a comparison with Landsat 8 OLI data indicated some deviation of the airborne data in Bílý Kříž from the rest of the sites in the VIS region (Supplementary material Fig. S2). We discuss the other differences and similarities between modeled and reference data, as well as the role of model parameters, in the next section.

New parameterization of the model
The inclusion of woody elements helped to reduce the model's large overestimation of canopy scattering in the NIR region. To explain this, we look into Eq. 7. In forest canopies, the values of photon recollision probability (p) are usually relatively high (0.53-0.85, Fig. 10) compared to those observed for single trees (e.g., approximately 0.1-0.6, see Fig. 10 in . Thus, multiple scattering occurs and contributes strongly to the canopy scattering in forests. The effect of multiple scattering on canopy reflectance is the largest when the plant element albedo is high. Using Eq. 7, it can be calculated that when the element albedo is 0.9, a reduction of 0.1 units in the element albedo results in reductions of 21% to 40% in canopy scattering, assuming p is 0.6 and 0.9, respectively. In other words, the canopy scattering is very sensitive to changes in element albedo, when the canopy is dense and the element albedo is high. Inclusion of woody elements in the model reduced the modeled canopy scattering in NIR, because the NIR albedo of woody elements is clearly lower than that of foliage or shoots (Fig. 9).  9. Plant element albedo spectra of pine, spruce, and broadleaved species. Foliage albedo (ω L ) was converted to shoot albedo (ω S ), using photon recollision probability within a shoot (for broadleaved species ω S equals ω L ). The default model used the shoot albedos as such. In the fitted model, the plant element albedo (ω E ) was obtained by least squares fitting, as a linear combination of woody element (ω W ) and shoot albedo. The f W (95% confidence intervals in brackets) is the fraction of woody elements from effective plant area that produced the ω E values. Top row shows the entire spectrum, and bottom row an enlarged view of the visible region. The colored regions show 95% confidence intervals for the ω L , ω W , and ω E spectra.
At the same time, the woody elements increased the average plant element albedo in the red and blue wavelength regions, thus making the model performance more consistent across wavelength regions.
Our study supports earlier findings that it is important to consider the woody elements in forest reflectance simulations (Kuusinen et al., 2021;Malenovský et al., 2008;Verrelst et al., 2010). A meta-analysis of studies conducted for different broadleaved and coniferous tree species reported woody element percentage from total plant area to vary between 5 and 35% (Gower et al., 1999). Corresponding values for all our study species are not readily available, but in a limited case study, Stenberg et al. (2003) reported values of 13% and 26% for Scots pine forests on poor and fertile soils, respectively. Our woody element percentage (32%, 30%, and 12% from effective plant area, i.e., 22%, 21%, and 12% from total plant area for pine, spruce, and broadleaved tree species, respectively) are therefore within a physically meaningful range. The values should, however, be interpreted as approximations. Measurement and sampling uncertainties in foliage spectra (Fig. 9) can also partly explain the NIR overestimation, as seen in Fig. 7 and Fig. 11. Other parameters (p and Q Ω ) are possible explanations as well, but, as will be discussed next, changing their values would not reduce the discrepancy in model performance between NIR and the other wavelengths.
Previous studies have emphasized the importance of p, and thus canopy clumping, in controlling the reflectance properties of forest canopies (e.g., Rautiainen and Stenberg, 2005;Hovi et al., 2017). We estimated p from field measurements and accounted for clumping at shoot level (through using the shoots as a basic plant element) and higher levels (through applying a canopy clumping coefficient in estimating true plant area index, and thus canopy-level p). As discussed by Ryu et al. (2010), our correction for higher than shoot level clumping is likely to represent a minimum, and the values of p can in reality be somewhat higher than what we estimated. Fine-tuning the p estimates through model inversion was not possible, because in our model p is correlated with Q Ω (i.e., both affect the directional scattering coefficient of the forest canopy in a similar manner) and thus both variables are difficult to estimate reliably at the same time. Confidence intervals in Fig. 10 indicate that an increase in p could increase the estimates of Q Ω . Fig. 10. Nadir to hemispherical scattering ratio of the forest canopy (Q Ω ) obtained through model fitting (top row), theoretical estimate of Q Ω (middle row), and photon recollision probability (p) calculated from hemispherical photographs (bottom row) as function of effective plant area index (L eff ) in coniferous, broadleaved, and mixed forest plots (columns). The vertical lines illustrate 95% confidence intervals.
However, we also argue that the effect of increasing the p value would be the highest in wavelength regions where plant element albedo is low (e. g., in the VIS region), and thus would result in undesired effect: reduction of simulated forest reflectance in the VIS region but not as much in the NIR region. This is because when plant element albedo is low, radiation attenuates rapidly after first interaction with the canopy and, given unchanged plant element albedo, the canopy scattering coefficient is almost directly determined by the escape probability (1 -p) for photons that scatter from the canopy for the first time. The other extreme is a theoretical situation where the plant element albedo is one. In that case, the canopy scattering coefficient does not depend on p at all because, independent of the p value, eventually the photons will escape the canopy. When plant element albedo increases (e.g., when moving from VIS to the NIR region), we approach this theoretical situation, and the sensitivity of canopy scattering to p decreases.
Together with woody elements, the nadir to hemispherical scattering ratio of the canopy (Q Ω ) was important in reducing the overall level of simulated forest reflectance. We obtained Q Ω values from 0.32 to 0.92 in our inversion, and also the confidence intervals were heavily weighted towards values below one. This indicates that in the majority of studied canopies, there is less scattering to nadir than to other hemispherical directions on average. The Q Ω values are impossible to evaluate based on Fig. 11. Modeled and reference (measured) hemispherical-directional reflectance factors, R(↓ sky ,Ω), of the forest as function of effective plant area index (L eff ) at four wavelengths (rows) in coniferous, broadleaved, and mixed forest plots (columns). The diamond symbols represent the default model (i.e., assuming a Lambertian canopy that is composed of foliage only), square symbols the fitted model (accounting for nadir to hemispherical scattering ratio and contribution of woody elements), and the x symbols the reference. The vertical lines illustrate 95% confidence intervals for the default model. our airborne data, because our data only cover close-to-nadir view angles (±20 • ). A previous study reported nadir and hemispherical reflectance values based on above-canopy multi-angular measurements at red, NIR, and SWIR wavelengths in boreal coniferous and broadleaved forests (Deering et al., 1999). Based on those results, it can be calculated that Q Ω values ranged between 0.42 and 1.19. Those values cannot be directly compared to our results because our Q Ω is defined for the canopy only (forest floor excluded), but they give some evidence that the obtained range of Q Ω is correct. Interestingly, we observed Q Ω to increase as a function of L eff . In preliminary ray tracing simulations at red wavelength (simulated forests composed of ellipsoid crowns filled with randomly distributed and oriented, circular leaves), we noticed a similar increasing trend as a function of L eff , and the Q Ω values ranged between 0.39 and 0.82. This suggests that the values obtained through our inversion are physically meaningful.
Based on our data and earlier publications (e.g., Hadi et al., 2017;Hovi et al., 2017), the average gap fraction of forest canopies tends to be the largest in nadir and approaches zero when the canopy is viewed from oblique angles. The contrast in gap fractions measured at nadir and oblique angles is the largest when the tree cover is low (Gerard and North, 1997). Thus, compared to dense canopies, sparse canopies have less plant area visible to the sensor in nadir (Fig. 5), which means that less radiation is likely to be scattered to nadir than to oblique directions. An interesting question is whether a sufficiently accurate, yet simple analytical formula for Q Ω could be included in the model. The good qualitative agreement of Q Ω with the theoretical estimate i Ω /i D (Fig. 10) suggests that such a formula could possibly utilize multi-angular canopy interception values measured from, e.g., hemispherical photos or terrestrial laser scanning. Quantitatively, there remained some discrepancy, i.e., inverted values of Q Ω were on average 29% lower than i Ω /i D , which can be due to both uncertainty of our inversion as well as deviation of real canopies from the assumptions behind the theoretical estimate. Other factors contributing to the angular distribution of reflected radiation, and thus probably also the nadir to hemispherical Top and middle rows: Root mean square error (RMSE) and mean estimation error (MEE) of the modeled forest hemispherical-directional reflectance factor, R(↓ sky ,Ω), relative to the mean of reference R(↓ sky ,Ω). Bottom row: Pearson correlation coefficient (r) between modeled and reference R(↓ sky ,Ω). Left column shows results for the default model (i.e., assuming a Lambertian canopy that is composed of foliage only) and right column for the fitted model (accounting for nadir to hemispherical scattering ratio and contribution of woody elements). Mean (and range) of the evaluation metrics are listed in each sub- figure. scattering ratio, are crown shape (Rautiainen et al., 2004), crown height distribution (Gerard and North, 1997), and sun zenith angle (Deering et al., 1999;Gerard and North, 1997), which all influence shadow fraction and its dependence on view-direction.
In this study, our focus was on near-nadir measurements, but the model could be applied also to simulations in other view directions, as long as Q Ω could be adequately estimated. We note, however, that predicting Q Ω near the hotspot region can be difficult, because close to the hotspot the gap fraction in direction of view (and thus escape probability) becomes highly correlated with gap fraction in direction of the sun (Kuusk, 1991;Yang et al., 2017). This correlation depends, in addition to other canopy structural factors, also on leaf size and orientation. Therefore, simple parameterizations of Q Ω based on macroscale canopy structure such as discussed in the previous paragraph might not be sufficient for hotspot simulations.

Implications
What are then the broader implications of our study, beyond the technical details and parameters of the p-based model? First, costefficient and computationally simple methods, such as the physicallybased model tested here, are becoming more and more important, due to the increase in the variety of satellite data that is available. New hyperspectral satellite missions (e.g., EnMAP), in particular, increase the need for testing the models in a wide range of wavelengths. By using hyperspectral data, we showed that, if parameterized accurately, the model based on photon recollision probability performs well throughout the spectrum. In other words, the geometry (structure) of the canopy, together with visibility of the forest floor, determine the overall 'brightness' of the forest throughout the spectrum. Thus, few spectrally invariant parameters and the plant element albedo can be enough to describe the reflectance characteristics of the forest.
The concept of spectral invariants has been used in interpretation of global multispectral satellite datasets: mapping of leaf area index and fraction of absorbed photosynthetically active radiation using MODIS (Knyazikhin et al., 1998;Myneni et al., 2002) and VIIRS (Yan et al., 2018) data, and mapping of leaf area index and its sunlit portion using DSCOVR EPIC (Yang et al., 2017) data. Retrieval of leaf area index from Landsat multispectral satellite data has also been demonstrated (Ganguly et al., 2012). Models belonging to the PARAS family, in particular, have been used in the inversion of forest leaf area index and its seasonal variation from medium resolution (Landsat, Sentinel-2) multispectral satellite data (Schraik et al., 2019;Varvia et al., 2018). The relative simplicity of these models makes them particularly suitable when using computationally intensive (inversion) techniques (e.g., Schraik et al., 2019). For example, the PARAS model was used for sensitivity analyses quantifying the driving factors of forest reflectance (Hadi et al., 2017;. We hope that the results presented here will further help to facilitate the use of these models in practice, including (but not limited to) the above mentioned applications.
Our study also revealed new aspects about the role of leaf (and needle) spectra in forest reflectance and land surface modeling. Based on our results, it appears that using a leaf or needle spectrum as input is not sufficient for modeling a forest reflectance spectrum accurately. Rather, one should be able to determine, perhaps with the help of measurements or spectral libraries, the effective plant element albedos that would be used as model input. Probably, if the effective plant element albedos are species-or biome-specific, this kind of model calibration would not be needed for each and every dataset or study area. This would make the model widely applicable. Overall, we would like to emphasize the importance of empirical measurements in evaluation and development of forest reflectance models. The finding that the default model did not match with the empirical measurements motivated us to find the explanation and to provide a solution that improved the model performance. We also made an interesting discovery about the spectral properties of forests: we observed that even though the plant element albedos are not dramatically different between coniferous and broadleaved forests, they can still explain a large part of differences between these forest types in the NIR reflectance. This is because, due to multiple scattering, the relation between plant element albedo and forest canopy scattering is nonlinear. Thus, the forest reflectance, especially in dense forests, is very sensitive to variation in plant element albedo in the NIR wavelength region where the plant element albedo is high.

Conclusions
We conducted an extensive (in terms of geographical coverage and forest structural variation) empirical evaluation of a photon recollision probability based forest reflectance model. Based on our results, we conclude that using foliage albedo as the plant element albedo and assuming a Lambertian canopy can lead to large errors when simulating the forest reflectance in a typical near-nadir observation geometry of satellite sensors. On the other hand, the model simulates forest spectra and their dependence on effective plant area index of the forest canopy correctly, if effective plant element albedo (which combines foliage and woody elements) and directional scattering properties of the canopy are used as inputs. Future studies should develop methods for approximating the directional scattering properties (i.e., forest BRDF), and determine effective plant element albedos for different tree species and biomes based on measurements of spatial and temporal variation of both woody element and foliage spectra.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.