Comparison of two-dimensional multitemporal Sentinel-2 data with three-dimensional remote sensing data sources for forest inventory parameter estimation over a boreal forest

National Forest Inventories (NFI) are key data and tools to better understand the role of forests in the global carbon budget. Traditionally inventories have been carried out as field work, which makes them laborious and expensive. In recent years, the development of various remote sensing techniques to improve the cost-efficiency of the NFIs has accelerated. The goal of this study is to determine the usability of open and free multitemporal multispectral satellite images from the European Space Agency's Sentinel-2 satellite constellation and to compare their usability in forest inventories against airborne laserscanning (ALS) and three-dimensional data obtained with high-resolution optical satellite images from WorldView-2 and Synthetic Aperture Radar (SAR) stereo data from TerraSAR-X. Ground reference consisted of field data collected over 74 boreal forest plots in Southern Finland in 2014 and 2016. Features utilizing both singleand multiple-date information were designed and tested for Sentinel-2 data. Due to high cloud cover, only four Sentinel-2 images were available for the multitemporal feature analysis of all reference plots within the monitoring window. Random Forest technique was used to find the best descriptive feature sets to model five forest inventory parameters (mean height, mean diameter at breast height, basal area, volume, above-ground biomass) from all input remote sensing data. The results confirmed that the higher spatial resolution input data correlated with more accurate forest inventory parameter predictions, which is in line with other results presented in literature. The addition of temporal information to the Sentinel-2 results showed limited variation in prediction accuracy between the single and multidate cases ranging from 0.45 to 1.5 percentage points, whereof mean height, basal area and aboveground biomass are lower for single date with relative RMSEs of 14.07%, 20.66% and 24.71% respectively. Diameter at breast height and volume are lower for multi date feature combination with relative RMSEs of 18.38% and 27.21%. The results emphasize the importance of obtaining more evenly distributed data acquisitions over the growing season to fully exploit the potential of temporal features.


Introduction
Gaining a better understanding of the effect of forests in the carbon cycle and in climate change requires accurate information of forest resources in short time intervals. One example of how this information can be monitored and reported is for the Kyoto protocol, which requires a report on the state of the nation's forests for an estimation of carbon storage (UNFCCC, 1997).
The most common way of assessing forest structure, health and productivity is by maintaining a national forest inventory, like those kept e.g. in Finland (Tomppo, 1996) and Canada (Gillis et al., 2005). Common forest inventory variables include, but are not limited to, diameter at breast height (DBH), tree heights and basal area and above ground biomass. Forest inventories mostly rely on statistical estimation methods, meaning a good number of field plots are collected as reference data and then remote sensing data together with estimation models are used to create estimates for larger areas (Scott and Jeffrey, 2002). For example, in Finland, the National Forest Inventory is currently carried out in five-year cycles utilizing field measurements, high resolution satellite imagery and digital maps (Katila and Tomppo, 2001;Tomppo et al., 2008), but more accurate and more frequent remote sensing data acquisitions would be preferred by the forest industry. As of today, the costs of accurate acquisition of these data are too high to fulfill this need.
The means by which the field and remote sensing data are acquired can range from manual in-situ field data collection to more automated static, mobile, airborne or even spaceborne data acquisition. It has been shown that the use of remote sensing data in addition to the field data can increase the value of national forest inventories (e.g., McRoberts and Tomppo, 2007;White et al., 2013). On the ground level, field work covers only small areas and is time-consuming and expensive, but it is also more accurate than remotely sensed estimation methods. Satellite data covers wide areas, even continents, but the accuracy of freely available satellite data is much lower than in-situ data and commercial satellite data. Since the early days of the Landsat mission (https:// landsat.usgs.gov/), a variety of scientific research regarding the use of reflectance information and various vegetation and other indices (2D features in the present study) has been conducted in the field of forest mapping (e.g., Chen and Cihlar, 1996;Wulder, 1998;Zhu and Liu, 2015). 3D remote sensing techniques, especially airborne laser scanning (ALS, which will be described below), have shown huge potential in the estimation of forest inventory parameters (Yu et al., 2004;Hyyppä et al., 2008). In terms of cost and coverage, ALS positions between the field inventories and satellite remote sensing in forest applications. Similarly accurate results have also been acquired by using digital photogrammetry; however, a digital terrain model (DTM) from for example ALS data is always required. Also, 3D features acquired from optical stereo satellite images and satellite SAR data have also provided good results for estimating forest inventory variables (Yu et al., 2015;Karila et al., 2015;Karjalainen et al., 2012).
Laser scanning, or LiDAR (Light Detection and Ranging), is one technique of directly deriving 3D coordinates of an object. In laser scanning a pulsed laser beam illuminates a target and the reflected pulses are measured by the sensor. Through the time for the emitted pulse to return, together with a scan angle and Global Navigation Satellite System (GNSS) and Inertial Measurement Unit (IMU), the determination of 3D coordinates is enabled. These measurements can be obtained from terrestrial, mobile, aerial or spaceborne platforms. For Airborne Laser Scanning (ALS) acquisition from a platform moving at about 500 meters altitude, one of the most accuracy influencing factors is the positioning of the platform, which is provided through continuous monitoring of position and orientation of the sensor through GNSS and IMU. These measurements permit the extraction of a georeferenced point cloud. ALS has been shown to have great potential for forest mapping purposes through the ability of the laser pulses to penetrate the canopy and thus not only depicting the canopy but also the internal structure of the forest, where other techniques, such as digital photogrammetry or radargrammetry can only represent the outer structure . In that way, forest parameters can be accurately estimated by utilizing ALS data (Hyyppä and Inkinen, 1999;Hyyppä et al., 2008;White et al., 2016;Yu et al., 2011). Nowadays, ALS is used in operational forest management and producing very good results, but due to the low altitude it can only cover a relatively small area in a short time, thus making frequent monitoring expensive.
A Synthetic Aperture Radar (SAR) is an imaging radar capable of producing high-resolution 3D data over land areas. SAR uses microwaves, typically with wavelengths ranging from centimeter level up to tens of centimeters. These wavelengths penetrate clouds, making SAR an advantageous sensor technology for Earth Observation from space in cloudy areas. SAR transmits pulses of microwave radiation and receives echoes from the target area, for which the strength of the backscattering pulse is recorded as well as the phase of the signal. In addition, the range between sensor and target pixel can be calculated from the time delay. The strength of backscatter (intensity) contains information about forest biomass, but is known to have limited capacity in estimation due to the saturation of signal in high biomass volumes. The point of saturation highly depends on the radar wavelength used (Le Toan et al., 1992). In addition, the phase signal enables advanced techniques of SAR polarimetry and interferometry, which may also be used in forest mapping (Papathanassiou and Cloude, 2001). However, currently there are no satellite systems fully capable of Polarimetric SAR Interferometry techniques. Recently, the elevation models extracted from SAR data have shown their potential in forest biomass estimation (Yu et al., 2015;Persson et al., 2017). SAR interferometry and SAR stereo-radargrammetry are two techniques that can be used to derive 3D information .
Optical satellite data has been used for estimation of forest inventory parameters since the 1980s (Tomppo and Katila, 1991). The Landsat archive has provided free and open 2D information suitable for forest analysis since 1972 (Williams et al., 2006). Bolton et al. (2018) and Zhu and Liu (2015) have studied the use of one Landsat image as well as multiple Landsat images for forest disturbance and recovery analysis, as well as classification and estimation of aboveground biomass. The main drawbacks for precision forestry applications of the Landsat archive are its striping error in Landsat 7 (Markham et al., 2004), the rather low resolution of 30 m and a revisit cycle of 16 days.
Since 2015, the Sentinel-2 (S-2) satellite constellation aims to tackle the above mentioned drawbacks of Landsat imagery. The Sentinel-2 Multispectral Instrument (MSI) is a satellite constellation run by the European Space Agency (ESA), consisting of two satellites (A and B, B in orbit since 2017) orbiting earth at a height of 786 km in a sun-synchronous orbit with a revisit cycle as low as 3-6 days in some regions of the world (Drusch et al., 2012). As part of the Copernicus program of the European commission, Sentinel-2 will be a part of six different satellite platforms observing earth once the program concludes. It provides 13 bands in different wavelength intervals in 10 m, 20 m or 60 m spatial resolution. Four of the bands cover the red edge area of the electromagnetic spectrum, i.e., those wavelengths between red and near infrared light where the reflection of vegetation changes rapidly (Lillesand et al., 2014). In addition, it provides two bands within the shortwave infrared (SWIR) area (1610 and 2190 nm). Compared to the Landsat missions, Sentinel-2 does not carry thermal sensors. Sentinel-2 data has been shown to be suitable for different kinds of studies, such as tropical coral reef mapping and coral bleaching detection (Hedley et al., 2012), the detection of built-up areas (Pesaresi et al., 2016), glacier monitoring (Paul et al., 2016) and water body mapping (Du et al., 2016). As of today, the use of Sentinel-2 data for estimating forest inventory parameters has been explored in a limited number of studies, of which most focused on the estimation of biophysical variables (Majasalmi and Rautiainen, 2016;Puliti et al., 2018;Korhonen et al., 2017) or focusing on e.g., the mangrove forests on the Philippines (Castillo et al., 2017) or a mediterranean forest ecosystem in Greece (Chrysafis et al., 2017). These ecosystems differ greatly from finnish boreal forest.
Higher resolution images from airborne or spaceborne platforms can also be utilized to derive accurate 3D positions. Photogrammetry is a technique allowing the reconstruction of position, orientation, shape and size of objects from images (Kraus et al., 2007). Therefore, images have to be taken either by a approximately horizontal looking camera and different angles towards the target (close range photogrammetry) or for airborne photogrammetry, by a vertically looking camera with successive images with a minimum overlap of 60%. Features are then identified in each image and matched to features in other images. Based on this, 3D coordinates of the features can be extracted.
Even though the extraction of 3D features, especially from ALS, have revolutionized forest inventories and are used in operational forest management, present-day medium resolution satellite imagery has a clear role in forest inventories: (1) satellite data cover wide areas, (2) data acquisition is frequent and regular, (3) free and open satellite data sources exist. However, a better understanding of the potential and accuracy of different 2D and 3D techniques is still required. Therefore, it is very intriguing to compare the forest inventory variable estimation accuracies of the current state-of-the-art 3D techniques to 2D techniques. There are a number of publications comparing single-time satellite images to other means of remote sensing data acquisition (e.g., Chrysafis et al., 2017;Chubey et al., 2006;Gunlu et al., 2014;Hyyppä et al., 2000;Mäkelä and Pekkarinen, 2004), as well as combining multispectral satellite imagery with other sensors for forest parameter estimation (e.g., Badreldin and Sanchez-Azofeifa, 2015;Carreiras et al., 2017;Castillo et al., 2017;Puliti et al., 2018;Tanaka et al., 2015;Fernández-Landa et al., 2018). Lefsky et al. (2001) compared multitemporal 2D optical satellite data to 2D (Airborne Visible InfraRed Imaging Spectrometer) and 3D (Scanning Lidar Imager of Canopies by Echo Recovery) data from airborne platforms, but were limited in the spatial resolution of the optical satellite data. A study on the use of 3D data extraction from aerial images for improving the Finnish National Forest Inventory was done by Tuominen et al. (2017) who concluded that 3D material derived from ALS and aerial images outperformed the 2D data derived from satellite and aerial imagery.
Therefore, we investigate in this study the potential of multitemporal Sentinel-2 imagery for forest inventory parameter estimation in a homogeneous boreal forest in Southern Finland and compare this technique to several 3D techniques. The aim is to determine, to what extent free available optical satellite data can be used for forest inventory parameter estimation. We will also assess to what degree Sentinel-2 data is suitable for determining areas of interest to be mapped more accurately with higher resolution methods, such as ALS.

Materials and methods
The study area, field data and different remote sensing datasets used in this study as well as the feature derivation and accuracy assessment are described in the following section.

Study area
The area selected for this study is situated in Southern Finland 120 km north of Helsinki in the Kanta-Häme region, close to Evo. The 4 × 6 km 2 area is part of the Southern Boreal Forest Zone and has its centerpoint at lat. 61.202°, lon. 25.123°. The area contains mainly Norway Spruce (picea abies) and Scots Pine (pinus sylvestris) and the average stand size of the 2000ha managed forest has been observed to be 1ha. With elevations ranging from 125 m to 185 m above sea level, the topography of the study area is relatively flat.

Field data
The field data was acquired in summer 2014. 119 sample plots were chosen from an systematic grid (32 m × 32 m cell size) over the whole study area. The sample plots were chosen to represent the different forest conditions and comprising height and density variations within the study area. The study site was revisited in 2016 to determine if the plots had undergone changes. Consequently 91 plots that showed no changes between 2014 and 2016 were utilized for this study. The number of plots had to be reduced to 74 thereafter due to cloud cover in parts of the study area during one satellite image acquisition. Descriptive statistics of the 74 plots used in this study can be found in Table 1.
For each plot, all trees with a diameter at breast height (DBH) of more than 5 cm were measured and their mean served as value for DBH of the plot. Tree heights were determined with the use of an electronic hypsometer and tree species were recorded. The standard Finnish allometric models (Laasasenaho, 1982;Repola, 2009) were used to calculate tree volumes and biomass, taking tree species data, DBH and mean height as input. The inventory parameters used in this study were then derived by summing or averaging the tree data. More details on the field data acquisition and information on the accuracy of field data can be found in Luoma et al. (2017).

Remote sensing data
The remote sensing datasets used in this study have separate preprocessing steps that will be presented in the following subsections.

Airborne laser scanning
ALS data was acquired on the first of May 2016, using an Optech Titan multispectral system. The spectral channels of the system comprise two channels in the infrared spectrum at 1550 nm (channel 1) and 1064 nm (channel 2), and one operating in the green spectrum at 532 nm (channel 3). For this study, only the point cloud data from channel 2 was used, to keep the results comparable to conventional single-channel ALS data and previous studies. The three channels look in different directions with channel 2 pointing towards the nadir. The system operated at a pulse rate of 3 × 250 kHz from an altitude of 500 m above sea level. The average pulse density per channel was about 17 pulses per m 2 , with a footprint size of 17.5 cm in diameter for channel 2 with a beam divergence of 0.35 mrad. The point cloud was geo-referenced to the local ETRS-TM35FIN coordinate system. Calibration and matching between acquisition strips were performed by the data provider.

TerraSAR-X-stereo
SAR-stereo radargrammetry is represented in this study by the German Aerospace Centers TerraSAR-X (TSX) data. The constellation was launched in June 2007 and operates in the X-band microwave region with a central wavelength of 3.1 cm and a spatial resolution of 1 m. Compared to longer wavelengths of, e.g. C-and P-band, X-band radar only minimally penetrates into the canopy. For the present study, six TSX high-resolution model images were acquired with different imaging geometry over the test site, providing a total of six stereo pairs (each ascending/descending image with every ascending/descending image: A1 + A2, A1 + A3, A2 + A3, D1 + D2, D1 + D3, D2 + D3) for stereo-radargrammetric processing. The data shown in Table 2 was  chosen based on availability of the stereo configuration and with the shortest possible time between acquisitions. The acquisition dates were required to be around June to be comparable to the other datasets. Stereo-radargrammetric processing was carried out using the Socet Set software (version 5.6, BAE Systems). The output being a point cloud of 3D elevation data, based on a combination of all six stereo models. The point cloud is then used to calculate forest vertical structure-related 3D features with the help of an external DTM, which was a triangulated irregular network (TIN) based on ALS ground points. A detailed process description can be found in Karjalainen et al. (2012).

WorldView-2
For 3D photogrammetric data extraction we used very high spatial resolution along track stereo pairs from WorldView-2 (WV-2) satellite. This commercial multispectral satellite is operated and monitored by DigitalGlobe (Maxar Technologies, Colorado, United States) and was launched in October 2009, with a ground sampling distance of 0.5 m for the panchromatic images. For this study, a stereo-pair was acquired on September 13th, 2016 around noon local time with nadir viewing geometry and nearly cloudfree. This dataset was chosen because the date of acquisition is close to that of the other datasets.
The acquisition of 3D features from the WV-2 stereo pair was similar to the process used for the TSX stereo-radargrammetric data. For WV-2 however, Ground Control Points (GCP) were needed to refine the geolocation accuracy. In the present study, 10 3D GCPs were extracted from orthophotos and DEMs from the National Land Survey of Finland with a pixel size of 0.5 m. The residual after the adjustment were 0.49 m (Easting), 0.75 m (Northing), and 0.15 m (Elevation). The accuracy of the WV-2 stereo-pair was also visually checked against the reference orthophoto in order to verify the geolocation accuracy. The output of the process is a point cloud, which is then used to estimate 3D features for the forest inventory parameters. A more detailed description of the process can be found in Yu et al. (2015).

Sentinel-2
The Sentinel-2 satellite constellation has consisted of two satellites, Sentinel 2A and 2B since fall 2017, but as this study is focusing on the year 2016, only data from Sentinel 2A was available. For data provision and processing, we used the EODC platform ('Earth Observation Data Center' (EODC Gmbh, Vienna, Austria)), which mirrors all Sentinel-2 data as L1C (top of atmosphere) products and allows on-site processing, thus omitting data transfers, which was the main reason for choosing this platform. Our region of interest in Southern Finland is completely covered by tile 35VLH of the Sentinel-2 tile system with center point lat. 60.81378684°lon. 24.33207008°and a size of 110 km × 110 km (see Fig. 1). In this study, we used all available images from 2016 with less than 30% cloud cover in the area of investigation. This resulted in three nearly cloud-free images and one with ∼30% cloud cover in the area of interest (see Fig. 1), which resulted in a reduced number of usable sampling plots (see Table 3). The four images used were acquired on April 10th, May 10th, August 11th and August 31st, 2016; a short description can be found in Table 3. Cloud cover indicates the percentage based on the cloud mask calculated in the third party provided sen2cor atmospheric correction process (ESA, 2018a).
The processing of Sentinel-2 tiles was done using the Finnish Geospatial Research Institute (FGI)-developed process, which is written in Python and utilizes different open source tools: first, the images had to be processed to L2A (bottom of atmosphere) product with sen2cor tool. Next, one cloud mask and different index images per date were generated using the SNAP toolbox python interface, known as snappy (ESA, 2018b)(cf. Fig. 2). The indices used in this study were chosen based on previous performance in other studies and are shown in Table 4. The index raster images, as well as all available bands were  upsampled to 10 m resolution using the nearest neighbor method.

Feature derivation
The process of feature derivation in this study can be split into two parts. The first part is the feature derivation process from three-dimensional data acquired by ALS, Worldview-2 and TerraSAR-X. The 3D feature derivation process is fundamentally different from the feature derivation process for Sentinel-2 data that have no third dimension.

Feature derivation from 3D data
All point clouds from ALS, WV-2 and TSX were normalized by removing the elevation of the ground from every point height value to derive the height above ground for each point. A digital terrain model (DTM) was derived from ALS data by classifying all points into nonground or ground points using the commercial software TerraScan (Terrasolid Oy., Helsinki, Finland). The ALS derived DTM was used for normalization of all 3D point clouds. Due to the characteristics of their data acquisition, WV-2 and TSX point clouds seldom penetrated the vegetation to the ground, and therefore no DTM can be extracted from these data.
Feature derivation was split into two parts. First, a number of height distribution features, i.e., maximum height, mean height calculated as the arithmetic mean of heights, mode of heights, standard deviation of the heights, coefficient of variation, penetration rate as a ratio of points below 2 m (i.e., ground points) to total number of points, volume as the multiplication of (1-penetration rate) and mean height, and percentiles from 10% to 90% with 10% increments of the canopy height distribution. Second, density-related features were calculated from all aboveground points (i.e., points above a threshold height of 2 m). Therefore each point cloud was split into 10 equal height intervals from bottom to top of the canopy. The features were then calculated as a share of points in each height-interval to the total number of points.

Feature derivation from 2D data
After computing the index images as described above, the further processing of the S-2 dataset was done using Python packages rasterstats (pythonhosted.org/rasterstats/) and pandas (pandas.pydata.org). A total of 25 features per image were computed at plot level for seven statistics each (mean, median, minimum, maximum, range, standard deviation, 10th and 90th percentile). Additionally, temporal features (band ratios and features using temporal differences/changes between two dates) between all dates were calculated. Altogether 1100 features were calculated for the 74 plots for each acquisition date. To assess the predictive performance of multi temporality for forest inventory parameter estimation, different combinations of the Sentinel-2 data have been used: (I) Features calculated for each single date (II) Combined features of all four single dates (single-date combination) (III) Multitemporal features calculated using differences and ratios between two or more dates (multiple-date combination) (IV) All features (II and III combined).

Forest inventory parameter estimation
The forest inventory parameters were estimated with the Random Forest technique (Breiman, 2001) using features obtained with four different remote sensing techniques and field measurements. The Random Forest algorithm is a non-parametric supervised learning algorithm. It is used for both classification and regression tasks with the latter being the task in this study. In the Random Forest algorithm, the prediction is obtained by fitting a number of decision tree classifiers on various sub-samples of the dataset and averaging is used for improving the predictive accuracy and controlling overfitting of the model. We chose the Random Forest algorithm because it does not make an assumption about the data distribution and works well without variable selection procedure. Another advantage of the algorithm is that it  provides the option to measure the relative importance of each feature on the prediction. Thus, the least important features can be omitted from the prediction to prevent overfitting and to improve the accuracy when feature space is large. For this study, the Random Forest technique was used firstly to select the 10 most important features from each data set and then forest attributes were estimated based on the selected features and field data. The algorithm was implemented in MATLAB (The MathWorks, Inc., Massachusetts, United States) with the following parameter settings: the number of classification trees to grow was set to 200, the number of variables (features) to select at random for each decision split was set to three and the minimum number of observations per tree leaf was set to three.

Accuracy assessment
All remote sensing data sources were evaluated based on their performance for predicting the forest inventory parameters. Thus the correlation coefficient (R) as the relationship between predicted and reference values and the coefficient of determination (R 2 ) have been calculated. In addition, bias and root mean square error (RMSE) have been determined as follows: n -number of plots, X -estimated value from remote sensing data for plot i and X -observed value for plot i from field observation. Also, the relative bias and RMSE were calculated in regard to the sampled mean of bias and RMSE, respectively.

Feature importance
Identification of ten features with the most predictive power was carried out for Sentinel-2 data and can be found in Appendix, Table A.1. For the 3D datasets, the identification of the most important features can be found in Yu et al. (2015). Table A1 shows that the minimum of short wave infrared (SWIR) bands 11 (1610 nm) and 12 (2190 nm), as well as near infrared (NIR) bands 5 (705 nm), 7 (783 nm) and 8A (865 nm) show higher predictive performance than other features for all parameters. Among the index and ratio features, the slope-like feature between May and April for band 11 and the TCW show the best predictive capability for all parameters. In general, the multitemporal features make up 2 to 3 of the best predicting features for basal area, volume and aboveground biomass. For the mean diameter at breast height and mean height, the multitemporal features make up 5 and 4, respectively out of ten of the best predicting features. It can be noticed that features involving the near-infrared bands are often the most optimal for diameter at breast height and total aboveground biomass. For basal area and volume, the best features often include the shortwave infrared bands.
We also tested the different NIR bands of Sentinel-2 for determining NDVI. In general, the NDVI was the best predicting feature for all parameters for the image of August 31st, 2016, where the NDVI calculated with bands 5 or 7 show the best predictive capability. The highest predictive capability of NDVI is at the end of August (for basal area, volume and also aboveground biomass also at the beginning of August) which shows that there is a connection between tree health/ greenness and the different forest inventory parameters. Vegetation is densest in August in Finland, thus having most cover. Overall, the NDVI calculated with bands 5 or 7 is among the 10 best features for all parameters. Only aboveground biomass seems to be more related to the NDVI calculated from band 8A. Hence, we recommend paying attention to Sentinel-2's different NIR bands when calculating spectral indices including the near infrared wavelength interval, as they do provide more specific information on the vegetation's red edge area than band 8 with the broader wavelength coverage around 842 nm.

Accuracy of forest inventory parameter estimation
The main part of this study was to compare the accuracy for the forest inventory attributes mean height, diameter at breast height, basal area, volume and total above ground biomass from the different datasets derived from ALS, WV-2, TSX and single date and multitemporal S-2. A summary of the statistics for forest inventory parameter estimation from each dataset is shown in Table 5. It can be seen that features derived from ALS data offer the most accurate estimates for all forest inventory parameters studied.
The order of least accurate to most accurate based on R and RMSE for all forest inventory attributes is TSX, S-2, WV-2, ALS. ALS shows the highest correlation coefficient (R) between 0.84 and 0.97 for all parameters, while for WV-2, those values range from 0.81 to 0.93, for the combination of all data of S-2 the values range from 0.63 to 0.80 and for TSX they range from 0.42 to 0.74. According to high correlation coefficients, the mean height, mean diameter at breast height and the volume were estimated best for all datasets. The lowest error was reported for mean height and mean diameter at breast height for all datasets. The biases for all data sources and all estimates ranging between −2.58 and 0.69 indicates a good calibration of systematic errors performed by the model. The results of this study compare well to other studies in this field. When compared to results by Yu et al. (2015), the results regarding ALS, WV-2, and TSX stereo are well in line with earlier results at the same test site in Evo in 2014. For example, RMSE% for biomass estimations are within one percentage point in 2014 and 2016. In 2014, TanDEM-X bistatic INSAR data was also used, and the estimation accuracies were better than TSX stereo but slightly worse than ALS and WV-2. In the present study, we were not able to include TanDEM-X data in the comparison because there was no TanDEM-X data available in summer 2016 over the Evo test site.
The main difference from the 2014 study is the addition of Sentinel-2 data into the comparison. Our results show that Sentinel-2 data is close to TSX stereo regarding the accuracies of the estimations of the forest inventory attributes. For example, biomass estimation results were 25.0% and 27.5%, for TSX stereo and S-2 respectively. However, S-2 RMSE% are roughly 10 percentage points higher compared with ALS data and WV-2. This result is in accordance with Hyyppä et al. (2000), where 3D features were found to be superior compared to 2D Fig. 3. Forest inventory parameter difference maps derived from ALS minus Sentinel-2 all single date features over the whole study area. White areas correspond to no data. spectral features. Tuominen et al. (2017) obtained an RMSE of 33.78% for height, 37.50% for DBH and 45.34% with dual date Landsat 8 images from July 23rd, 2013 and July 3rd, 2014 patched together. Our results show a better RMSE percentage for all three parameters. However, there are a number of differences between the study of Tuominen et al. (2017) and our study. Their study was conducted in central Finland on a study site of approximately 5800 km 2 on 2469 defined plots. Furthermore the Landsat 8 data utilized was acquired in 2014/2015, and provides 30 m spatial resolution red, green, blue and near infrared bands and a 15 m panchromatic band compared to 10 m spatial resolution red, green, blue and near-infrared bands of Sentinel-2.
Compared to the spatial resolution of the Sentinel-2 data, the plot size chosen in this study was relatively small. The 32 m × 32 m plots generally had a full or partial overlap of 10 to 17 Sentinel pixels (10 m × 10 m). It is estimated that increasing the plot size would also increase the performance of Sentinel-2 data for forest inventory parameter estimation. However, for our research, the used plot size was due to the Finnish National Forest Inventory field reference. Fig. 3 shows difference maps for all derived forest inventory parameters from the best performing dataset (ALS) minus Sentinel-2 (combination of all single date features). The maps were calculated using the model of the random forest algorithm and two rasters of 32 × 32 meter pixel size over the whole study area. One raster derived from Sentinel-2 features, the other based on ALS features. After calculating the forest inventory parameters for both rasters, water areas were extracted based on the (ESRI) shapefile of water areas of the National Land Survey's (NLS) topological database with a buffer of one pixel (32 m) around the water areas.
Negative values (dark orange) in the figures show that Sentinel-2 is overestimating the parameter in question, positive values (dark turquoise) show an underestimation by Sentinel-2. It can be seen, that Sentinel-2 features are mainly overestimating the parameter values, but there are a few condensed areas where Sentinel-2 features are underestimating. These areas, where Sentinel-2 features are underestimating are partly sparse forest with high trees, suggesting a higher influence of the lower vegetation on the reflectance than in denser forest. Fig. 4 gives an impression of the estimation accuracy in the full value range of the observations. It can be seen that the forest inventory parameters with low values are not covered well by any of the sensors used in this study. Sentinel-2 shows no prediction power for, e.g., low biomass (< 100 m 3 /ha) or low tree heights (< 20 m). The overestimation for low biomass and underestimation of high biomass may result from the limited number of sample plots used in this study. Also the small number of pixels covering one sample plot and together with the number of border pixels in the case of Sentinel-2 data may be a reason for the limited prediction power.

Multitemporal Sentinel-2 features
Three different combinations of Sentinel-2 data were assessed for predictive performance. The estimations in Table 5 show the results when all data from each individual date and their combinations were used together.Two other groupings were also tested, these being (i) all single-date data combined (single-date combination) and (ii) all multitemporal features combined (multi-date combination). Fig. 5 shows the RMSE as percentages for the different combinations and each single date. In general, the difference between the combinations is only between 0 and 6 percentage points. The combination of all Sentinel-2 data gave the best performance for all forest inventory parameters. The data from April 10, 2016 (leaf-off with ground still frozen) shows the highest percentage for RMSE, while data from August 11, 2016 (late summer) shows an equally low or lower percentage for all parameters except for the diameter at breast height, as when all features were used. This shows that in general August 11, 2016 data could be chosen instead of the multitemporal data. But after comparing the results for the other dates with the multitemporal data, it is clear that this is not the case for the other dates.
It can be estimated that the use of more dates of the Sentinel-2 data would increase the predictive capability of the model, given that more cloud free images are available per growth season. Having more cloudfree dates would also allow extraction of additional, more timedependent features, from the time series, to improve the estimation and to detect expected changes in growth patterns.

Conclusions
The primary objective of this research was to determine the performance of multitemporal features from multispectral Sentinel-2 satellite imagery for forest inventory parameter estimation over a boreal forest. The study was conducted over a small forested area in southern Finland and the results were compared to those of single date multispectral satellite data, stereo-SAR, high-resolution 3D satellite data, and ALS. One big advantage of the multispectral satellite imagery compared to all other methods used in this study, is that the data is openly available, has global coverage, and regular revisit frequency. All 3D data require a DTM for processing, preferably, from ALS if available. In general, our results for the three-dimensional data were well in line with previous studies.
We demonstrated that the use of here selected multitemporal features did not increase the performance for predicting forest inventory parameters compared to the use of single-date features. The results indicate that even though the multitemporal multispectral data can show a general trend in estimation of forest inventory parameters, it cannot account for the missing third dimension, which the estimations of many forest inventory parameters rely on. Furthermore, the images used in this study did not cover the full extent of the growing season due to high cloud coverage in 2016.
If the question is to find interesting areas, where a more detailed survey is necessary, the free multispectral satellite data can give a good impression of where to look. Compared to the closest method in estimation accuracy for the forest inventory parameters, stereo SAR, the multispectral Sentinel-2 satellite imagery has the advantage of being free of charge and regularly updated.
One limitation of our research was the availability of cloud-free data for the timeframe of interest. Having more data available would make the use of time series features possible, which might improve the use of multitemporal multispectral satellite images, as was also mentioned in Bolton et al. (2018). Filling the gaps in a multispectral time series with SAR data would be one way to mitigate data availability and performance issues. Another limitation was the number of plots used in the study, which was limited by the Finnish National Forest Inventory field reference.
Open satellite data is valuable in providing large-scale routine monitoring over large forest areas and detecting changes in them. This allows for more accurate resources, such as ALS, to be focused on areas with important changes. Also the various combination possibilities of the data sources used in this study may improve the estimation of forest inventory parameters.

Author contribution
Samantha Wittke processed the Sentinel-2 data, carried out research and wrote the first version of the manuscript. Xiaowei Yu processed the ALS data and carried out the Random Forest parameter estimation. Mika Karjalainen processed WorldView-2 image data and TSX radargrammetry data. Juha Hyyppä and Eetu Puttonen provided the scientific guidance. All co-authors assisted in writing and improving the manuscript. All authors participated in the design of the scientific study.

Conflict of interest
The authors declare no conflicts of interest.