Draft Precision of exogenous post-stratification in small area estimation based on a continuous national forest inventory

National forest inventories (NFIs) are designed to provide accurate information on forest resources at the national and regional levels, but there is also a demand for such information at smaller spatial scales. Auxiliary data such as satellite imagery have been used to facilitate small-area estimation. The commonly used method, k-nearest neighbour (k-NN), provides a model-based estimator for small areas, but a design-unbiased estimator for the mean square error is not available. Post-stratification (PS) is an alternative approach to using auxiliary information that allows for design-based variance estimation. In a case study using real inventory data of the Finnish NFI, we applied this method to the municipality level to explore the lower limit to the area for which the key forest parameters, forest area and growing stock volumes, can be estimated with sufficient precision. For PS, we employed exogenous forest resources maps based on the previous NFI round. In the municipalities of the two study provinces, the relative standard errors of total volume estimates ranged from 2.3% to 26.9%. They were smaller than 10% for municipalities with an area of 390 km2 or larger, corresponding to approximately 100 or more sample plots on forestland. We also demonstrated the usefulness of design-unbiased variance estimation in showing discrepancies between design-based PS and model-based k-NN estimates.

In many NFIs, continuous field inventories and remotely sensed data acquired on a regular basis 106 provide external information; data from a previous inventory can be used in fitting the model 107 (Pulkkinen et al. 2018, Haakana et al. 2019. For example in Finland, forest resources maps 108 based on NFI sample plot data, satellite images and numerical map data, extending over the 109 whole country, are produced by the multi-source NFI (MS-NFI) method every two years 110 (Tomppo et al. 2008, Mäkisara et al. 2016, 2019. The NFI field inventory in turn is a continuous 111 inventory where one round is completed every five years.

112
113 For the smallest areas containing too few sample plots for reasonably precise estimation by any 114 design-based method, a model-based approach is the only option. Even in those cases, design-115 unbiased estimators can be useful. For example, we can compute both the k-NN estimates and 116 PS estimates with design-based confidence intervals (CIs) for the same areas of interest. The 117 properties of the areas where the k-NN estimate is outside the CI of the PS estimator could then 118 be examined more closely for identifying potential problems in k-NN estimation. To our 119 knowledge, any such analysis has not been reported. 120 121 The primary objective of our study was to evaluate the precision of post-stratification based on 122 an external model for small area estimation in the NFI context. Using the operational NFI in 123 Finland as an example, we estimated the precision of PS estimates for growing stock volume and 124 volumes by tree species, as well as the area of productive forest land and poorly productive 125 forest land at the municipality level. The approach was direct domain estimation, that is, the 126 estimates were based only on the sample plots within the small area (Rao 2003, p. 1). An  Table 2). The sampling design of NFI11 was systematic cluster sampling, but the designs 152 were slightly different in the Kainuu and Pirkanmaa regions (Fig. 2). The total number of sample 153 plots was 5,467 in Kainuu and 4,614 in Pirkanmaa. For the sample plots, both forest stand-level 154 and tree-level characteristics were measured. Trees belonging to a sample plot were selected by 155 restricted angle count sampling. In North Finland, including Kainuu, the basal area factor 156 (relascope factor) was 1.5 and the maximum radius of the plot was 12.45 m. In Pirkanmaa (South 157 Finland), a basal area factor of 2.0 and a maximum radius of 12.52 m were applied. Every 158 seventh tally tree was measured as a sample tree in more detail. Sample tree volumes were 159 estimated using volume models (Laasasenaho 1982), and volumes for tally trees were estimated  Table 2). To cover cloudy 166 areas in the Landsat images, IRS and Spot 4/5 images from 2005 and 2006 were used (Tomppo 167 et al. 2008). The pixel size of the satellite images was 20 m × 20 m, and the mean volume of the 168 growing stock (m 3 /ha) for each image pixel on forestry land was estimated using the k-NN 169 method (Tomppo et al. 2008). For the estimation, a forestry land mask excluding other land use 170 classes, such as agricultural land, roads, built-up areas and waterbodies, was derived from the 171 digital map data from the National Land Survey of Finland. For post-stratification, the MS-NFI D r a f t 9 173 MS-NFI volume maps are produced every two years, and the MS-NFI-2007 volume map was 174 selected for PS because it was based on the sample plot data collected before the NFI11 field 175 data used in the estimation.

176
177 Four volume strata plus three strata for other land uses were formed. The boundaries of the 178 volume strata were determined separately for the two study areas using the method of Dalenius 179 and Hodges (1959). In this approach, boundaries y i , i=0,…,4, are determined so that y 0 is the 180 minimum and y 4 the maximum of predicted volumes , and the values , i=1,…,3, ] into four equal sub-intervals, where f is the probability ∫ 4 0 ( ) 182 density of the predictions (Dalenius andHodges 1959, Cochran 1977, p.127). In Kainuu, the 183 stratum boundaries, 34 m 3 /ha, 73 m 3 /ha and 122 m 3 /ha, were based on the volume map of North 184 Finland, excluding the three northernmost municipalities, covered largely by open fells (Fig. 1).
185 North Finland comprises the three northernmost provinces (Fig. 1). In Pirkanmaa, post-  193 (McRoberts et al. 2002, Nilsson et al. 2005, McRoberts 2010, Magnussen et al. 2015. Three 194 strata for other land use classes (agricultural land, built-up area and waterbodies) were used 195 because the forestry land mask was not reliable enough (Katila et al. 2000). Hence, all NFI D r a f t 10 196 sample plots assessed as forest land in the field were included in the estimation, though some of 197 them were classified as other land use according to the map data. Cochran (1977, pp. 132-134) 198 recommended four to six strata, provided that the sample size is reasonably large (around 20) in 199 every stratum. Westfall et al. (2011) suggested that the smallest within-stratum sample size 200 should be at least 10 to obtain approximately unbiased variance estimator. The selection of four 201 volume strata also supported having enough sample plots per stratum. If there was an empty 202 stratum, as in one of the municipalities in Pirkanmaa (no sample plots in waterbodies), two strata 203 of other land use (agriculture and waterbodies) were combined before estimation.

205 Estimation of forest characteristics and sampling variances
206 For forest areas and total volumes, post-stratification estimators (Cochran 1977, Haakana et al. 207 2019) were used. Forest areas and volumes and their variances were first estimated for each 208 stratum in each municipality and then aggregated to the municipalities. Because the municipality 209 areas from the map data differed slightly from the official area statistics provided by the National 210 Land Survey of Finland, the sum of stratum areas within a municipality was calibrated to the 211 official municipality area, the strata on land with the official land area and the stratum of 212 waterbodies separately with the area of waterbodies within the municipality in question.

213
214 Stratum-specific sampling variances were estimated by using local quadratic forms of cluster-215 level residuals (Matérn 1960, p. 110, Tomppo et al. 2011. In the variance estimation, the 216 correlation between the estimates from different strata was taken into account, due to the fact that 217 sample plots within one cluster could belong to different strata (Appendix A). Sampling 218 uncertainty was quantified through relative standard error (SE): where is a post- (Lempäälä) with a large portion of land use other than forestry. Total volume could be estimated 289 with a relative SE smaller than 10% for municipalities with an area of 390 km 2 or larger (Fig. 4).
290 The estimates of pine, spruce and birch volumes were less precise, but SEs were mostly less than 291 15% in municipalities larger than 390 km 2 (Fig. 5).   D r a f t           D r a f t

Appendix A. Post-stratified estimators and their sampling variance
For simplicity, we present the detailed formulae for the post-stratified estimators and their variances for the area of forest land. The estimators are presented for one arbitrary municipality; the numbers and the sums are taken over all sample plots within that municipality, and also the s are the stratum areas within it. The volume estimators were analogous to the area estimators, ℎ as explained in the end of this appendix. The motivation behind the basic formulae is discussed by Tomppo et al. (2011, sec. 3.5), for example.
Let denote the number of those sample plots of cluster that belong to stratum , and let is a cluster-level residual, is a group of clusters close to each other, ,ℎ = ,ℎℎ ,ℎ and the weights were determined so that each local quadratic form is an unbiased estimator of the variance of residuals (Matérn 1960, p. 110).
Groups either contained four temporary clusters forming a square or five clusters including a similar square of four temporary clusters and one permanent cluster in the center of the square D r a f t (cf. Fig. 2). In groups of four clusters, weights were for the SW and NE corner clusters 5 /4 and for the other two. In groups of five clusters, the permanent cluster in the center -5 /4 received weight and all temporary ones in the corners weight . Since D r a f t Fig. 9. Estimated pine volumes on forest land, derived with post-stratification (PS) and MS-NFI by the municipalities in Pirkanmaa province, and confidence intervals (2 × standard error) for PS estimates.