High-resolution hybrid MODIS-Landsat estimation of post-monsoon 1 agricultural burned area in northwestern India 2

Department of Earth and Environmental Sciences, Columbia University, New York, 5 USA 6 Department of Earth and Planetary Sciences, Harvard University, Cambridge, USA 7 Department of Ecology, Evolution, and Environmental Biology, Columbia University, 8 New York, USA 9 The Earth Institute, Columbia University, New York, USA 10 School for Environment and Sustainability, University of Michigan, Ann Arbor, USA 11 Department of Mathematics, University of Petroleum and Energy Studies, Dehradun, 12 Uttarakhand, India 13 Environmental Defense Fund, Washington DC, USA 14


Agricultural residue burning in northwestern India 52
India is embracing agricultural mechanisation to increase crop productivity and 53 decrease labour costs in order to feed its rapidly growing population (Mehta et al. 54 2014). Agriculture in India is currently 40-45% mechanised, below that of the United 55 States, Russia, Western Europe, China and Brazil (57-95%) (Bai 2014; Mehta et al. 56 2014). India's population is expected to grow from 1.3 billion in 2015 to 1.7 billion by 57 2050 (UN 2015). This population surge demands sustainable increases in crop 58 productivity, intensity and yield, which in turn affects the rise of agricultural 59 mechanisation. Traditionally, farmers collect crop residue to feed livestock. However, 60 as India mechanises, farmers are using combine harvesters, which leave behind 61 scattered crop residues that are labour intensive to remove manually (Vadrevu et al. 62 2011; Kumar et al. 2015). Consequently, 80-90% of crop residue left behind by 63 combine harvesters is burned in field, which can severely degrade regional air quality 64 seasonally ( to May), wheat residue is burned to prepare fields for sowing the monsoon crop. In 82 general, carbonaceous particles can be transported hundreds of kilometres in the 83 atmosphere (Sharma et al. 2010; Kaskaoutis et al. 2014). Besides air quality degradation 84 and public health impacts, crop residue burning reduces soil quality by depleting 85 organic matter, major nutrients, and microbial biomass (PRSC 2015). This inhibits the 86 productivity of the next cropping season. However, previous work using satellite fire 87 detections and HYSPLIT atmospheric back trajectories suggests that pre-monsoon 88 wheat residue burning is of less concern to the Delhi National Capital Region's air 89 quality than post-monsoon rice residue burning due to different atmospheric transport 90 patterns, higher ventilation from high boundary layer conditions, and less overall fire 91 intensity (Liu et al. 2018

Burned area estimation 204
Previous studies on high-resolution agricultural burned area estimation in northwestern 205 India are generally constrained to 1-2 years of study (e.g. NBR-based thresholds, based on 1-km MODIS active fire detections for selecting 220 burned and unburned training pixels, and is validated with Landsat-derived burned area 221 maps (Giglio et al. 2009). Here we use MCD64A1 as a training dataset due to the lack 222 of extensive ground data and remotely-sensed fire datasets at higher spatial resolution 223 for the duration of the study period and extent of the study region. In addition, we find 224 that Landsat images are too low in spatial resolution for visual interpretation, or to 225 definitively separate bare soil and burned fields and therefore obtain burned and 226 unburned training samples. While the Google Earth collection of DigitalGlobe and 227 CNES/Airbus imagery at sub-meter to meter resolution are viable for visual 228 interpretation, publicly available historical images are limited, often acquired outside 229 the post-monsoon period. Consequently, ModL2T adapts the MCD64A1 algorithm for 230 use with Landsat imagery in GEE. We improve on "baseline" MCD64A1 burned area 231 estimation with a Landsat-driven small fire boostsimilar to the GFEDv4s approach of 232 using active fires to boost MCD64A1that increases the spatial resolution (500 m to 30 233 m) but decreases the temporal resolution (daily to bimonthly) of MCD64A1. 234 We use the near infrared and shortwave infrared SR bands from MODIS/Terra 236 (MOD09A1) and Landsat 5 (TM), 7 (ETM+), and 8 (OLI/TIRS) SR products to 237 estimate NBR (Tables S1, S2). We use MODIS/Terra daily surface reflectance rather 238 than that of Aqua, because the local daytime overpass time of the MODIS/Terra (10.30 239 am)that of the MODIS/Aqua is 1.30 pmis comparable with that of Landsat (10.00 240 am ± 15 minutes). MOD09A1 is a gridded Level-3, validated stage 2 product that 241 selects the best quality pixel over every 8-day period based on several criteria: cloud 242 cover, observation coverage, low-view angle and aerosol loading (Vermote et al. 2008 where T is the NBRmax or NBRmin threshold, Q( ) is the quantile function at percentile 280 of the probability density function, f, of the distribution of NBRmin or NBRmax at burned 281 (X) and unburned (Y) agricultural areas. This approach attempts to balance omission and 282 commission errors. Tmax ranges from 0.635 to 0.706, and Tmin ranges from -0.057 to -283 0.014. Figure 4 shows an example of derived resolution. Sensor-specific differences in spectral band wavelengths and the lack of 296 Landsat availability can also introduce bias (Table S2, Figure S1). Thus, before deriving 297 burned area from Landsat imagery, we correct for bias in Landsat NBR composites by 298 adding the yearly regionally-averaged differences in MODIS and resampled Landsat 299 NBR to Landsat NBR for all Landsat platforms. The compensation for Landsat NBRmax 300 ranges from 0.012 to 0.114, and that for NBRmin ranges from -0.073 to 0.012. In this 301 step, we also combine the MODIS-derived burned area with BAMCD64A1 to minimize 302 omission error generated by differences in the MCD64A1 and ModL2T algorithms. 303 [FIGURE 4] 304 Next, to merge the separately derived MODIS and Landsat classified burned 305 area, we 'carve' out moderate-resolution MODIS burned pixels with high-resolution 306 Landsat burned pixels ( Figure S1). That is, we are more confident in Landsat to 307 distinguish between burned and unburned fields, whereas MODIS more severely 308 homogenizes large aggregates of individual landholdings due to its coarser spatial 309 resolution. However, due to Landsat's coarse temporal resolution, we are not confident 310 in Landsat to accurately capture the highest NBRmax and lowest NBRmin when its usable 311 data availability is temporally-sparse and/or biased. for composites of best quality Landsat pixels from different acquisition dates and 318 sensor-specific differences in spectral band wavelengths (Table S2). While 0.1 is an 319 arbitrary selection, a large departure of Landsat from MODIS NBR indicates that pixels 320 of available Landsat scenes are generally cloudy and/or do not capture scenes near peak 321 monsoon growing season (NBRmax) and/or in the post-burning (NBRmin) period when 322 the burn scar is still visible. Furthermore, it may be the case that there are some Landsat 323 observations in the two-month windows for the pre-fire and post-fire collections, but the 324 acquisition dates of 'best quality' Landsat pixels may not be close to that for MODIS 325 pixels. In the last step, we apply an agricultural mask based on GlobeLand30 land 326 cover. The final ModL2T-derived burned area (BAModL2T) is an estimate of the total 327 post-monsoon agricultural burned area at the Landsat 30-m resolution. 328 We also assign confidence scores to BAModL2T on a pixel-by-pixel basis by 329 designating different categorical values to burned area derived from MCD64A1, 330 Landsat-only ModL2T, and MODIS (MOD09A1)-only ModL2T. We are most 331 confident in MCD64A1 and least confident in MODIS-only ModL2T, so we assign 332 BAMCD64A1 a value of 3, Landsat-only BAModL2T a value of 2, and MODIS-only 333 BAModL2T a value of 1. Adding these burned area layers together yields a confidence 334 scale from 1 (low) to 6 (high) ( Table S4). 335

MCD64A1-based geographical accuracy assessment 336
We use MCD64A1 as the reference dataset in a geographic accuracy assessment of the 337 two-tailed threshold burned area classification algorithm. Here, we compare MCD64A1 338 with MODIS (MOD09A1)-only BAModL2T in order to evaluate the burned area 339 classification algorithms on a pixel-by-pixel basis at the MODIS 500-m resolution. We 340 estimate Cohen's kappa coefficient (κ), which evaluates the agreement between the 341 reference and test classification after random chance is removed (Cohen 1960 (Giglio 2015). This assessment is based on the fraction of 370 VIIRS active fires co-located within the classified burned area; a higher fraction 371 indicates a lower omission error. BAModL2T and BAMCD64A1 are first resampled to 1 km 372 to account for off-nadir MODIS and VIIRS pixel area. 373 values represent hazy conditions and generally poor air quality. We use Level-2 AOD 389 product from MODIS/Terra, operationally available at 3 km and 10 km pixel resolution, 390 to assess detrended correlation with BAModL2T (Table S1) for differentiating between dark and bright land areas. In this study, we use the best-402 quality retrievals of the combined DT/DB AOD data (for only quality flag = 3 403 retrievals). Additionally, the 3 km AOD retrievals are also used to analyse spatial 404 distribution of aerosol loading at a higher resolution and study relationship with burned 405 area. The 3 km AOD data are based on DT retrievals, limited to vegetated pixels, which 406 cover the majority of Punjab and Haryana. The uncertainty of the 3 km AOD retrieval is 407 reported as ±(0.05 + 0.15τ) (Munchak et al. 2013), where represents AOD. 408

Landholdings and combine harvesters 409
We consider ancillary data in landholding size and combine harvester use to assess 410 trends in farm fragmentation and mechanisation. The Agricultural Census division of 411

Methods of crop residue burning 428
In a field visit, Kumar

Comparison to MCD641 burned area estimates 456
The strength of agreement (Cohen's κ) between BAMCD64A1 and MODIS-only BAModL2T 457 is consistent and ranges from 0.4-0.53 (moderate) (Landis and Koch 1977). Overall 458 accuracy ranges from 82-89%. ModL2T averages 66 ± 31% higher post-monsoon 459 burned area than MCD64A1 in Punjab and Haryana from 2003-2016 ( Figure 6, Table  460 S3). We estimate 49-72% of BAModL2T with good confidence (score ≥ 3) ( Figure S2) We validate BAModL2T with independent household survey results from 2016. 477 We compare post-monsoon village-level survey crop residue burning rates, normalised 478 by landholding size, with BAModL2T expressed as a fraction of cropland area. The 479 village-level fraction of surveyed households that burn crop residue is moderately 480 correlated with fractional BAModL2T (r = 0.62, p < 0.01) (Figure 8(a)). In contrast, 481 BAMCD64A1 achieves a weaker correlation of r = 0.54 (p < 0.01) and tends to cluster at 482 fractions burned of 0 or 1, likely due to its moderate spatial resolution (Figure 8(b)). 483 BAMCD64A1 and BAModL2T explain 28% and 37% of variability (adjusted R 2 ) in survey 484 burn rates, respectively, indicating that BAModL2T is better able to capture variability in 485 the 'ground truth' burn rates. 486

Trends in landholding size and combine harvesters 535
The median landholding size in Haryana (1-2 ha) is smaller than that of Punjab (2-3 ha); 536 only ~0.5% of landholdings in Haryana and ~1% in Punjab are over 20 ha (Figure 9). 537 After some consolidation of small landholdings from 1995-96 to 2000-01, landholdings 538 were increasingly fragmented from 2000-01 to 2010-11. Landholdings smaller than 7. Landsat imagery has been primarily limited by: (1) its low temporal resolution (16 days) 568 and (2) storage and computing power. To minimise these limitations, we implement a 569 hybrid MODIS-Landsat approach in Google Earth Engine, a cloud-computing platform 570 with petabyte-scale storage, to rapidly process large collections of MODIS and Landsat 571 imagery and expand the spatio-temporal range of study. 572 Here we aim to improve BAMCD64A1 by extrapolating from the MCD64A1 573 training data, which we assume to be valid, and adding Landsat SR as an input. We 574 caution that use of MCD64A1 as a training dataset should be amended with availability 575 of ground data or fine-resolution multispectral imagery. Given this limitation, ModL2T 576 improves on MCD64A1 from the spatial resolution rather than the algorithm 577 perspective. The higher average contribution of Landsat-only BAModL2T (33%) over 578 MODIS-only BAModL2T (6%) to overall BAModL2T confirms that additional burned area 579 from ModL2T relative to MCD64A1 is primarily driven by integration of Landsat 580 imagery rather differences in the ModL2T and MCD64A1 algorithms. 581 In comparison to MCD64A1, the ModL2T algorithm estimates on average 66 ± 582 31% higher burned area in Haryana and Punjab during post-monsoon, from 2003-2016. 583 We validate the BAModL2T with survey data from 2016. The higher correlation (r = 0.62, 584 p < 0.01) between village-level fractions of households that burn crop residue, 585 normalised by landholding area, and BAModL2T, compared to BAMCD64A1 (r = 0.54, p < 586 0.01), of total village cropland area suggests that the ModL2T algorithm can estimate 587 burned area with increased accuracy. According to this validation, both ModL2T and 588 MCD64A1 tend to underestimate burned area in northern Punjab villages and 589 overestimate that in northeastern Haryana villages . The homogenous definition of the  590  time range for pre-fire and post-fire collections for the ModL2T algorithm may have  591 restricted burned scar detection. For example, the northern Punjab districts of 592 Kapurthala and Jalandhar tend to burn earlier than other districts. Thus, more spatially 593 dynamic temporal specifications of the pre-fire and post-fire image collections and 594 detailed knowledge of the cropping patterns may decrease omission errors. 595 In additional assessments, we find that BAModL2T improves on BAMCD64A1 in 596 terms of omission error, comparison with previous estimates of burned area, and 597 relationship with satellite AOD. First, we find that BAModL2T captures 95-100% of 598 VIIRS active fires within its extent, while BAMCD64A1 is only co-located with 69-76% of 599 VIIRS active fires. Second, BAModL2T improves on BAMCD64A1 in terms of mean 600 absolute AOD can be influenced by other local and regional post-monsoon pollution sources, 627 such as urban and industrial emissions and Diwali festival fireworks (Cusworth et al. 628 2018). While the % valid pixels used for estimating mean regional AOD is relatively 629 consistent across years (38 ± 3%), Cusworth et al. (2018) found that the MODIS cloud 630 algorithm confuses thick haze with clouds, implying underestimation of AOD for days 631 with severe haze, as in November 2016. 632 633   BAMCD64A1, which the GFEDv4s fire emissions inventory relies on, is derived from  634 MODIS, a moderate-resolution satellite (500 m). In India, however, the average 635 landholding tends to be comparatively small and fragmented (Misri 1999 a result of the demand for  729  higher crop productivity and agricultural mechanisation, crop residue burning rates may  730  accelerate unless alternative, more sustainable methods become viable and cost-time  731 effective. can improve accuracy in burned area estimation and fire emissions inventories for more 738 recent years of study (e.g. Wang et al. 2017). For example, the emissions factor for 739 partial burning may be higher than whole field burning, but its burn scar is sub-740 landholding size and its emissions footprint is therefore difficult to estimate even at 741

Limitations of burned area algorithms in northwestern India
Landsat resolution. Fine-resolution sensors can be used to distinguish the spatial 742 patterns of the burning practices to better inform fire emissions inventories retroactively 743 and proactively. Additionally, the coupling of cloud computing and geospatial datasets 744 in GEE makes near-real time analysis possible for policy and management decisions 745 (Gorelick et al. 2017 to train and validate algorithms. 754

Conclusion 755
The two-fold problem of satellite spatial and temporal limitations poses a difficult 756 challenge for estimating burned area from agricultural fires. In particular, the small 757 landholdings in the region and the short duration of agricultural fires require both high 758 spatial and temporal satellite resolution. MODIS burned area product MCD64A1 is 759 limited by moderate spatial resolution (500 m), and the GFEDv4s small fires boost to 760 MCD64A1 further limits the spatial resolution (0.25°). In this study, we develop a 761 hybrid approach (ModL2T) that leverages the temporal resolution of MODIS (daily, 762 500 m) and spatial resolution of Landsat (every 16 days, 30 m) in a two-step NBR-763 based classification. Additionally, we use the Google Earth Engine platform to rapidly 764 run the ModL2T algorithm using all available MODIS and Landsat images within the 765 defined pre-fire and post-fire time periods to classify post-monsoon (October to 766 November) burned area. The ModL2T algorithm estimates 66 ± 31% higher post-767 monsoon burned area than MCD64A1 in Punjab and Haryana from 2003-2016. In 768 future work, the high-resolution BAModL2T (30 m) dataset, which moderately well agrees 769 (r = 0.62) with independent household survey results, can be used to build an emissions 770 inventory for post-monsoon agricultural fires in Punjab and Haryana and re-evaluate -771 and likely previously underestimatedregional public health effects. Lastly, the 772 methods described in this study may be useful in other regions with high concentrations 773 of small fires and in improving global fire emissions inventories currently based on 774 moderate-resolution satellite products. 775

Acknowledgements 776
We acknowledge the Columbia University Department of Earth and Environmental 777 Sciences Young Investigator Award and Earth Institute Research Assistantship program 778 for support for this work, as well as the Columbia University President's Global 779 Innovation Fund. This work was also supported by a National Science Foundation 780 Graduate The study area is bounded by a red box. 1026 Punjab and Haryana than MCD64A1. The curved arrows denote the relative increase in 1054 burned area mapped by ModL2T compared to MCD64A1. 1055 +73% +110% +112% +73% +107% +54% +70% +33% +35% +92% +73% +32% +31% +33% ModL2T