Discriminating the Severity of Basal Stem Rot Disease in Oil Palm (Elaeis guineensis Jacq.) Plantation Using Sentinel-2

Oil palm remains the most prominent plantation sector in Indonesia. Monitoring large scale plantations require specificity in datasets and approaches. Observing the extent of plantations has long been conducted using remotely-sensed data. With the complex issues in oil palm plantations, focus has also been made in investigating the utility of earth-observing satellites to estimate foliar nutrients and to identify the impact of plant diseases. The latter has been one of emerging research interests, especially the basal stem rot (BSR) caused by Ganoderma boninense. In this research, we proposed a freely-available, multispectral data of Sentinel-2 taking the advantage of frequent observation period. Preliminary discrimination of BSR impacts was examined by using a popular machine learning approach, i.e. random forests. Ground data as the reference were collected in Cikasungka plantation, Bogor, Indonesia. Using a tuned random forest model, we obtained overall accuracy about 82% with five distinctive targets, i.e. four severity levels and the normal trees. This research suggests that tuned random forest model could be invaluable for constructing a proper machine learning model that adaptive to data feeds.


Introduction
Oil palm (Elaeis guineensis Jacq.) plantations produce crude palm oil (CPO), which is often used as cooking oil, industrial oil, and fuel. Oil palm tree is originated from West Africa and has become popular after the industrial revolution leading to a high demand of palm oil. Oil palm is the most prominent plantation sector in South East Asia, especially Malaysia and Indonesia. In the upcoming years, it is estimated that the largest investment in the plantation sector remains in oil palm.
In Indonesia, oil palm plantations are generally managed by state-owned or private corporations, as well as small-scale farmers. Managing large plantations is a challenge in terms of monitoring schemes, therefore, the use of remote sensing technology is an efficient and effective strategy to collect plantationrelated data. Plantation sector has increasingly utilized remote sensing technology; details are presented elsewhere, for instance [1][2][3]. This also applies to oil palm plantations.
Investigations using numerous multispectral sensors with varying spatial resolutions in the tropics have been carried out. Within the context of oil palm plantations, these research include estimating stand age [4], detection of oil palm plantation areas and their dynamics [2,5], yield estimation [6], nutritional deficiencies [7], and monitoring of plant health [8]. This suggests that, over this time span, research foci have become more extensive; hence, it is necessary for earth observers to further assess the utility of Various studies have presented potential uses of imaging technologies/techniques for detecting disease in plants [9][10][11]. Pathogens have a detrimental impact on plants in stomatal conductance, photosynthetic pigments, and photosynthetic rates. In addition, noxious compounds in pathogens could cause damage to plants in a way that modifies reflectance of plant leaves [12,13]. Another study using QuickBird imagery to identify diseases and their spatial patterns showed that very high resolution images could be optimized to estimate location and the extent of oil palm infection [14]. Survey of the literature indicates that one of prominent problems in oil palm plantations is related to the impact of basal stem rot (BSR) disease. BSR disease is caused by Ganoderma boninense fungus. This disease has been in concern, causing significant tree mortality in several Indonesian plantations. G. boninense degrades lignin into carbon dioxide, water, and cellulose. BSR can easily be identified in the field. Common visual properties include unopened spear leaves, yellowing and necrotic leaves and small canopy coverage. In an expanded stage, fungal fruiting bodies and fallen stems can be observed. In current practice, the identification of the symptoms is based on manual observation, which is expensive and time consuming. With this reason, an alternative approach to detect BSR disease should be developed, based on visual symptoms, which could later be translated into the severity of BSR in a spatial unit. Discrimination of BSR severity can be carried out using machine learning models, including Random Forest (RF).
Studies have shown that RF algorithms can produce a better result [15][16][17]. RF has a little preprocessing requirement, because it is insensitive to varying data units and can handle unbalanced data [18] Using very high resolution QuickBird imagery for detecting BSR disease, RF was the best at predicting, classifying, and mapping the disease in comparison to Support Vector Machine (SVM) and classification and regression tree (CART) [19]. Nonetheless, research employing multispectral images mostly use images that are inaccessible for free and, some of them, require a long revisit time, leading to a weaker implementation. Freely available remote sensing datasets, including Sentinel-2, are therefore more suited for frequent observation. Unfortunately, the sensor has yet to be fully exploited for BSR detection and mapping.
With this reason, a research on the identification of BSR disease and the investigation of its severity levels using Sentinel-2 is required to assist monitoring over oil palm plantations. This research, therefore, aimed (1) to map the distribution of affected oil palm through a machine learning approach and (2) to investigate the role of vegetation indices available to Sentinel-2 data in discriminating the severity levels.

Data Collection and Analysis
General processing step is shown in Figure 2. It starts from field observation, pre-processing Sentinel-2, extracting pixel values according to ground sample locations, RF classification in R software, and producing distribution maps of BSR severity disease.
Firstly, coordinates of sample trees were taken utilizing a handheld navigation system, both for healthy and affected trees. Labelling severity was done in situ, based on four different symptoms (Table  1) [20]. In total, there were 85 samples taken during field surveys.
In this research, we used Sentinel-2A multispectral imagery taken on 23 February 2021. This image was downloaded from Copernicus Open Access Hub (https://scihub.copernicus.eu/dhus/#/home). It was preprocessed using Sentinel Application Platform (SNAP) by applying subset and resample procedures. Preprocessing was straightforward since level 2A of Sentinel-2 data is bottom-of-atmosphere reflectance. In order to investigate the statistics, pixel values were extracted from the imagery. Extracted pixel values were determined by rounded polygon (buffer) around field samples, performed using QGIS 3.16 software. This research used 7.5 m diameter from sample coordinates, considering planting space. To ease the analysis, only bands with spatial resolutions of 20 m and 10 m were considered (Table 2). With this setting, band 1 (coastal aerosol), band 9 (water vapor) and band 10 (SWIR -Cirrus) having 60 m spatial resolution were neglected. All selected bands were resampled onto 10 m for further analysis.
Vegetation indices were retrieved to understand their role in discriminating the severity levels of BSR. The values are generated by combining data from multiple spectral bands into single image. In this research, evaluated vegetation indices included Normalized Difference Vegetation Index (NDVI), Green Normalized Difference Vegetation Index (GNDVI), Green Blue Normalized Difference Vegetation Index (GBNDVI), Atmospherically Resistant Vegetation Index (ARVI), Soil Adjusted Vegetation Index (SAVI), and Simple Ratio (SR). Details are given in Table 3.  In this study, image analysis and prediction employed RStudio software. Pixel values obtained from 10 bands of Sentinel-2 imagery were utilized as variables in modeling. Modeling using machine learning involves data ingestion and partitioning, statistical modeling, predicting and measuring accuracy. Whole samples were separated randomly considering the amount of data contained in each severity class. Data separation into training and testing data was implemented using a ratio of 75% for training and 25% for testing data [27]. Eighty five polygons were rasterized into 153 pixels; hence, whole data were split into 114 and 39 pixels for training and testing, respectively (Table 4). This research employed random forest (RF) learning algorithm for classification [28]. Specifically, randomForest R package was chosen for RF classifier, allowing number of trees (ntree) parameter to be tuned. Confusion matrix was used for evaluating models. The last step was producing distribution of severe trees infected with G. Boninense based on the best RF model. Table 5 shows confusion matrix which represents accuracy of a model using default setting. Overall accuracy was about 82%. Table 5 explains that severe attack (level 4) tended to have higher accuracy, while the rests showed variably, ranging from 70% to 86%. Significant difference of level 4 was due to clear difference as this level has its tree canopy absent (Figure 3). This suggested that distinctive reflectance values were clearly observed and RF model was able to classify the target well. The Jeffries-Matusita distance indicated comparable outcome to this research, especially between healthy trees and light severity trees [29]. Oil palm canopy was found correlated with some spectral bands. Leaf pigment and canopy reflectance indicate that chlorophyll contents serve as a good indicator of photosynthesis and plant health conditions [30].  Figure 3. BSR symptoms in oil palm based on severity levels: (a) healthy, (b) light, (c) medium, (d) critical, (e) very critical/death Machine learning model may not be optimized in various conditions. Tuning parameters of machine learning models is therefore suggested to explore suitable parameter in order to understand their potential utilization [2, 16,17]. While ntree parameter is generally pre-determined with 500, this setting may not be optimum. Although the acquired data were fairly imbalance, this research indicated that RF model could adapt, similar to the finding of previous research [31]. Implementing the model into the whole Sentinel-2 image was done using raster package. Figure 4 shows the distribution of reflectance values based on the classification of BSR disease severity. In general, these two oil palm blocks have partially impacted by Ganoderma.

Sentinel-2 Vegetation Indices
Visible and NIR bands of Sentinel 2 were ingested into band math formula (Table 3) to evaluate their role in discriminating the severity. Theoretically, infrared bands can strongly be reflected by chlorophyll; hence, they are suitable to assess healthy and unhealthy plants. Figure 5 depicts the behaviour of vegetation indices according to severity levels.
NDVI makes use of the physical phenomenon of reflected light waves from leaves, scaling the responses into a range of -1 to 1. In general, high NDVI values were found in healthy stands ( Figure 5). Low impacted trees were found indistinguishable to healthy trees with this measure. This might be due to existence of green leaves in both levels 1 and 2. In devastated trees, chlorophyll was limited; hence, NDVI values were reduced. The patterns were found similar to all vegetation indices, except GNDVI. Due to limited datasets and the extent of GNDVI assessment, the reason is yet to be fully understood. It was suspected that this was due to the use of green band in GNDVI, instead of the red band. This warrants a future research, exploring deviant of the pattern. In general, this research found that contributions of vegetation indices were generally limited. With the possible diminishing effect in RF [32], adding vegetation indices into a set of spectral bands should carefully be taken.

Conclusion
This research showed that the use of RF model on the Sentinel-2 yielded an overall accuracy of about 82%. This indicates that RF can be invaluable for machine learning modeling for distinguishing the disturbance in oil palm plantation. With readily available dataset, Sentinel-2 has been invaluable to support a better management practice in oil palm plantations. Additional datasets like vegetation indices showed irresponsive to the discrimination, although they helped to distinguish severely impacted oil palm trees due to Ganoderma.