CANOPY NITROGEN ESTIMATION ON COTTON PLANT USING SATELLITE IMAGERY

: The optimization of nitrogen (N) management is becoming a key challenge to enhance crop yield production while protecting the environment. Analysis of canopy N content in crop plants is used as insights for fertilization management, in which actions can be taken to optimize N fertilizer usage. Traditionally, lab chemical processing is used to measure the crop plant’s nutrient content. However, the collection of leaf samples from the field is labour intensive, and it would be costly to increase sampling frequency. Thus, this approach may not be the most optimal for large plantations. Remote sensing applications in agriculture have been widely studied. This study aims to evaluate the potential of using Sentinel 2 imagery to predict canopy N content, as an alternative wide scale method as compared to traditional methods. A cotton plantation with about 50 square km area in the state of Mato Grosso, Brazil, was used as the case study. About 180 samples across the cotton plantation were collected between March and April 2022 and the N contents of the crop plants were measured using lab chemical processes. Sentinel 2 images within 15 days of the sampling dates were retrieved from ESA’s Copernicus Open Access Hub. This study proposes a Random Forest (RF) regression algorithm for the generation of an N prediction model. About 52 vegetation indices (VIs) were extracted as the features for model training, such as Normalized Difference Vegetation Index (NDVI) and Enhanced Vegetation Index (EVI). RF model allows easy measurement of the relative importance of each feature with respect to the prediction to achieve a good performance. Validation is done by using mean absolute error (MAE) and mean absolute percentage error (MAPE) to evaluate the prediction accuracy against the ground truth, which resulted to be 3.418 g/kg and 9.29% respectively. Finally, this study analyses the performance of the canopy N prediction model and assesses its ability as an alternative to traditional lab chemical sampling processes.


INTRODUCTION
Increasing global demand for agricultural products have resulted in more intensive farming that causes drains nutrients from the soil (Purwanto & Alam, 2019).Application of fertilizer replaces the nutrients for crop uptake which otherwise would have been insufficient.However, over-fertilisation of crop land occurs worldwide and may cause environmental issues due to surface run-off into the natural environment (Ritchie, 2021;Sishodia et al., 2020).Furthermore, application of excess fertiliser contributes to additional costs.This pushes for a need to identify the optimal amount of fertiliser to minimise cost as well as maintaining yield.This optimal amount depends on the plant status and thus the requirement to have an accurate estimate of the plant nutrient status.
Nitrogen is an important element in the plant, and is present in chlorophyll, amino acids and hence protein, nucleic acids, plant tissue, etc (Buchholz, 2022).Traditionally, N level can be obtained via lab chemical processing, however, this process takes time and increases the operational costs (Farella et al., 2022).Furthermore, the sampling results only applies for that set of leaf samples and does not represent the entire crop field.There is a need to consider alternate methods of estimating leaf nutrient that is cheaper and faster.
Satellite-based remote sensing provides an advantage in wide area monitoring and have been used for many different applications in agriculture, such as land use and crop classification, soil health and moisture, and vegetation health (Sishodia et al., 2020).Satellites such as European Space Agency's Sentinel 2 provides global coverage over land once every 5 days and have a wide imaging swath of 290 km (Sentinel-2).Sentinel 2 L2A product has 12 bands ranging from visible to short wave infrared at spectral resolutions ranging from 10m to 60m.It is provided for public access and is atmospherically corrected, hence it is a surface reflectance product.Band combinationssuch as band ratios, normalised band differences, or more complex formulaecan be derived for different purposes.There are many indices derived for agricultural purposes, such as Normalised Difference Vegetation Index (NDVI) and Enhanced Vegetation Index (EVI).These vegetation indices (VI) have their own advantages and limitations.One example is the usefulness of NDVI to determine broadly the vigour of a vegetated area, however it faces an oversaturation issue where NDVI value loses sensitivity beyond a vegetation density (Pettorelli et al., 2005).
Machine learning can be used to estimate nitrogen levels on crop leaves using satellite imagery by collecting data, extracting relevant features, training a machine learning model, and validating the model.Marang et al. (2021) proposed hybrid random forest regression, DBSCAN, and PCA to predict N level on cotton crop using hyperspectral UAV and sentinel imagery.Huang et al. (2015) estimated rice nitrogen status based on satellite imagery.R-squared was used to compute relationship between vegetation indices and rice N status.Moreover, Tan et al. (2020) proposed partial least square and remote sensing imagery to predict protein content.
Random forest (RF) regression is one of the common machine learning methods which have been used in estimation of foliar nitrogen levels (Abdel-Rahman et al., 2012;Soltanikazemi et al., 2022).RF is an ensemble method which uses a decision tree as a base estimator.It allows for easy measurement of the relative importance of each feature with respect to the prediction, hence achieving a good performance.In this study, we attempted to use RF regression model to estimate foliar nitrogen concentrations for cotton crops in Brazil.We want to validate the performance of applying RF regression model on selected indices based on Sentinel 2 and determine the limitations of the model.

STUDY AREA AND DATA COLLECTION
In this paper, field experiments were conducted from February to April 2022 over a cotton plantation in the state of Mato Grosso, Brazil with about 50 square km area, as shown in Figure 1.Leaf samples were collected during the vegetative and flowering stage of the cotton plants, between 70 to 120 days after emergence.The sampling process involve by first identifying a sampling coordinate (latitude, longitude) within the plantation.Leaf samples are then collected randomly within a 10m radius from the sampling coordinate.This process was iterated across different sampling coordinates and dates.Afterwards, the samples were handled by a professional vendor to measure the nitrogen content using chemical laboratory equipment, carried out with standardized procedure.The measured leaf nitrogen concentration of the leaf samples was then provided as g/kg.In this study, 180 samples were collected and used to generate and validate our model.The statistics of the sampled nitrogen concentration is as shown in Table 1 below We collected cloud-free Sentinel-2 L2A data that fall within 15 days from the sample date for each sampling coordinate.L2A products were used for our processing as the atmospheric effects were removed.We then up-sampled all bands to 10m spatial resolution, before extracting the 3x3 context pixels centred on the sampling coordinates.Vegetation indices (VIs) were computed to obtain the percentage of vegetation cover, amount of chlorophyll content, leaf area, and so on (As-Syakur et al. 2012;Brecht 2018;Broge and Leblanc 2001;Chen 1996;Duong et al. 2017;El-Shikha et al. 2008;Frampton et al. 2013;Gitelson et al. 2002;Hiphen-plant 2022;Huang et al. 2012;Main et al. 2011;Metternicht 2003;NDRE index 2023;Pro.arcgis;Rasul et al. 2018;Sentinel Hub;Vincini et al. 2008;Waqar et al. 2012;Xu 2006;Zhao and Chen, 2005).A total of 52 VIs were computed to get handcrafted features, as described in the Appendix.After which, 52 VIs and 12 bands were deployed as the feature inputs to the RF model.N estimators [200,400,600,800,1000,1200,1400,1600,1800,2000] Table 2. Settings for Hyperparameter Values as Input to the Random Search CV Algorithm

PERFORMANCE EVALUATION
Figure 3 shows the histogram of the dataset where we observe most N values ranged between 33 and 47 g/kg, therefore we assumed that values beyond this range were extreme values.In the experiment, we split the dataset randomly into 80% training and 20% testing with respect to the observed value.We observe that the model is able to achieve a better performance for N prediction where the values fall within the middle range highlighted above.Poorer performance for the extreme values could be attributed to lack of training data in that range, and that training with an imbalanced dataset resulted in higher errors.In addition, it is believed that the model loses sensitivity at the extreme ends of the distribution and the dataset is still lacking features with high correlation to N. Increase in variance of N distribution and the size of the dataset can be an alternative solution in the future to overcome the overfitting issue.Moreover, hyperspectral (HS) images can be an option to increase the number of features available for training since they are more sensitive than S2A images, which are multispectral images.

CONCLUSION
This study proposed the use of Sentinel-2 imagery and machine learning method, specifically RF model, to predict the amount of N present in the canopy of the cotton crops, in hopes of replacing the traditional method, with the end goal to save cost and time.RF model was chosen since it could calculate which feature was important to the prediction to achieve a high accuracy, with the top 15 features being the most important on the N prediction to be Band 11, CSI, Band 1, IRECI, Band 5, MNDWI, BRBA, CCCI_ALT, Band 12, Band 9, NDVI3, MCARI_ALT, BUI, Band 2, Band 3. The validation metric used was MAE which resulted to be 3.418 g/kg, which was MAPE of 9.29% for the testing dataset.While MAE is considerably low, more could be done since MAE was mostly contributed by the inaccurate prediction of extreme N values.Improvements such as using hyperspectral imagery instead of Sentinel-2 imagery and using different VIs of different feature importance based on different methods could be done.Even though VIs among different methods could be of different feature importance, stacking them with the consideration of different methods could prevent overfitting and should be used to monitor agricultural fields to improve classification accuracy.In addition, boosting the dataset with more data, especially in the extreme range, would help to resolve the imbalanced dataset issue.

Figure 2
Figure 2 shows the overall flowchart of the proposed method, including pixel extraction, feature extraction, and RF regression model.Firstly, Sentinel-2 images and sample coordinates were used to retrieve sampled foliar nitrogen values and their respective pixels.The 52 VIs were then computed.Next, the training dataset was generated from the 12 bands and 52 VIs.The best hyperparameters of the RF model were chosen with reference to the training dataset, with the aid of Random Search CV optimization algorithm.Table2shows the hyperparameters that were fed into the optimization algorithm.The dataset was then incorporated into the model for training.

Figure 5 .
Figure 5. Prediction Plot a). using Training Data, b). using Testing Data

Table 1 .
. Statistics of Nitrogen samples in g/kg The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLVIII-M-1-2023 39th International Symposium on Remote Sensing of Environment (ISRSE-39) "From Human Needs to SDGs", 24-28 April 2023, Antalya, Türkiye