Estimating Sugarcane Maturity Using High Spatial Resolution Remote Sensing Images

: Sugarcane suffers from the increased frequency and severity of droughts and floods, negatively affecting growing conditions. Climate change has affected cultivation, and the growth dynamics have changed over the years. The identification of the development stages of sugarcane is necessary to reduce its vulnerability. Traditional methods are inefficient when detecting those changes, especially when estimating sugarcane maturity—a critical step in sugarcane production. Hence, the study aimed to develop a cost-and time-effective method to estimate sugarcane maturity using high spatial-resolution remote sensing data. Images were acquired using a drone. Field samples were collected and measured in the laboratory for brix and pol values. Normalized Difference Water Index, Green Normalized Difference Vegetation Index and green band were chosen (highest correlation with field samples) for further analysis. Random forest (RF), Support Vector Machine (SVM), and multi-linear regression models were used to predict sugarcane maturity using the brix and pol variables. The best performance was obtained from the RF model. Hence, the maturity index of the study area was calculated based on the RF model results. It was found that the field plot has not yet reached maturity for harvesting. The developed cost-and time-effective method allows temporal crop monitoring and optimizes the harvest time.


Introduction
Sugarcane is a tall perennial grass in the genus Saccharum, used mainly but not solely for sugar production [1].The plants are usually 2-6 m tall with stout, jointed, fibrous stalks rich in sucrose, accumulating in the stalk internodes (the space between the joints of the stalk) [2].Costa Rica is one of the countries that cultivate sugarcane.Costa Rica's sugar agro-industry contributes 0.33% to the national Gross Domestic Product (GDP) and 3.83% to the agricultural GDP, generating more than USD 200 million in revenue annually [3].The industry is estimated to create 25,000 direct jobs and more than 100,000 indirect jobs, representing 9.3% of agricultural and 1.3% of national employment [3].
Sugarcane crops have a long growth cycle.The temperature and moisture levels should be sufficient for optimal production as the crop is highly weather-dependent.However, over the years, sugarcane has suffered from the increased frequency and severity of droughts and floods, negatively affecting growing conditions [4].The province of Guanacaste, the primary sugarcane production area in Costa Rica, is one of the most vulnerable provinces in the country in terms of drought and adverse effects of climate change related to its location on the central-American dry corridor, a route associated with a rainfall distribution problem [5].Climate change has affected the Guanacaste province's sugarcane cultivation and industrial production.Hence, considering the adverse effects of climate change on sugarcane cultivation and, thus, the socioeconomic status of Costa Rica, it is necessary to identify the most vulnerable points in the production cycle and develop strategies to reduce the vulnerability.The identification of vulnerable points in the production cycle will promote its competitiveness and facilitate its adaptation to minimize the negative impacts [6].However, the development stages of sugarcane must be identified to understand the most vulnerable points in sugarcane production.
There are four main phenological development stages of sugarcane: (i) Germination and establishment-this initial stage is where sugarcane seeds or cuttings start to germinate, and young shoots begin to establish.Roots begin to develop and anchor the plant, and the first shoots break through the soil [7].(ii) Tillering and canopy development-during this phase, the plant produces additional shoots (known as tillers).These tillers expand to form a canopy, which helps in photosynthesis and plant development [8].(iii) Grand growth and elongation phase-the phase of rapid growth where the sugarcane stalks grow tall and the internodes elongate.The plant accumulates biomass and stores energy in the form of sucrose in the stalks [7].(iv) Maturity and ripening-in the final stage, the sugarcane stalks mature, and the sugar content increases until the cane is ready for harvest.The leaves start to dry, and the stalks change color as they ripen [8].Typically, each stage takes up to three months, completing a cycle of one year.Usually, in Costa Rica, there are four to eight cycles per plot before planting again, when yield and profits diminish [9].
One of the most critical points in sugarcane production is the crop's maturation process or the final phase (fourth phenological development phase) [4].During the final phase, climate variability strongly affects sugarcane growth, causing damage such as reduced sucrose accumulation in the stalks and lower juice quality [4].With the recent climate variations and environmental changes, agricultural practices have already been changed to overcome this challenge, and sugarcane cultivators have started to use ripening agents.Ripening agents ensure the plant's correct sucrose accumulation and juice quality at the time of harvesting [10].However, these agents should be used in the right phenological stage of the plant.As previously stated, the crop's maturation process is one of the most critical points in sugarcane production [4] and thus is the benchmark for applying ripening agents.Therefore, estimating maturity is crucial for optimizing harvest timing to maximize sucrose content and minimize yield losses in the sugar industry [11].According to Singh et al. [12], proper maturity assessment guides farmers in determining the ideal harvesting time, impacting sugar quality, yield, and profitability.The traditional method to identify the maturity stage is to take and analyze the sucrose concentration of its juices in the laboratory and determine its degree of maturation [6].This is a time-consuming and costly method due to the density of the plants and extensions of the plantations.Hence, the reduced sampling strategies are implemented to overcome financial burdens.Still, those reduced samples do not represent the field variations and thus create places where the maturity is not accounted for [13].The complexity and variation of sugarcane and its vital place in the economy drive the development of a time-and cost-effective method for measuring maturity while growing [4].
Simões et al. [13] found that spectral indices such as Simple Ratio (SR), Normalized Difference Vegetation Index (NDVI), and Soil-Adjusted Vegetation Index (SAVI) were highly correlated with agronomic parameters (Leaf Area Index (LAI), Number of Stalks per Meter, Yield, and Total Biomass).The study confirmed those relations are effective in crop monitoring and yield estimation.Similar results were also found by Lofton et al. [14].The method specified in Zhao et al. [8] could predict sucrose content in sugarcane with 90% accuracy.These statistics were significantly higher than the accuracy of the conventional method (only accurate around 70%).Cruz-Sanabria et al. [15] showed that their method could identify phenological stages with an accuracy of 92.45% using the indices-NDVI, GNDVI, SAVI, Perpendicular Vegetation Index (PVI) and LAI.The study concluded that the proposed method accurately and effectively identifies sugarcane's phenological stages.Yeasin et al. [16] conducted a similar study using machine learning models in combination with multispectral images and concluded 88% accuracy in predicting the phenological stage of sugarcane.Another study conducted by Gunnula et al. [17] showed that ground mea-surements and remote sensing data can effectively assess sugarcane's growth and maturity.NDVI and Enhanced Vegetation Index (EVI) were good indicators of sugarcane growth, while LAI was a good indicator of sugarcane maturity [18].However, all these studies used space-borne multispectral images and their spatial resolutions were limited to at least 10 m to 30 m.Hence, there is a demand for efficient methods that use multispectral images of very high spatial resolution due to the significant findings of multiple studies [4,[18][19][20] and the spatial variation of sugarcane fields.
This project aimed to develop a method to estimate sugarcane maturity using high spatial resolution remote sensing data as an alternative to traditional methods.The specific objectives were to (a) identify the optimal remote sensing band combination or indices using the correlation between filed samples and remote sensing data; (b) utilize three regression models-two machine learning algorithms and a multiple linear model to estimate sugarcane maturity; and (c) compare these regression models, using performance metrics, for selecting the most suitable model and map the spatial distribution of sugarcane maturity over the field.

Study Area and Data
The research was conducted at "Agrícola El Cántaro" farm, located at Bebedero, Guanacaste province, Costa Rica (Figure 1).The extent of the sample field is about 14 ha.Its central coordinates are 10 • 21 ′ 11.87 ′′ N, 85 • 8 ′ 16.56 ′′ W. This field is characterized by loam soil composition with particle distribution of 43% sand, 23% clay and 34% silt, obtained from data captured on the field.A surface irrigation method with a surface slope of 1‰ is employed there.From hourly data captured on the field between the last harvest and the remote sensing data capturing date, the cumulative rainfall was 553 mm, and the average temperature was 28.Five representative plots with the extent of 1 m × 1.7 m were selected within the study area.Sugarcane plants inside each plot (Figure 1) were cut and weighed to extract fiber, Brix, pol, moisture, Purity, and sugar content measurements.These data were extracted by the Quality Assurance Laboratory at Taboga Sugar Mill using the methodology proposed by the Council of Sugarcane, Sugar and Ethanol Producers of the State of Sao Paulo [22].Each sampling location was surveyed with a Real Time Kinematics (RTK) global navigation satellite system [23] for higher positional accuracy.Accessibility is one of the cri- The primary purpose of this field is to produce sugar, and the variety cultivated is CP 72-2086.This variety was developed by the United States Sugar Corporation Research Farm in Canal Point (CP), Florida [21].The first two numbers after CP represent the year the first clonal crop resulting from a cross-pollination was planted; the numbers after the hyphen represent the accession number of that cultivar in the year it was named [21].CP 70-2086 is adapted to South Florida weather and its tropical climate; it has a preference for sandy organic soils, it yields a sugar content and tonnage from medium to high, and its canopy characteristics are high arch and slow canopy closure with medium tillering and stalk size.Usually, the harvest season is from January to mid-March with it being prone to mechanical harvest, with a fair to good stubbing ability, and this variety has the advantages of late flowering and that the sugar content increases through the harvest season [21].
This field was in its fourth-year production cycle, with a mature age of 255 days at the moment of the field sampling.The planting pattern is single-spaced with 1.7 m spacing between furrows.Consistent cultural practices, irrigation, drainage, and mechanization were applied uniformly across the plot.
Five representative plots with the extent of 1 m × 1.7 m were selected within the study area.Sugarcane plants inside each plot (Figure 1) were cut and weighed to extract fiber, Brix, pol, moisture, Purity, and sugar content measurements.These data were extracted by the Quality Assurance Laboratory at Taboga Sugar Mill using the methodology proposed by the Council of Sugarcane, Sugar and Ethanol Producers of the State of Sao Paulo [22].Each sampling location was surveyed with a Real Time Kinematics (RTK) global navigation satellite system [23] for higher positional accuracy.Accessibility is one of the criteria when selecting sample plots.As sugarcane matures, the plot becomes denser and denser, making it almost impossible to reach some areas.The second criterion was the spatial distribution of plots within the study area.Although we considered a random distribution of plots, the appearance was checked.Young trees were not ready to harvest in some areas; hence, attention was given to areas with mature trees.
Multispectral images were captured from a Micasense RedEdge-P camera (AgEagle Aerial Systems Inc., Wichita, KS, USA) [24], mounted on a DJI enterprise Matrice-300 drone (DJI, Shenzhen, China) [25].These images have five multispectral bands and one panchromatic band.The spectral resolution of each multispectral band is available in Table 1.The panchromatic sensor available in this camera was not used for the analysis.The spatial resolution of 5 cm was achieved by flying the drone 80 m above the ground level.

Method
Figure 2 shows the overall method.The drone images were processed using Pix4Dmapper software (Version 4.9, Pix4D S.A., Prilly, Switzerland) to obtain the radiometrically corrected orthomosaic using the calibration panel and the sensor-specific parameters provided by the manufacturer [26,27].Then, as shown in Table 2, different vegetation indices were calculated.Different indices were selected based on their spectral response to sugarcane at various phenological stages.For example, when plants mature, they appear greener; thus, indices derived from green and near-infrared bands provide distinct results from those of other spectral bands.Also, when sugarcane ripens, it becomes brownish, and indices derived from the red band are more suitable.After investigating successful studies that used different vegetation indices, a potential, best-fitted list was compiled (Table 2).NDVI indicates vegetation health, more specifically, the level of leaf chlorophyll, and thus is valuable for vegetation health and yield analysis [8]. Green Normalized Difference Vegetation Index (GNDVI) is an index similar to NDVI but uses the green band instead of the red band [13]; it helps to estimate a plant's chlorophyll content.The Normalized Difference Water Index (NDWI) illustrates moisture and helps to determine the water stress in plants [8].This is an important aspect to consider regarding sugarcane-when there is a lack of water or water stress, the crop starts accumulating sucrose [28].The Simple Ratio (SR) and Ratio Vegetation Index (RVI) were used by Simões et al. [13] in their research to estimate sugarcane yield and biomass successfully.INSEY1 and 2 are based on NDVI but adjusted with the Cumulative Growing Degree Days (CGDD) and by Days from Harvest (DFH), respectively (Table 2).This was successfully used by Lofton et al. [14] to estimate sugarcane yield potential.
vegetation indices were calculated.Different indices were selected based on their spectral response to sugarcane at various phenological stages.For example, when plants mature, they appear greener; thus, indices derived from green and near-infrared bands provide distinct results from those of other spectral bands.Also, when sugarcane ripens, it becomes brownish, and indices derived from the red band are more suitable.After investigating successful studies that used different vegetation indices, a potential, bestfitted list was compiled (Table 2).NDVI indicates vegetation health, more specifically, the level of leaf chlorophyll, and thus is valuable for vegetation health and yield analysis [8]. Green Normalized Difference Vegetation Index (GNDVI) is an index similar to NDVI but uses the green band instead of the red band [13]; it helps to estimate a plant's chlorophyll content.The Normalized Difference Water Index (NDWI) illustrates moisture and helps to determine the water stress in plants [8].This is an important aspect to consider regarding sugarcane-when there is a lack of water or water stress, the crop starts accumulating sucrose [28].The Simple Ratio (SR) and Ratio Vegetation Index (RVI) were used by Simões et al. [13] in their research to estimate sugarcane yield and biomass successfully.INSEY1 and 2 are based on NDVI but adjusted with the Cumulative Growing Degree Days (CGDD) and by Days from Harvest (DFH), respectively (Table 2).This was successfully used by Lofton et al. [14] to estimate sugarcane yield potential.Then, the sugarcane coverage was isolated in the study area by classifying the orthomosaic.Four general classes were identified within the study area: sugarcane, shadows, soil, and other vegetation.For each class, over two hundred training polygons were created to ensure higher accuracy in the classification results.After that, the Maximum Likelihood classification algorithm was applied.The classification algorithm was selected based on the characteristics of the orthomosaic and the variety planted.Once the classification was completed, the sugarcane coverage within each field sample was extracted.

Name Equation Reference
Normalized Difference Vegetation Index (NDVI) Green Normalized Difference Vegetation Index (GNDVI) N IR−G N IR+G [30] Normalized Difference Water Index (NDWI) Simple Ratio (SR) N IR R [31] Ratio Vegetation Index (RVI) R N IR [14] INSEY 1 NDV I CGDD [15] INSEY 2 NDV I DFP [15] Cumulative Growing Degree Days (CGDD) ∑ Random points were generated within each sample plot, and spectral values for each band and vegetation indices (VIs) were extracted.The correlations between spectral band values, vegetation indices, and laboratory-analyzed data were calculated using the Pearson correlation coefficient, which provides a statistical measure of the strengths of linear association between these variables.A correlation coefficient greater than ±0.6 is generally considered a variable with strong relationships [33].Hence, all combinations greater than ±0.6 were selected for further analysis.After that, regression models were created to predict brix and pol using two machine learning algorithms: Random Forest [34] and Support Vector Machine [35] and Mixed Linear Model (MLM).
RF algorithm is an ensemble learning method that uses multiple decision trees to classify data, employing a voting scenario to assign labels to unlabeled samples.Each tree depends on the values of a random vector sampled independently and with the same distribution for all the trees in the forest, thus reducing variance without increasing bias [34,36].The SVM is a binary linear classifier that can be extended to manage non-linearly separable data using the kernel trick and soft margin methods.This algorithm implements the idea that input vectors are non-linearly mapped to a very high-dimension feature space, and a linear decision surface is constructed [35,36].The algorithm demonstrates its ability to handle large datasets with many features and spectral bands with higher classification accuracy [35,36].The MLM, as explained by McLean et al. [37], is an equation that offers the base for a methodology that provides flexibility in fitting models with various fixed and random elements with the possible assumption of correlation among random effects; this means that two or more explanatory variables could be used to explain the outcome of a desired variable.
Another variable, purity, was used to predict the maturity of sugarcane.In this study, purity will be called the Maturity Index (MI) and explained in Equation (1), as Barbosa Júnior et al. [38] proposed.
According to Araya et al. [39], brix is the concentration of solutes in grams per 100 g of solution at a specific temperature; this is the sucrose on the sugarcane juice with impurities, measured in brix degrees ( • Brix).On the other hand, pol is defined as the concentration of sucrose in grams per 100 g of solution at a specific temperature; this is the sucrose on the sugarcane juice without impurities, measured in pol degrees ( • Pol) [39].Therefore, with Equation (1), taking the sucrose as a solution without impurities (pol) divided by the sucrose as a solution with impurities (brix), the purity of the sugarcane juice (maturity) as a percentage can be calculated.The industry-accepted standards for harvesting sugarcane are when MI > 85% and sucrose content pol > 16% [40].The model performances were evaluated using the coefficient of determination (R 2 ), Root Mean Squared Error (RMSE), the range of the data and the model deviation.
The overall accuracy was calculated using Equation (2).The accuracy expresses the error obtained relative to the data range; thus, the error will be normalized.This helps to interpret results compared to previous studies and maintain the consistency between existing studies and their evaluation methods.
The level of maturity at the time of photography can also be checked with the spectral reflectance of sugarcane.It may represent the canopy chlorophyll and moisture levels.Hence, five distinct zones (in terms of the appearance from greener (healthy) to brownish (dry)) within the study area were selected for an area of 17 m × 17 m (Figure 3d).Within each area, random points were created using sugarcane pixels only.Then, the spectral values for each band were extracted, and spectral profiles were created.
tration of sucrose in grams per 100 g of solution at a specific temperature; this is the sucrose on the sugarcane juice without impurities, measured in pol degrees (°Pol) [39].Therefore, with Equation (1), taking the sucrose as a solution without impurities (pol) divided by the sucrose as a solution with impurities (brix), the purity of the sugarcane juice (maturity) as a percentage can be calculated.The industry-accepted standards for harvesting sugarcane are when MI > 85% and sucrose content pol > 16% [40].The model performances were evaluated using the coefficient of determination (R 2 ), Root Mean Squared Error (RMSE), the range of the data and the model deviation.
The overall accuracy was calculated using Equation (2).The accuracy expresses the error obtained relative to the data range; thus, the error will be normalized.This helps to interpret results compared to previous studies and maintain the consistency between existing studies and their evaluation methods.
The level of maturity at the time of photography can also be checked with the spectral reflectance of sugarcane.It may represent the canopy chlorophyll and moisture levels.Hence, five distinct zones (in terms of the appearance from greener (healthy) to brownish (dry)) within the study area were selected for an area of 17 m × 17 m (Figure 3d).Within each area, random points were created using sugarcane pixels only.Then, the spectral values for each band were extracted, and spectral profiles were created.

Results
The sugarcane area was isolated with a higher degree of accuracy, and the total extent of the sugarcane was 10.23 ha. Figure 4 shows the correlation analysis results.The green band, NDWI and GDNVI were correlated the best, with brix values, providing R 2 of −0.69, −0.75 and 0.68, respectively.For pol, the R 2 values were −0.66, −0.73 and 0.65, respectively.Hence, green band, NDWI and GNDVI were used to predict brix and pol and, thus, the maturity index.The rest of the indices were discarded since they did not meet the threshold (±0.6) established in the method.
of the sugarcane was 10.23 ha. Figure 4 shows the correlation analysis results.The green band, NDWI and GDNVI were correlated the best, with brix values, providing R 2 of −0.69, −0.75 and 0.68, respectively.For pol, the R 2 values were −0.66, −0.73 and 0.65, respectively.Hence, green band, NDWI and GNDVI were used to predict brix and pol and, thus, the maturity index.The rest of the indices were discarded since they did not meet the threshold (±0.6) established in the method.
The extent of the percentage of the maturity index (Figure 5a) was grouped into four intervals: 65-70% (denoted as DMI), 70-75% (CMI), 75-80% (BMI) and >80% (AMI).For the AMI category, SVM predicted an area of 6.12 ha, RF = 3.21 ha and MLM = 2.28 ha.BMI was 3.95 ha, 6.92 ha and 7.91 ha for SVM, RF and MLM, respectively.The RF algorithm did not detect the CMI category; the rest were minimal.DMI values were zero or very small for these algorithms (Figure 5a).
For pol prediction, the results were categorized into four intervals: 8-10 Pol° (denoted as DP), 10-12 Pol° (denoted as CP), 12-14 Pol° (denoted as BP) and all values > 14 Pol° (denoted as AP) (Figure 5b).SVM predicted 0.63 m 2 for AP, no area was detected from RF, and MLM resulted in 3.73 m 2 .BP obtained the highest values from SVM (7.65 ha) and the lowest from RF (5.94 ha) (Figure 5b).For the CP category, the SVM prediction was 2.56 ha, RF was 4.30 ha, and MLM predicted 4.00 ha.The predicted values for DP were 0.0160 ha for SVM, 0.0013 ha for RF and 0.0025 ha for MLM.The MLR models for brix and pol are shown in Equations ( 3) and ( 4).They were used to predict the spatial distribution of Pol and Brix maps.Brix = 12.80412 − 5.27616 × (NDWI) + 2.1931 × (GNDVI) + 0.00002903 × (GREEN) (3)   Pol = 8.93896 − 6.75543 × (NDWI) + 2.69357 × (GNDVI) + 0.00003741 × (GREEN) (4)   The extent of the percentage of the maturity index (Figure 5a) was grouped into four intervals: 65-70% (denoted as D MI ), 70-75% (C MI ), 75-80% (B MI ) and >80% (A MI ).For the A MI category, SVM predicted an area of 6.12 ha, RF = 3.21 ha and MLM = 2.28 ha.B MI was 3.95 ha, 6.92 ha and 7.91 ha for SVM, RF and MLM, respectively.The RF algorithm did not detect the C MI category; the rest were minimal.D MI values were zero or very small for these algorithms (Figure 5a).A classification system was established to assess the sugarcane maturity (Figure 6), where "A" represents MI > 85% AND Pol° > 16, "B" represents MI > 83% AND Pol° > 13, "C" represents MI > 83% OR Pol° > 13, and "D" represents neither condition.This is the standard practice of any quality assurance laboratory in Costa Rica and the general practice among farmers [35].No area was predicted for "A" from any of the methods.Regarding category B, the MLM estimated the largest area (7.3 ha), followed by the RF algorithm (6.7 ha), for C, SVM, RF, and MLM predicted areas 4.2 ha, 3.4 ha and 4.2 ha, respectively.The lowest values were predicted for category D from all three models.For pol prediction, the results were categorized into four intervals: 8-10 Pol • (denoted as D P ), 10-12 Pol • (denoted as C P ), 12-14 Pol • (denoted as B P ) and all values > 14 Pol • (denoted as A P ) (Figure 5b).SVM predicted 0.63 m 2 for A P , no area was detected from RF, and MLM resulted in 3.73 m 2 .B P obtained the highest values from SVM (7.65 ha) and the lowest from RF (5.94 ha) (Figure 5b).For the C P category, the SVM prediction was 2.56 ha, RF was 4.30 ha, and MLM predicted 4.00 ha.The predicted values for D P were 0.0160 ha for SVM, 0.0013 ha for RF and 0.0025 ha for MLM.
A classification system was established to assess the sugarcane maturity (Figure 6), where "A" represents MI > 85% AND Pol • > 16, "B" represents MI > 83% AND Pol • > 13, "C" represents MI > 83% OR Pol • > 13, and "D" represents neither condition.This is the standard practice of any quality assurance laboratory in Costa Rica and the general practice among farmers [35].No area was predicted for "A" from any of the methods.Regarding category B, the MLM estimated the largest area (7.3 ha), followed by the RF algorithm (6.7 ha), for C, SVM, RF, and MLM predicted areas 4.2 ha, 3.4 ha and 4.2 ha, respectively.The lowest values were predicted for category D from all three models.A classification system was established to assess the sugarcane maturity (Figure 6), where "A" represents MI > 85% AND Pol° > 16, "B" represents MI > 83% AND Pol° > 13, "C" represents MI > 83% OR Pol° > 13, and "D" represents neither condition.This is the standard practice of any quality assurance laboratory in Costa Rica and the general practice among farmers [35].No area was predicted for "A" from any of the methods.Regarding category B, the MLM estimated the largest area (7.3 ha), followed by the RF algorithm (6.7 ha), for C, SVM, RF, and MLM predicted areas 4.2 ha, 3.4 ha and 4.2 ha, respectively.The lowest values were predicted for category D from all three models.When predicting the brix values, the highest accuracy was obtained from the RF model (82.68%) (Table 3).The SVM model demonstrated a precision of 21.89% and an accuracy of 66.64% with a deviation of 33.36%.The MLM results showed the lowest accuracy and precision.
For the pol prediction results (Table 3), the highest precision and accuracy were obtained from the RF.Both SVM and MLM models showed lower accuracies.Their deviations were approximately similar.The RF model also received the highest accuracy for the maturity index (Table 3).The SVM and MLM showed identical results (approximately 65% accuracy).Hence, the best performance was achieved using the RF algorithm.
As shown in Figure 7, the spectral signatures were distinct from each other and thus their sugarcane parameters.This is explained by the different MI and Pol values of zones (Table 4).For instance, Zone 5 exhibited greater values for all bands than other zones, and When predicting the brix values, the highest accuracy was obtained from the RF model (82.68%) (Table 3).The SVM model demonstrated a precision of 21.89% and an accuracy of 66.64% with a deviation of 33.36%.The MLM results showed the lowest accuracy and precision.For the pol prediction results (Table 3), the highest precision and accuracy were obtained from the RF.Both SVM and MLM models showed lower accuracies.Their deviations were approximately similar.The RF model also received the highest accuracy for the maturity index (Table 3).The SVM and MLM showed identical results (approximately 65% accuracy).Hence, the best performance was achieved using the RF algorithm.
As shown in Figure 7, the spectral signatures were distinct from each other and thus their sugarcane parameters.This is explained by the different MI and Pol values of zones (Table 4).For instance, Zone 5 exhibited greater values for all bands than other zones, and its MI is the highest among them.Approximately similar values were observed for Zones 1 to 4, where an increase in the MI and • Pol resulted in decreased spectral values.Figure 3a-c shows the spatial distribution of predicted maturity conditions over the study area using RF, MLM and SVM.SVM predictions are visually different from the other two methods; the map shows more category B areas than others.However, the prediction from the RF model is similar to the field photos or reality; for instance, closer to the edges of the study area, there were lots of category B sugarcane plants.

Discussion
This study developed a novel method to determine the sugarcane brix and pol values and, thus, evaluate the maturity of the sugarcane.This can be applied at the operational level to estimate the sugarcane maturity and may reduce the time and cost associated with  Figure 3a-c shows the spatial distribution of predicted maturity conditions over the study area using RF, MLM and SVM.SVM predictions are visually different from the other two methods; the map shows more category B areas than others.However, the prediction from the RF model is similar to the field photos or reality; for instance, closer to the edges of the study area, there were lots of category B sugarcane plants.

Discussion
This study developed a novel method to determine the sugarcane brix and pol values and, thus, evaluate the maturity of the sugarcane.This can be applied at the operational level to estimate the sugarcane maturity and may reduce the time and cost associated with existing ground sampling-based methods.The robust results generated from this study also confirmed the data acquisition plan, including sensors for remotely sensed data acquisition, data acquisition parameters (shutter speed, flying height and overlaps, etc.) and data processing workflow for higher accuracy.
Green band, NDWI and GNDVI exhibited the strongest correlations with brix and pol values (Figure 4).The green band and NDWI showed a high negative correlation due to the lower canopy chlorophyll in sugarcane when maturing and becoming brownish.Typically, the visible green band maximizes the reflectance of water surfaces.Hence, NDWI and GNDVI are sensitive to the moisture level, and when crops reach maturity, the moisture level will change [8].Chea etl al. [41] demonstrated that combining VIs and visible spectral bands related to the chemical and physical changes inside the plant resulted in greater accuracy in predicting the brix, pol, and subsequently, the maturity index.They suggested the same spectral band and indices as in this study (green band, NDWI and GNDVI).
Additionally, Lebourgeois et al. [42] used similar spectral combinations and proved that they allowed for a significant quantitative characterization of nitrogen status on sugarcane, impacting the physicochemical status of the plant.Moreover, considering the relationship between canopy nitrogen, greenness and biomass with maturity index and pol, Chaves, Lofton et al.Villegas and Larrahondo [15,28,43] and Erdle et al. [44] decided to use spectral indices that incorporate NIR and green band.Further, de Oliveira et al. [38] also supported the efficacy of the combination of these bands and indices.
When considering the overall prediction results, most areas were within the 12-14 pol interval and 75-80% MI range.They can be categorized as either "B" or "C", and each is about 4-6 ha.That means the study area has not yet reached the harvesting level ("A").However, the SVM model predicted larger areas for MI > 80%.However, there are no or very small areas for pol > 16 to be classified as "A" according to the standards given by the quality assurance laboratory and to Barbosa Júnior et al. [19].However, studies from [43], Chaves [28] and Araya et al. [39] used different criteria for maturity-they suggest at least pol = 13 and MI = 80% for category "B".
The best prediction model performances were obtained from the RF model.It was followed by SVM and the MLM for brix, pol, and MI, respectively.Notably, RF precision ranged between 77.31% and 77.57%, while SVM ranged from 21.02% to 21.89%, and MLM was even lower.This study first assessed the correlation between VIs and pol and brix values.Then, VIs and spectral bands with the highest correlation coefficients were selected as predictor variables for regression models.Hence, the variable importance was not considered within the RF model, but the parameter optimization (and feature importance) was performed.This might allow the RF model to perform well while increasing the predictive power and computing speed.When predicting brix values, Chea et al. [36] reported a precision of 45% for SVM and RF; however, they employed different sets of VI's such as Cl rededge , inverted reflectance green and Plant Pigment Ratio (PPR).This reflects that this study obtained better results for brix prediction when combining the selected VIs and the green band.
In comparison with Zhao et al. [8] and Cruz-Sanabria et al. [16], the results obtained for the pol (Figure 5b) were lower since their study obtained a precision of 90.4% and 92.45%, respectively.They used a more extensive data set to test the proposed methodology.As a general rule of thumb, the precision that can be obtained from a traditional method is about 70% [8,16].Hence, even with a small number of samples, the RF model of this study showed 7.31% higher precision than the traditional method for Pol estimation.Therefore, it is evident that the developed method can be used in the field (or production) as it provides relatively accurate and efficient results (time and cost) compared to conventional methods.A similar result was obtained by Barbosa Júnior et al. [19] in their study, where they obtained a precision of 77% for brix.However, the MI results were much higher, about 90%, compared to the 77.44% of our study.When comparing the results of the MLM, Gunnula et al. [18] obtained about 60% higher accuracy than this study.This is mainly due to the limited number of samples this study used.The considerable outperforming results of RF compared with SVM were proven by Sheykhmousa et al. [36].They concluded that the MLM suffers from a small number of samples, causing underfitting and adding to the problem that these models struggle with non-linear patterns.Finally, comparing the results of existing studies, although this study used a small number of samples, yielded greater precision and accuracy for predictions.When selecting samples, the study considered uniform conditions such as variety, geographic zone, soil texture, and agricultural and cultural practices, compared with the other studies that varied one or more of them.
The spectral signatures for zones 1 to 4 exhibited similar results with slight Pol and MI value changes.They are visually identical.A lower spectral value for each band indicates their preparation for maturity as this crop's natural response to chlorophyll and moisture levels when maturing.The dry matter of crops started to increase, which was related to sucrose accumulation [43].Also, as shown in Figure 3, the lower the value of MI and pol, the farther from the center of the plot they are located, a trend found by Som-ard et al. [4], since the center zones start to dry out sooner than the edges of the plot.Zone 5 was an outlier to both trends mentioned here, but this is due to the proximity of this zone to drainage and water availability, allowing the development of these plants and resulting in high-density, vigorous leaves.This correlates with the results of Som-ard et al. [45], who found that a high yield can be found on the edge of the field since it is more photosynthetically efficient.
One of the most prominent potential benefits of using this developed method is optimizing the harvest timing.As a control parameter within the factory, the search for the optimal maturity point is performed, thus maximizing sugar yield and, consequently, profitability for producers.The developed method significantly reduces the costs associated with traditional sampling.Conventional methods require intensive and repeated field labor, resulting in high operational costs.In contrast, this remote sensing method can quickly cover large cultivation areas, minimizing the need for field personnel and reducing logistical expenses.Data collected from drones can be consistently analyzed across different regions and scales, facilitating comprehensive production management at a macro level.Moreover, it allows for temporal monitoring, enabling the observation of trends for future predictions and strengthening the method with more data series over time.However, there are limitations to consider.The implementation of the method can be affected by adverse weather conditions, the initial cost of implementing these technologies, and the need for technical skills to interpret the data.Future research on estimating the maturity of sugarcane using remote sensing should focus on overcoming these limitations and improving the precision and accessibility of these technologies.Synergy and collaboration among researchers, technologists, and producers will be vital in developing solutions tailored to the specific needs of different regions and growing conditions.
Agriculture is moving into an area of combined operations, where an adjustment of a practice makes a difference to the yield of the crop, and for this reason, the agriculture management could influence the behavior of the crop.Normally a sugarcane field consists of a five-year cycle until it renews.The value of sugarcane reduces at each cycle, affecting the results of the method across the years.Conducting longitudinal studies to validate the method over multiple growing seasons (years) will help to refine the models and confirm their robustness.Increasing the number of samples will further enhance its applicability and reliability at the production level by improving the effectiveness of the method.

Conclusions
The study developed a method to estimate sugarcane maturity at the farm level using high spatial resolution remote sensing images and a machine learning algorithm.Many studies used satellite images to assess the vegetation traits of sugarcane around the world.However, the spatial resolution of the most widely available and free satellite images limits the ability to extract fine details about sugarcane fields.Hence, a reliable method that can be actionable at the farm level using high spatial resolution images is in demand.Therefore, the research aimed to develop a method to estimate sugarcane maturity using images acquired from drones as an alternative to traditional methods.The images were captured using a Micasense RedEdge-P camera, and field samples of sugarcane were collected and analyzed in the laboratory.Different vegetation indices (nine indices that can be generated from the MicaSense RedEdge-P sensor) were calculated, and their correlation was tested against laboratory-generated "pol" and "brix" values.The highest correlation was obtained for the Normalized Difference Water Index (NDWI), Green Normalized Difference Vegetation Index (GNDVI) and the green band.The green band, GNDVI and NDWI are sensitive to the canopy chlorophyll available in sugarcane and, thus, showed a high negative correlation between the variables ("pol" and "brix") when maturing.Therefore, the green band, GNDVI and NDWI were used to predict the spatial distribution of pol and brix values over the study area using a multiple linear regression model (MLM) and two machine learning models (Random Forest (RF) and Support Vector Machine (SVM)).The performance matrices for the brix model demonstrated that the RF had greater accuracy and precision than the rest, with 77.57% and 82.68%, respectively.The second best was SVM, followed by MLM, which had precision and accuracy of 21.89%, 66.64%, 12.61%, and 65.74%, respectively.The RF models showed the best results for the pol models and maturity index.Therefore, the RF demonstrates better accuracy than SVM in cases of increasing dimensionality and small training data samples.MLM suffers from fewer samples, causing the model to be underestimated.This indicates the potential of machine learning models, mainly RF, to handle complex agricultural datasets and provide reliable predictions.The Maturity Index (MI) or purity of sugarcane was estimated based on the percentage of pol per brix values.The spatial distribution of MI over the study area helps to determine areas where the level of maturity is reached for harvesting or not.The method offers a practical tool for farmers to optimize harvest timing, which can enhance sugar yield and profit by providing a cost-effective and scalable alternative to traditional methods.Also, this state-of-the-art technique enables farmers to make data-driven decisions, improving efficiency and reducing the time it takes to collect samples and harvest the crop at the right time.Additionally, it promotes sustainable agricultural practices by enabling more precise resource management, thus minimizing environmental impact.Therefore, based on the results of this study, we can conclude that (a) the optimal band/indices combination for estimating sugarcane maturity is NDWI, GNDVI and green; (b) according to the results of the regression models, the plot is not ready for harvesting since it did not achieve the minimum values for the pol and maturity index, and (c) according to the performance metrics and the conditions of the study, the most suitable regression model for estimating sugarcane maturity is the random forest model.Future research should explore the scalability of this method across different geographical regions and weather conditions.Small variations in agricultural practices, weather or geographic regions could highly affect the research outcome.Integrating this approach with other precision agriculture technologies, such as soil moisture sensors, irrigation systems, and different management practices, could prove the effectiveness of the method across various scenarios.Funding: This research was partially funded by the Mitacs-CALAREO Globalink Research Award, grant number IT36425.

Figure 1 .
Figure 1.The study area map (upper left with the inset map of Costa Rica) and field plots 1-5 (right and bottom).

Figure 1 .
Figure 1.The study area map (upper left with the inset map of Costa Rica) and field plots 1-5 (right and bottom).

Figure 2 .
Figure 2. The flow chart of the complete workflow.Figure 2. The flow chart of the complete workflow.

Figure 2 .
Figure 2. The flow chart of the complete workflow.Figure 2. The flow chart of the complete workflow.

Figure 3 .
Figure 3. Maturity index maps obtained from: (a) Multiple Linear Regression Model; (b) Support Vector Machine algorithm; (c) Random Forest model; and (d) Locations of zones 1-5 for spectral analysis (red squares).The values 0,1 and 2 represent maturity conditions D, C and B respectively.

Figure 3 .
Figure 3. Maturity index maps obtained from: (a) Multiple Linear Regression Model; (b) Support Vector Machine algorithm; (c) Random Forest model; and (d) Locations of zones 1-5 for spectral (red squares).The values 0, 1 and 2 represent maturity conditions D, C and B respectively.

Figure 4 .
Figure 4. Pearson correlation matrix for each variable, spectral band, and indices.

Figure 5 .
Figure 5.The sugarcane extent related to pol and MI: (a) areas predicted for pol using different prediction models; (b) the percentage of the area of the maturity index from different prediction models.

Figure 5 .
Figure 5.The sugarcane extent related to pol and MI: (a) areas predicted for pol using different prediction models; (b) the percentage of the area of the maturity index from different prediction models.

Figure 5 .
Figure 5.The sugarcane extent related to pol and MI: (a) areas predicted for pol using different prediction models; (b) the percentage of the area of the maturity index from different prediction models.

Figure 6 .
Figure 6.Areas for different maturity conditions (A-D) from different prediction models (mixed linear model (black), random forest (gray), support vector machine (white with dot pattern)).

Figure 6 .
Figure 6.Areas for different maturity conditions (A-D) from different prediction models (mixed linear model (black), random forest (gray), support vector machine (white with dot pattern)).

Table 2 .
Vegetation indices used to estimate sugarcane maturity.

Table 4 .
Maturity Index (MI) and pol results for Zones 1 to 5.

Table 4 .
Maturity Index (MI) and pol results for Zones 1 to 5.