Next Article in Journal
Multi-Target Element-Based Screening of Maize Varieties with Low Accumulation of Heavy Metals (HMs) and Metalloids: Uptake, Transport, and Health Risks
Next Article in Special Issue
Self-Excited-Resonance of Soil-Engaging Surface Spectrum: A New Method of Soil Cutting Resistance Reduction
Previous Article in Journal
Influence of Plant Growth Retardants and Nitrogen Doses on the Content of Plant Secondary Metabolites in Wheat, the Presence of Pests, and Soil Quality Parameters
Previous Article in Special Issue
Innovative Design Method of Hydro-Pneumatic Suspension for Large High-Clearance Sprayer Based on Improved NSGA-II Algorithm
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Prediction of Solid Soluble Content of Green Plum Based on Improved CatBoost

1
College of Mechanical and Electronic Engineering, Nanjing Forestry University, Nanjing 210037, China
2
Nanjing Institute of Agricultural Mechanization, Ministry of Agriculture and Rural Affairs, Nanjing 210014, China
*
Author to whom correspondence should be addressed.
Agriculture 2023, 13(6), 1122; https://doi.org/10.3390/agriculture13061122
Submission received: 18 April 2023 / Revised: 9 May 2023 / Accepted: 24 May 2023 / Published: 25 May 2023

Abstract

:
Most green plums need to be processed before consumption, and due to personal subjective factors, manual harvesting and sorting are difficult to achieve using standardized processing. Soluble solid content (SSC) of green plum was taken as the research object in this paper. Visible near-infrared (VIS-NIR) and shortwave near-infrared (SW-NIR) full-spectrum spectral information of green plums were collected, and the spectral data were corrected and pre-processed. Random forest algorithm based on induced random selection (IRS-RF) was proposed to screen four sets of characteristic wavebands. Bayesian optimization CatBoost model (BO-CatBoost) was constructed to predict SSC value of green plums. The experimental results showed that the preprocessing method of multiplicative scatter corrections (MSC) was obviously superior to Savitzky–Golay (S–G), the prediction effect of SSC based on VIS-NIR spectral waveband by partial least squares regression model (PLSR) was obviously superior to SW-NIR spectral waveband, MSC + IRS-RF was obviously superior to corresponding combination of correlation coefficient method (CCM), successive projections algorithm (SPA), competitive adaptive reweighted sampling (CARS), and random forest (RF). With the lowest dimensional selected feature waveband, the lowest VIS-NIR band group was only 53, and the SW-NIR band group was only 100. The model proposed in this paper based on MSC + IRS-RF + BO-CatBoost was superior to PLSR, XGBoost, and CatBoost in predicting SSC, with R2P of 0.957, which was 3.1% higher than the traditional PLSR.

1. Introduction

Most green plums need to be processed before consumption. Due to different components of each green plum, the processed products are also different [1,2]. Green plums with high acidity and low sugar content are usually used to make green plum essence, while those with high sugar content and low acidity are usually used to make green plum wine, etc. The component content of green plums will vary with different maturity levels [3].
When manually picking and sorting green plums, the main basis is the skin color and picking time of green plums, and the composition content is determined and classified based on manual experience. However, due to personal subjective factors, it is difficult to achieve standardized processing. Traditional methods for determining the acidity and sugar content of green plums are destructive and inefficient [4,5]. Therefore, research on new non-destructive methods for detecting the composition of green plums is of great significance for improving the processing efficiency of green plum.
SSC is one of the important reference indicators for measuring the maturity, internal quality, and edible processing characteristics of fruits. The experimental results showed that as the SSC index increased, the maturity of fruits increased. Therefore, many experts and scholars conducted non-destructive testing research on the maturity of fruits such as apples, pears, grapes, strawberries, and watermelons based on SSC prediction.
Currently, spectroscopic techniques based on spectral features such as near-infrared spectroscopy and hyperspectral imaging became the main means of non-destructive detection technology [6,7,8,9,10]. Ma T et al. used VIS-NIR spectroscopy to predict SSC in apples with a determination coefficient R2 and root mean square error (RMSE) of 0.97 and 0.20% [11]. Yu X et al. combined the hyperspectral imaging with a deep learning method consisting of stacked autoencoders (SAE) and fully connected neural networks (FNN) to predict SSC in postharvest Kurele pears, with a determination coefficient R2 and RMSE of 0.92 and 0.22% [12]. The hyperspectral technology was used to predict SSC in netted melons by continuous wavelet transformation. The correlation coefficient and RMSE of the random forest regression model decomposed by the continuous wavelet transform were 0.72 and 0.98%, respectively [13].
The aforementioned research methods relied on collecting full-spectrum spectral information to predict the internal different component contents of the fruit and achieved a high prediction accuracy. However, if high-spectrum equipment is used to build green plums sorting production line, there are still problems such as long-time consumption, high cost, and difficulty in practical promotion and application [14,15]. When using traditional multispectral technology to select specific waveband groups and predict the internal different component contents by building PLSR, support vector regression (SVR), and other traditional machine learning models, the prediction accuracy cannot meet the actual sorting requirements. Liu C et al. established a model to predict dicyandiamide (DCD) in milk powder based on multispectral technology using partial least squares (PLS), least squares support vector machine (LS-SVM), and backpropagation neural network (BPNN) with R2 of 0.873 [16]. Younas et al. combined multispectral imaging technology with chemical metrology to build a relationship between multispectral images and water content in mushrooms using PLSR, BPNN, and LS-SVM models, with the highest R2 figure reaching 0.86 [17]. Chakravartula et al. used the 1940 nm, 1500 nm, and 2050 nm wavebands to build principal components analysis- partial least squares regression model (PCA-PLSR) to detect water content, protein, and other indicators in bread, with R2 reaching 0.88 at the highest. Currently, research on predicting different component contents of green plums using multispectral technology is still blank [18].
To resolve the problems of time-consuming, high cost, and difficulty in practical application of non-destructive testing technology for green plums, SSC prediction of green plums was taken as the research object in this study. VIS-NIR and SW-NIR full-spectrum spectral information were collected from green plum samples, and the spectral data were corrected and pre-processed. An IRS-RF algorithm was proposed to screen the characteristic bands, and a BO-CatBoost model was constructed to predict SSC value of different green plum samples. This makes model prediction accuracy meet the actual sorting requirements. The selected feature band groups provide a theoretical basis for future multi-spectral technology based on green plum sorting research. The technical route is shown in Figure 1.

2. Materials and Methods

2.1. Spectrum Data Collection

2.1.1. Sample Sources

In May 2021, a total of 276 Zhao Shui green plum samples were purchased and screened from Yunnan Province, China, for the purpose of predicting SSC value of green plums. The samples were divided into a training set and a test set in a ratio of about 4:1, with 221 samples as the training set and 55 samples as the test set.
The samples were placed in a laboratory refrigerator at a constant temperature of 4 °C. Samples were randomly selected for each experiment, placed in advance at ambient temperature, and spectral data were collected and SSC value determination of each green plum was performed when their temperature was the same as room temperature.

2.1.2. Hardware Composition of Hyperspectral Acquisition Device

A non-destructive hyperspectral imaging system for predicting SSC of green plum samples was set up (Figure 2), comprised of a push-broom VIS-NIR hyperspectral camera (GaiaField-V10E-AZ4, Jiangsu Dualix Spectral Image Technology Co., Ltd., Wuxi, China), a push-broom SW-NIR hyperspectral camera (GaiaField-N17E-HR, Jiangsu Dualix Spectral Image Technology Co., Ltd., China), two self-made dome light source systems, an uninterrupted power supply (UPS) (C3K, Shante, Hangzhou, China), a transmission desk, a dark chamber, and a computer (T570, Lenovo, Beijing, China). Each camera was equipped with the same dome light source system, which included 12 halogen lamps (Halogen 12V, Philips, Suzhou, China), with UPS providing a stable power supply. The whole system was surrounded by the dark chamber to prevent external light interference.
During the high-spectrum image acquisition stage, the green plum samples were derived evenly to below the high-spectrum camera by the conveyor belt, and the conveyor belt speed was set to match the acquisition speed of the high-spectrum camera. The high-spectrum acquisition system is shown in Figure 2.
Before the experiment, the hyperspectral imaging system should be turned on and warmed up for 30 min. The optimal parameters for spectral acquisition was determined by pre-test: exposure time and conveyor belt moving speed. After the green plum sample was scanned, the obtained hyperspectral data were calibrated by black and white board. The calibrated image (A0) was calculated using the following Formula (1):
A 0 = A A D A W A D × 100 %
wherein A0 represents the green plums spectrum reflectance data after black and white calibration, A is the original green plums spectrum data to be corrected, AD represents the spectral reflectance data collected with the lens cover on, and AW represents the spectral data of the standard 99% reflectance plate.
After spectral data were collected, the green plums were squeezed to extract green plum juice immediately. The PAL-1 hand-held refractometer was used to measure the SSC of green plums. The measurement range was 0.0–53.0% BRIX, with an accuracy of ±0.2% BRIX. The sample tank should be cleaned before measuring, the green plum sample was then squeezed into juice. After precipitating, the supernatant was dropped into the sample tank and the SSC value was recorded.
The SSC values of 276 green plum samples are shown in Table 1.

2.2. Preprocessing of Spectral Data

All the tested software and hardware configurations as well as compilation environments are shown in Table 2.
The hyperspectral imaging system was used to collect the images of green plum samples, the VIS-NIR spectrum range was 400–1000 nm, with 120 data channels, and the SW-NIR spectrum range was 900–1700 nm, with 350 data channels. In order to reduce the error caused by the spectral reflection due to a single posture, the spectral images of three different postures of each green plum sample was extracted and the average was taken from three positions, as shown in Figure 3.
ENVI5.3 was used to determine the region of interest (ROI) of the images, and extract the average spectrum of the green plum samples in the ROI as the original spectrum data, and Figure 4 is the original spectrum reflectance curve of all green plum samples. Different colored lines represent the spectral characteristic curves of different green plum samples, with a total of 276 lines.

2.3. Improved Random Forest Feature Extraction Based on Induced Random Selection

An IRS-RF algorithm was proposed in this paper, which measured the importance of all features, clustered them according to their weight values using the K-means clustering algorithm and, finally, selected the feature subspace from each class to construct the decision tree based on the partition ratio. This algorithm could reduce the probability of highly correlated features being selected at the same time, decrease the uncertainty of node splitting and the correlation between decision trees, increase the diversity of decision trees, and achieve a smaller generalization error. The steps were as follows:
(1)
Measure the importance of features
Calculate the correlation coefficient r between spectral features according to Equation (2), and use it as the feature importance weight.
r = ( X X ¯ ) ( Y Y ¯ ) ( X X ¯ ) 2 × ( Y Y ¯ ) 2
(2)
K-means clustering for feature classification
Randomly select k data points as the initial clustering centers for k clusters, and each data point is divided into the closest cluster to it to form the initial distribution of k clusters. For each allocated cluster, recalculate their respective cluster centers and iterate multiple times until the cluster centers remain unchanged. Using correlation as the importance weight of a feature, it is divided into k feature regions with varying degrees of importance.
(3)
Proportional sampling
Feature selection is selected and constructed to build a decision tree in a certain proportion. The number of randomly selected features in the i-th feature area is calculated according to Equation (3), and N1, N2, ... Nk features are randomly selected from k feature areas to form the feature subspace of this tree. Select features proportionally in different feature intervals, i.e., perform induced random selection, which makes the feature subspace more representative.
N i = m t r y m i m
Among them, Ni represents the number of features extracted from the i-th feature area, m represents the total number of features, mi represents the number of features in the i-th feature area, and the number of feature variables in each feature tree.

3. BO-CatBoost Model Based on Bayesian Optimization Algorithm

With the advancement in computing power, models are becoming increasingly complex. To ensure that the model does not fall into a local minimum and to avoid excessive computation, a Bayesian optimization algorithm can be used. The core components of Bayesian optimization are a statistical description proxy model and an acquisition function [19,20,21]. In the proxy function model, a flexible surrogate model was used to randomly approximate the target function, which was difficult to calculate, and different kernel functions were used to increase the nonlinear expression ability of the proxy model. The acquisition function balances the development of high mean regions and the exploration of high volatility regions to select suitable hyperparameter sample points.
The Bayesian optimization classification algorithm, using the classical Gaussian process as a proxy model, was introduced in this paper, and an improved BO-CatBoost algorithm was built to ensure the stability and friendliness of the model while gradually improving the performance of the algorithm. BO-CatBoost algorithm flowchart is shown in Figure 5. The main steps were as follows:
(1)
Initialize the Bayesian optimization algorithm point set and the maximum number of iterations N;
(2)
Based on the current set of points, build the Gaussian process proxy function;
(3)
Based on the proxy function, maximize the acquisition function to obtain the next evaluation point;
(4)
Obtain the evaluation point xt function value f(xt), add it to the evaluation point set;
(5)
Termination condition determination: if the number of iterations meets the default criteria, stop searching or return to step 2 for further search;
(6)
After iteration, obtain the optimal BO-CatBoost parameters, and use the optimal parameters to study and model the training data;
(7)
Finally, test the model with the test set, output the evaluation result.

4. Results and Discussion

4.1. Model Training and Result Analysis

The prediction coefficient R2, mean absolute error (MAE), and RMSE were selected as the model performance evaluation indicators. The smaller MAE and RMSE values and the larger the R2 value, the better model performance and prediction effect. The evaluation indicators of the training set were represented by R2C, MAEC, and RMSEC, respectively, and the test set were R2P, MAEP, and RMSEP. The PLSR was built and used to predict SSC of green plums under different pre-processing and feature extraction methods were compared.
MSC and S–G were used to pre-process the spectral data of VIS-NIR and SW-NIR, to remove noise and invalid information. The pre-processed SW-NIR and VIS-NIR spectral band data are shown in Figure 6. Different colored lines represent the spectral characteristic curves of different green plum samples, with a total of 276 lines.
The absorption of the spectrum mainly reflects information of hydrogen groups such as C-H, O-H, and N-H in organic substances, while SSC contains important information of the O-H group. As shown in Figure 6, MSC could effectively eliminate spectral differences caused by different scattering levels, while the effect of difference was not good after S–G pre-processing. After MSC pre-processing, a noticeable absorption peak produced by the O-H bond stretching vibration was present at 730 nm in the VIS-NIR spectrum of green plums, the subsequent decrease to 930 nm may have been influenced by the quadruple frequency stretching vibration of C-H. In the SW-NIR spectrum, the decline from 1140–1220 nm may be caused by the first frequency absorption of the N-H group. The differences in the VIS-NIR and SW-NIR spectra were significant, which may have been caused by stronger interference from moisture in the SW-NIR range. In summary, MSC pre-processing could enhance the correlation between spectrum and data.

4.1.1. Comparison of SSC Prediction Results with Different Pre-Processing Methods and Feature Extraction Combination Algorithms

Based on different pre-processing methods and feature extraction algorithm combinations, the prediction performance results of PLSR models based on VIS-NIR and SW-NIR spectra are shown in Table 3.
As can be seen from the data in Table 3, for the VIS-NIR range, the R2P of MSC combined with five different feature extraction algorithms were 0.901, 0.920, 0.814, 0.914, and 0.928, respectively, all of which were higher than the corresponding R2P of S–G, which were 0.902, 0.889, 0.723, 0.747, and 0.905; for the SW-NIR range, the R2P of MSC combined with five different feature extraction algorithms were 0.827, 0.851, 0.785, 0.805, and 0.911, respectively, all of which were higher than the corresponding R2P of S–G, which were 0.715, 0.855, 0.781, 0.631, and 0.892. It can be seen that the R2P corresponding to the MSC pre-processing method was obviously better than the R2P corresponding to the S–G. In conclusion, MSC was selected as the pre-processing method for predicting SSC of green plums.
The improved IRS-RF algorithm was used to extracts four sets of feature band groups in this paper and compared with CCM, SPA, CARS, and RF algorithms. The effects of different feature extraction algorithms on the SSC test set are shown in Table 4.
As can be seen from the data in Table 4, in terms of prediction accuracy: for the VIS-NIR spectral band, the RMSEP and MAEP of MSC + IRS-RF (our algorithm) were the lowest, only 0.359 and 0.261, respectively, with the highest R2P of 0.928. For the SW-NIR spectral band, the RMSEP and MAEP of MSC + IRS-RF were the lowest, only 0.488 and 0.266, respectively, with the highest R2P of 0.911. The prediction performance of SSC in the VIS-NIR spectral band was obviously better than that in the SW-NIR spectral band. In terms of spectral dimension, for the VIS-NIR spectral band, the number of selected spectral bands by IRS-RF was only 53. For the SW-NIR spectral band, the number of selected spectral bands by IRS-RF was only 100. The IRS-RF algorithm measured the importance of all features, reducing the probability of features with high correlation being selected simultaneously. Therefore, the number of selected spectral bands by MSC + IRS-RF was the smallest.
In conclusion, preliminary selection of MSC + IRS-RF for pre-processing and spectral band selection of VIS-NIR spectral band was selected. Figure 7 compares the prediction results of SSC values in different spectral bands of VIS-NIR and SW-NIR based on the MSC + IRS-RF algorithm.

4.1.2. Comparison of SSC Prediction Results of Different Machine Learning Regression Models

Based on the research foundation in 4.1.1, the MSC + IRS-RF model was used for preprocessing and feature wavelength selection of the VIS-NIR spectral bands. In this paper, we proposed a BO-CatBoost (our algorithm) model based on Bayesian optimization algorithm with a learning rate of 0.1 and 500 iterations. Table 5 compares the prediction results of our algorithm with different regression models including conventional PLSR, XGBoost, and CatBoost for the SSC of green plums.
The experimental results showed that the R2P of PLSR, XGBoost, and CatBoost regression models were 0.928, 0.927, and 0.942, respectively. The R2P of BO-CatBoost (our algorithm) was the highest, reaching 0.957 with the lowest RMSEP and MAEP of 0.252 and 0.189, respectively. It was improved by 3.1% compared to traditional PLSR model. The selected four feature wavelength dimensions were the lowest with only 31, less than 53, 43, and 39 of PLSR, XGBoost, and CatBoost, respectively. The wavelength ranges were 452–498 nm, 538–566 nm, 756–782 nm, and 982–1032 nm. Figure 8 and Figure 9 compare the prediction results of different regression models for the SSC of green plums and the selected feature wavelength groups for SSC prediction, respectively.

5. Conclusions

SSC prediction of green plums was taken as the research object in this paper, and the VIS-NIR and SW-NIR full-band spectra information of green plums was collected. The spectral data were calibrated and pre-processed. An improved IRS-RF algorithm was proposed to screen four groups of characteristic bands. A BO-CatBoost model based on Bayesian optimization algorithm was constructed to study SSC prediction of green plums. The main conclusions include:
(1)
MSC + IRS-RF was used to preprocess the VIS-NIR spectral band and select the characteristic wavelength. The BO-CatBoost model based on Bayesian optimization algorithm outperformed PLSR, XGBoost, and CatBoost regression models in SSC prediction, with R2P of 0.957, which was 3.1% higher than the traditional PLSR.
(2)
Based on the MSC + IRS-RF + BO-CatBoost model proposed in this article, when predicting SSC values, the four selected feature band dimensions were the lowest, only 31, all less than PLSR, XGBoost, and CatBoost’s 53, 43, and 39. The selected band ranges were 452–498, 538–566, 756–782, and 982–1032.
Sandra [22] used PLSR to establish a prediction model for nectarine maturity index (RPI) and internal quality index (IQI), and the results showed that the determination coefficient R2 was greater than 0.87. Yang [23] used the CARS-PLSR model to predict the SSC of multi-variety tomatoes. Performances were Tianci-595, Rp was 0.85, Xianke-No. 8 Rp was 0.87, and Yuanwei-No. 1 Rp was 0.87. Zhang [24] used partial least squares (PLS) and least square-support vector machines (LS-SVM) to build the prediction models to evaluate SSC in tomatoes. The prediction results revealed that the best performance was obtained using the PLS model with the optimal wavelengths selected by CARS in the range of 900–1400 nm, and the Rp was 0.820. It could be seen that the prediction results of the optimization model proposed in this article were significantly better than the above research.
Through the above research, the model prediction accuracy met the actual sorting requirements, and the selected feature wavelength groups provide a theoretical foundation for later research on green plum sorting based on multispectral technology. Subsequently, based on the selected feature wavelength groups, a multispectral acquisition system will be established and further optimized to reduce the cost of green plum sorting, ensure sorting accuracy, and fill the research gap.

Author Contributions

Conceptualization, X.Z. and Y.L.; methodology, X.Z., C.Z. and Z.Z.; software, C.Z. and Q.S.; validation, Q.S. and Z.Z.; resources, X.Z., Y.Y. and Z.Z.; writing—original draft preparation, X.Z.; writing—review and editing, Y.L., C.Z. and Y.Y.; funding acquisition, Y.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Jiangsu Agricultural Science and Technology Innovation Fund Project (Funding number: CX (18)3071, Funder: Jiangsu provincial department of science and technology). LIU YING. Research on key technologies of intelligent sorting for green plum.

Institutional Review Board Statement

“Not applicable” for studies not involving humans or animals.

Data Availability Statement

The experiment is not yet completed, and so, the data are not public.

Acknowledgments

The authors would like to extend their sincere gratitude for the technical support from the Jiangsu Co-Innovation Center of Efficient Processing and Utilization of Forest Resources.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Xu, L.; Wang, S.; Tian, A.; Liu, T.; Benjakul, S.; Xiao, G.; Ying, X.; Zhang, Y.; Ma, L. Characteristic volatile compounds, fatty acids and minor bioactive components in oils from green plum seed by HS-GC-IMS, GC-MS and HPLC. Food Chem. X 2023, 17, 100530. [Google Scholar] [CrossRef] [PubMed]
  2. Zhu, Y.; Ju, R.; Ma, F.; Qian, J.; Yan, J.; Li, S.; Li, Z. Moisture variation analysis of the green plum during the drying process based on low-field nuclear magnetic resonance. J. Food Sci. 2021, 86, 5137–5147. [Google Scholar] [CrossRef] [PubMed]
  3. Shen, L.; Wang, H.; Liu, Y.; Liu, Y.; Zhang, X.; Fei, Y. Prediction of Soluble Solids Content in Green Plum by Using a Sparse Autoencoder. Appl. Sci. 2020, 10, 3769. [Google Scholar] [CrossRef]
  4. Saridaş, M.A.; Kafkas, E.; Zarifikhosroshahi, M.; Bozhaydar, O.; Kargi, S.P. Quality traits of green plums (Prunus cerasifera Ehrh.) at different maturity stages. Turk. J. Agric. For. 2016, 40, 655–663. [Google Scholar] [CrossRef]
  5. Luo, W.; Tappi, S.; Wang, C.; Yu, Y.; Zhu, S.; Rocculi, P. Study and optimization of high hydrostatic pressure (HHP) to improve mass transfer and quality characteristics of candied green plums (Prunus mume). J. Food Process. Preserv. 2018, 42, e13769. [Google Scholar] [CrossRef]
  6. Caporaso, N.; Whitworth, M.B.; Fisk, I.D. Protein content prediction in single wheat kernels using hyperspectral imaging. Food Chem. 2018, 240, 32–42. [Google Scholar] [CrossRef]
  7. Liu, Z.; He, Y.; Cen, H.; Lu, R. Deep feature representation with stacked sparse auto-encoder and convolutional neural network for hyperspectral imaging-based detection of cucumber defects. Trans. Asabe 2018, 61, 425–436. [Google Scholar] [CrossRef]
  8. Fan, Y.; Zhang, C.; Liu, Z.; Qiu, Z.; He, Y. Cost-sensitive stacked sparse auto-encoder models to detect striped stem borer infestation on rice based on hyperspectral imaging. Knowl. Based Syst. 2019, 168, 49–58. [Google Scholar] [CrossRef]
  9. Satorres Martinez, S.; Martinez Gila, D.; Beyaz, A.; Gomez Ortega, J.; Gamez Garcia, J. A computer vision approach based on endocarp features for the identification of olive cultivars. Comput. Electron. Agric. 2018, 154, 341–346. [Google Scholar] [CrossRef]
  10. Xie, C.; Zhu, H.Y.; Fei, Y.Q. Deep coordinate attention network for single image super-resolution. Iet Image Process. 2022, 16, 273–284. [Google Scholar] [CrossRef]
  11. Ma, T.; Xia, Y.; Inagaki, T.; Tsuchikawa, S. Rapid and nondestructive evaluation of soluble solids content (SSC) and firmness in apple using Vis-NIR spatially resolved spectroscopy. Postharvest Biol. Technol. 2020, 173, 111417. [Google Scholar] [CrossRef]
  12. Yu, X.; Lu, H.; Wu, D. Development of deep learning method for predicting firmness and soluble solid content of postharvest Korla fragrant pear using Vis-NIR hyperspectral reflectance imaging. Postharvest Biol. Technol. 2018, 141, 39–49. [Google Scholar] [CrossRef]
  13. Zhang, C.; Shi, Y.; Wei, Z.; Wang, R.; Li, T.; Wang, Y.; Zhao, X.; Gu, X. Hyperspectral estimation of the soluble solid content of intact netted melons decomposed by continuous wavelet transform. Front. Phys. 2022, 10, 1034982. [Google Scholar] [CrossRef]
  14. Saha, D.; Senthilkumar, T.; Sharma, S.; Singh, C.B.; Manickavasagan, A. Application of near-infrared hyperspectral imaging coupled with chemometrics for rapid and non-destructive prediction of protein content in single chickpea seed. J. Food Compos. Anal. 2023, 115, 104938. [Google Scholar] [CrossRef]
  15. Zhang, M.; Li, W.; Du, Q. Diverse Region-Based CNN for Hyperspectral Image Classification. IEEE Trans. Image Process. 2018, 27, 2623–2634. [Google Scholar] [CrossRef] [PubMed]
  16. Liu, C.; Liu, W.; Yang, J.; Chen, Y.; Zheng, L. Non-destructive detection of dicyandiamide in infant formula powder using multi-spectral imaging coupled with chemometrics. J. Sci. Food Agric. 2017, 97, 2094–2099. [Google Scholar] [CrossRef] [PubMed]
  17. Younas, S.; Liu, C.; Qu, H.; Mao, Y.; Liu, W.; Wei, L.; Yan, L.; Zheng, L. Multispectral imaging for predicting the water status in mushroom during hot-air dehydration. J. Food Sci. 2020, 85, 903–909. [Google Scholar] [CrossRef]
  18. Chakravartula, S.S.N.; Cevoli, C.; Balestra, F.; Fabbri, A.; Dalla Rosa, M. Evaluation of drying of edible coating on bread using NIR spectroscopy. J. Food Eng. 2019, 240, 29–37. [Google Scholar] [CrossRef]
  19. Beskopylny, A.N.; Stel’makh, S.A.; Shcherban, E.M.; Mailyan, L.R.; Meskhi, B.; Razveeva, I.; Chernil’nik, A.; Beskopylny, N. Concrete strength prediction using machine learning methods CatBoost, k-Nearest Neighbors, Support Vector Regression. Appl. Sci. 2022, 12, 10864. [Google Scholar] [CrossRef]
  20. Ogar, V.; Hussain, S.; Gamage, K. Transmission line fault classification of multi-dataset using CatBoost classifier. Signals 2022, 3, 468–482. [Google Scholar] [CrossRef]
  21. Guadagno, C.R.; Millar, D.; Lai, R.; Mackay, D.S.; Pleban, J.R.; McClung, C.R.; Weinig, C.; Wang, D.R.; Ewers, B.E. Use of transcriptomic data to inform biophysical models via Bayesian networks. Ecol. Model. 2020, 429, 109086. [Google Scholar] [CrossRef]
  22. Sandra, M.; Jose, M.A.; José, B.; Sergio, C.; Pau, T.; Nuria, A. Ripeness monitoring of two cultivars of nectarine using VIS-NIR hyperspectral reflectance imaging. J. Food Eng. 2017, 214, 29–39. [Google Scholar]
  23. Yang, Y.; Huang, W.; Zhao, C.; Tian, X.; Fan, S.; Wang, Q.; Li, J. Online soluble solids content (SSC) assessment of multi-variety tomatoes using Vis/NIRS diffuse transmission. Infrared Phys. Technol. 2022, 125, 104312. [Google Scholar] [CrossRef]
  24. Zhang, D.; Yang, Y.; Chen, G.; Tian, X.; Wang, Z.; Fan, S.; Xin, Z. Nondestructive evaluation of soluble solids content in tomato with different stage by using Vis/NIR technology and multivariate algorithms. Spectrochim. Acta Part A Mol. Biomol. Spectrosc. 2020, 248, 119139. [Google Scholar] [CrossRef]
Figure 1. Technical route of subject research.
Figure 1. Technical route of subject research.
Agriculture 13 01122 g001
Figure 2. Actual photo of hyperspectral imaging acquisition system.
Figure 2. Actual photo of hyperspectral imaging acquisition system.
Agriculture 13 01122 g002
Figure 3. Spectral images of three different positions of the same green plum: (a) left posture; (b) right posture; (c) top posture.
Figure 3. Spectral images of three different positions of the same green plum: (a) left posture; (b) right posture; (c) top posture.
Agriculture 13 01122 g003
Figure 4. Original spectral reflectance curve of green plum samples: (a) VIS-NIR (b) SW-NIR.
Figure 4. Original spectral reflectance curve of green plum samples: (a) VIS-NIR (b) SW-NIR.
Agriculture 13 01122 g004
Figure 5. BO-CatBoost algorithm flowchart.
Figure 5. BO-CatBoost algorithm flowchart.
Agriculture 13 01122 g005
Figure 6. Comparison of the effects of different preprocessing methods for spectral data. (a) VIS-NIR spectral data preprocessed by MSC. (b) VIS-NIR spectral data preprocessed by S–G. (c) SW-NIR spectral data preprocessed by MSC. (d) SW-NIR spectral data preprocessed by S–G.
Figure 6. Comparison of the effects of different preprocessing methods for spectral data. (a) VIS-NIR spectral data preprocessed by MSC. (b) VIS-NIR spectral data preprocessed by S–G. (c) SW-NIR spectral data preprocessed by MSC. (d) SW-NIR spectral data preprocessed by S–G.
Agriculture 13 01122 g006aAgriculture 13 01122 g006b
Figure 7. Prediction of SSC results of different spectral bands based on MSC + IRS-RF: (a) VIS-NIR spectral band; (b) SW-NIR spectral band.
Figure 7. Prediction of SSC results of different spectral bands based on MSC + IRS-RF: (a) VIS-NIR spectral band; (b) SW-NIR spectral band.
Agriculture 13 01122 g007
Figure 8. SSC prediction results of different regression models. (a) PLSR. (b) XGBoost. (c) CatBoost. (d) BO-CatBoost.
Figure 8. SSC prediction results of different regression models. (a) PLSR. (b) XGBoost. (c) CatBoost. (d) BO-CatBoost.
Agriculture 13 01122 g008aAgriculture 13 01122 g008b
Figure 9. Different model selection for predicting SSC feature bands.
Figure 9. Different model selection for predicting SSC feature bands.
Agriculture 13 01122 g009
Table 1. SSC values of green plum samples.
Table 1. SSC values of green plum samples.
Sample SetNumber of SamplesMinimum ValueMaximum ValueAverage Value
Training set2215.813.19.7263
Test set555.812.49.7033
Table 2. Software and hardware environment configuration.
Table 2. Software and hardware environment configuration.
NameParameters
SystemWindows 10 × 64
CPUInter I9 [email protected] GHz
GPUNvidia GeForce RTX 2080 Ti (11 G)
Environment configurationPyCharm + Pytorch 1.7.1 + Python 3.7.7
Cuda 10.2 + cudnn 7.6.5 + tensorboardX 2.1
Table 3. Influence of different preprocessing and feature extraction algorithms on SSC prediction.
Table 3. Influence of different preprocessing and feature extraction algorithms on SSC prediction.
BandsPreprocessing + Characteristic Wavelength
Combination Algorithm
RMSECMAECR2CRMSEPMAEPR2P
VIS-NIR Spectral BandMSC + CCM0.4010.3230.9090.4200.3390.901
MSC + SPA0.3690.3050.9250.3780.3160.920
MSC + CARS0.6050.4790.8270.6140.4900.814
MSC + RF0.4180.3210.9180.4280.3270.914
MSC + our algorithm0.3410.2550.9330.3590.2610.928
S–G + CCM0.3760.2760.9120.3800.2880.902
S–G + SPA0.3910.3030.8940.4050.3190.889
S–G + CARS0.6140.4650.7380.6390.4750.723
S–G + RF0.6070.4470.7590.6110.4580.747
S–G + our algorithm0.4560.2580.9160.4670.2680.905
SW-NIR
Spectral Band
MSC + CCM0.5620.4510.8390.5720.4550.827
MSC + SPA0.5310.4080.8710.5430.4230.851
MSC + CARS0.7520.5990.7980.7860.6150.785
MSC + RF0.6610.4020.8220.6700.4100.805
MSC + our algorithm0.4730.2570.9250.4880.2660.911
S–G + CCM0.6370.4610.7320.6490.4800.715
S–G + SPA0.4570.3210.8730.4620.3340.855
S–G + CARS0.5520.3590.7900.5690.3710.781
S–G + RF0.7180.5250.6520.7380.5440.631
S–G + our algorithm0.4900.3010.9030.4950.3040.892
Table 4. Influence of different feature extraction algorithms on SSC prediction.
Table 4. Influence of different feature extraction algorithms on SSC prediction.
AlgorithmsBandsCharacteristic Band Group Wavelength (nm)DimensionRMSEPMAEPR2P
Group 1Group 2Group 3Group 4
CCMVIS-NIR Spectral Band430–488636–666702–925968–1038790.4200.3390.901
SPA440–524596–646707–846957–1038740.3780.3160.920
CARS450–542596–646707–867957–1038800.6140.4900.814
RF430–483631–656707–925973–1038750.4280.3270.914
Ours430–498552–608756–798952–1038530.3590.2610.928
CCMSW-NIR
Spectral Band
1167–12391253–13581489–15731601–16101590.5720.4550.827
SPA1066–11501165–13331367–14671518–15682360.5430.4230.851
CARS1068–11491159–13531363–14761576–16102480.7860.6150.785
RF1024–11501164–13501507–15681591–16102310.6700.4100.805
Ours1182–12261281–13451526–15731595–16101000.4880.2660.911
Table 5. Influence of different regression models on SSC prediction.
Table 5. Influence of different regression models on SSC prediction.
Regression ModelCharacteristic Band Group Wavelength (nm)DimensionRMSEPMAEPR2P
Group 1Group 2Group 3Group 4
PLSR430–498552–608756–798952–1038530.3590.2610.928
XGBoost432–498556–592756–798966–1028430.4030.1910.927
CatBoost432–498562–592742–786972–1018390.3650.2310.942
Ours452–498538–566756–782982–1032310.2520.1890.957
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhang, X.; Zhou, C.; Sun, Q.; Liu, Y.; Yang, Y.; Zhuang, Z. Prediction of Solid Soluble Content of Green Plum Based on Improved CatBoost. Agriculture 2023, 13, 1122. https://doi.org/10.3390/agriculture13061122

AMA Style

Zhang X, Zhou C, Sun Q, Liu Y, Yang Y, Zhuang Z. Prediction of Solid Soluble Content of Green Plum Based on Improved CatBoost. Agriculture. 2023; 13(6):1122. https://doi.org/10.3390/agriculture13061122

Chicago/Turabian Style

Zhang, Xiao, Chenxin Zhou, Qi Sun, Ying Liu, Yutu Yang, and Zilong Zhuang. 2023. "Prediction of Solid Soluble Content of Green Plum Based on Improved CatBoost" Agriculture 13, no. 6: 1122. https://doi.org/10.3390/agriculture13061122

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop