Self-adaptive models for predicting soluble solid content of blueberries with biological variability by using near-infrared spectroscopy and chemometrics

https://doi.org/10.1016/j.postharvbio.2020.111286Get rights and content

Highlights

  • A self-adaptive model for SSC prediction was proposed.

  • Five correcting methods were used for biological variability eliminating.

  • The proposed method can select the optimal model adaptively.

  • The Self-selection strategy guaranteed the reliability of the selected model.

Abstract

Biological variability is the natural characteristic of agricultural products. Non-destructive determination of fruit/vegetable soluble solid content (SSC) using spectral detection method is still a challenge due to the spectral variation caused by abundant biological variations, such as different cultivars, geographic origins and harvest seasons. In this paper, a self-adaptive model was established by combining five correcting methods for biological variability elimination, self-selection strategy and model search technology. Thus, the model can automatically adapt to the change of diverse biological variation compared to others. Furthermore, 100 cycles of selection accompanied with the random algorithm were set up to randomly select the calibration sets and prediction sets to ensure the reliability of the results. For the same batch of blueberry samples, five correcting models showed different prediction performances and all achieved satisfactory prediction accuracy compared to the individual-variation model and the hybrid-variation model. The consequence of the self-adaptive model showed consistency when considering multiple variation as well as variation with only cultivars or seasons. The best models in the three cases (multiple variation, only cultivars and only seasons) were all based on the preprocessing method, which was selected for 70, 57 and 47 times respectively. The results indicated that the biological variability had an impact on SSC prediction and that correcting models could improve the prediction accuracy. For the blueberry samples, the most suitable model selected according to the adaptive results was the preprocessing-based model. Within the study conditions, the self-adaptive model can select the most reliable model with the best prediction performance with respect to different variations.

Introduction

Blueberries are known as “the queen of fruits” and “the king of berries” with a high economic value worldwide (Wang et al., 2018; Hu et al., 2016a; Li et al., 2019b). They contain valuable antioxidant and flavonoid that can enhance immunity, delay nerve senescence as well as prevent cardiovascular disease (Neto, 2007; Kalt and Dufour, 1997; Jiang et al., 2019). Soluble solids content (SSC) is a critical internal quality in assessing fruit ripeness and harvest time. Also, it is an important indicator to determine fruit storage and consumer preference (Zhang et al., 2017; Fan et al., 2015; Li and Colin, 2014). Therefore, a rapid, nondestructive method to accurately measure SSC would be useful.

Near-infrared (NIR) spectroscopy has been widely accepted for rapid detection of internal qualities in fruits, such as apple, pear and melon. (Liu and Ying, 2005; Xiaobo et al., 2007; Nicolaï et al., 2008; McGlone and Kawano, 1998; An et al., 2020). NIR combined with computer technology, spectroscopy as well as chemometrics is becoming an efficient and fast analytical technique for evaluating internal qualities of fruit and vegetables (Roggo et al., 2007). In NIR spectroscopy, the matter is irradiated by a continuous beam of NIR radiation to measure its reflected or transmitted radiation. Its spectral properties change with the wavelength related scattering and absorption processes when the radiation pierces the matter. This change depends on the chemical composition of the matter and the light scattering characteristic associated with its microstructure. (Nicolai et al., 2007). Hydrogen groups (Csingle bondH, NH and OHsingle bondsingle bond) in organic molecules absorb NIR at different wavelengths and intensities (Li et al., 2016), and therefore SSC and other chemical information can be reflected according to the absorption. NIR can be measured by reflectivity, transmittivity and interaction patterns, to make light levels more relevant, Fu et al. (2006) improved the spectra acquisition method for detection of pear SSC and obtained NIR spectroscopy conveniently and accurately by using diffuse reflection measurement. In addition, Wang et al. (2019) used Monte Carlo outlier detection method to remove abnormal samples as well as optimize the sample set, and the reliability of the calibration model was further improved. Zhang et al. (2017) combined NIR technique and wavelength selection algorithm to screen the most effective wavelength, eliminating the interference information in apple SSC determination and simplifying the model to ensure the prediction accuracy. Combined with various chemometric models, including partial least squares regression (PLSR), principal component regression (PCR), and multiple linear regression (MLR), NIR spectroscopy can be used to predict SSC (Zhan et al., 2017). However, many studies have shown that the diversity of cultivar, season, origin and other biological characters would affect the interaction between light and matter as well as reduce the predictive performance of the model (Wold et al., 2001; Hu et al., 2016a,b; Fan et al., 2019). Consequently, the accuracy of fruit SSC prediction is not only related to the veracity of the spectrum, the selection of the optimal wavelength and the stability and reliability of the model, but also highly dependent on the biological variability and sample richness.

The diversity of biological characteristics has great influence on the measurement of SSC. Zhang et al. (2017) pointed out that the cellular structure could lead to the heterogeneity in terms of physical aspect and chemical component, meanwhile, the structure would also be influenced by the biological variability. Simultaneously, the difference in internal quality will affect the propagation characteristics of the incident light (Magwaza et al., 2012; Xia et al., 2019), thus, the variability of biological affects the robustness of SSC prediction model. To ensure the accuracy and universality of the calibration model, biological differences should be considered as an important factor in modeling. To eliminate the effects of biodiversity, five modeling strategies: preprocessing-based, compensation-based, equivalent-based, classification-based and calibration transfer-based methods have been proposed. Zhang et al. (2017) analyzed the performance of various pretreatment methods and their combinations in removing the uninformative biological variability. The method used the preprocessed data to develop models can reduce the effect of noise and unrelated physical factors. Tian et al. (2018) built two types of models to reduce the influence of the spectral measurement position, i.e. the surface features of the fruit. One was the compensation-based prediction model which took all the spectral data from all locations as the input simultaneously, another one was the equivalent-based model which took the average data of every location at each wavelength as the input. Lyu et al. (2015) and Bai et al. (2019) utilized various cultivar discriminant methods such as linear discriminant analysis (LDA), support vector machine (SVM), fingerprint features and deep learning to build classification model to decrease the sensitivity to cultivar. Fan et al. (2019) established the slope and bias correction model with the calibration transfer method based on the predicted results rather than the spectral space to eliminate the influence of seasonal diversity. The models mentioned above can solve a certain single biological difference and specifically eliminate the effects of cultivar, season and origin, but neither of those models is universal for all biological variations and needs to be upgraded or reconstructed when new samples are collected. The existing solutions cannot automatically adapt to the variation of differences to improve the detection accuracy. Hence, it is important to develop a new model that can adapt to the change of diverse biological variation.

In view of the biological variability, it is critical to self-select the corresponding modeling method with the highest accuracy according to different situations. This study attempts to propose a new strategy: establishment of a self-adaptive model combined with self-selection strategy, NIR technique, chemometric analysis, five correcting methods and model search technology to adapt the changes in biodiversity, improving the prediction performance of the unknown samples and the universality of the calibration model. The specific objectives were to: (1) analyze and investigate the spectral characteristics and differences among seasons and several diverse cultivars of blueberries; (2) establish the above five correcting models and assess the predictive performance of each model; (3) develop the self-adaptive model and select the relevant modeling strategy with the highest accuracy combined with self-selection algorithm for SSC detection in different situations and requirements; (4) compare and evaluate the performance of the self-adaptive selection model based on the results of the cyclic selection.

Section snippets

Blueberry samples

A total of 684 blueberries from three cultivars were collected from Qingdao, China: ‘Bluecrop’ (sample size of 324), ‘Duke’ (sample size of 180) and ‘M2’ (sample size of 180). From season variation aspect, 540 samples were harvested in May 2015 (180 for each cultivar) and 144 “Bluecrop’’ samples were harvested in May 2014. Fruit of uniform size and free of blemish were delivered to the laboratory immediately after harvest and stored at 4 ℃ before experiments. The samples were removed from

Spectra features

The FT-NIR spectra of 684 blueberries from three cultivars in the region of 1000−2400 nm are shown in Fig. 2. Each color interval represents the full spectrum range of the corresponding cultivar or season of blueberries, that is, the lowest absorbance to the highest absorbance of the samples at each wavelength, and the thick solid line within the interval is the average spectrum of each cultivar. Obviously, the average spectral curve trends of fruit with diverse biological variation is similar,

Conclusions

NIR spectroscopy and chemometrics, this paper studies the effect of biodiversity on blueberry SSC prediction and aims to eliminate the effects of biological variability and abiotic information. Although the specific individual-variation model or hybrid-variation model achieved satisfactory accuracy in prediction itself, when applied to the prediction of blueberry SSC in other cultivars or seasons, the accuracy was greatly reduced and the effect was undesirable, indicating that there were

CRediT authorship contribution statement

Wei Zheng: Methodology, Writing - original draft, Writing - review & editing. Yuhao Bai: Conceptualization, Investigation. Hui Luo: Methodology, Writing - original draft. Yuhua Li: Data curation, Supervision. Xi Yang: Supervision, Validation. Baohua Zhang: Project administration, Conceptualization, Funding acquisition, Resources, Supervision.

Declaration of Competing Interest

The authors declare that they have no know competing financial interests or personal relationships that could have appeared of influence the reported in this paper.

Acknowledgments

This work was supported by the National Natural Science Foundation of China (project No. 31901415), the Natural Science Foundation of Jiangsu Province (Grant no. BK20180515), and the Jiangsu Training Programs of Innovation and Entrepreneurship for Undergraduates (project No. 201910307155Y). The authors also wish to give special thanks to Professor Menghan Hu and Professor Shizhuang Weng for their help in data processing and analysis.

References (39)

Cited by (23)

  • Early identification of strawberry leaves disease utilizing hyperspectral imaging combing with spectral features, multiple vegetation indices and textural features

    2023, Computers and Electronics in Agriculture
    Citation Excerpt :

    By resolving the information in the spectral dimension for each pixel in the acquired image, the spatial information of the image can be intuitively observed and the spectral reflectance of the target pixel can be obtained. Subtle changes in spectral reflectance at different wavelengths due to absorption or reflectance provide predictive indicators for the identification of crop diseases (Zheng et al., 2020; Nguyen et al., 2021). Once strawberry leaves are infested with the disease, the host strawberry leaves initiate a protective mechanism and their biochemical and biophysical properties begin to change, thus producing a spectral profile different from that of healthy leaves.

  • The prediction of ripening parameters in Primitivo wine grape cultivar using a portable NIR device

    2022, Journal of Food Composition and Analysis
    Citation Excerpt :

    The chemometric indexes for TSS models of the Primitivo berries indicated that the NIR spectra region (740–1070 nm) could effectively predict the TSS across a wide range, from 4.2 up to 23.8 %, which represent values for the production of very good Primitivo wines such as PDO ones. For fresh fruit, the model failure is commonly due to a high biological variability which can be related to several factors such as: cultivars, site of cultivation, cultural practices, training systems, season of harvest, and ripening stages of fruit (Zheng et al., 2020; Bedbabis et al., 2014; Boselli et al., 2019). Hence, a natural solution to deal with this calibration failure problem is to measure a wide range of samples from different viticultural locations, seasons of harvest (2–3 seasons), training systems and developing/ripening stages to calibrate global models to be used in different wine-producing areas.

  • Nondestructive evaluation of Zn content in rape leaves using MSSAE and hyperspectral imaging

    2022, Spectrochimica Acta - Part A: Molecular and Biomolecular Spectroscopy
    Citation Excerpt :

    It is of great significance to further investigate whether HSI has the potential for nondestructive detection of Zn content in plant leaves. Since each pixel in the HSI contains hundreds of bands, the massive amount of data increases the complexity of feature extraction and modeling [16,17]. Dimensionality reduction methods such as variable iterative space shrinkage approach (VISSA), principal component analysis (PCA), and successive projections algorithm (SPA) have been proposed to reduce the data dimension [18].

View all citing articles on Scopus
View full text