Grapevine variety identification using “Big Data” collected with miniaturized spectrometer combined with support vector machines and convolutional neural networks

doi:10.1016/j.compag.2019.104855

Computers and Electronics in Agriculture

Volume 163, August 2019, 104855

https://doi.org/10.1016/j.compag.2019.104855 Get rights and content

Highlights

•
Data gathered for an unprecedented number of varieties, 64.
•
Use of unprecedented number of samples, 35833.
•
First use of Convolutional Neural Networks in grapevine variety identification.
•
Test AUROC for Touriga Nacional identification was 0.7922.
•
Test AUROC for Touriga Franca identification was 0.9803.

Abstract

Several experiments have been previously reported suggesting that the application of spectroscopy and machine learning allows the identification of grapevine varieties, however, up to now, the maximum number of varieties separated was twenty and the total number of sample spectra used does not go beyond the few hundreds. The present work aim is to answer the question: Is it possible to separate one variety from an enlarged group of other varieties when the number of samples is also significantly increased? With this in mind, a total of 35,833 spectra from leaves of 626 plants from 64 varieties were gathered for the study. This is a non-trivial evolution from previous works because it originates an increase in the variability of spectra which brings in a higher risk that a significant percentage of spectra of different varieties are equal and cannot be separated. Simultaneously, it was studied if a miniaturized and easy to use spectrometer could deliver data whose quality was enough to allow varieties separation even with data being collected in the field, non-destructively, and under uncontrolled solar lighting. This data was used to build support vector machines and convolutional neural networks for separating Touriga Nacional from 63 other varieties (including Touriga Franca) or Touriga Franca from 63 varieties (including Touriga Nacional), and the classification efficiencies are analysed.

Introduction

Variety identification is an important topic in viticulture because the wine quality potential is variety dependent and also because the consumers know and want wines of certain varieties which affects the price of grapes. It is therefore important to have methods that ensure trueness-to-type of plants that come out of nurseries. Conventionally, this variety identification is done using ampelography and ampelometry (Tomic et al., 2013) where an expert analyses and measures tens of grapevine features. However, the large number of features to analyse and the similarities between varieties make this a hard and laborious process that cannot be applied to hundreds of plants in a short time period. In addition, training a good ampelographer can take years. Even though ampelography and ampelometry are widely accepted and reliable methods there have been famous cases where producers thought that they were producing a certain variety but were in fact producing another (Tassie, 2010). This can have high costs for these producers due to the influence of the grapes variety on their commercial value. More recently, new DNA based methods (Tomic et al., 2013) have been developed that in spite of being highly reliable are still slow and expensive which prevents their extensive use. With the objective of creating simpler methods, in the last few years, spectroscopic methods have been combined with machine learning methods with promising results (Gutiérrez et al., 2015a, Gutiérrez et al., 2016, Gutiérrez et al., 2015b, Cao et al., 2010, Arana et al., 2005, Diago et al., 2013, Yang et al., 2012). Spectroscopy and machine learning has also been applied to grapevine clone identification but this topic is beyond the scope of the present article (Fernandes et al., 2014).

Spectroscopy measures how electromagnetic radiation interacts with matter, i.e., how this radiation is absorbed or not depending on its wavelength. The ratio between the amount of light incident on a material and the amount of light coming from the material is called reflectance, and its plot versus the wavelength is called a reflectance spectrum. The necessity to use machine learning algorithms to process spectroscopic data comes from the large amount of information that this data contains. These algorithms learn from the spectral data to distinguish between different varieties. The current state-of-the-art in variety identification using spectroscopy and machine learning is described in Fernandes et al. (2018). Gutiérrez et al. (2015a) which separated 20 varieties and Cao et al. (2010) that used 197 samples for a single variety and 439 samples in total set the state-of-the-art in number of separated varieties and samples employed. The present work boosts these values by distinguishing samples of a certain variety from those of 63 other varieties; each variety to separate has more than 3000 samples and the total number of samples available for all varieties is 35833. This means a three-fold increase in the number of varieties, a 17-fold increase in number of samples in a variety and an 80-fold increase in the total number of samples employed. This leads to a more realistic and harder to solve problem than those in previously reported works. The reason is that there is an increased probability that the used dataset contains equal spectra from different varieties. This increase in number of varieties and samples also means an important step towards the creation of a robust grapevine variety identification system that can be commercialized. The present work reports the construction of machine learning classifiers capable of separating Touriga Franca (TF_var) or Touriga Nacional (TN_var) from the remaining varieties. When separating TF_var, TN_var was added to the set of remaining varieties and vice-versa. The classifiers used were Support Vector Machines (SVM) and Convolutional Neural Networks (CNN) and their results will be compared. To the best of the authors’ knowledge this is the first time that CNN are being employed in grapevine variety identification even though they have been used once in rice variety identification (Qiu et al., 2018). The built classifiers were of the one-vs.-all type meaning that they are binary and indicate if a spectrum belongs to a certain variety or not. The classifiers were tested with data gathered in a different day of the training and validation data to minimize the influence, on the spectra separation, of environmental or biological parameters specific to a certain day. The choice of TF_var and TN_var as the main varieties to be separated has to do with their importance in Portugal in the production of the worldwide famous Port wine. In fact, TF_var and TN_var have each 7% (Ranking de castas, 2017), of the total grapevine area planted in Portugal making them the two most planted Portuguese autochthonous varieties in the country. Portugal which has one of the largest pools of autochthonous grapevine varieties in the world, 239 (Cunha et al., 2016), is actively working towards their preservation and dissemination.

Section snippets

Samples

The spectroscopic measurements of leaves were done in the 25th, 26th, 27th and 28th of July of 2017, in Dois Portos, Portugal, 39°02′34.03″N 9°10′57.41″W, in the Portuguese ampelographic collection planted in 1988 at INIAV - Instituto Nacional de Investigação Agrária e Veterinária (www.iniav.pt). There was no precipitation during these days. The measurements were done in the field, non-destructively, i.e. no part of the grapevine was removed for measurement, and without touching the grapevines.

Results

This section contains the results of the attempts to create two classifiers able to separate Touriga Nacional or Touriga Franca from the remaining varieties. In the case of the classifier for Touriga Nacional, Touriga Franca was included in the remaining varieties and vice-versa. Support vector machines (SVM) and convolutional neural networks (CNN) were both tested for each classifier.

Discussion

In the present work, the analysed samples were leaves with the spectra being collected non-destructively in the field. Gutiérrez et al., 2015a, Gutiérrez et al., 2016, Gutiérrez et al., 2015b, in three different works, has also collected leaf spectra non-destructively, however, in the present work, contact to the sample was unnecessary, contrarily to Gutiérrez et al., 2015a, Gutiérrez et al., 2016, Gutiérrez et al., 2015b works, allowing therefore a faster sample collection. This was rather

Relevance of the developed method

Up to now the available methods for variety identification, ampelography and DNA based analysis are not effective in terms cost or measurement time. Ampelography requires a long training time of the experts in order to be safely applied and the analysis of each plant cannot be done in a few seconds because it requires analysing various plant traits. DNA analysis cannot be made in the field neither in a few minutes; it can only be applied by highly trained personnel in well equipped

Conclusions

The present work has shown that it is possible to separate spectra of leaves from the grapevine varieties Touriga Nacional (TN_var) or Touriga Franca (TF_var) from spectra of 62 other varieties plus TF_var or TN_var, respectively, when more than 35,000 spectra are used, even though the efficiency of this separation can be rather different depending on the varieties used. The work has also shown that it is possible to collect these large amounts of data in a relatively small amount of time, namely

Acknowledgements

The authors thank Mr. António Manuel Fernandes for his help in all the logistics related to the experiments. Armando Fernandes acknowledges a post doctoral grant with number SFRH/BPD/108060/2015 from Fundação para a Ciência e a Tecnologia. The authors acknowledge financial support through projects: National Funds by FCT - Portuguese Foundation for Science and Technology, under the project UID/AGR/04033/2019; project INTERACT – “Integrative Research in Environment, Agro-Chains and Technology”,

References (19)

F. Cao et al.
Soluble solids content and pH prediction and varieties discrimination of grapes based on visible–near infrared spectroscopy
Comput. Electron. Agric.
(2010)
M.P. Diago et al.
Identification of grapevine varieties using leaf spectroscopy and partial least squares
Comput. Electron. Agric.
(2013)
A. Fernandes et al.
Assessment of grapevine variety discrimination using stem hyperspectral data and AdaBoost of random weight neural networks
Appl. Soft Comput.
(2018)
V. Gomes et al.
Characterization of neural network generalization in the determination of pH and anthocyanin content of wine grape in new vintages and varieties
Food Chem.
(2017)
Å. Rinnan et al.
Review of the most common pre-processing techniques for near-infrared spectra
TrAC Trends Anal. Chem.
(2009)
I. Arana et al.
Maturity, variety and origin determination in white grapes (Vitis vinifera L.) using near infrared reflectance technology
J. Infrared Spectrosc.
(2005)
R. Butt
Introduction to Numerical Analysis using MATLAB
(2010)
C.-C. Chang et al.
LIBSVM: A library for support vector machines
ACM Trans. Intell. Syst. Technol.
(2011)
J. Cunha et al.
Characterisation of the Portuguese grapevine germplasm with 48 single-nucleotide polymorphisms
Aust. J. Grape Wine Res.
(2016)

There are more references available in the full text version of this article.

Cited by (29)

Evaluating the generalization ability of deep learning models: An application on sugar content estimation from hyperspectral images of wine grape berries
2024, Expert Systems with Applications
The assessment of grape ripeness is an extremely important factor in winemaking and has a direct impact on wine quality. This process is usually carried out with a traditional laboratory analysis, a costly procedure that destroys the grapes selected for analysis. Consequently, the research in precision viticulture has shifted focus to the development of digital processes that are fast and non-intrusive. In this context, the use of hyperspectral imaging paired with prediction models for the estimation of oenological parameters has gained wide recognition. The major drawback of these solutions is the extreme variability presented by the data, aligned with a small number of samples for training, derived from the high cost of acquiring new samples infield. Achieving a satisfactory generalization capacity while working on small data sets with such high variability is a serious challenge, and in this work we aim to provide a pipeline on how to properly build validation and test sets that allow for a correct evaluation of performance, avoiding common misconceptions such as using the $R^{2}$ metric for model selection or creating models based on biased data sets. Additionally, we implement and evaluate different architectures, namely Residual Networks, InceptionTime and MiniRocket, to showcase that deep learning techniques can accurately predict sugar content from different vintages and varieties of wine grape berries, maintaining a strong generalization capacity even in a setting of high variability and small number of samples. Finally, our results also suggest that adding more relevant features to better characterize the data might be enough for the networks to adjust and produce accurate estimates of sugar content, which would eliminate the necessity to capture new samples on a yearly basis.
Prediction of physical attributes in fresh grapevine (Vitis vinifera L.) organs using infrared spectroscopy and chemometrics
2024, Vibrational Spectroscopy
Spectra obtained from fresh grapevine organs provide information on chemical composition but could also contain valuable information on the morphological and physical attributes. The prediction of grapevine organs physical attributes using infrared spectroscopy is explored for the first time in this study. Near infrared spectroscopy (NIR) using a solid probe (NIR-SP) and a rotating integrating sphere (NIR-RS) and mid infrared (MIR) were used to obtain spectra from fresh and intact grapevine shoots, leaves, and berries. Linear partial least squares (PLS) and non-linear least absolute shrinkage and selection operator (LASSO), and extreme gradient boost (XGBoost) were implemented to predict relevant physical attributes in grapevine organs. NIR-RS using XGBoost showed coefficients of determination in validation (R²val) of 91.01% and root mean square error of prediction (RMSEP) of 0.71 mm (6.80%) for berry diameter. Shoot diameter was predicted at R²val of 62.08% and RMSEP at 0.82 mm (12.75%) using NIR-RS with LASSO regression. Monitoring these attributes throughout the growing season can lead to important viticultural information on grapevine yield, growth, and health.
Standardized precipitation evapotranspiration index (SPEI) estimated using variant long short-term memory network at four climatic zones of China
2023, Computers and Electronics in Agriculture
Although the accurate prediction of the Standardized Precipitation Evapotranspiration Index (SPEI) is considered meaningful in reducing drought losses, its wide applications are limited to substantial meteorological data requirements. Considering Long Short-Term Memory network (LSTM) has proved its potential in estimating drought index, the concern is justified regarding the performance of its variants for estimating SPEI using limited meteorological input at the national level. Therefore, this study established the SPEI models using empirical methods, SVM, RNN, LSTM, BiLSTM, and CNN-LSTM, respectively, to cope with different data-missing scenarios. Based on a comprehensive comparison among different methods for multiscale SPEI estimation at four climatic zones of China, the results showed that BiLSTM was the most recommended model for estimating SPEI at 3-month timescale, with R², NSE, and RMSE ranging 0.916–0.997, 0.907–0.997, and 0.143–0.353, respectively. Whereas CNN-LSTM was more suitable for other timescales, with R², NSE, and RMSE being 0.904–0.999, 0.858–0.989, and 0.145–0.365 for estimating SPEI at 6-month timescale, respectively, and 0.858–0.998, 0.795–0.991, and 0.081–0.568 for estimating SPEI at 12-month timescale. Generally, the accuracy performance of SPEI methods can be ranked from best to worst as LSTM-type models, SVM, RNN, and empirical methods. The exception was that H-S exceeded RNN3 in TCZ and MPZ by 4.1–30.0 % for R², 4.1–65.0 % for NSE, and 2.3–19.6 % for RMSE, respectively. Moreover, this study found that the accuracy performance of machine learning models for SPEI estimation got worse with the number of independent variables decreased. Overall, the variants of LSTM exerted excellent performance for multiscale SPEI estimation, which provided the most accurate prediction of meteorological, agroecological, and hydrological droughts throughout China.
Boosting the performance of SOTA convolution-based networks with dimensionality reduction: An application on hyperspectral images of wine grape berries
2023, Intelligent Systems with Applications
Precision viticulture is an area that is very dependent on methods that allow for a sustainable assessment of grape maturity and, in this work, we apply two state-of-the-art (SOTA) convolution-based networks, namely InceptionTime and OmniScale 1D-CNN, to hyperspectral images of wine grape berries to estimate sugar content. Since attaining generalization capacity and processing the information in such high-dimensional data are the two biggest challenges to overcome in problems of this nature, we also study the impact of two dimensionality reduction techniques, Principal Component Analysis (PCA) and t-Distributed Stochastic Neighbor Embedding (t-SNE), on the models' performance. Both models underwent different tests with different vintages and varieties of wine grapes in the training/validation steps, as to form a true test to their generalization capacity. Our results show that both PCA and t-SNE succeed in improving the performance of these deep networks when an adequate number of components is chosen that minimizes the ratio between information loss and removing redundant features: additionally, both techniques significantly reduce computational cost, a very important trait when training deep learning models. Both models showed good generalization ability with very competitive results across different varieties and vintages even despite their significant differences in variability, which is an indicator that a relationship between spectras can be found that is reflected on sugar content values.
Smart applications and digital technologies in viticulture: A review
2021, Smart Agricultural Technology
It is important to continuously monitor the long-term impact of viticultural management practices and assess opportunities for improving the environmental footprint of vineyard operations. This is particularly relevant to the wine industry as growers face disruptive challenges caused by climate change, shortages of labour and escalating production costs. In recent years there has been considerable development and testing of non-invasive digital technologies, some of which have already demonstrated an improvement in the way that wine grapes are grown, managed and harvested to produce quality wines in a manner that is both environmentally and economically sustainable. In this paper, we describe a number of sensing technologies including spectroscopy, multispectral and hyperspectral imaging, chlorophyll fluorescence, thermography, electrical resistivity, laser imaging detection and ranging, and computer vision and the platforms where they are generally mounted or embedded for either proximal or remote monitoring. Artificial intelligence is also discussed as it is useful as a means of transforming data into different pieces of information used by the grape grower for making informed decisions. A key objective of using these technologies is to obtain and supply data and information to grape growers and wine producers as a basis for improving land and vine management through a more-informed decision-making process. The current and future application of these technologies and artificial intelligence in vineyards are discussed in relation to soil properties and topography, vegetative growth, canopy architecture, nutrient and water status, pests and diseases, crop forecasting, yield and fruit composition, vineyard sampling, targeted management and selective harvesting. The principles behind how these technologies operate are also described. While the technologies have enormous potential for growers, their adoption and use will depend on user-friendly software and devices, together with affordable costs, at the field scale.
Analyzing the Fine Tuning's impact in Grapevine Classification
2021, Procedia Computer Science
Wine is one the most important products from Portugal, being the grapevine variety very important to ensure uniqueness, authenticity and classification. In the Douro Demarcated Region, only certain grapevine varieties are allowed, implying the need for an identification mechanism. The ampelographs, professionals that use visual analysis to classify grapevines, are disappearing. In this situation, one possible replacement for ampelographs can be deep learning models. In previous experiments, we successfully classified 12 grapevines varieties, fine-tuning the Xception model, achieving ~0.9 in F1 score, raising the question, “what is the impact of the fine-tuning layers’ configuration in our results?”.
This paper presents an analysis of the impact of different layers’ configuration in fine-tuning Xception model to classify 12 grapevine varieties with images acquired in a natural environment. Despite the model achieved F1-score of 0.92 in all configurations, using the Grad-CAM approach, we show that layers’ configuration in fine-tuning implies the quality of the models’ prediction. As analysis’ result, we can see that the model acting as feature extractor and fully fine-tuned obtains similar results in terms of metrics and pixel contribution, and fine-tuning only the last two blocks lead the model to look at more features in the image.

View all citing articles on Scopus

View full text

Original papersGrapevine variety identification using “Big Data” collected with miniaturized spectrometer combined with support vector machines and convolutional neural networks

Highlights

Abstract

Introduction

Section snippets

Samples

Results

Discussion

Relevance of the developed method

Conclusions

Acknowledgements

Comput. Electron. Agric.

Comput. Electron. Agric.

Appl. Soft Comput.

Food Chem.

TrAC Trends Anal. Chem.

Maturity, variety and origin determination in white grapes (Vitis vinifera L.) using near infrared reflectance technology

J. Infrared Spectrosc.

Introduction to Numerical Analysis using MATLAB

LIBSVM: A library for support vector machines

ACM Trans. Intell. Syst. Technol.

Characterisation of the Portuguese grapevine germplasm with 48 single-nucleotide polymorphisms

Aust. J. Grape Wine Res.

Original papers
Grapevine variety identification using “Big Data” collected with miniaturized spectrometer combined with support vector machines and convolutional neural networks