Original papersGrapevine variety identification using “Big Data” collected with miniaturized spectrometer combined with support vector machines and convolutional neural networks
Introduction
Variety identification is an important topic in viticulture because the wine quality potential is variety dependent and also because the consumers know and want wines of certain varieties which affects the price of grapes. It is therefore important to have methods that ensure trueness-to-type of plants that come out of nurseries. Conventionally, this variety identification is done using ampelography and ampelometry (Tomic et al., 2013) where an expert analyses and measures tens of grapevine features. However, the large number of features to analyse and the similarities between varieties make this a hard and laborious process that cannot be applied to hundreds of plants in a short time period. In addition, training a good ampelographer can take years. Even though ampelography and ampelometry are widely accepted and reliable methods there have been famous cases where producers thought that they were producing a certain variety but were in fact producing another (Tassie, 2010). This can have high costs for these producers due to the influence of the grapes variety on their commercial value. More recently, new DNA based methods (Tomic et al., 2013) have been developed that in spite of being highly reliable are still slow and expensive which prevents their extensive use. With the objective of creating simpler methods, in the last few years, spectroscopic methods have been combined with machine learning methods with promising results (Gutiérrez et al., 2015a, Gutiérrez et al., 2016, Gutiérrez et al., 2015b, Cao et al., 2010, Arana et al., 2005, Diago et al., 2013, Yang et al., 2012). Spectroscopy and machine learning has also been applied to grapevine clone identification but this topic is beyond the scope of the present article (Fernandes et al., 2014).
Spectroscopy measures how electromagnetic radiation interacts with matter, i.e., how this radiation is absorbed or not depending on its wavelength. The ratio between the amount of light incident on a material and the amount of light coming from the material is called reflectance, and its plot versus the wavelength is called a reflectance spectrum. The necessity to use machine learning algorithms to process spectroscopic data comes from the large amount of information that this data contains. These algorithms learn from the spectral data to distinguish between different varieties. The current state-of-the-art in variety identification using spectroscopy and machine learning is described in Fernandes et al. (2018). Gutiérrez et al. (2015a) which separated 20 varieties and Cao et al. (2010) that used 197 samples for a single variety and 439 samples in total set the state-of-the-art in number of separated varieties and samples employed. The present work boosts these values by distinguishing samples of a certain variety from those of 63 other varieties; each variety to separate has more than 3000 samples and the total number of samples available for all varieties is 35833. This means a three-fold increase in the number of varieties, a 17-fold increase in number of samples in a variety and an 80-fold increase in the total number of samples employed. This leads to a more realistic and harder to solve problem than those in previously reported works. The reason is that there is an increased probability that the used dataset contains equal spectra from different varieties. This increase in number of varieties and samples also means an important step towards the creation of a robust grapevine variety identification system that can be commercialized. The present work reports the construction of machine learning classifiers capable of separating Touriga Franca (TFvar) or Touriga Nacional (TNvar) from the remaining varieties. When separating TFvar, TNvar was added to the set of remaining varieties and vice-versa. The classifiers used were Support Vector Machines (SVM) and Convolutional Neural Networks (CNN) and their results will be compared. To the best of the authors’ knowledge this is the first time that CNN are being employed in grapevine variety identification even though they have been used once in rice variety identification (Qiu et al., 2018). The built classifiers were of the one-vs.-all type meaning that they are binary and indicate if a spectrum belongs to a certain variety or not. The classifiers were tested with data gathered in a different day of the training and validation data to minimize the influence, on the spectra separation, of environmental or biological parameters specific to a certain day. The choice of TFvar and TNvar as the main varieties to be separated has to do with their importance in Portugal in the production of the worldwide famous Port wine. In fact, TFvar and TNvar have each 7% (Ranking de castas, 2017), of the total grapevine area planted in Portugal making them the two most planted Portuguese autochthonous varieties in the country. Portugal which has one of the largest pools of autochthonous grapevine varieties in the world, 239 (Cunha et al., 2016), is actively working towards their preservation and dissemination.
Section snippets
Samples
The spectroscopic measurements of leaves were done in the 25th, 26th, 27th and 28th of July of 2017, in Dois Portos, Portugal, 39°02′34.03″N 9°10′57.41″W, in the Portuguese ampelographic collection planted in 1988 at INIAV - Instituto Nacional de Investigação Agrária e Veterinária (www.iniav.pt). There was no precipitation during these days. The measurements were done in the field, non-destructively, i.e. no part of the grapevine was removed for measurement, and without touching the grapevines.
Results
This section contains the results of the attempts to create two classifiers able to separate Touriga Nacional or Touriga Franca from the remaining varieties. In the case of the classifier for Touriga Nacional, Touriga Franca was included in the remaining varieties and vice-versa. Support vector machines (SVM) and convolutional neural networks (CNN) were both tested for each classifier.
Discussion
In the present work, the analysed samples were leaves with the spectra being collected non-destructively in the field. Gutiérrez et al., 2015a, Gutiérrez et al., 2016, Gutiérrez et al., 2015b, in three different works, has also collected leaf spectra non-destructively, however, in the present work, contact to the sample was unnecessary, contrarily to Gutiérrez et al., 2015a, Gutiérrez et al., 2016, Gutiérrez et al., 2015b works, allowing therefore a faster sample collection. This was rather
Relevance of the developed method
Up to now the available methods for variety identification, ampelography and DNA based analysis are not effective in terms cost or measurement time. Ampelography requires a long training time of the experts in order to be safely applied and the analysis of each plant cannot be done in a few seconds because it requires analysing various plant traits. DNA analysis cannot be made in the field neither in a few minutes; it can only be applied by highly trained personnel in well equipped
Conclusions
The present work has shown that it is possible to separate spectra of leaves from the grapevine varieties Touriga Nacional (TNvar) or Touriga Franca (TFvar) from spectra of 62 other varieties plus TFvar or TNvar, respectively, when more than 35,000 spectra are used, even though the efficiency of this separation can be rather different depending on the varieties used. The work has also shown that it is possible to collect these large amounts of data in a relatively small amount of time, namely
Acknowledgements
The authors thank Mr. António Manuel Fernandes for his help in all the logistics related to the experiments. Armando Fernandes acknowledges a post doctoral grant with number SFRH/BPD/108060/2015 from Fundação para a Ciência e a Tecnologia. The authors acknowledge financial support through projects: National Funds by FCT - Portuguese Foundation for Science and Technology, under the project UID/AGR/04033/2019; project INTERACT – “Integrative Research in Environment, Agro-Chains and Technology”,
References (19)
- et al.
Soluble solids content and pH prediction and varieties discrimination of grapes based on visible–near infrared spectroscopy
Comput. Electron. Agric.
(2010) - et al.
Identification of grapevine varieties using leaf spectroscopy and partial least squares
Comput. Electron. Agric.
(2013) - et al.
Assessment of grapevine variety discrimination using stem hyperspectral data and AdaBoost of random weight neural networks
Appl. Soft Comput.
(2018) - et al.
Characterization of neural network generalization in the determination of pH and anthocyanin content of wine grape in new vintages and varieties
Food Chem.
(2017) - et al.
Review of the most common pre-processing techniques for near-infrared spectra
TrAC Trends Anal. Chem.
(2009) - et al.
Maturity, variety and origin determination in white grapes (Vitis vinifera L.) using near infrared reflectance technology
J. Infrared Spectrosc.
(2005) Introduction to Numerical Analysis using MATLAB
(2010)- et al.
LIBSVM: A library for support vector machines
ACM Trans. Intell. Syst. Technol.
(2011) - et al.
Characterisation of the Portuguese grapevine germplasm with 48 single-nucleotide polymorphisms
Aust. J. Grape Wine Res.
(2016)
Cited by (29)
Standardized precipitation evapotranspiration index (SPEI) estimated using variant long short-term memory network at four climatic zones of China
2023, Computers and Electronics in AgricultureBoosting the performance of SOTA convolution-based networks with dimensionality reduction: An application on hyperspectral images of wine grape berries
2023, Intelligent Systems with ApplicationsSmart applications and digital technologies in viticulture: A review
2021, Smart Agricultural TechnologyAnalyzing the Fine Tuning's impact in Grapevine Classification
2021, Procedia Computer Science