Original papers
Grapevine variety identification using “Big Data” collected with miniaturized spectrometer combined with support vector machines and convolutional neural networks

https://doi.org/10.1016/j.compag.2019.104855Get rights and content

Highlights

  • Data gathered for an unprecedented number of varieties, 64.

  • Use of unprecedented number of samples, 35833.

  • First use of Convolutional Neural Networks in grapevine variety identification.

  • Test AUROC for Touriga Nacional identification was 0.7922.

  • Test AUROC for Touriga Franca identification was 0.9803.

Abstract

Several experiments have been previously reported suggesting that the application of spectroscopy and machine learning allows the identification of grapevine varieties, however, up to now, the maximum number of varieties separated was twenty and the total number of sample spectra used does not go beyond the few hundreds. The present work aim is to answer the question: Is it possible to separate one variety from an enlarged group of other varieties when the number of samples is also significantly increased? With this in mind, a total of 35,833 spectra from leaves of 626 plants from 64 varieties were gathered for the study. This is a non-trivial evolution from previous works because it originates an increase in the variability of spectra which brings in a higher risk that a significant percentage of spectra of different varieties are equal and cannot be separated. Simultaneously, it was studied if a miniaturized and easy to use spectrometer could deliver data whose quality was enough to allow varieties separation even with data being collected in the field, non-destructively, and under uncontrolled solar lighting. This data was used to build support vector machines and convolutional neural networks for separating Touriga Nacional from 63 other varieties (including Touriga Franca) or Touriga Franca from 63 varieties (including Touriga Nacional), and the classification efficiencies are analysed.

Introduction

Variety identification is an important topic in viticulture because the wine quality potential is variety dependent and also because the consumers know and want wines of certain varieties which affects the price of grapes. It is therefore important to have methods that ensure trueness-to-type of plants that come out of nurseries. Conventionally, this variety identification is done using ampelography and ampelometry (Tomic et al., 2013) where an expert analyses and measures tens of grapevine features. However, the large number of features to analyse and the similarities between varieties make this a hard and laborious process that cannot be applied to hundreds of plants in a short time period. In addition, training a good ampelographer can take years. Even though ampelography and ampelometry are widely accepted and reliable methods there have been famous cases where producers thought that they were producing a certain variety but were in fact producing another (Tassie, 2010). This can have high costs for these producers due to the influence of the grapes variety on their commercial value. More recently, new DNA based methods (Tomic et al., 2013) have been developed that in spite of being highly reliable are still slow and expensive which prevents their extensive use. With the objective of creating simpler methods, in the last few years, spectroscopic methods have been combined with machine learning methods with promising results (Gutiérrez et al., 2015a, Gutiérrez et al., 2016, Gutiérrez et al., 2015b, Cao et al., 2010, Arana et al., 2005, Diago et al., 2013, Yang et al., 2012). Spectroscopy and machine learning has also been applied to grapevine clone identification but this topic is beyond the scope of the present article (Fernandes et al., 2014).

Spectroscopy measures how electromagnetic radiation interacts with matter, i.e., how this radiation is absorbed or not depending on its wavelength. The ratio between the amount of light incident on a material and the amount of light coming from the material is called reflectance, and its plot versus the wavelength is called a reflectance spectrum. The necessity to use machine learning algorithms to process spectroscopic data comes from the large amount of information that this data contains. These algorithms learn from the spectral data to distinguish between different varieties. The current state-of-the-art in variety identification using spectroscopy and machine learning is described in Fernandes et al. (2018). Gutiérrez et al. (2015a) which separated 20 varieties and Cao et al. (2010) that used 197 samples for a single variety and 439 samples in total set the state-of-the-art in number of separated varieties and samples employed. The present work boosts these values by distinguishing samples of a certain variety from those of 63 other varieties; each variety to separate has more than 3000 samples and the total number of samples available for all varieties is 35833. This means a three-fold increase in the number of varieties, a 17-fold increase in number of samples in a variety and an 80-fold increase in the total number of samples employed. This leads to a more realistic and harder to solve problem than those in previously reported works. The reason is that there is an increased probability that the used dataset contains equal spectra from different varieties. This increase in number of varieties and samples also means an important step towards the creation of a robust grapevine variety identification system that can be commercialized. The present work reports the construction of machine learning classifiers capable of separating Touriga Franca (TFvar) or Touriga Nacional (TNvar) from the remaining varieties. When separating TFvar, TNvar was added to the set of remaining varieties and vice-versa. The classifiers used were Support Vector Machines (SVM) and Convolutional Neural Networks (CNN) and their results will be compared. To the best of the authors’ knowledge this is the first time that CNN are being employed in grapevine variety identification even though they have been used once in rice variety identification (Qiu et al., 2018). The built classifiers were of the one-vs.-all type meaning that they are binary and indicate if a spectrum belongs to a certain variety or not. The classifiers were tested with data gathered in a different day of the training and validation data to minimize the influence, on the spectra separation, of environmental or biological parameters specific to a certain day. The choice of TFvar and TNvar as the main varieties to be separated has to do with their importance in Portugal in the production of the worldwide famous Port wine. In fact, TFvar and TNvar have each 7% (Ranking de castas, 2017), of the total grapevine area planted in Portugal making them the two most planted Portuguese autochthonous varieties in the country. Portugal which has one of the largest pools of autochthonous grapevine varieties in the world, 239 (Cunha et al., 2016), is actively working towards their preservation and dissemination.

Section snippets

Samples

The spectroscopic measurements of leaves were done in the 25th, 26th, 27th and 28th of July of 2017, in Dois Portos, Portugal, 39°02′34.03″N 9°10′57.41″W, in the Portuguese ampelographic collection planted in 1988 at INIAV - Instituto Nacional de Investigação Agrária e Veterinária (www.iniav.pt). There was no precipitation during these days. The measurements were done in the field, non-destructively, i.e. no part of the grapevine was removed for measurement, and without touching the grapevines.

Results

This section contains the results of the attempts to create two classifiers able to separate Touriga Nacional or Touriga Franca from the remaining varieties. In the case of the classifier for Touriga Nacional, Touriga Franca was included in the remaining varieties and vice-versa. Support vector machines (SVM) and convolutional neural networks (CNN) were both tested for each classifier.

Discussion

In the present work, the analysed samples were leaves with the spectra being collected non-destructively in the field. Gutiérrez et al., 2015a, Gutiérrez et al., 2016, Gutiérrez et al., 2015b, in three different works, has also collected leaf spectra non-destructively, however, in the present work, contact to the sample was unnecessary, contrarily to Gutiérrez et al., 2015a, Gutiérrez et al., 2016, Gutiérrez et al., 2015b works, allowing therefore a faster sample collection. This was rather

Relevance of the developed method

Up to now the available methods for variety identification, ampelography and DNA based analysis are not effective in terms cost or measurement time. Ampelography requires a long training time of the experts in order to be safely applied and the analysis of each plant cannot be done in a few seconds because it requires analysing various plant traits. DNA analysis cannot be made in the field neither in a few minutes; it can only be applied by highly trained personnel in well equipped

Conclusions

The present work has shown that it is possible to separate spectra of leaves from the grapevine varieties Touriga Nacional (TNvar) or Touriga Franca (TFvar) from spectra of 62 other varieties plus TFvar or TNvar, respectively, when more than 35,000 spectra are used, even though the efficiency of this separation can be rather different depending on the varieties used. The work has also shown that it is possible to collect these large amounts of data in a relatively small amount of time, namely

Acknowledgements

The authors thank Mr. António Manuel Fernandes for his help in all the logistics related to the experiments. Armando Fernandes acknowledges a post doctoral grant with number SFRH/BPD/108060/2015 from Fundação para a Ciência e a Tecnologia. The authors acknowledge financial support through projects: National Funds by FCT - Portuguese Foundation for Science and Technology, under the project UID/AGR/04033/2019; project INTERACT – “Integrative Research in Environment, Agro-Chains and Technology”,

References (19)

There are more references available in the full text version of this article.

Cited by (29)

View all citing articles on Scopus
View full text