Introduction

Potato (Solanum tuberosum L.) is currently one of the world's leading staple crops, ranking fourth behind rice, wheat and maize (de Haan and Rodriguez 2016). In 2021, global potato production exceeded 376 million tonnes, cultivated across 18 million hectares approximately. Over the past 20 years, production trends have shown fluctuations. The largest producers are found in Asia and Europe, accounting for over 80% of the world’s potato output. In terms of harvested area, there has been a downward trend with a 6.7% reduction in the area under cultivation from 2001 to 2021 (FAO 2023). Besides, there is a concerning prediction of reduced potato yields due to climate change (Adavi et al. 2018; Egerer et al. 2023). Some researchers have pointed out how an increase in temperature raises evaporation rates, leading to inefficient water use by crops (Haverkort and Verhagen 2008) which, in turn, causes a reduction in irrigation potential, decreasing potato crop yields. Strategies such as delaying planting dates, using early maturing potato varieties or avoiding overlapping growth initiation with temperature peaks could mitigate the adverse effects of climate change on tuber growth (Adavi et al. 2018).

Despite its global importance, potato is a food with a low presence in the diet of developed countries. A daily adult consumption of 50–150 g is estimated (Burgos et al. 2020), mainly supplied by the fast food and snack industries (de Haan and Rodriguez 2016). Conversely, in developing countries, it is considered a staple food, with a daily dietary weight of up to 800 g (Burgos et al. 2020). The fundamental role of this crop in the global food system and its significance in enhancing food security and alleviating poverty is recognized.

Potatoes are a versatile food, rich in carbohydrates and low in fat. Freshly harvested potatoes usually contain 80% water and 20% dry matter, of which 60–80% is starch. The main sugars include sucrose, fructose and glucose. Both starch and sugars could be considered the two most important compounds for assessing the quality of potatoes. Starch influences the texture of cooked products, while sugars directly impact the colour of fried foods (Stark et al. 2020). Among all the amino acids present on potatoes, asparagine is of particular importance in the processing industry. Together with reducing sugars (glucose and fructose), it can trigger the Maillard reaction at high temperatures during frying, leading to acrylamide formation (Friedman 2003). Acrylamide is an organic compound classified as a potent neurotoxin and a possible human carcinogen (Group 2A), so its control in the industry is crucially important (International Agency for Research on Cancer 1986, 1994).

The nutritional composition of potatoes can be affected by pre-harvest conditions such as climate, cultural practices, maturity at harvest or biotic and abiotic stresses. Besides, post-harvest conditions such as processing, storage and transport can affect the potato structure (Burgos et al. 2020).

Instrumental analyses and laboratory analytical techniques are used to determine the main potato compounds and diseases. High Performance Liquid Chromatography (HPLC) and Gas Chromatography (GC)—mass spectrometry are the most widely used techniques to determine sugar content (Chen et al. 2010), acrylamide (Fernandes and Soares 2007; Gökmen et al. 2005; Šimko and Kolarič 2020), phenolic compounds (Barba et al. 2008; Kvasnička et al. 2008), carotenoids (Burgos et al. 2009), vitamins (Han et al. 2004; Juhász et al. 2014) and glycoalkaloids (Deußer et al. 2012; Lachman et al. 2013; Turakainen et al. 2004). However, these methods have some drawbacks. HPLC analytical columns are expensive and have a short operating life. Likewise, solvents are expensive and their disposal may pose a contamination problem (Nie and Nie 2019). GC is limited to volatile samples and often requires mass spectroscopy for peak identification (Feng et al. 2019). In relation to potato diseases, Polymerase Chain Reaction (PCR) and Enzyme-Linked Immunosorbent Assay (ELISA) lifespan techniques have been used to identify potato virus Y (PVY) (MacKenzie et al. 2015), potato mop-top viruses (Arif et al. 2014), Gemini virus (Jeevalatha et al. 2013) and Phytophthora infestans (Niepold and Schöber-Butin 1995). These techniques are costly and time-consuming and require proper sample handling and qualified personnel (Patel et al. 2023).

In a context where crop productions generate high uncertainty due to climate change, the use of tools that enable rapid monitoring of foods throughout the entire agri-food chain is of considerable interest. Studies suggest that the future of potato quality control lies in the adoption of non-destructive, cost-effective and user-friendly techniques for real-time monitoring of quality parameters (Jarén et al. 2016). Hyperspectral imaging (HSI) is a reliable analytical tool for assessing the quality attributes of roots and tubers. It quickly obtains information about their external or internal defects, identifying different quality grades, and physical and chemical characteristics without the need for laborious sample preparation (Su and Sun 2019).

The aim of this work was to evaluate the existing body of literature on the applications of HSI for potato quality control, with particular attention on the analysis of the most important potato compounds. We conducted a search in the principal scientific databases, encompassing all documents from the search strategy. A systematized review protocol was then carried out following the main steps of Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) (Moher et al. 2015). Besides, eligible studies were subjected to our own quality assessment scale. Finally, conclusions were drawn on the current status of the role of HSI in potato quality control. The secondary objective of this work was to identify gaps within this body of literature and to provide suggestions for focusing future research on less explored areas.

Study Design

Study Protocol

The systematized review protocol was developed through some adaptations of Preferred Reporting Items for Systematic Reviews and Meta-Analysis (Moher et al. 2015).

Search Strategy

As an initial step, on June 2, 2023, the studies were searched in two databases: Web of Science and Scopus. These databases were chosen for their relevance in including journals related to image analysis in the context of foods. Among three researchers: Carlos Miguel Peraza-Alemán (CMPA), Ainara López-Maestresalas (ALM) and Silvia Arazuri (SA), it was decided to conduct a comprehensive search encompassing all the studies up to date pertaining to HSI applications in potato quality control.

The following terms were used for the search in Web of Science: ((potato* OR “Solanum tuberosum”) AND (“hyperspectral imaging” OR “hyperspectral image” OR “hyperspectral imagery” OR “imaging spectroscopy” OR “HSI*”)), while for Scopus they were: ((potato* OR {Solanum tuberosum}) AND ({hyperspectral imaging} OR {hyperspectral image} OR {hyperspectral imagery} OR {imaging spectroscopy} OR {HSI*})).

Study Inclusion and Exclusion Criteria

Inclusion and exclusion criteria were established between the aforementioned researchers. Inclusion criteria encompassed the following: (1) studies on HSI applications for predicting potato compounds (e.g. starch, dry matter, sugars, amino acids), texture parameters, crop parameters (e.g. chlorophyll, biomass, abiotic stresses) and the identification of damage and biotic stresses (diseases) or classifications by origin, variety, crop cycle and colour through machine learning or deep learning techniques; (2) journal articles; (3) published in English or Spanish and (4) including at least one numerical dataset suitable for meta-analysis. Exclusion criteria comprised the following: (1) studies in which potatoes were analysed together with other tubers or foods; (2) conference proceedings, annual meetings, books, book chapters, theses, reviews, systematic reviews and systematized reviews and (3) articles in which the study matrix was potato, but the application scope was distant from the prediction of variables of interest.

Study Selection

The selection process consisted of three stages using the Rayyan tool (Ouzzani et al. 2016). The first stage involved removing duplicate documents from both databases. In the second stage, titles and abstracts of all documents were analysed against the inclusion and exclusion criteria and those not meeting the inclusion criteria were discarded. Finally, documents not excluded in the second stage were read in their entirety to make a final decision regarding inclusion or exclusion in the systematized review. All these stages were conducted independently by the three researchers, and in case of discrepancies, a consensus meeting was held.

Quality Assessment of Studies

The quality of the included studies was independently evaluated by three researchers (CMPA, ALM, SA) and discrepancies were discussed and resolved in a subsequent meeting. Given the absence of established quality scales for the food technology field, a proprietary quality scale encompassing key aspects in our review was developed. This scale, available in Supplementary Material (Scale S1), consisted of four criteria: (1) sample related, (2) data collection methodology, (3) data analysis and (4) results. The scale comprised nine questions, grouped among these four criteria. A value of 1 was assigned for a positive reponse (a. Yes), 0 both for a negative reponse (b. No) or when it was unclear in the study (c. Unclear) and, no value (neither 0 nor 1) if the question was not applicable to the study (d. Not applicable). Studies were classified as "high risk" if the percentage of positive reponses (a. Yes) to the total number of applicable questions was below 44%. Furthermore, if the percentage of positive reponses was between 45 and 74%, the study was considered "moderate risk" and, if it exceeded 75%, "low risk".

Data Collection and Extraction

The data extracted was captured in two tables: one for regression-related studies and another one for classifications. Data were extracted from all the articles categorized as “low risk”, “moderate risk” and “high risk”. The tables included the following: (1) reference (authors and publication year), (2) the number of potato varieties, (3) the sample size (either tubers or leaves), (4) variables to predict, (5) prediction methods and, (6) evaluation metrics. For regression models, evaluation metrics included coefficient of determination (R2), root mean square errors (RMSE) and residual predictive deviation (RPD), for calibration (cal), cross-validation (cv) and external validation (val). For classification models the sensitivity, specificity, accuracy and precision values for cal, cv and val were included.

Analysis of the Evidence of the Studies Included in the Systematized Review

The studies included in the systematized review were qualitatively analysed to summarize the main advancement in hyperspectral imaging technology for potato quality control. The most frequently reported compounds in the studies were described and relevant conclusions on the development of this technology were drawn.

Results and Discussion

A total of 451 studies were identified across both databases. Figure 1 illustrates the yearly distribution of documents retrieved in our search. It is evident that this topic is relatively recent. In fact, the earliest article correctly linking the terms HSI as hyperspectral imaging and potato dates back to 2006, while the first study included in the systematized review was published in 2008. Moreover, since 2012, there has been an exponential growth in the number of publications. However, there are only a small number of publications in 2023, as this research covers data up to June 2023.

Fig. 1
figure 1

Total of documents (n = 451) found in the search strategy

Duplicate references were identified and removed (n = 139) using the Rayyan tool (Ouzzani et al. 2016). After a thorough evaluation of titles and abstracts, 63 articles were selected for full-text and eligibility. Finally, 52 articles met the inclusion criteria and were incorporated into the present review. During full-text assessment for eligibility, the percentage of agreement between researchers (CMPA, ALM, SA) was 97.5%. After a meeting, a 100% agreement was reached. The flowchart in Fig. 2 summarises the inclusion selection process.

Fig. 2
figure 2

Flowchart of the steps carried out for the inclusion of the studies

Risk Assessment of Studies

An in-house developed scale, described in Supplementary Material (Scale S1), was used to evaluate the quality of the included studies. Overall, the quality of the studies was outstanding. Forty-nine articles (94.2%) were categorised as "low risk", while three articles (5.8%) as "moderate risk", and no articles were considered "high risk". A more comprehensive analysis of each study is found in Supplementary Material (Table S2). The three researchers reached a consensus of 100% agreement in assigning categories to each article. However, in some cases, the scores did not agree on all assessment criteria.

Data Extraction: Study Characteristics

Data was extracted as described in the “Data Collection and Extraction” section. Tables were categorized into regression and classification models. These tables are available in Supplementary Material (Table S3) for regression models and Supplementary Material (Table S4) for classification models. At this stage, the objective was to extract the most significant results from the models presented by the authors.

Overview of the Findings

A similar number of studies based on regression (n = 28) and classification (n = 26) methods was found. A discussion on the main advances of this technology in potatoes according to the studies analysed is provided below.

Major Components

Moisture and Dry Matter Content

Water content in potato serves as a critical indicator of freshness, making its rapid detection essential for quality control and classification (Xiao et al. 2020). Despite its significance in the dehydration process within the industry, moisture measurement is often performed manually and with limited efficiency (Su and Sun 2016b). However, some studies have demonstrated the potential of HSI to predict it.

Amjad et al. (2018) suggested the inversely proportional relationship between moisture content and relative reflectance in potatoes. Higher moisture content corresponds to lower reflectance, due to increased absorption of light at specific wavelengths, primarily attributable to the presence of more O–H bonds. In general, authors obtained good predicting results for moisture content at different temperatures (50 °C, 60 °C, 70 °C) and different slice thicknesses (5, 7 and 9 mm). They also reported that moisture content distribution within potato slices remained uniform throughout the matrix. In a similar study evaluating potato slices of 5, 7 and 9 mm thickness at a constant drying temperature of 50 °C, researchers obtained accurate prediction models for moisture content (R2val of 0.99 and RMSEval of 0.11%). This suggests that hyperspectral images can provide information on the chemical and physico-chemical changes occurring in potato slices during drying (Moscetti et al. 2018).

However, a recent study encountered challenges when determining moisture content in unpeeled potatoes (Muruganantham et al. 2023). Additionally, the limited sample size (n = 47) raises concerns regarding the practical applicability of the model. Authors stated that a RPD > 1.4 could be considered excellent, contrary to the majority of articles which state that excellent predictions are achieved with RPD values above 3 (Saeys et al. 2005).

In a different work, potato slices were classified based on their moisture content. The Partial Least Squares Discriminant Analysis (PLSDA) model achieved sensitivity and specificity values of 1 for both calibration and cross-validation across all moisture levels studied (Su and Sun 2016b). Other researchers employed two algorithms (Successive Projections Algorithm (SPA) and Competitive Adaptive Reweighted Sampling (CARS)) to extract characteristic wavelengths for estimating water content in potato slices, and Least Squares Support Vector Machine (LSSVM) was used to build regression models. The prediction using the LSSVM-SPA model achieved good results with R2val = 0.791 and a RPD = 2.018 (Xiao et al. 2020).

More recently, the estimation of water content in potatoes using a Partial Least Squares Regression (PLSR) model combined with CARS algorithm in the short-wave infrared (SWIR) region was accomplished obtaining a R2val = 0.9313 and a RPD = 2.7453. The capability of CARS to reduce the dimension and multicollinearity of spectral data was demonstrated (Cui et al. 2022). Zou et al. (2022) used hyperspectral images to predict water content in potato tubers applying several preprocessing methods to eliminate the noise from the original data. The best prediction was obtained using the Extreme Gradient Boost (XGBoost) model with a R2cv of 0.8448 and a RMSEcv of 0.0544%.

Dry matter (DM) determination is an important characteristic for industrial potato applications due to its positive correlation both with the yield of processed products and with the texture of chips and French fries (Genet 1992). In addition, the DM content can also influence the final colour of the chips (Jadhav and Kadam 1998). Kjær et al. (2016) obtained a coefficient of determination of 0.66 for dry matter prediction using HSI. The same authors improved their results in another study with a R2val = 0.726 (Kjær et al. 2017). An experiment was performed to predict and classify DM in intact potatoes using different optical sensing systems (transmittance spectra, interactance spectra and hyperspectral imaging). The measurements at the equatorial position gave good performance. Concerning DM prediction, the best outcomes were obtained using transmittance spectra mode, applying a CARS-SVM method (Wang et al. 2022). Based on these studies, it appears that there is still room for improvement in potato dry matter predictions using HSI.

Starch Content

Starch constitutes the major carbohydrate in potatoes. Its interactions with sugars and non-starch polysaccharides impact on the sensory quality and shelf life of potato products (Liu et al. 2009). Despite the relevance of starch in potato, this review uncovered a limited number of studies focusing on this subject. One such study evaluated several scanning methods for whole tuber assessment, with two of them based on HSI. The best HSI scanning method for predicting starch achieved a R2cal of 0.69 and a RMSEcal of 1.6% (Kjær et al. 2016). However, this study did not report cross-validation or external validation, and the sample size analysed was relatively small (n = 60). Another study conducted by Wang et al. (2021a) on two potato varieties (‘Zihuabi’ and ‘Atlantic’) demonstrated the effectiveness of HSI in the VIS/NIR range for mapping starch content distribution (g/kg). The MSC-CARS-PLSR (MSC: Multivariate Scattering Correction) model outperformed the full spectrum PLSR model. In a similar research involving other varieties (‘Kexin No.1’ and ‘Holland No.15’), the authors emphasized the influence of the sampling site on prediction model accuracy. Hyperspectral data were collected from three sampling sites: top, umbilicus and middle regions. The umbilicus region gave the best performance using CARS-SVR model (SVR: Support Vector Regression) (Wang et al. 2021b). Wang and Wang (2022) tried to enhance starch prediction models by fusion of spectral and textural data compared to single data. While low-level did not improve predictions, mid-level data fusion, using 10 important wavelengths selected by CARS, and 7 textural variables, achieved a RPD of 2.05.

Soluble Solids and Sugar Contents

Soluble solids in potatoes consist mainly of soluble sugars. One study obtained subpar results in predicting sucrose, glucose and fructose using HSI (Kjær et al. 2016). Similar outcomes were obtained by Rady et al. (2014) in soluble solids, glucose and sucrose predictions. They conducted experiments across two potato cultivars, employing three acquisition modes (interactance, transmittance and hyperspectral). Among these, the VIS/NIR spectral interactance obtained the best correlation of compounds. Nevertheless, Rady et al. (2015) managed to build a robust model for glucose prediction using PLSR achieving a R2val = 0.97 and a RPD = 3.7.

Data fusion from spectroscopic (interactance and reflectance) and HSI systems improved classification and regression models in contrast to a single technique for glucose and sucrose prediction. Surprisingly, unlike other studies mentioned above, the authors found that HSI exhibited superior outcomes over interactance or reflectance systems (Rady et al. 2021).

Minor Components

The prediction of minor components in potato tubers using HSI has received limited attention as indicated by the results of this review. Minor components of potatoes include phenolics, enzymes and minerals, which are found in low concentrations in the tubers (Fernández-Ahumada et al. 2006; Reeve et al. 1969). Kjær et al. (2016) tried to predict different amino acids on potatoes (asparagine, aspartate, glutamate, tyrosine, valine, glutamine and tryptophan). Except for asparagine results (R2cal = 0.7), the models for the remaining amino acids performed poorly. Likewise, Kjær et al. (2017) obtained unsatisfactory results when predicting the total glycoalkaloid concentrations (TGA) through a reflection based setup. In another study, a fluorescence HSI technique was employed to predict the solanine content of potatoes. The bud eye was identified as the best region of interest to predict solanine. The best model achieved a coefficient of determination of 0.9143 and a root mean square error of 0.0296 mg/100 g for the prediction set (Lu et al. 2019).

Diseases

Hyperspectral imaging seems to be a promising technique for potato diseases detection, whether in tubers, leaves or field trials. Its main application has been in classifying healthy and diseased tubers and/or leaves.

Root-knot nematodes (Meloidogyne spp.) are considered an aggressive group of plant-parasitic nematodes for potato production due to their impact on tuber quality and yield (Žibrat et al. 2021). Two studies successfully applied hyperspectral imaging to identify root-knot nematode infestations in potato tubers. Both Support Vector Machine (SVM) and Linear Discriminant Analysis (LDA) models achieved 100% success rates in classifying healthy and infected tubers (Lapajne et al. 2022; Žibrat et al. 2021).

Late blight caused by the fungus Phytophthora infestans was detected using an UAV (Unmanned Aerial Vehicle)-Based HSI data. An end-to-end deep learning model (CropdocNet) was developed to extract the spectral-spatial hierarchical structure of late blight disease. The average accuracies were 98.09% and 95.75% for the training and independent testing datasets, respectively (Shi et al. 2022). Song et al. (2020) developed a low-cost HSI camera with similar performance to a high-end pushbroom system. An accuracy of 88% was achieved in the classification of healthy and affected potato leaves. In a field study, models with healthy leaves and five progressive disease stages were trained under laboratory conditions. The model developed was then applied under real field conditions. The authors highlighted the difficulties of using laboratory data to train field disease detection models (Appeltans et al. 2022). In another study, using a deep learning model, asymptomatic late blight in biotrophic phase was successfully classified in potato leaves. Authors evidenced changes on potato leaves reflectance on the third day after infection. Using only 6 wavelengths, the potato leaves were classified at a stage where no symptoms were yet visible (Qi et al. 2023).

In a recent study, a novel structure, Atrous-CNN (Convolutional Neural Networks) was developed to classify different potato leaf diseases (Anthrax, leaf blight, early blight). This algorithm combined information from 1D-CNN, 2D-CNN, and 3D-CNN, resulting in increased accuracy and reduced hardware consumption (Gao et al. 2023).

For the detection of potato virus Y, a Fully Convolutional Network (FCN) was used to classify healthy and virus-infected plants under real conditions. The validation was carried out in different potato cultivars. Precision and recall exceeded 0.78 and 0.88, respectively, demonstrating the capability of this method to be implemented in field (Polder et al. 2019).

Zebra Chip disease was identified in a large sample size of potatoes (n = 3352). The spectral range included VIS and NIR (550–1700 nm). The PLSDA model achieved an accuracy of 92% without preprocessing and 89% after applying a waveband optimisation by variable importance in projection (VIP) (Garhwal et al. 2020). Although waveband selection reduced the accuracy, it is an important step in the application of models to industry; otherwise, the large amount of information provided by a full spectrum makes the practical use of this technique unfeasible at present.

Early blight disease caused by Alternaria solani was detected in a field study using a multispectral camera. While recall values (0.83) were sufficient for in-field application, precision (0.21) was considered too low for accurate disease detection (Van De Vijver et al. 2020).

Defects

Bruise Detection

Bruising represents the most common mechanical damage encountered along the postharvest chain and remains a primary cause of postharvest loss. Manual inspection has traditionally been relied upon for bruise detection, a time-consuming and error-prone process, particularly during the early stages (Du et al. 2020). HSI can provide the necessary information through representative wavelengths for bruise detection (Che et al. 2018).

López-Maestresalas et al. (2016) identified early bruises in potatoes within 5 h after bruising through a PLSDA model in the SWIR region. Another study achieved 100% accuracy in classifying potatoes with different levels of bruising using dimensionality reduction techniques (Ye et al. 2018). Ji et al. (2019b) proposed a technique combining hyperspectral imaging and discrete wavelet transform for bruise identification. Employing the Fisher Linear Discriminant and AdaBoost classifier (AdaBoost-FLD) they attained an accuracy of 99.82%. In another study, different potato defects including green skin, germination, dry rot, wormhole and damage were classified through hyperspectral imaging and Multi-Class Support Vector Machine (MSVM). A pixel-based defect classification achieved over 90% of accuracy (Ji et al. 2019a).

Sprouting Detection

Knowledge of the sprouting stage is essential for effective management of storage conditions to ensure postharvest quality. A study developed a method to identify by-sprouting and pre-sprouting eyes using HSI. While LSSVM classifier was inadequate for predicting sprouting stages, the Succesive Projection Algorithm—Sine Fit Algorithm—Fisher Discriminant Analysis (SPA-SFA-FDA classifier) obtained an overall classification accuracy of 97.6% for prediction sets (Gao et al. 2018). In a related study, similar results were obtained for both whole tubers and potato slices in classifying high or low sprouting activity. Merging data from different optical systems did not result in improved models for whole tubers, while a slight enhancement in classification accuracy was observed for sliced samples (Rady et al. 2020).

Primordial leaf count is a parameter that provide insight into a tuber capacity to produce sprouts. Rady et al. (2014) demonstrated higher predictions of primordial leaf count on potato slices (RPD = 2.92) compared to whole tubers (RPD = 1.14). Moreover, the potential of data fusion between VIS/NIR spectroscopy and hyperspectral imaging was investigated to improve the prediction of sprouting activity based on primordial leaf count. The results showed an increase in RPD values of 35.6% for ‘FL1879’ and 136.7% for ‘R. Norkotah’ cultivars in data fusion models compared to a single technique (Rady et al. 2019).

Colour

Artificial vision is emerging as a promising technique for pixel-level colour analysis (Wu and Sun 2013). In potatoes, it offers the possibility to monitor browning development (Moscetti et al. 2018) as well as to control the drying processes.

Amjad et al. (2018) demonstrated the feasibility of determining chromaticity in potato slices employing hyperspectral imaging. Optimal models were developed through a wavelength selection method (Monte Carlo Uninformative Variable Elimination). The models for CIELAB a* exhibited consistent performance, while those for CIELAB b* showed greater variability. In another study, favourable results were achieved in predicting the colour coordinates hue (h) and L*/b* during hot-air drying. The thickness of the potato slices influenced the drying kinetics, and consequently, the construction of the prediction model (Moscetti et al. 2018).

Xiao et al. (2020) predicted the colour parameters (L*, a*, b*, Browning index (BI), L*/b*) in fresh-cut potatoes. For all the parameters, the R2val and RPD values surpassed 0.84 and 2.1, respectively. The full spectra range was used to visualise the spatial distribution of colour parameters.

Chlorophyll

Chlorophyll content in leaves gives an indication of the plant’s nutritional status (Ali et al. 2012). Kjær et al. (2017) predicted the chlorophyll content on peeled potatoes. Different contents were obtained by subjecting the tubers of four cultivars to different light treatments (red, blue, red/blue, UV-a, UV-b or UV-c). The spectral response for chlorophyll prediction differed among potato cultivars. Despite the global model across all potato varieties achieved a R2val = 0.92, superior models were developed when built for each cultivar individually. This study highlights the challenge in obtaining robust models across different potato varieties. Therefore, further research is required to investigate the prediction of variables between different cultivars.

Two in-field studies were conducted to predict chlorophyll content in potato leaves using UAV technology. Models were developed for four growth stages (budding, tuber formation, tuber growth and starch accumulation). In general, good models were obtained for all stages (Li et al. 2020, 2021a). However, in their conclusions, Li et al. (2021a) only reported modelling accuracies while verification ones were considerably lower.

Other Characteristics

After analysing the main attributes of potatoes, this section delves into additional noteworthy usages.

The first application of hyperspectral imaging, as identified in this review, was conducted by Al-Mallahi et al. (2008). Their study aimed to distinguish between potato tubers and soil clods in wet or dry conditions. Correct classification rates of 90% under wet conditions and 94.7% under dry conditions were achieved. A similar study focused on detecting foreign materials in fresh-cut vegetables, including potatoes, was undertaken by Tunny et al. (2023). They reported that some hyperspectral imaging techniques (visible near infrared (VNIR), short-wave infrared imaging (SWIR) and fluorescence), exhibited differences in model’s performance. SWIR was found to be the most effective method with 99% accuracy.

Hyperspectral imaging has also been employed to determinate the volatility of tuber compositions (VTC) and tuber cooking degree (TCD) during low-temperature baking. The first derivate and mean centering algorithm—Three-layer back propagation artificial neural network (FMCIA-TBPANN) showed the strongest correlation within the models. The authors also generated maps to visualize the spatial distribution and gradation of VTC and TCD (Su and Sun 2016a). A similar application successfully discriminated between raw and cooked pixels in hyperspectral images of potatoes subjected to different cooking times. Authors remarked the importance of the sample size to develop models for industrial applications (Nguyen Do Trong et al. 2011).

Recently, the ability of hyperspectral imaging to detect Escherichia coli on the surface of potato slices was showcased. The Back-Propagation Neural Network (BP-NN) model obtained favourable results (R2val = 0.976, RMSEval = 0.065 log CFU g−1) compared to the PLS model (R2val = 0.891, RMSEval = 0.141 log CFU g−1) (Li et al. 2021b).

The influence of abiotic factors on potato has also been studied using HSI. Thus, Duarte-Carvajalino et al. (2021) focused on detecting water stress at different levels (none, light, moderate and severe) in potato crops. Results showed 100% accuracy at two phenological stages (tuber differentiation and maximum tuberization).

Logan et al. (2021) used HSI technique to discriminate between fresh and non-fresh potatoes. Statistical treatment involved stochastic gradient descent to train a Convultional Neural Network (CNN) model, obtaining 98% accuracy. Similarly, Bai et al. (2020) used HSI to classify fresh-cut potato slices treated with different concentrations of sulphur dioxide. Authors obtained 95% classification accuracy using the full spectrum. In contrast, models with selected wavelengths resulted in a lower but sufficient accuracy to develop a multispectral system.

In another study, the maximum quantum yield of primary photochemistry (Fv/Fm) in potato leaves was predicted. The model bior3.3-RF-PLS proved optimal results, given its ability to minimize redundant information and multicollinearity among variables present in hyperspectral imaging data (Zhao et al. 2021).

Other researchers have focused on the analysis of potato characteristics at the crop level using drones. Thus, Abdelbaki et al. (2021) estimated the leaf area index (LAI), fractional vegetation cover (fCover) and canopy chlorophyll content (CCC) during different growing seasons on potato crops. The random forest with the exposure time (RFexp) gave the most robust performance, effectively mitigating illumination variability and cloud shadows.

Liu et al. (2021) looked at nutrient performance including petiole nitrate, whole leaf and total nitrogen across four potato varieties and two growing seasons. They worked in both VIS and NIR regions (400–2350 nm). Authors concluded that to build models applicable to any potato crop, it was necessary to gather information at each growth stage. Similarly, total nitrogen concentration was predicted via whole leaf total N concentration and petiole nitrate–N concentration. Findings indicated that PLSR model performance decreased with an increase in spectral bandwidth. HSI outperformed all multispectral cameras. In addition, no significant differences were observed among the three brands of multispectral cameras. The results showed that spectral bands in the visible range (400–700 nm) correlated strongly with N concentration in potatoes (Zhou et al. 2022).

UAV hyperspectral images were employed to predict the above-ground biomass (AGB) as indicator of potato crop growth. The Gaussian Process Regression (GPR) method outperformed SVM and RF methods. Several variables such as canopy original spectra, first-derivative spectra, vegetation indices and crop height were evaluated for AGB prediction. Results indicated that combining spectral information and crop height produced superior models (Liu et al. 2022b). Similar studies have showed that AGB estimates improved during early growth stages but declined as crops progressed from the sprouting stage to starch accumulation. The best results were achieved using PLSR with R2val = 0.74, RMSEval = 125.48 kg/hm2 (Liu et al. 2022c) and R2val = 0.78, RMSEval = 131.91 kg/hm2 (Liu et al. 2022a).

In a recent study, ten potato varieties were discriminated based on their industrial suitability for cooking or frying as chips using hyperspectral imaging. Two classification approaches were employed: mean spectra and pixel-wise. Mean spectra classifications reported higher results compared to pixel-wise; although by applying a variable selection method (iPLS), the pixel-based PLSDA models improved (López-Maestresalas et al. 2022).

Strengths and Limitations

In this systematized review, a qualitative and quantitative assessment of all selected articles was conducted. A specific quality scale was developed to better evaluate the articles published in this field of science. The search strategy carried out in the two main databases (Web of Science and Scopus) in the field of food technology allowed us to cover a wide range of articles. For each article, data were collected and analysed thoroughly, a process independently carried out by three researchers to minimize the risk of bias.

Our findings revealed that linear methods outperformed non-linear methods in correlating potato compounds with spectral information. In the context of regression models, over two-thirds of the results of the articles obtained their best outcomes with PLSR (Fig. 3a). In contrast, classification models displayed a broader distribution, with PLSDA and SVM obtaining the most robust models in 50% of the analysed studies (Fig. 3b). The scientific evidence collected in this review suggests that in many cases, the complexity of non-linear methods is not necessary to attain robust modelling results. However, in recent reviews, authors reported non-linear methods as the most suitable for potato variable predictions (Gupta et al. 2023; Su and Xue 2021).

Fig. 3
figure 3

Proportion of machine and deep learning methods used in the potato field: a regression and b classification methods

Sixty variables were identified in the search strategy, highlighting the wide range of potato characteristics examined. Figure 4 shows the principal characteristics evaluated: major components (23%), minor components (9%), diseases (14%) and defects (12%). The remaining characteristics studied were as follows: colour (9%), chlorophyll (4%) and other characteristics (28%). It can be seen that major components have received the most attention, with moisture/water content being the most frequent variable, cited in seven studies (Fig. 4a). Conversely, the minority components have received less attention, with only one mention each (Fig. 4b). We believe that this could be attributed to the challenge of building robust prediction because of their low concentration in potatoes. Their minor impact on the final product quality also renders them less attractive for further research. Here, it is necessary to differentiate asparagine, due to its importance for the industry during the frying process. It would therefore be interesting to further explore this compound using hyperspectral imaging. Regarding diseases, the predominant variable was late blight, with over a third of the studies dedicated to it (Fig. 4c). Similar results obtained by Gupta et al (2023) emphasised late blight and early blight diseases as the most studied using machine learning techniques. As for defects, sprouting was the most frequent variable, with four mentions (Fig. 4d). These findings indicate that hyperspectral imaging remains a highly promising approach for predicting compounds, defects and diseases in potatoes.

Fig. 4
figure 4

Principal characteristics analysed in potatoes utilising HSI technique. a Major components; b minor components; c diseases; d defects

In this review, comparability between studies has been further hampered by the fact that there is a very small number of studies for a specific variable, which was a limiting factor for a meta-analysis of studies based on regression and classification. Furthermore, during the body of literature assessment, it was observed that some authors provided different statistical parameters while studying the same variable, which prevents the development of a robust meta-analysis.

Future Research

As mentioned in the previous section, predicting minor components remains a challenge as robust models have not yet been developed. It would be worthwhile to conduct further research on glycoalkaloids and asparagine, as they are directly related to the quality of fresh or processed tubers, respectively. In their review, Rady and Guyer (2015) emphasised the importance of further studying sugars. Predicting the sugar content quickly and reliably makes it possible to decide whether or not each tuber is suitable for the frying process. However, this remains an unresolved issue to date. Although there have been attempts to predict acrylamide using spectroscopic techniques (Adedipe et al. 2016; Aykas et al. 2022; Ayvaz and Rodriguez-Saona 2015; Pedreschi et al. 2010; Segtnan et al. 2006), we did not find any articles using HSI. Hyperspectral imaging has high potential for estimating acrylamide as it can provide new and relevant information of its spatial distribution, in French fries or chips. Related to diseases, drones equipped with multispectral cameras are the most suitable way for online monitoring of potato crops. However, obtaining robust models could be further improved.

Most of the studies focused on the VIS/NIR range up to 1000 nm, so further research in the NIR region spanning from 1000 to 2500 nm could lead to more robust models. In several articles, it was difficult to discern whether the model results corresponded to cross-validation or external validation. These aspects should be presented in a clearer way for a better comprehensive understanding of the findings.

Modelling for real-time application with hyperspectral imaging is currently unsuitable due to the large amount of multidimensional and often irrelevant information. The goal is to develop more practical methodologies that require fewer computational demands and execution steps, enabling real-time and rapid determinations. Finally, to ensure the industrial adoption of these techniques, it is crucial to improve the generality and robustness of the models developed either by increasing the sample sizes, incorporating various potato cultivars or conducting multiple campaigns.

Conclusions

This systematized review offers a synthesis of the latest advancements in the application of hyperspectral imaging in potato crops recorded in the Web of Science and Scopus databases. Fifty-two articles were identified and subjected to a quality assessment using a proper scale. Data were extracted from their best models, and the relevant results were discussed. The potential of hyperspectral imaging in potato crop analysis remains largely untapped. Further studies are required to predict chemical compounds, including glycoalkaloids, asparagine, sugars and starch for raw tubers and acrylamide for French fries and chips, due to its importance in potato industry. In contrast to other studies, we found that linear methods are more suitable for predicting compounds, diseases and defects than non-linear methods. The use of hyperspectral imaging systems for online monitoring in the potato industry poses a challenge due to the huge amount of information that needs to be processed. The evidence presented in this paper suggests that the real applications of HSI will be through multispectral systems. Finally, the meta-analysis was constrained by the limited number of articles that examined both the same variables and parameters. It is therefore recommended that future systematized reviews in this area could explore the feasibility of meta-analysis. This review provides an overview of the current state of the art and can serve as a valuable source of information for researchers working in this field.