Evaluation of Nutritional Values of Edible Algal Species Using a Shortwave Infrared Hyperspectral Imaging and Machine Learning Technique

In recent years, the growing demand for algae in Western countries is due to their richness in nutrients and bioactive compounds, and their use as ingredients for foods, cosmetics, nutraceuticals, fertilizers, biofuels,, etc. Evaluation of the qualitative characteristics of algae involves assessing their physicochemical and nutritional components to determine their suitability for specific end uses, but this assessment is generally performed using destructive, expensive, and time-consuming traditional chemical analyses, and requires sample preparation. The hyperspectral imaging (HSI) technique has been successfully applied in food quality assessment and control and has the potential to overcome the limitations of traditional biochemical methods. In this study, the nutritional profile (proteins, lipids, and fibers) of seventeen edible macro- and microalgae species widely grown throughout the world were investigated using traditional methods. Moreover, a shortwave infrared (SWIR) hyperspectral imaging device and artificial neural network (ANN) algorithms were used to develop multi-species models for proteins, lipids, and fibers. The predictive power of the models was characterized by different metrics, which showed very high predictive performances for all nutritional parameters (for example, R2 = 0.9952, 0.9767, 0.9828 for proteins, lipids, and fibers, respectively). Our results demonstrated the ability of SWIR hyperspectral imaging coupled with ANN algorithms in quantifying biomolecules in algal species in a fast and sustainable way.


Introduction
Algae are a broad group of photosynthetic organisms of different dimensions, shapes, and colors, and with different filament complexities, from simple to branched [1].They are widely spread across all of the world's biogeographic areas, having a robust ability to adapt to different environmental conditions (temperature, light, nutrient concentration, hydro dynamism, etc.) [2].Algae are classified as red (Rhodophyta), brown (Phaeophyceae), green (Chlorophyta), and blue-green (Cyanophyta) algae depending on the nature of their pigments.The first three phyla are often called seaweeds or macroalgae.They are macroscopic marine algae, whose length can reach tens of meters [3].Actually, there are around 7533 red, 2133 brown and 8191 green species in nature [4].Microalgae are microscopic organisms naturally found in marine environments or fresh water.Although it is estimated that there are many microalgal species, approximately 44,000 have currently been studied [5].Among these, only a small number, such as Auxenochlorella vulgaris and Limnospira platensis, are commercially relevant [5].
According to the FAO, in 2021 the production of algae was 36 million tonnes (wet weight), mainly from aquaculture and marine aquaculture [6].The top producers were China, Indonesia, the Republic of Korea, and the Philippines (with shares of 60%, 25%, Foods 2024, 13, 2277 3 of 17 (ANN), one of the most common machine learning techniques, have gained attention due to their ability to reliably and practically predict food quality traits [27].Compared to other linear algorithms, such as partial least square (PLS) regression, ANN can iteratively learn, identify, and model complex and often nonlinear relationships between the dependent and independent variables in the function of the provided patterns and without requiring prior knowledge of the relationships between variables [27,28].
In light of these considerations, our study aimed to evaluate the nutritional profile (proteins, lipids, and fibers) of seventeen macro-and microalgae species widely consumed around the world; to evaluate, for the first time, the predictive performance of a shortwave infrared (SWIR) hyperspectral imaging device (935-1720 nm), developing multi-species calibration models using ANN algorithms for three nutritional parameters (proteins, lipids, and fibers); to analyze prediction accuracy with different metrics; and to highlight the most predictive input spectral regions for each model.To the best of our knowledge, no studies have investigated the potential of shortwave infrared hyperspectral imaging devices to develop predictive models for the assessment of the nutritional parameters of algae.

Algae Material
Forty-one macro-and microalgae samples of different origins were considered in this study, as described in Table 1.The samples belonged to seventeen species widely consumed around the world, namely Limnospira platensis, Auxenochlorella pyrenidosa, Chlorella vulgaris, Chondrus crispus, Eisenia bicyclis, Himanthalia elongata, Laminaria digitata, Laminaria longissima, Laminaria ochroleuca, Palmaria palmata, Porphyra umbilicalis, Pyropia yezoensis, Sargassum fusiforme, Ulva lactuca, Ulva lactuca var.spiralis, Ulva pertusa, and Undaria pinnatifida.The samples were grown in different environments and were purchased from Italian markets over three years.The samples was ground by a Bühler MLI 203 sifter (Milan, Italy) and sieved to obtain a fine and homogeneous flour with particle sizes from 400 to 500 µm.Four aliquots were randomly extracted from each sample, on which the analyses were carried out.

Proximate Composition
Determination of the proximate composition of the algae was performed in triplicate and the data were expressed as g 100 g −1 on a dry weight basis (dw).The proteins and lipids were determined by the ICC standard methods 105/2 and 136, respectively [29].Protein content was estimated using the conversion factor 5.0 instead of 6.25.In fact, recent studies highlighted that the use of a conversion factor equal to 6.25 overestimated the algae protein content and they identified the value 5.0 as the most suitable nitrogen-toprotein conversion factor [30][31][32][33].Total dietary fiber content was measured according to Lee et al. [34].

Hyperspectral Imaging
Shortwave infrared (SWIR) hyperspectral images of the algae samples (5 g of material in a Petri dish) were acquired in reflectance mode using a SisuCHEMA Hyperspectral Chemical Imaging Analyser (SPECIM, Spectral Imaging LTD, Oulu, Finland) system, as well described by Amoriello et al. [24].The system (Figure 1) consists of a scanner table having a maximum scanning rate of 60 mm/s and a spatial resolution of 600 µm, with an integrated SPECIM diffusive line illumination unit, and a monochrome InGaAs image sensor detector (Specim FX17, Spectral Imaging Ltd., Oulu, Finland) with a spectral range of 935-1720 nm, a spectral resolution of 8 nm, and a spatial resolution of 640 pixels.The images were acquired and converted to spectral reflectance with Lumo-Scanner software (version 2022, Lumo-Scanner, Specim, Spectral Imaging Ltd., Oulu, Finland).The exposure time of the hyperspectral camera was set to 4.70 ms, the frame rate to 15.20 Hz, the positioning speed of the platform to 20.00 mm s −1 , and the scanning speed to 5.84 mm s −1 .The reflectance of the acquired hyperspectral images was calibrated using the white and dark reference images, according to the following equation: where R = the corrected reflectance, R raw = the original reflectance, R B = the black reference, and R W = the white reference.
Foods 2024, 13, 2277 where R = the corrected reflectance, Rraw = the original reflectance, RB = the black reference, and RW = the white reference.The HSI images were processed using Evince software (version 2.7.12,Prediktera AB, Umeå, Sweden).A principal component analysis (PCA) algorithm was used for the image segmentation and to remove the background, as described by Amoriello et al. [24].Then, the reflectance spectra were smoothed with a baseline correction and the application of the first-order Savitzky-Golay filter for noise reduction.The light scattering was minimized using standard normal variate (SNV) correction.The mean spectrum was calculated as the average of the spectra related to all of the pixels of each sample, considering the overall hyperspectral image.All mean spectra were transformed by first-derivative treatment with a central difference approach to highlight species differences.

Artificial Neural Networks
Three nutritional parameters (protein, lipid, and total fiber) were predicted using artificial neural network (ANN).The feed-forward architecture of the ANN, i.e., multilayer perceptron (MLP), combined with the Levenberg-Marquardt learning algorithm, was used to develop nonlinear models for the nutritional variables.The SWIR spectra of each algal sample represented the independent variables, and the dataset was randomly split into a training set (70% of the data), a testing set (15% of the data), and a validation set (15% of the data).The ANN architecture (Figure 2) was composed by three main layers: an input layer, which contains 164 spectral data, an output layer, i.e., the three nutritional parameters, and a hidden layer, as well described by Amoriello et al. [27].The HSI images were processed using Evince software (version 2.7.12,Prediktera AB, Umeå, Sweden).A principal component analysis (PCA) algorithm was used for the image segmentation and to remove the background, as described by Amoriello et al. [24].Then, the reflectance spectra were smoothed with a baseline correction and the application of the first-order Savitzky-Golay filter for noise reduction.The light scattering was minimized using standard normal variate (SNV) correction.The mean spectrum was calculated as the average of the spectra related to all of the pixels of each sample, considering the overall hyperspectral image.All mean spectra were transformed by first-derivative treatment with a central difference approach to highlight species differences.

Artificial Neural Networks
Three nutritional parameters (protein, lipid, and total fiber) were predicted using artificial neural network (ANN).The feed-forward architecture of the ANN, i.e., multilayer perceptron (MLP), combined with the Levenberg-Marquardt learning algorithm, was used to develop nonlinear models for the nutritional variables.The SWIR spectra of each algal sample represented the independent variables, and the dataset was randomly split into a training set (70% of the data), a testing set (15% of the data), and a validation set (15% of the data).The ANN architecture (Figure 2) was composed by three main layers: an input layer, which contains 164 spectral data, an output layer, i.e., the three nutritional parameters, and a hidden layer, as well described by Amoriello et al. [27].
Four activation functions (identity function, logistic function, hyperbolic tangent function, and exponential function) applied in the hidden or output layers and different topologies with different neurons in the hidden layer (from 1 to 25) were tested to evaluate the best topology for each model.The training process of the network was run 100,000 times with random initial values of weights and biases.The prediction performances were assessed using different metrics: the coefficient of correlation between observed and predicted values (r), the coefficient of determination (R 2 ), the mean absolute error (MAE), the root mean squared error (RMSE), and the relative standard error (RSE), as described by Amoriello et al. [27].The models and sensitivity analyses were developed using TIBCO ® Statistica statistical package software (version 13.5, TIBCO software Inc., Palo Alto, CA, USA).Four activation functions (identity function, logistic function, hyperbolic tangent function, and exponential function) applied in the hidden or output layers and different topologies with different neurons in the hidden layer (from 1 to 25) were tested to evaluate the best topology for each model.The training process of the network was run 100,000 times with random initial values of weights and biases.The prediction performances were assessed using different metrics: the coefficient of correlation between observed and predicted values (r), the coefficient of determination (R 2 ), the mean absolute error (MAE), the root mean squared error (RMSE), and the relative standard error (RSE), as described by Amoriello et al. [27].The models and sensitivity analyses were developed using TIBCO ® Statistica statistical package software (version 13.5, TIBCO software Inc., Palo Alto, CA, USA).

Statistical Analysis
Differences between all of the nutritional variables were determined using the Kruskal-Wallis non-parametric test and Dunn's post hoc test at a significance level of 5%, using PAST statistical software (version 4.17).

Exploratory Analysis
Table 2 summarizes the nutrient composition in terms of protein, lipid, and total fiber contents of different algae species.Protein content differed significantly between samples, showing the highest mean values for Limnospira platensis (62.1 ± 3.2 g 100 g −1 dw), formerly known as Spirulina, for Auxenochlorella pyrenidosa (59.0 ± 0.2 g 100 g −1 dw), and for Chlorella vulgaris (57.9 ± 0.2 g 100 g −1 dw).Conversely, the lowest values were recorded for brown algae, particularly by Laminaria ochroleuca (2.2 ± 0.2 g 100 g −1 dw, sample 23), in accordance with the results of Salido et al. [1] and Penalver et al. [13], which suggested a protein content near or below 15 g 100 g −1 dw for brown algae.The Food and Agriculture Organization of the United Nations (FAO) and the World Health Organization (WHO) recommend the consumption of Spirulina and Chlorella microalgae in the diet due to their high protein content, up to 70% protein per unit of dry weight [14].Furthermore, these algae are composed of essential amino acids, suitable for human nutrition [5].For these reasons, these microalgae are considered as a desirable and sustainable ingredient for protein supplements, to be consumed especially in vegetarian or vegan diets.However, seaweeds can also be considered a good source of protein due to their overall protein level and their amino acid composition [34].For example, in our study, red algae also showed a good

Statistical Analysis
Differences between all of the nutritional variables were determined using the Kruskal-Wallis non-parametric test and Dunn's post hoc test at a significance level of 5%, using PAST statistical software (version 4.17).

Exploratory Analysis
Table 2 summarizes the nutrient composition in terms of protein, lipid, and total fiber contents of different algae species.Protein content differed significantly between samples, showing the highest mean values for Limnospira platensis (62.1 ± 3.2 g 100 g −1 dw), formerly known as Spirulina, for Auxenochlorella pyrenidosa (59.0 ± 0.2 g 100 g −1 dw), and for Chlorella vulgaris (57.9 ± 0.2 g 100 g −1 dw).Conversely, the lowest values were recorded for brown algae, particularly by Laminaria ochroleuca (2.2 ± 0.2 g 100 g −1 dw, sample 23), in accordance with the results of Salido et al. [1] and Penalver et al. [13], which suggested a protein content near or below 15 g 100 g −1 dw for brown algae.The Food and Agriculture Organization of the United Nations (FAO) and the World Health Organization (WHO) recommend the consumption of Spirulina and Chlorella microalgae in the diet due to their high protein content, up to 70% protein per unit of dry weight [14].Furthermore, these algae are composed of essential amino acids, suitable for human nutrition [5].For these reasons, these microalgae are considered as a desirable and sustainable ingredient for protein supplements, to be consumed especially in vegetarian or vegan diets.However, seaweeds can also be considered a good source of protein due to their overall protein level and their amino acid composition [34].For example, in our study, red algae also showed a good average protein content, from the 15.4 ± 1.4 g 100 g −1 dw of Chondrus crispus to the 26.6 ± 3.4 g 100 g −1 dw of Porphyra umbilicalis, as previously reported by Salido et al. [1] and Sultana et al. [35].However, the differences in the chemical composition of algal species can be due to geographical location and environmental factors, such as seasonality, year, salinity, water temperature, and light irradiation, which could influence the nutrient supply, including the nitrogen availability [1,36].The total lipid content of the algal species is quite low, ranging between the 0.6 ± 0.1 g 100 g −1 dw of Pyropia yezoensis and the 10.1 ± 0.1 g 100 g −1 dw of Auxenochlorella pyrenidosa.Generally, seaweeds contain limited lipid quantities (around 1-5%), whereas microalgae exhibit higher values (10-12%) [4,14,37,38].Our study confirmed this lipid content.Slight discrepancies can be due to some physical factors, such as sunlight intensity, temperature, nutrient limitation, pH, and oxidative stress, which can affect lipid biosynthesis and composition, as described by Morales et al. [38] and Breuer et al. [39].Algal lipid fraction is mainly composed of neutral lipids, such as fatty acids (especially, omega-3 polyunsaturated fatty acids), triglycerides and sterols, and complex lipids, such as glycolipids and phospholipids [40][41][42].For this reason, algae are recognized for their health benefits and used in functional foods and nutraceuticals.Some edible algae are an important source of fiber, especially algae belonging to the Pheophyceae (brown algae) and Rhodophyceae (red algae) phylum.Eisenia bicyclis showed the highest fiber content (66.9 ± 0.3, 66.6 ± 0.3.63.2 ± 0.3 g 100 g −1 dw for samples 12, 13, and 11, respectively), followed by Sargassum fusiforme (61.4 ± 0.3 g 100 g −1 dw), whilst Limnospira platensis showed the lowest (2.5 ± 0.2 g 100 g −1 dw, Sample 9).Similar fiber ranges for algae were reported by other authors, except for the Limnospira platensis values [4,14].Algae are an excellent source of fibers, particularly soluble fibers (50-85% dw), such as alginates, fucoidans, carrageenans, and exopolysaccharides, contrary to the typical composition of fibers in terrestrial plants [13,43].Due to their high fiber content, algae can contribute to a more balanced diet, enhancing daily fiber intake.

Spectral Characteristics
The SWIR first-derivatives spectra (935-1720 nm) were depicted in Figure 3 and contained information on different functional groups of the algae samples.The mean SWIR raw spectra of each algae sample are represented in Supplementary Figure S1.In general, the SWIR regions were mainly characterized by second-overtone spectral regions, which could be associated with the aliphatic chain (C-H n ), hydroxyl group (O-H), and aminic group (N-H) characteristics of complex carbohydrates (cellulose, lignin, etc.), lipids, water, and proteins [44].
Functional groups, such as C-H, O-H, and N-H, were typical in the molecules of the biochemical substances of the algae.Qualitative and quantitative NIR analyses are often based on these [45].The reflectance spectra showed similar profiles characterized by six notable peaks and of different magnitudes among all of the algae samples.The broad spectral region between 1100 and 1300 nm could be mainly referred to the C-H and C-H 2 stretching vibration [46].The prominent peak at around 1400 nm was assigned to O-H and N-H stretching of the first and second overtone.The signals in the wavelength region from 1600 nm to 1700 nm were caused by C-H and C-H 2 vibrations [47].
In general, differences in the peaks' intensities in the SWIR spectra between the different samples, especially in the regions between 1350 and 1450 nm, could be mainly related to the compounds, such as proteins, lipids, carbohydrates, and water contents, typical of the various algal species and phyla.At the same time, the geographical origin, the growth conditions, and the nutritional input could have influenced the formation and content of the chemical components and could have caused significant variability in the samples, as shown by the spectra [33,48,49].Differences between letters (a-s) in the same column indicate significant differences (p < 0.05).
The total lipid content of the algal species is quite low, ranging between the 0.6 ± 0.1 g 100 g −1 dw of Pyropia yezoensis and the 10.1 ± 0.1 g 100 g −1 dw of Auxenochlorella pyrenidosa.Generally, seaweeds contain limited lipid quantities (around 1-5%), whereas microalgae exhibit higher values (10-12%) [4,14,37,38].Our study confirmed this lipid content.Slight discrepancies can be due to some physical factors, such as sunlight intensity, temperature, nutrient limitation, pH, and oxidative stress, which can affect lipid biosynthesis and composition, as described by Morales et al. [38] and Breuer et al. [39].Algal lipid fraction is mainly composed of neutral lipids, such as fatty acids (especially, omega-3 polyunsaturated fatty acids), triglycerides and sterols, and complex lipids, such as glycolipids and phospholipids [40][41][42].For this reason, algae are recognized for their health benefits and used in functional foods and nutraceuticals.

Spectral Characteristics
The SWIR first-derivatives spectra (935-1720 nm) were depicted in Figure 3 and contained information on different functional groups of the algae samples.The mean SWIR raw spectra of each algae sample are represented in Supplementary Figure S1.In general, the SWIR regions were mainly characterized by second-overtone spectral regions, which could be associated with the aliphatic chain (C-Hn), hydroxyl group (O-H), and aminic group (N-H) characteristics of complex carbohydrates (cellulose, lignin, etc.), lipids, water, and proteins [44].A-E).The species of the samples identified by the numbers are reported in Table 1.
Functional groups, such as C-H, O-H, and N-H, were typical in the molecules of the biochemical substances of the algae.Qualitative and quantitative NIR analyses are often based on these [45].The reflectance spectra showed similar profiles characterized by six notable peaks and of different magnitudes among all of the algae samples.The broad spectral region between 1100 and 1300 nm could be mainly referred to the C-H and C-H2 stretching vibration [46].The prominent peak at around 1400 nm was assigned to O-H and N-H stretching of the first and second overtone.The signals in the wavelength region from 1600 nm to 1700 nm were caused by C-H and C-H2 vibrations [47].
In general, differences in the peaks' intensities in the SWIR spectra between the different samples, especially in the regions between 1350 and 1450 nm, could be mainly related to the compounds, such as proteins, lipids, carbohydrates, and water contents, typical of the various algal species and phyla.At the same time, the geographical origin, the growth conditions, and the nutritional input could have influenced the formation and content of the chemical components and could have caused significant variability in the samples, as shown by the spectra [33,48,49].
Among the samples belonging to the Chlorophyta phylum (Figure 3A), Samples 2 and 4 (Chlorella vulgaris and Ulva lactuca) showed different spectral profiles in the regions between 1100 and 1200 nm, 1350 and 1450 nm, 1650 and 1700 nm, and prominent peaks at around 950 nm, 1200 nm, and 1500 nm.Conversely, Cyanophyta samples showed similar characteristic peaks, albeit with different signal intensities (Figure 3B).
The spectra of the Phaeophyceae samples (Figure 3C,D) were characterized by high variability in profile and intensity.The spectral signals for Samples 17 and 18 (Laminaria digitata from Northwest France and Laminaria digitata from the North Atlantic) were the lowest, whereas those of Samples 24, 25, and 30 (Sargassum fusiforme, Undaria pinnatifida from Japan, and Undaria pinnatifida from the Atlantic, respectively) the highest at between 900 and 1300 nm.Regarding to the Rhodophyta phylum samples, Sample 37 (Palmaria palmata from the Atlantic) showed a spectral profile very different from those of other red algae, especially for the regions between 1100 and 1200 nm, 1350 and 1450 nm, and 1650 and 1700 nm.Furthermore, prominent peaks at 950 nm, 1200 nm, and 1500 nm can be observed (Figure 3E).
These results demonstrate that the SWIR device can capture the intraspecies spectral differences, probably derived from the different growth conditions of the algae, and  A-E).The species of the samples identified by the numbers are reported in Table 1.
Among the samples belonging to the Chlorophyta phylum (Figure 3A), Samples 2 and 4 (Chlorella vulgaris and Ulva lactuca) showed different spectral profiles in the regions between 1100 and 1200 nm, 1350 and 1450 nm, 1650 and 1700 nm, and prominent peaks at around 950 nm, 1200 nm, and 1500 nm.Conversely, Cyanophyta samples showed similar characteristic peaks, albeit with different signal intensities (Figure 3B).
The spectra of the Phaeophyceae samples (Figure 3C,D) were characterized by high variability in profile and intensity.The spectral signals for Samples 17 and 18 (Laminaria digitata from Northwest France and Laminaria digitata from the North Atlantic) were the lowest, whereas those of Samples 24, 25, and 30 (Sargassum fusiforme, Undaria pinnatifida from Japan, and Undaria pinnatifida from the Atlantic, respectively) the highest at between 900 and 1300 nm.Regarding to the Rhodophyta phylum samples, Sample 37 (Palmaria palmata from the Atlantic) showed a spectral profile very different from those of other red algae, especially for the regions between 1100 and 1200 nm, 1350 and 1450 nm, and 1650 and 1700 nm.Furthermore, prominent peaks at 950 nm, 1200 nm, and 1500 nm can be observed (Figure 3E).
These results demonstrate that the SWIR device can capture the intraspecies spectral differences, probably derived from the different growth conditions of the algae, and interspecies differences, in relation to the quantity of macroconstituents indicated by the band intensity at well-defined wavelengths.

ANN Model Prediction
The ANN models were developed using first-derivative transformed spectral data.The ANN activation functions for the best topology generated for each output variable and the modeling performance in terms of the coefficient of correlation (r), the coefficient of determination (R 2 ), the mean absolute error (MAE), the root mean squared error (RMSE), and the relative standard error (RSE) for the training, test, and validation sets are shown in Table 3 and Figure 4, whilst the results from the sensitivity analysis for each ANN model are reported in Figure 5.The best five ANN architectures for each parameter are shown in Table S1 (Supplementary Materials).
Sensitivity analysis is one of the most widely used methods to rate the importance of the models' input variables.It is based on the partial derivatives method, and it consists of calculating the derivative of the output regarding each input variable of the neural network, evaluated on each data sample of a given dataset, as described by Pizarroso et al. [50].The contribution of each input is calculated in both magnitude and sign considering the connection weights, the activation functions, and the values of each input.Once the sensitivity has been calculated for each variable and observation, a global sensitivity can be defined considering the sum of the derivatives of the output of the k-th neuron in the output layer regarding the i-th input variable divided by the number of samples.If variable is important, the global sensitivity should be large (>>1).
All models showed optimal prediction accuracy.In detail, the best model for the protein content was obtained with 11 neurons in the hidden layer, and a hyperbolic tangent activation function for the hidden and output neurons.The very high values of r and R 2 (0.9976 and 0.9952, respectively) and the low values of RMSE, MAE, and RSE (1.2891, 0.2590, and 5.8128, respectively) for the test set showed an excellent prediction performance.Sensitivity analysis (Figure 5) showed bands with high intensity peaks in the spectral range at around 1700-1750 nm and at around 1175-1225 nm, followed by several bands between 950 and 1000 nm, 1125 and 1175 nm, 1250 and 1300 nm, and 1560 and 1600 nm, characterized by lower intensities.Indeed, the spectral regions close to 1730 nm and 1200 nm were characterized by S-H first-and second-overtone absorption, whereas in the other regions a N-H first and second overtone was observed.Niemi et al. [33] reported a similar spectral range contribution to the development of a FT-NIR prediction model of protein in North Atlantic seaweeds.Specifically, they found positive correlation between the major protein bands at around 1050-1350 nm, 1550-1600 nm, and 1700-1750 [33].
The best model for the lipid content, built with 24 neurons in the hidden layer, an exponential function for the hidden neurons, and a logistic activation function for the output neurons, presented an optimal predictive ability due to values of r and R 2 equal to 0.9823 and 0.9767, respectively, and due to high values of the other metrics for the test set (RMSE = 0.4096 and MAE = 0.0834).Although the RSE metric is quite high (RSE = 15.6521), the estimate may still be considered reliable because the value does not exceed the 30% threshold, as indicated by Amoriello et al. [27].Sensitivity analysis (Figure 3) revealed that the bands of lipids with high intensity peaks were centered at around 1195-1215 nm for the C-H 3 and C-H 2 second overtone of C-H stretch and at around 1290-1310 nm for the C-H 3 first overtone of C-H stretch.Absorptions at around 1680 nm were contributed by C-H stretch (-CH=CH-) and can be used to quantify unsaturated fatty acids [48,49].As previously reported by Liu et al. [51], the NIR spectra within the wavelength ranges of 1030-1500 and 1600-1880 nm correspond to the area where fatty acids show dominant absorbance.Sensitivity analysis is one of the most widely used methods to rate the importance the models' input variables.It is based on the partial derivatives method, and it consis of calculating the derivative of the output regarding each input variable of the neural n  A high predictive accuracy was also found for best model of fiber content, developed with 10 neurons in the hidden layer, an exponential function for the hidden neurons, and an identity function for the output neurons.In fact, metrics for the test set were 0.9914, 0.9828, 2.3032, 0.7968, and 7.5954 for r, R 2 , RMSE, MAE, and RSE, respectively.Sensitivity analyses highlighted the wavebands that exhibited the best predictive ability for fiber content (Figure 5).The spectra regions that contribution to the model development can be attributed to O-H bending and C-H stretching (characteristic of wave lengths between 1000 nm and 1100 nm), C-H stretching (characteristic of spectral bands between 1200 nm and 1350 nm), C-H 3 , C-H 2 , and C-H stretching (characteristic of wave bands between 1550 nm and 1720 nm), which mainly form carbohydrates [52,53].
The good performances of the ANN models for the three algal nutritional parameters highlighted the advantages of the use of the ANN technique to develop accurate prediction models.Ordinary statistical techniques, such as partial least square (PLS) regression, are not always able to precisely quantify the complex inter-and intra-relations between input and output variables [54].Conversely, ANN, inspired by the biological neural network comprising the human brain, has numerous advantages.First of all, it has the ability to solve complex nonlinear relationships between dependent and independent variables, learning iteratively the characteristics of algae via the extraction of features from a large database (i.e., the spectral data).No prior knowledge of the relationships between the process variables, no constraints on input variables, and no fixed relationships in the data are required [55].Then, the ANN can also be used for unstable, noisy, imprecise, and incomplete data [56].
The high predictive accuracy of the models can also be due to having considered many algal genotypes, thus obtaining multi-species models.In fact, according to Gholipoor and Nadali [54], the models could be more reliable the greater the number of genotypes, and therefore the greater the variability, if the genotypes are substantially different in terms of traits.
Finally, the sensitivity analysis made it possible to highlight the most important wavelengths of the three models.This result can be useful to build less expensive devices to use in screening or process control applications in the algal industry.

Conclusions
The nutritional profile of macro-and microalgae species can vary a great deal and their biochemical properties are strongly influenced by algal genotype, growth and environmental conditions, and nutrient availability.A fast, easy, and non-destructive assessment of the qualitative and quantitative characteristics of algae is increasingly requested by food industry.The overall results demonstrated that the use of an SWIR hyperspectral imaging device combined with machine learning techniques was able to successfully predict the algal composition.In fact, the multi-species models for proteins, lipids, and fibers, developed with the artificial neural network, showed excellent and robust predictive performances.A goal of this study was a reduction in species-specific influences using the SWIR spectra of seventeen algal species of different phylogenetic divisions.Therefore, the developed models can be applied to any species with a high confidence level.Moreover, the sensitivity analysis enabled the identification of the most informative spectral wavelengths and those that were redundant and irrelevant.Although our results are very promising, a further test of the goodness of the models needs to be conducted on unknown samples before its application on a routine basis in food industries.

Supplementary Materials:
The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/foods13142277/s1, Figure S1: Mean raw reflectance spectra between 935 and 1720 nm wavelength of each algae sample; Table S1: Neural network architectures and correlation coefficients for the developed ANN models.

Figure 2 .
Figure 2. Structure of multilayer perceptron artificial neural network.

Figure 3 .
Figure 3. Mean first-derivative reflectance spectra between wavelengths of 935 and 1720 nm for 41 algae samples.Samples was clustered in relation to the phylum in five subfigures (A-E).The species of the samples identified by the numbers are reported in Table1.

Figure 3 .
Figure 3. Mean first-derivative reflectance spectra between wavelengths of 935 and 1720 nm for 41 algae samples.Samples was clustered in relation to the phylum in five subfigures (A-E).The species of the samples identified by the numbers are reported in Table1.

Figure 4 .
Figure 4. Predicted vs. experimental values of the protein, lipid, and fiber contents using the optimal ANN topologies and first-derivative transformed SWIR spectra.The coefficients of determination (R 2 ) for the training, test, and validation sets are reported.

Figure 4 .
Figure 4. Predicted vs. experimental values of the protein, lipid, and fiber contents using the optim ANN topologies and first-derivative transformed SWIR spectra.The coefficients of determinati (R 2 ) for the training, test, and validation sets are reported.

Figure 5 .
Figure 5. Results of the sensitivity analysis for each ANN model.The most sensitive spectral regio are highlighted in red.

Figure 5 .
Figure 5. Results of the sensitivity analysis for each ANN model.The most sensitive spectral regions are highlighted in red.

Table 1 .
Samples of algae divided by classes, species, and origin.

Table 2 .
Nutrient composition (protein, lipid, and total fiber) of algal samples.
Differences between letters (a-t) in the same column indicate significant differences (p < 0.05).

Table 3 .
Neural network architectures; regression metrics for the highest training, test, and validation sets predictions; goodness of fit; and residual analysis for the developed ANN models.