Detection of protein, starch, oil, and moisture content of corn kernels using one-dimensional convolutional autoencoder and near-infrared spectroscopy

Background Analysis of the nutritional values and chemical composition of grain products plays an essential role in determining the quality of the products. Near-infrared spectroscopy has attracted the attention of researchers in recent years due to its advantages in the analysis process. However, preprocessing and regression models in near-infrared spectroscopy are usually determined by trial and error. Combining newly popular deep learning algorithms with near-infrared spectroscopy has brought a new perspective to this area. Methods This article presents a new method that combines a one-dimensional convolutional autoencoder with near-infrared spectroscopy to analyze the protein, moisture, oil, and starch content of corn kernels. First, a one-dimensional convolutional autoencoder model was created for three different spectra in the corn dataset. Thirty-two latent variables were obtained for each spectrum, which is a low-dimensional spectrum representation. Multiple linear regression models were built for each target using the latent variables of obtained autoencoder models. Results R2, RMSE, and RMSPE were used to show the performance of the proposed model. The created one-dimensional convolutional autoencoder model achieved a high reconstruction rate with a mean RMSPE value of 1.90% and 2.27% for calibration and prediction sets, respectively. This way, a spectrum with 700 features was converted to only 32 features. The created MLR models which use these features as input were compared to partial least squares regression and principal component regression combined with various preprocessing methods. Experimental results indicate that the proposed method has superior performance, especially in MP5 and MP6 datasets.


INTRODUCTION
Near-infrared spectroscopy (NIRS) has become a widely used method in recent years due to its fast and low-cost analysis capability and non-destructive feature (Roggo et al., 2007;Yi et al., 2017;Zhang et al., 2019). Although it is used in many different fields, the main and can extract high-level features using stacked network layers. Due to this feature, it has an increasing use in the field of spectroscopy for noise reduction, feature extraction, classification, and regression (Yang et al., 2022b). Acquarelli et al. (2017), Cui & Fearn (2018), Kim et al. (2023) and Malek, Melgani & Bazi (2018) are some example studies that used CNN for quantitative or qualitative analysis. Chemometrics has generalization problems when used on a new instrument. Calibration transfer between different instruments is a popular application of deep learning on NIRS to address this problem. Yang et al. (2022a) has developed a deep learning model with three stacked convolutional layers for calibration transfer. They used five instruments and two datasets (soybean and wheat) to validate model performance. The comparison with the conventional standardization method showed that critical features could be protected during calibration transfer between different instruments. While they obtained comparable RMSE values with the soybean dataset using CNN and PLSR (0.078 and 0.076), in the wheat dataset, CNN outperformed the PLSR method (0.053 and 0.130). Mishra & Passos (2021) conducted a similar study with the tablet dataset and olive dataset. They used two instruments: one for primary model development and the other for calibration transfer. Fine-tuning was performed on fully connected and convolutional layers of the model while protecting convolutional layers. The RMSE value of 3.258, which is lower than that obtained with instrument 1 (3.513), was obtained with instrument 2 using the calibration transfer method. Another deep learning model was developed by Yang et al. (2022b) to reduce the impact of interseasonal variations on spectral analysis. They used Cuiguan pear, Rohca pear and mango datasets to validate the calibration transfer model and obtained RMSE values for each dataset that were at least 9.2%, 17.5%, and 11.6% lower than conventional methods. These studies have presented promising results for device dependency, which is a critical problem in NIRS.
Autoencoders, a special type of deep learning, aim to obtain a valuable representation of the input data while providing an output precisely like the given input data. Because of this property, autoencoders are classified as unsupervised learning methods. Autoencoders are often used for feature extraction, noise removal, or outliner detection. Stacked autoencoder, sparse autoencoder, convolutional autoencoder, and variational autoencoder are commonly used types of autoencoders (An et al., 2022). Le (2020) has proposed a model combining a stacked sparse autoencoder with affine transform-extreme learning machine to detect the amylose content of rice and the moisture content of corn. He obtained the correlation coefficient value in the prediction set of 0.999 for the moisture parameter and 0.927 for the amylose parameter, meaning that this model showed better performance than the partial least squares regression and the extreme learning machine. Mu & Chen (2022) have proposed a variational autoencoder-based transfer model to deal with unlabeled spectrum problems in practical NIRS applications. They performed the test study using the dataset they created in the simulation environment. They achieved a high R 2 value of 0.9988 with their proposed autoencoder model, outperforming nine different methods compared. Another application of the autoencoder was performed by Said, Wahba & Khalil (2022) to analyze the fat content of cow milk and detect water adulteration. They obtained R 2 values between 0.914 and 0.966 for fat content prediction and 0.411-0.910 for water adulteration detection.
This study combines a one-dimensional convolutional autoencoder (1D-CAE) with NIRS to provide a convenient method for qualitative analysis. First, unlike the other studies, feature extraction was performed using a 1D-CAE model from the spectral data of corn kernels. Then, the obtained features were utilized in MLR modeling to determine the moisture, oil, protein, and starch parameters of corn kernels. The proposed method was tested on three different spectra of corn kernels obtained from three devices. The proposed method was compared with conventional chemometric methods and the literature.

Dataset description
In this study, the corn dataset, which is commonly used in the literature, was used to test the validity of the proposed method. The corn dataset contains the spectra of 80 corn kernels measured on three different devices. These devices are called M5 (FOSS NIRSystems 5000), MP5 (FOSS NIRSystems 5000), and MP6 (FOSS NIRSystems 6000). The wavelength range covered in the dataset is 1,100-2,498 nm at 2 nm intervals. In addition, the reference values for moisture, oil, protein, and starch targets of each corn kernel are also included in the dataset. The mean spectra for each device in the corn dataset are given in Fig. 1. The corn dataset can be accessed from (https://eigenvector.com/resources/data-sets/).

Autoencoders
Autoencoders are generative and unsupervised neural network algorithms. In this learning algorithm, the main goal is to get output values equal to input values. An autoencoder framework consists of two main blocks: encoder and decoder. The encoder block compresses input data into a low-dimensional representation called latent variables, which contains valuable input data information. The decoder block takes these latent variables as input and attempts to obtain the original data. An autoencoder framework also includes a waypoint, called bottleneck, between the encoder and the decoder (Zhang, Liu & Jin, 2020a). A simple diagram of an autoencoder is shown in Fig. 2. For a given input data x i , i = 1,2,...,N , latent variables, h i , can be obtained as: where w i denotes coefficients, b denotes biases and ψ (x) denotes the activation function of the encoder layer. After the encoding process, the decoding process starts with obtained h i using (2): here,w i andb denote coefficients and biases of the decoder layer. In an ideal autoencoder, x i and y i are expected to be equal. During the training phase, the created network model tries to minimize the loss function, J (θ ), by searching for optimal values for the weight and bias parameters.

1D convolutional autoencoders
Convolutional neural network (CNN) is a special form of neural networks that uses convolution operation in layers. Previous works show that convolutional layers are more successful than fully connected layers in retrieving high-level features (Kiranyaz et al., 2021). Because of this, CNN has taken the field of artificial intelligence to an advanced level by adding a different depth. Similarly, a convolution operation can be applied to layers of the autoencoder network. Thus, a convolutional autoencoder can extract high-level features that can be used for classification or regression. CNNs are mainly used for high-dimensional data such as images; however, they can also be applied to low-dimensional data such as signals or time series. The mathematical convolution operation of two discrete signals in one dimension can be defined as follows: Where x (i), y (i) and h(i) are input, output, and filter vectors, respectively. For a given vector, x, and 1D filter, w, whose length is m, the convolution formula can be reorganized as: In the forward propagation step of the 1D CNN network, we can generalize formula (5) for each neuron in each CNN layer. Here, b l k is the bias of the k th neuron at layer l, x l−1 k is the output of layer l − 1, w l−1 k is filter coefficients of the k th neuron at layer l − 1, ψ(x) is the activation function of the current layer. An activation function is used to ensure the nonlinearity of the system. The selection of the activation function is one of the essential phases in creating a network model. Sigmoid, hyperbolic tangent (tanh), and rectified linear activation function (ReLU) are the most popular activation functions. Of these, the hyperbolic tangent is preferred when the input and output are constrained to values between −1 and 1 (Patil & Kumar, 2021). The formula of the hyperbolic tangent activation function is given in Eq. (7). A convolution layer is generally followed by a pooling layer. The pooling layer reduces features without changing the number of channels. Pooling layers do not contain any parameters, so no learning occurs in this layer. Max pooling and average pooling are the most popular pooling algorithms. An example of the 1D-Max pooling used in this study is given in Fig. 3.
A loss function needs to be utilized to evaluate the learning progress of the network. Mean squared error is the most preferred loss function for regression tasks, and its formula is given in Eq. (8).
Here,ŷ i is the predicted output, and y i is the target output of the network, which is also equal to the input, x i . For backward propagation, gradients must be calculated and propagated from the output layer to the input layer using the chain rule.
Various optimization algorithms in the literature have been proposed to update weights and biases, such as Stochastic Gradient Descent (SGD), RMSProp, Adam, and Adadelta. Among these, the Adam optimizer was used in our study. The Adam optimizer is based on adaptive moment estimation and combines Momentum and RMSProp (Kingma & Ba, 2014). To apply the Adam optimizer, firstly, the moving averages of the gradients and the moving averages of the square gradients, m t and v t , needs to be calculated using formulas Eqs. (13) and (14).
Where, β 1 and β 2 are decay rate parameters. Using Eqs. (13) and (14), we can calculate bias corrected m t and v t . .
Using Eq. (17), we can update weights and biases.
Here, η is named as the learning rate, another critical hyperparameter affecting the learning speed of the network.

Proposed model
In Fig. 4, the proposed 1D-CAE model is shown. This model consists of two convolutional layers, two pooling layers, and two dense (fully connected) layers in the encoder sub-model and three convolutional layers, two upsampling layers, and one dense layer in the decoder sub-model. The main reason for choosing two stacked convolutional layers in our model is that some previous works show that two or three layers are sufficient for CNN-based NIRS applications (Zhang et al., 2020b). Using random search, the optimal number of filters for each convolutional layer in the encoder and decoder model was determined and given in Input Layer Pooling Upsampling-2 Upsampling-1 DeConv-1 Reshape Dense-2 LVs Dense-1 Flatten  Tables 1 and 2. The hyperbolic tangent function was chosen to provide nonlinearity. The filter weights were randomly initialized. Training of the autoencoder model was done using randomly chosen samples. The reference values in the dataset were not used in this process which is unsupervised learning. The backpropagation algorithm was used to update the convolution filter weights. Although two, four, eight, 16, and 32 neurons were tried as the number of latent variables, no remarkable success was achieved in models containing fewer than 32 neurons, forcing us to select 32 neurons in our model. The optimization of unsupervised learning was done using the ADAM optimizer (Kingma & Ba, 2014). Selected values for learning rate, β 1 and β 2 were 0.001, 0.9, and 0.999, respectively. After the unsupervised learning training process, 32 latent variables for each corn kernel were exported for further analysis.

Encoder Decoder
In the second part of the proposed model, multiple linear regression was employed to establish linear relations between latent variables and reference outputs. For each reference output, moisture, oil, protein, and starch, different MLR models were developed using the  same latent variables. In addition, all the processes mentioned above were performed for three devices in the corn dataset to confirm that our results are device independent.

Dataset processing
In order to create a reliable model and to make accurate comparisons with known methods, the spectral data in the dataset were divided into two sets: one for calibration and one for prediction by the random division method. This way, the same samples from different devices were used for calibration and prediction. Of 80 samples, 60 were labeled as the calibration set, and the remaining 20 were labeled as the prediction set. While the calibration set was used to train the model, the prediction set was used to evaluate the model's performance. The reason for splitting the dataset into two subsets is that for small datasets, the additional splitting can result in a smaller training set which can be subject to overfitting (Ashtiani et al., 2021;Féré et al., 2020). Statistics of the split dataset are given in Table 3.

Hyperparameters
In machine learning, tuning the hyperparameters of a model is an essential step that determines the performance of the model. In this work, the optimization of hyperparameters was carried out using random search with a lookup table. This table includes kernel size, the number of latent variables, and batch size. The lookup table is given in Table S1. During training, the maximum number of epochs was set to 20. Validation loss was tracked along the training of the model. When validation loss increased for two consecutive epochs, the training of the network was stopped to avoid overfitting.

Test environment
In this article, autoencoder and regression models were implemented in Python (version 3.7.13) using Keras (version 2.9.0), which is a high-level neural networks library (Chollet, 2022) and scikit-learn (version 1.0.2) which provides regression models and model evaluation tools (Lemaitre, 2021). All training and testing processes were performed using a computer with Intel i7 10870H CPU, 16GB Ram, and Nvidia RTX 2070 GPU.

Performance evaluation criteria
The coefficient of determination (R 2 ), root mean squared error (RMSE), and root mean squared percentage error (RMSPE) indicators were used to test the evaluation of our proposed model. R 2 and RMSE indicators were used for overall model evaluation, while RMSPE was used to determine the performance of the autoencoder model in reconstructing the input spectrum. The formulas of R 2 , RMSE, and RMSPE are given in Eqs. (18), (19) and (20) (Ashtiani, Salarikia & Golzarian, 2017;Chen et al., 2008;Miles, 2005).
Here N is the sample size, y i ,ŷ i and y i are the actual output, the predicted output, and the mean value of actual outputs, respectively. As one can understand from Eq. (18), R 2 indicator is the proportion of the dependent variable variation explained by the independent variables and takes values between 0 and 1. RMSE, another indicator often used in regression tasks, is equal to the standard deviation of the residuals. Similarly, RMSPE gives the ratio of the error to the input spectrum in percent. Values closer to 0 are preferable for RMSE and RMSPE. Although most studies include the ratio of performance to deviation (RPD) metric to show model quality, some articles argue that RPD is not different from R 2 (Minasny & McBratney, 2013). For this reason, we did not find it necessary to include both metrics.
The 1D-CAE model was created separately for M5, MP5, and MP6 datasets in the first experiment. In this stage, the main objective was to obtain a reliable model that reconstructs the spectrum like the input spectrum and to obtain meaningful latent variables. The RMSPE indicator was utilized to show the reconstruction performance of the model, and the obtained results are given in Table 4. Besides, sample input and reconstructed spectra for each dataset are shown in Fig. 5. The 1D-CAE model successfully reconstructed the spectrum and obtained a mean RMSPE value of 1.90% for calibration and 2.27% for prediction.
The most common regression methods used in NIR systems, PLSR and PCR, were used as comparison methods. Another popular regression model, MLR, was not used because the sample number is lower than the feature number, which is necessary for MLR models. The latent variable and principal component parameters of the PLSR and PCR methods were selected as the optimal value between 1 and 10. RMSE and 5-fold cross-validation were used to find the optimal value for the latent variable and principal component parameters. Together with the original spectrum, four different preprocessing methods were applied to spectral data to increase the accuracy of these methods. Besides, the proposed method, 1D-CAE+MLR, was also applied to spectrum data. R 2 and RMSE values were calculated separately for each combination. The block diagram summarizing the whole process is given in Fig. 6.
The M5 dataset is the first used for the test, and the obtained results are given in Table 5. A careful examination of these results indicates that the proposed method exhibits satisfactory performance compared to other methods, as evidenced by the minimum R 2 values of 0.9560 and 0.9012 in the calibration and prediction sets, respectively. Furthermore, the proposed method demonstrated superior performance when predicting oil and starch content, as it yielded higher R 2 values and lower RMSE values. However, when analyzing the prediction of moisture and protein content, it was found that the PLSR method yielded higher R 2 values and lower RMSE values. The MP5 dataset was employed in another experiment, and the results are presented in Table 6. The proposed method demonstrates superior performance, as evidenced by the higher R 2 values for all targets in the calibration set. Similar to the M5 dataset, the proposed method outperforms conventional methods when predicting oil and starch content, as it yields higher R 2 values and lower RMSE values. However, when assessing the prediction of moisture and protein content, it is observed that the PLSR method yields higher R 2 values and lower RMSE values. A notable difference is observed, particularly in the oil parameter, with an increase of 20.9% in the R 2 metric.
The MP6 dataset was utilized in the final experiment, and the test procedure was applied in the same manner as in previous experiments. The obtained results are presented in  Table 7. Although the R 2 values were lower than those obtained in the M5 and MP5 datasets, the proposed method showed improved performance on all targets in the MP6 dataset, as evidenced by the higher R 2 values and lower RMSE values. Additionally, when analyzing the prediction of the oil and starch parameters, it was found that conventional methods were unable to establish a viable model, as the R 2 value was below 0.7.

DISCUSSION
As mentioned before, preprocessing is an inevitable stage of NIRS modeling techniques. According to Tables 5, 6 and 7, four different preprocessing methods have yielded higher scores on different metrics, confirming this hypothesis. However, the inherent trial and error have led researchers to look for new preprocessing methods. Although some innovative methods have been proposed, they have not been widely used (Helin et al., 2022;Xu et al., 2022). But still, DL-based approaches give promising results. 1D-CAE and MLR combination offers a new approach to this problem. The R 2 values obtained with the proposed method in each target parameter and dataset were calculated as a percentage and illustrated in Fig. 7. The RMSE metric was not considered in the evaluation since it provided results consistent with the R 2 metric.
Upon a comprehensive evaluation of the results in Tables 5, 6 and 7, it was observed that the proposed method yielded a 3.52% increase in the mean R 2 metric in the calibration set, compared to the highest R 2 value obtained with conventional method combinations in the moisture parameter. However, a slightly lower value of 0.14% was obtained in the prediction set. The R 2 values were calculated for the combinations generated using the PLSR method, and the mean R 2 value was determined and compared with the mean R 2 value obtained with the proposed method. Results indicated that the proposed method yielded a 10.9% and 12.16% improvement in R 2 values for the calibration and prediction sets, respectively, compared to the PLSR method combinations. The comparison was also made with the PCR method for the prediction of moisture content. Results indicated that  the proposed method yielded an improvement of 14.34% and 20.99% in R 2 values for the calibration and prediction sets, respectively when compared to PCR combinations. The evaluation was also performed for the oil parameter by utilizing the results from three datasets. The proposed method was compared to the conventional method combinations with the highest R 2 value, the mean R 2 value of PLSR combinations, and the mean R 2 value of PCR combinations. The results showed that the proposed method yielded an improvement of 5.63% and 19.43% in R 2 values for the calibration and prediction sets, respectively when compared to the conventional method with the highest R 2 value. Additionally, the proposed method showed a 9.70% and 25.57% improvement in R 2 values for the calibration and prediction sets, respectively, when compared to the mean R 2 values of the PLSR combinations and 22.22% and 49.13% improvement in R 2 values for the calibration and prediction sets respectively when compared to the mean R 2 values of the PCR combinations.
In predicting the third target, protein content, the proposed method yielded a 1.43% improvement in the calibration set compared to the conventional method combination with the highest R 2 value. Conversely, a 2.37% decline was noted in the prediction set. Similarly, compared to the mean R 2 value of the PLSR combinations, the proposed method demonstrated a 2.03% enhancement in the calibration set and a 0.95% decline in the prediction set. The proposed method revealed an 8.33% and 12.25% increase compared to the mean R 2 value of the PCR combinations in the calibration and prediction set, respectively.
The proposed method for determining the starch content of corn samples was found to be highly efficacious, as evidenced by its significant improvement in performance when compared to PLSR and PCR combinations. Specifically, the proposed method exhibited an improvement of 1.98%, 7.40%, 3.20%, 12.18%, 16.78%, and 63.16% to the conventional method combination with the highest R 2 value, the mean R 2 value of PLSR combinations, and the mean R 2 value of PCR combinations, respectively.  For the overall assessment, the proposed method yielded higher R 2 values, especially when predicting the oil and starch parameters for each dataset. The reference and predicted output for each target and spectrum are given in Table S2. These data are visualized in Fig. 8. Table 8 presents a compilation and comparison of studies in the literature that utilize the corn dataset with the proposed method. Bian et al. (2016) employed four different PLSR-based methods for estimating protein parameters utilizing the MP6 dataset. These methods were found to enhance the performance of the traditional PLSR method. Upon comparing the four methods utilized in the study, it was observed that the proposed 1D-CAE+MLR method demonstrated superior results with higher R 2 and lower RMSE values. Yuanyuan & Zhibin (2018) have proposed four different models based on neural networks and deep learning to estimate four parameters of the corn dataset. The dataset used in this study was not specified. Upon examination of the graph provided in the study, it is inferred that the MP5 dataset was used, and the comparison was made accordingly. It was observed that the proposed method gave a higher R 2 value in the protein and starch parameters compared to the methods used in this study, while it gave a lower R 2 value in the moisture and oil parameters. Fatemi, Singh & Kamruzzaman (2022) developed wavelength selection-based models to predict four parameters of corn seeds using the M5 dataset. They identified a specific wavelength range for each parameter. When we compare our result with this study, our method gives a higher R 2 value for the oil parameter, but wavelength selection-based models give a higher R 2 value for the other three parameters. According to these studies, it can be seen that competitive results are obtained with the proposed method.
To confirm the statistical validity of the results obtained with the proposed method, a t -test was performed. Based on the results of the t -test, it was determined that all of the results obtained with the proposed method fell within the 99.9% confidence interval (p-value < 0.001).  Another highlight of this study is that although there are no significant changes in the obtained spectra due to the measurement of the same sample with different instruments, the success of conventional chemometric methods decreases significantly. This shows that the success of chemometric methods is spectrum dependent, as is the case with preprocessing methods.
Deep learning models require a larger quantity of samples for training compared to traditional neural networks to construct an accurate model. Failure to do so results in underfitting, where the model is unable to capture the underlying pattern of the data. As the generation of NIR datasets and their corresponding reference values is a laborious process, such datasets often have a limited number of samples, as in the corn dataset. This situation represents the limitations of the proposed method as well as other DL models.

CONCLUSIONS
A one-dimensional convolutional autoencoder-based NIR modeling technique is proposed to assess the quality parameters of corn kernels. With 1D-CAE, the need for preprocessing the spectrum, which is the common point of chemometric methods, is eliminated. The proposed method was tested on three different spectra obtained from different devices in the corn dataset, showing that our results are device independent. The results indicate that our method has superior performance over common preprocessing and chemometric method combinations according to R 2 and RMSE metrics, especially in oil and starch parameters. Our method provides a reliable model that ensures fast and precise analysis in near-infrared spectroscopy. Future investigations should focus on applying the proposed method to calibration transfer.

Notes.
a The results obtained in this study were reported according to the R (correlation coefficient) metric, and these values have been converted to the R 2 metric to ensure compliance. b In the study, the used dataset was not specified, and this table was prepared considering that the used dataset was MP5, according to the graph given in the study.