Block-based compressive sensing in deep learning using AlexNet for vegetable classification

Vegetables can be distinguished according to differences in color, shape, and texture. The deep learning convolutional neural network (CNN) method is a technique that can be used to classify types of vegetables for various applications in agriculture. This study proposes a vegetable classification technique that uses the CNN AlexNet model and applies compressive sensing (CS) to reduce computing time and save storage space. In CS, discrete cosine transform (DCT) is applied for the sparsing process, Gaussian distribution for sampling, and orthogonal matching pursuit (OMP) for reconstruction. Simulation results on 600 images for four types of vegetables showed a maximum test accuracy of 98% for the AlexNet method, while the combined block-based CS using the AlexNet method produced a maximum accuracy of 96.66% with a compression ratio of 2×. Our results indicated that AlexNet CNN architecture and block-based CS in AlexNet can classify vegetable images better than previous methods.


INTRODUCTION
Indonesia is one of the world's top producers of horticultural crop commodities, especially vegetables.Vegetables produced by Indonesian farmers are mostly used to meet daily household needs, but some are industrially processed for various processed products that are even exported abroad.Vegetables are foodstuffs that contain several vitamins and minerals and provide many benefits for the health of the human body.In addition, growing vegetables gives farmers a high chance of increasing their profits in Indonesia.Therefore, there is a need for a digital approach to classify vegetables more easily.
Technology is very influential in everyday life because individuals can carry out activities more effectively and efficiently (Zeng, 2017).Previous research (Chauhan, Ghanshala & Joshi, 2018) used CNN to detect and recognize objects in images by imitating the image recognition system in the human visual cortex.Another study (Hameed, Chai & Rassau, 2020) utilized CNN for object categorization with promising results.CNN (Sudharshan Duth & Jayasimha, 2020).Another study developed by Litvak, Divekar & Rabaev (2022) provided the Urban Planter dataset for classifying plant species consisting of 1,500 images with 15 types of houseplants.This study also tested DCNN different configurations and pre-trained models and web applications to evaluate the architectural model.Deep learning-based real-time vegetable recognition system has been used for agricultural robots.Comparative experiments showed that the recognition system could effectively identify seven different types of vegetables (the average AP can reach 87.89%, and the detection speed is up to 38FPS) (Zheng et al., 2019).DCNN-based vegetable image classification was tested by Ahmed, Mahmud Mamun & Zaman Asif (2021).Comparison tests of several CNN architectures such as VGG16, MobileNet, InceptionV3, and ResNet were carried out using transfer learning scenarios to get the optimum architecture built upon the accuracy and effectiveness of the new image data set.Fruit and vegetable classification using an android application with a deep learning method is easy to access.In Agrawal et al. (2021), the research was split into two parts: the first part explored the model through its training and development, and the second part was developed to obtain a simple model through trimming techniques so that the process was lighter but still maintained accuracy.Previous research also used AlexNet architecture in classifying the types of vegetables, which consisted of five classes of vegetables.The result of the accuracy test in this study was 92.1% (Zhu et al., 2018).The same study was also conducted using three classes of vegetables and the You Only Look Once algorithm and succeeded in predicting with an accuracy of 61.6% (Sachin et al., 2019).
The need for storage space and image data transmission is often an obstacle in the classification process because storage requires sufficient space.In contrast, the image data transmission process requires considerable time and bandwidth.The application of CS on image data can minimize the delivery time of image data and requires less storage space.The research combining CS procedure and deep learning is rare.The novelty of our method is the CS contribution to the system with the deep learning procedure.In this study, the development of research was carried out by applying CS to the CNN, thus the main contributions of this article consist of (1) efficient storage usage, (2) delivery time reduction, (3) The accuracy of testing is tremendous although the data quality is reduced and (4) data bandwidth is reduced before training and testing process obtaining an efficient data transmission.

MATERIALS AND METHODS
The dataset used in this study is a secondary dataset obtained from Kaggle in JPG format as shown in Fig. 1 (Ahmed, Mahmud Mamun & Zaman Asif, 2021).The data consisted of four classes (image data of broccoli, carrot, potato, and radish), and each class was made up of RBG images.The dataset varied and was taken from different perspectives so that the CNN algorithm could recognize objects and learn more about the object under study.
Figure 2 shows the block diagram of the proposed method.The input was an RGB vegetable image during the preprocess stage.The original image was resized to 256 × 256 pixels.The next step in the process was block splitting, which made the image into blocks with a size of 8 × 8.In the CS stage, the sparse features of signals in a transform domain were used to perform non-uniform sampling of signals at a sample rate significantly lower than Nyquist's sampling frequency, allowing the sampling and compression processes to be merged (Ouyang, Hu & Hou, 2019).The sparsing process was carried out on the 8 × 8 block.The next step was to convert the two-dimensional or 8 × 8 signal into a 64 × 1 column vector.
CS consists of two stages: compression and reconstruction.For compression, the signal is acquired with a Gaussian measurement matrix (A) measuring M × 64 so that a matrix of M × 1 size is obtained.While the reconstruction process aims to restore the original signal, OMP is one of the greedy and iterative reconstruction algorithms that performs calculations repeatedly to get optimal results (Usman, 2017).The next stage is transforming the signal from the frequency domain to the spatial domain using IDCT to get the original signal (Irawati et al., 2022).
CNN is a neural network commonly used in image data, and consists of neurons with weight, bias, and activation functions (Doukhan & Carrive, 2017).It is an extension of the multilayer perceptron (MLP).The CNN architecture consists of two main parts, feature learning and classification.In the first stage of feature learning, the convolution stage uses a kernel of a certain size.The number of kernels used depends on the number of features generated from the calculation.After the activation function using a rectifier linear unit (ReLU) and the activation function process is exited, the pooling process is carried out.This process is repeated several times until it is obtained.This feature map is sufficient to continue into the classification section.
In the LSVRC-2010 ImageNet contest, AlexNet was proposed to conduct classification on 1.2 million high-resolution photos.Non-saturating neurons and a GPU implementation of convolution operations made training go faster (Sustika et al., 2018).The suggested neural network system came in first place among participants at the ILSVRC2012.A test error rate of 13.3% was achieved by the networks.AlexNet has been utilized for a variety of tasks since its initial publication, including object identification (Drayer & Brox, 2014), picture segmentation (Long, Shelhamer & Darrell, 2015), and video categorization (Ng et al., 2015).
The block-based CS algorithm in this study is described in two steps: the training and testing process.The training algorithm is described in detail according to the following steps: 1. Read an RGB image and resize to a certain resolution N Â N ð Þobtaining x n; m ð Þ.While the steps for the testing phase are as follows:

Generate random normal distributed
1. Read an RGB image and resize to the certain resolution 2. Generate random normal distributed A M Â L ð Þmatrix for CS acquisition where L ¼ B 2 ; B is the size of segmented image and M ( L.

Apply segmentation of the image into B Â B pixels obtaining x i B Â B
ð Þwhere i is a block number of the block-based image.

Transform x i into DCT domain using 2-D DCT to obtain
5. Reshape 2-D matrix X i into 1-D vector obtaining v i with size B 2 Â 1. 10. Repeat steps 4-9 until all blocks are processed.

Apply CS acquisition
11. Combine all blocks from step 10 to obtain the reconstructed image x n; m ð Þ with size N Â N.
12. Load the trained network and apply testing to the reconstructed image using AlexNet technique and the trained network.Save the testing result.
13. Repeat steps 1-12 for the next testing of the reconstructed image.Calculate the deep learning performance of all testing images according to accuracy, precision, recall and F1-score.
The training and testing process is also illustrated in Fig. 3.
The confusion matrix is one of the parameters used to evaluate the classification method (Sokolova & Lapalme, 2009).At the testing stage, the stored training data will be processed.The test will be presented in the form of a confusion matrix to obtain precision, recall, F1-scores, and accuracy as shown in Eqs.(1)-( 4).The true negative (TN) value is the amount of negative data that is correctly classified by the system.True positive (TP) is the amount of positive data that is correctly classified by the system.False positive (FP) is the number of positive data that is incorrectly classified by the system.False negatives (FN) are the number of negative data that the system incorrectly classifies.
The compression performance measurement is expressed in a compression ratio (CR), which is the ratio between the original signal and the compressed signal, as shown in Eq. ( 5) (Irawati, Suksmono & Edward, 2018).

RESULTS
In this study, the system model designed for dataset testing uses the AlexNet architecture with predetermined input parameters as shown in Table 1.The dataset was divided into three out of 600 images: 70% training data with 420 images, 10% validation data with 60 images, and 20% test data with 120 randomly selected images.The dataset used for the CNN process had been reconstructed during the CS process.The programming platform used in this research was Google Collaboratory, starting from the pre-processing, compressive sensing, and classification stages.
The accuracy and loss performance of the training and validation process using the AlexNet method are shown in Fig. 4, while the combination method of block-based CS on AlexNet with a CR of 2× is shown in Fig. 5. Figure 4 shows that the result of the training accuracy was 98% while the validation accuracy was 96%.The result of the training loss was 14% and validation loss was 15%.In Fig. 5, the results showed that the training accuracy was 97% while the validation accuracy was 86%.Loss training performance was 11% and validation loss was 36%.
In the testing phase, 20% of the images from the dataset were used.Figure 6 shows that 112 images were correctly predicted according to their class.This condition shows that the application of block-based CS on the CNN model using the AlexNet architecture is promising for classifying vegetables such as carrots, broccoli, radishes, and potatoes.The description of the confusion matrix on the model's ability to predict different types of vegetables is shown in Table 2.
In this article, we tested the effect of several CRs on the system performance when using AlexNet only vs the combination of blocked-based CS and AlexNet.There were five CRs: 32, 16, 8, 4, and 2. Table 3 shows the performance parameters based on the F1-score, recall, precision, testing accuracy, and testing loss.Based on these results, we determined that the smaller the CR, the higher the precision, recall, F1, and test accuracy values, and the value of testing loss was directly proportional to CR.There is a trade-off between accuracy and CR.When the CR is getting better or higher, the accuracy will be lower.This is in accordance with the quality of the image which gets worse when the CR is higher, so that the deep learning system will find it more difficult to get training accuracy.However, the lower the accuracy, the higher the CR, which has a good impact on the efficiency of the process.In this case we choose a moderate value between accuracy and CR so that both provide benefits or a win-win solution depending on the performance focus, whether to choose CR or accuracy.The performance analysis of the proposed method compared with a previous study is shown in Table 4.Both proposed methods have higher accuracy performance than other     2020) used 4,200 images, while only 600 images were used in this study.However, even with less data, our testing accuracy was higher.For our next research, we plan to optimize the CNN architecture after CS processing by several procedures, such as early stopping and over-fitting reduction as researcher suggestion from Naushad, Kaur & Ghaderpour (2021).

CONCLUSION
In this study, the CNN architecture AlexNet was used for vegetable image classification.
Comparison tests were carried out limited on the AlexNet system only and AlexNet combined with block-based CS.In the CS system, DCT sparsity technique, Gaussian normal measurement matrix, and OMP reconstruction algorithm were used.Both proposed methods were tested on 600 datasets for four types of vegetables.The maximum testing accuracy results reached 98% for the AlexNet method, while the block-based CS in AlexNet method achieved a maximum accuracy of 96.66% with 2× compression rate.This proved that the two proposed systems provided better accuracy than previous studies.Future studies need to explore the addition of plant species and implementation when using sensors.

Figure 1 Figure 2
Figure 1 Vegetable dataset examples.(A) Broccoli.(B) Carrots.(C) Potatoes.(D) Radishes (Kaggle, Ahmed, Mahmud Mamun & Zaman Asif, 2021).Full-size  DOI: 10.7717/peerj-cs.1551/fig-1 A M Â L ð Þmatrix for CS acquisition where L ¼ B 2 , B is the size of segmented image and M ( L. 3. Apply segmentation of the image into B Â B pixels obtaining x i B Â B ð Þwhere i is a block number of the block-based image.4. Transform x i into DCT domain by 2-D DCT obtaining X i B Â B ð Þ. 5. Reshape 2-D matrix X i into 1-D vector obtaining v i with size B 2 Â 1. 6. Apply CS acquisition Av i obtaining Y i with size M Â 1. Dyah Irawati et al. (2023), PeerJ Comput.Sci., DOI 10.7717/peerj-cs.15515/15 7. Reconstruct Y i by OMP using A obtaining b v i with size B 2 Â 1. 8. Reshape 1-D vector b v i into 2-D matrix b X i with size B Â B. 9. Apply 2-D IDCT to b X i obtaining b x i with size B Â B. 10.Repeat steps 4-9 until all blocks are processed.11.Combine all blocks from step 10 obtaining the reconstructed image x n; m ð Þ with size N Â N: 12. Repeat steps 1-11 until we have 400 reconstructed images for training.These images are ready to go to the deep learning training stage.13.Train the dataset consisting of 400 images using the AlexNet technique in CNN model to obtain the trained network.Save the trained network.
Av i obtaining Y i with size M Â 1. 7. Reconstruct Y i by OMP using A obtaining b v i with size B 2 Â 1. 8. Reshape 1-D vector b v i into 2-D matrix b X i with size B Â B. 9. Apply 2-D IDCT to b X i obtaining b x i with size B Â B.

Figure 4
Figure 4 The results of training and validation on the AlexNet method for (A) accuracy and (B) loss.The blue line indicates the training results, and the orange line represents the testing results.Full-size  DOI: 10.7717/peerj-cs.1551/fig-4

Figure 7
Figure7shows a comparison of the computational time of each CR.Looking at the figure, it can be determined that the larger the CR, the greater the computation time, and the computation time without CS was the longest compared to that of the proposed combination technique.The performance analysis of the proposed method compared with a previous study is shown in Table4.Both proposed methods have higher accuracy performance than other

Figure 5 Figure 6
Figure 5 The results of training and validation on the Block-based CS+AlexNet method for (A) accuracy and (B) loss.The blue line indicates the training results, and the orange line indicates the testing results.Full-size  DOI: 10.7717/peerj-cs.1551/fig-5

Figure 7
Figure 7 Computational time.The x-axis shows the compression ratio (CR) and the y-axis shows the times.Full-size  DOI: 10.7717/peerj-cs.1551/fig-7

Table 1
Predetermined input parameter.

Table 3
Performance parameters of the proposed method.

Table 2
Precision, recall and F1-score values of the model to predict vegetables: carrot, brocolli, radish, potato.