TRANSFER LEARNING USING MOBILENET FOR RICE SEED IMAGE CLASSIFICATION

,


INTRODUCTION
Rice is a staple food for half the world's population, especially in Asia, Africa, and Latin America.
Breeding superior seeds to produce quality rice is necessary.Seeds play an important role in rice growth.Seeds carry genetic characteristics for successful plant growth.
The selection of quality seeds aims to increase rice productivity.Seed breeding can be done by selecting sources based on shape, smell, and color.Seed selection based on visual conditions can be done if the number is limited.However, if the number of seeds increases, identifying them will be increasingly difficult.
In this study, classification was carried out based on the shape of the rice seed.Similar research has been carried out, as shown in Table 1.Research using machine learning classification requires initializing the features that must be used for the classification process.The system can conduct the classification mining process with features that must be defined first.Apart from machine learning, there is also deep learning, which was used in previous research, including Kiratinaranapuk et al. using VGG16, VGG19 [8], Xception [9], Inception [10], InceptionResnetV2.The best accuracy results are up to 95.15% using InceptionResnetV2 [4].Furthermore, research conducted by Koklu et  GoogleNet [12], and ResNet [13].Experimental results show that ResNet produces accuracy values of up to 86.08% [7].
Research using deep learning does not require initializing feature types.The model works in a black-box manner that performs input and output without having to understand the type of feature used.Generally, deep learning for classification consists of the first layer for feature extraction and the second for classification.In this research, we classified corn seeds using the deep learning method.

RESEARCH METHOD
This section explains the research stages that must be carried out to obtain rice seed classification results.Figure 1 shows the proposed system.So, the number of training is 6,400, validation is 1,600, and testing is 2,000 images.

MobileNet
MobileNet is a lightweight version of CNN from Google.The characteristic of mobileNet is that it has Deptwise Separable Convolution, which functions to reduce model size and complexity.The size model has smaller parameters.Meanwhile, complexity has fewer multiplications and additions (multi-adds) [14]- [16].
Depthwise Separable Convolution is depthwise convolution plus pointwise convolution: 1. Depthwise Convolution is a channel-wise kernel×kernel spatial convolution.If the data has five channels, it will have five kernel×kernel spatial convolutions.
The mobileNet architecture can be seen in Table 2.Meanwhile, Figure 2 shows the layers after convolution and depthwise separable.

Transfer learning
Transfer    The stages of transfer learning are as follows: 1. Obtain pre-trained models.The transfer learning process begins by getting a model that has undergone previous training.Pre-trained models generally take imageNet weights.

Result
The experiment began with a training and validation process with 5-fold cross-validation [24]- [26].The accuracy results of the training and validation process are shown in Figure 5.The results of measuring the performance of the five folds show that fold-1 produces the best performance regarding the accuracy of each fold, as shown in Figure 7.The test results showed only nine identification errors out of 2,000 testing images.The misidentified image is found in Arborio, Basmati, and Karacadag seed types.Those identified incorrectly were in the Basmati and Arborio classes.From Figure 8, precision Recall, f1-score can be calculated as shown in Table 4. Visualization of the test results is shown in Figure 9.The xaxis is the predicted label, while the y-axis is the actual label.Figure 9 shows the results of random testing.The results of performance measurement testing data are accuracy 99.55%, precision 99.55%, recall 99.08%, f1-score 99.31%

Figure 1 .
Figure 1.Proposed system learning utilizes feature representations from pre-trained models without training a new model from scratch.Pre-trained models are usually trained on large datasets that are benchmarks RICE SEED IMAGE CLASSIFICATION for computer vision.The weights from the model can be reused in other computer vision tasks.Pre-trained models can perform new functions for image classification or integrate the training process on new models.Pre-trained models save training time and lower generalization errors.Transfer learning is suitable when used on small training data.The weights from the pre-trained models are used to initialize the new model[17]-[19].

Figure 2
Figure 2. Batch Normalization and Rectified Linear Unit after each convolutional layer

2 . 3 . 5 . 6 .
Create a base model.This section uses a base model, for example, mobileNet.The base model is only used in the final output layer for the new role.Therefore, remove the old final output layer.Then, add a final output layer that is compatible with the problem.Freeze layers so they don't change during training.Freeze the low feature layer in the initial convolutional layers.The weights do not need to be re-initiated.Learning has been carried out previously on the pre-trained model used.4. Add new trainable layers.The next step is adding a new trainable layer, bringing the old features to the new dataset.The new trainable layer is essential because the model is pre-trained without a final output layer.Train the new layers on the dataset.The pre-trained model is different from the classification model that will be used.Generally, a pre-trained model such as ImageNet has 1,000 outputs, while the classification model that will be used consists of five classes.So, the model must be trained with a new output layer.Therefore, usually, a new dense layer is added.The new dense layer units correspond to the number of output classes.Improve the model via fine-tuning.Fine-tuning performs unfreezing on certain parts of the base model.Usually, the convolutional layer produces high features in the final layers with a shallow learning rate.A low learning rate aims to improve performance and prevent overfitting.

Figure 4 Figure 4 .
Figure4shows the transfer learning process and fine-tuning on certain parts of the mobileNet.

Figure 5 .Figure 5
Figure 5. (a) -(e) Results of training accuracy and data validation in 5-fold cross-validation (fold-1 until fold-5)Figure5shows the success of the training and validation process.The validation results show that there is no overfitting in the data.The experimental results show that fold-1 leads to the best validation accuracy results.Apart from accuracy, performance is measured using precision, recall, and f1-score.Figure6shows the performance of each measure.

Figure 6 .
Figure 6.Performance Measures of training and validation result

Figure 7 .
Figure 7. Accuracy of each fold cross-validation (CV) For training, because the system uses Colab Pro, it can learn relatively quickly, around two to three minutes, as shown in Table 4.The fastest training time for fold-5 is 2 minutes 43 sec.Meanwhile, the longest training time on fold-4 is 3 minutes and 39 seconds.

Table 4 .
Time of training data next stage is testing data 20%.Testing uses the best model obtained in fold-1 during the training and validation process.The test results are shown in Figure 8, which is the confusion matrix of the testing results.

Figure 8 .
Figure 8. Confusion matrix of testing data

Table 4 .
Testing Result of Rice Seeds Classification