Optimization of information gain interval on determining artificial ripeness of banana using image data with imbalanced class
Agriculture and Natural Resources -- formerly Kasetsart Journal (Natural Science), Volume 057, Issue 4, July 2023- August 2023, Pages 615-624
ISSN: 2452-316X(0075-5192)
DOI: doi.org/10.34044/j.anres.2023.57.4.06
Candra Dewia,*, Endang Arisoesilaningsihb, Wayan Firdaus Mahmudyc, Solimund
58 Downloads, Cited by N/A
aBiology Department, Faculty of Science, Universitas Brawijaya, Malang 64145, Indonesia. Informatics Department, Faculty of Computer Science, Universitas Brawijaya, Malang 64145, Indonesia
bBiology Department, Faculty of Science, Universitas Brawijaya, Malang 64145, Indonesia
cInformatics Department, Faculty of Computer Science, Universitas Brawijaya, Malang 64145, Indonesia
dStaitstics Department, Faculty of Science, Universitas Brawijaya, Malang 64145, Indonesia
*Corresponding author, e-mail: dewi_candra@ub.ac.id
Importance of the work: Automatically determining the optimal number of intervals of information gain (IG) was done for continuous data. Objectives: To optimize IG interval to obtain optimal features in identifying an artificial banana ripeness with imbalanced class data. Materials & Methods: The test was done on six Indonesian banana cultivars with total image were 11,593. The total of 78 features were extracted using morphological descriptor, convex hull, local binary pattern and gray level co-occurrence matrix. Optimization of IG was done by Sturgess rule, Scott rule and K-means clustering. Oversampling with SMOTE technique was performed to handle imbalanced data. Results: The results of the identification by using extreme learning machine (ELM) classification of imbalanced data showed the higher accuracy on the use of optimization IG using Sturgess and Scott rules than the use of IG. The implementation of SMOTE also significantly increased accuracy from 20% to 40% was compared to the result on imbalanced data. Most of the accuracy resulted both using selected features for four cultivars (ambon lumut, hijau, kepok and raja) was more than 80%. The two other banana cultivars namely morosebo and susu had accuracy more than 71% and 76% respectively. Main finding: Due to the complexity of choosing the optimum number of IG bin interval on data with very high similarity characteristics, the optimization using Sturgess and Scott rules have been proven to produce the higher accuracy, especially on imbalanced data.
Banana artificial ripeness, k-means, Optimized IG, Scott rule, Sturgess rule
Copyright 2020 Kasetsart University Research and Development Institute