loading
Papers Papers/2022 Papers Papers/2022

Research.Publish.Connect.

Paper

Paper Unlock

Authors: Reda Al-Bahrani ; Dipendra Jha ; Qiao Kang ; Sunwoo Lee ; Zijiang Yang ; Wei-Keng Liao ; Ankit Agrawal and Alok Choudhary

Affiliation: Department of Electrical and Computer Engineering, Northwestern University, Evanston, IL, U.S.A.

Keyword(s): Synthetic Data, Balancing, Oversampling, Classification, Imbalanced Dataset.

Abstract: Machine learning models trained on imbalanced datasets tend to produce sub-optimal results. This happens because the learning of the minority classes is dominated by the learning of the majority class. Recommendations to overcome this obstacle include oversampling the minority class by synthesizing new instances and using different performance measures. We propose a novel approach to handle the imbalance in datasets by using a sequence-to-sequence recurrent neural network to synthesize minority class instances. The generative neural network is trained on the minority class instances to learn its data distribution; the generative neural network is then used to synthesize minority class instances; these instances are used to augment the original dataset and balance the minority class. We evaluate our proposed approach against several imbalanced datasets. We train Decision Tree models on the original and augmented datasets and compare their results against the Synthetic Minority Over-sa mpling TEchnique (SMOTE), Adaptive Synthetic sampling (ADASYN) and Synthetic Minority Over-sampling TEchnique-Nominal Continuous (SMOTE-NC). All results are an average of multiple runs and the results are compared across four different performance metrics. SIGRNN performs well compared to SMOTE and ADASYN, specifically in lower percentage increments to the minority class. Also, SIGRNN outperforms SMOTE-NC on datasets having nominal features. (More)

CC BY-NC-ND 4.0

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 3.143.17.128

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Al-Bahrani, R.; Jha, D.; Kang, Q.; Lee, S.; Yang, Z.; Liao, W.; Agrawal, A. and Choudhary, A. (2021). SIGRNN: Synthetic Minority Instances Generation in Imbalanced Datasets using a Recurrent Neural Network. In Proceedings of the 10th International Conference on Pattern Recognition Applications and Methods - ICPRAM; ISBN 978-989-758-486-2; ISSN 2184-4313, SciTePress, pages 349-356. DOI: 10.5220/0010348103490356

@conference{icpram21,
author={Reda Al{-}Bahrani. and Dipendra Jha. and Qiao Kang. and Sunwoo Lee. and Zijiang Yang. and Wei{-}Keng Liao. and Ankit Agrawal. and Alok Choudhary.},
title={SIGRNN: Synthetic Minority Instances Generation in Imbalanced Datasets using a Recurrent Neural Network},
booktitle={Proceedings of the 10th International Conference on Pattern Recognition Applications and Methods - ICPRAM},
year={2021},
pages={349-356},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0010348103490356},
isbn={978-989-758-486-2},
issn={2184-4313},
}

TY - CONF

JO - Proceedings of the 10th International Conference on Pattern Recognition Applications and Methods - ICPRAM
TI - SIGRNN: Synthetic Minority Instances Generation in Imbalanced Datasets using a Recurrent Neural Network
SN - 978-989-758-486-2
IS - 2184-4313
AU - Al-Bahrani, R.
AU - Jha, D.
AU - Kang, Q.
AU - Lee, S.
AU - Yang, Z.
AU - Liao, W.
AU - Agrawal, A.
AU - Choudhary, A.
PY - 2021
SP - 349
EP - 356
DO - 10.5220/0010348103490356
PB - SciTePress