loading
Papers Papers/2022 Papers Papers/2022

Research.Publish.Connect.

Paper

Authors: Keith Cochran 1 ; 2 ; Clayton Cohn 1 ; 2 and Peter Hastings 1 ; 2

Affiliations: 1 DePaul University, Chicago IL 60604, U.S.A. ; 2 Vanderbilt University, Nashville TN 37240, U.S.A.

Keyword(s): Educational Texts, Natural Language Processing, BERT, Data Augmentation, Text Augmentation, Imbalanced Data Sets.

Abstract: Computer-supported education studies can perform two important roles. They can allow researchers to gather important data about student learning processes, and they can help students learn more efficiently and effectively by providing automatic immediate feedback on what the students have done so far. The evaluation of student work required for both of these roles can be relatively easy in domains like math, where there are clear right answers. When text is involved, however, automated evaluations become more difficult. Natural Language Processing (NLP) can provide quick evaluations of student texts. However, traditional neural network approaches require a large amount of data to train models with enough accuracy to be useful in analyzing student responses. Typically, educational studies collect data but often only in small amounts and with a narrow focus on a particular topic. BERT-based neural network models have revolutionized NLP because they are pre-trained on very large corpora , developing a robust, contextualized understanding of the language. Then they can be “fine-tuned” on a much smaller set of data for a particular task. However, these models still need a certain base level of training data to be reasonably accurate, and that base level can exceed that provided by educational applications, which might contain only a few dozen examples. In other areas of artificial intelligence, such as computer vision, model performance on small data sets has been improved by “data augmentation” — adding scaled and rotated versions of the original images to the training set. This has been attempted on textual data; however, augmenting text is much more difficult than simply scaling or rotating images. The newly generated sentences may not be semantically similar to the original sentence, resulting in an improperly trained model. In this paper, we examine a self-augmentation method that is straightforward and shows great improvements in performance with different BERT-based models in two different languages and on two different tasks which have small data sets. We also identify the limitations of the self-augmentation procedure. (More)

CC BY-NC-ND 4.0

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 3.17.79.60

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Cochran, K.; Cohn, C. and Hastings, P. (2023). Improving NLP Model Performance on Small Educational Data Sets Using Self-Augmentation. In Proceedings of the 15th International Conference on Computer Supported Education - Volume 1: CSEDU; ISBN 978-989-758-641-5; ISSN 2184-5026, SciTePress, pages 70-78. DOI: 10.5220/0011857200003470

@conference{csedu23,
author={Keith Cochran. and Clayton Cohn. and Peter Hastings.},
title={Improving NLP Model Performance on Small Educational Data Sets Using Self-Augmentation},
booktitle={Proceedings of the 15th International Conference on Computer Supported Education - Volume 1: CSEDU},
year={2023},
pages={70-78},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0011857200003470},
isbn={978-989-758-641-5},
issn={2184-5026},
}

TY - CONF

JO - Proceedings of the 15th International Conference on Computer Supported Education - Volume 1: CSEDU
TI - Improving NLP Model Performance on Small Educational Data Sets Using Self-Augmentation
SN - 978-989-758-641-5
IS - 2184-5026
AU - Cochran, K.
AU - Cohn, C.
AU - Hastings, P.
PY - 2023
SP - 70
EP - 78
DO - 10.5220/0011857200003470
PB - SciTePress