ABSTRACT
Selection of high-quality ground truth data is a critical step for machine learning. Conventionally, a human-centered strategy is utilized to label the data. While this technique provides accurate annotations of task-specific behaviors, it is difficult, costly and error-prone. One method explored to solve these problems is active learning, a model-centered approach that minimizes human involvement. In this work, we conduct an experiment to compare the performance of active learning and passive learning strategies in selecting ground truth data for a classification task to detect the incidence of task persistent behavior from students' interaction logs. Our findings suggest that active learning tends to be more effective and efficient than passive learning in achieving a certain level of performance. However, the overall performance comparison shows that passive selection for ground truth data is as effective as the active learning approach for applications with relatively small sample size.
- Baker, R. and de Carvalho, A., 2008, June. Labeling student behavior faster and more precisely with text replays. In Educational Data Mining 2008.Google Scholar
- Wang, W., Chen, L., Thirunarayan, K. and Sheth, A.P., 2012, September. Harnessing twitter" big data" for automatic emotion identification. In Privacy, Security, Risk and Trust (PASSAT), 2012 International Conference on and 2012 International Conference on Social Computing (SocialCom)(pp. 587--592). IEEE. Google ScholarDigital Library
- Zhang, Y., Lease, M. and Wallace, B.C., 2017, February. Active Discriminative Text Representation Learning. In AAAI(pp. 3386--3392). Google ScholarDigital Library
- Doan, A., Ardalan, A., Ballard, J., Das, S., Govind, Y., Konda, P., Li, H., Mudgal, S., Paulson, E., Suganthan, G.C. and Zhang, H., 2017, May. Human-in-the-Loop Challenges for Entity Matching: A Midterm Report. In Proceedings of the 2nd Workshop on Human-In-the-Loop Data Analytics (p. 12). ACM. Google ScholarDigital Library
- BANAWAN, M.P., RODRIGO, M.M.T. and ANDRES, J.M.L., 2017. Predicting Student Carefulness within an Educational Game for Physics using Support Vector Machines. In Proc. of the 25th International Conference on Computers in Education(pp. 62--67).Google Scholar
- Lee, D.M.C., Rodrigo, M.M.T., d Baker, R.S., Sugay, J.O. and Coronel, A., 2011, October. Exploring the relationship between novice programmer confusion and achievement. In International Conference on Affective Computing and Intelligent Interaction (pp. 175--184). Springer, Berlin, Heidelberg. Google ScholarDigital Library
- DiCerbo, K.E., 2014. Game-based assessment of persistence. Journal of Educational Technology & Society, 17(1).Google Scholar
- Kautz, T., Heckman, J.J., Diris, R., Ter Weel, B. and Borghans, L., 2014. Fostering and measuring skills: Improving cognitive and non-cognitive skills to promote lifetime success (No. w20749). National Bureau of Economic Research.Google Scholar
- Matsuda, N., Barbalios, N., Zhao, Z., Ramamurthy, A., Stylianides, G.J. and Koedinger, K.R., 2016, June. Tell me how to teach, I'll learn how to solve problems. In International Conference on Intelligent Tutoring Systems (pp. 111--121). Springer, Cham. Google ScholarDigital Library
- Demir, B., Persello, C. and Bruzzone, L., 2011. Batch-mode active-learning methods for the interactive classification of remote sensing images. IEEE Transactions on Geoscience and Remote Sensing, 49(3), pp.1014--1031.Google ScholarCross Ref
- Wang, L., Hu, X., Yuan, B. and Lu, J., 2015. Active learning via query synthesis and nearest neighbour search. Neurocomputing, 147, pp.426--434.Google ScholarCross Ref
- Smailović, J., Grčar, M., Lavrač, N. and Žnidaršič, M., 2014. Stream-based active learning for sentiment analysis in the financial domain. Information sciences, 285, pp.181--203. Google ScholarDigital Library
- Kremer, J., Steenstrup Pedersen, K. and Igel, C., 2014. Active learning with support vector machines. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 4(4), pp.313--326. Google ScholarDigital Library
- Ramirez-Loaiza, M.E., Sharma, M., Kumar, G. and Bilgic, M., 2017. Active learning: an empirical study of common baselines. Data mining and knowledge discovery, 31(2), pp.287--313. Google ScholarDigital Library
- Dumdumaya, C., and Rodrigo, M.M., 2018. Predicting Task Persistence within a Learning-by-Teaching Environment. In Proceedings of the 26th International Conference on Computers in Education (pp.1--10).Google Scholar
- Suh, J., Graham, S., Ferrarone, T., Kopeinig, G. and Bertholet, B., 2011. Developing persistent and flexible problem solvers with a growth mindset. Motivation and disposition: Pathways to learning mathematics, pp.169--184.Google Scholar
- Wigness, M., Draper, B.A. and Beveridge, J.R., 2018. Efficient label collection for image datasets via hierarchical clustering. International Journal of Computer Vision, 126(1), pp.59--85. Google ScholarDigital Library
- Aleksandrova, M., Brun, A., Boyer, A. and Chertov, O., 2017. Identifying representative users in matrix factorization-based recommender systems: application to solving the content-less new item cold-start problem. Journal of Intelligent Information Systems, 48(2), pp.365--397. Google ScholarDigital Library
- Smyth, P., Fayyad, U.M., Burl, M.C., Perona, P. and Baldi, P., 1995. Inferring ground truth from subjective labelling of venus images. In Advances in neural information processing systems(pp. 1085--1092). Google ScholarDigital Library
- Joshi, A.J., Porikli, F. and Papanikolopoulos, N., 2009. Multi-class active learning for image classification. IEEE Conference on Computer Vision and Pattern Recognition(CVPR), Miami, FL, USA, 2009, pp. 2372--2379.Google Scholar
- Koedinger, K.R., Baker, R.S.J.d., Cunningham, K., Skogsholm, A., Leber, B., Stamper, J. 2010. A Data Repository for the EDM community: The PSLC DataShop. In Romero, C., Ventura, S., Pechenizkiy, M., Baker, R.S.J.d. (Eds.) Handbook of Educational Data Mining. Boca Raton, FL: CRC Press.Google Scholar
Index Terms
- Exploring Active Learning for Student Behavior Classification
Recommendations
Active Learning for Microarray based Leukemia Classification
ICBBE '21: Proceedings of the 2021 8th International Conference on Biomedical and Bioinformatics EngineeringIn machine learning, data labeling is assumed to be easy and cheap. However, in real word cases especially clinical field, data sets are rare and expensive to obtain. Active learning is an approach that can query the most informative data for the ...
Pre-Training Acquisition Functions by Deep Reinforcement Learning for Fixed Budget Active Learning
AbstractThere are many situations in supervised learning where the acquisition of data is very expensive and sometimes determined by a user’s budget. One way to address this limitation is active learning. In this study, we focus on a fixed budget regime ...
Active Learning from Positive and Unlabeled Data
ICDMW '11: Proceedings of the 2011 IEEE 11th International Conference on Data Mining WorkshopsDuring recent years, active learning has evolved into a popular paradigm for utilizing user's feedback to improve accuracy of learning algorithms. Active learning works by selecting the most informative sample among unlabeled data and querying the label ...
Comments