Learning Curves Prediction for a Transformers-Based Model
Abstract
Doi: 10.28991/ESJ-2023-07-05-03
Full Text: PDF
Keywords
References
Peres, F., & Castelli, M. (2021). Combinatorial Optimization Problems and Metaheuristics: Review, Challenges, Design, and Development. Applied Sciences, 11(14), 6449. doi:10.3390/app11146449.
Yang, L., & Shami, A. (2020). On hyperparameter optimization of machine learning algorithms: Theory and practice. Neurocomputing, 415, 295–316. doi:10.1016/j.neucom.2020.07.061.
Kalayeh, H. M., & Landgrebe, D. A. (1983). Predicting the Required Number of Training Samples. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-5(6), 664–667. doi:10.1109/TPAMI.1983.4767459.
Hestness, J., Narang, S., Ardalani, N., Diamos, G., Jun, H., Kianinejad, H., Patwary, M. M. A., Yang, Y., & Zhou, Y. (2017). Deep learning scaling is predictable, empirically. arXiv preprint arXiv:1712.00409. doi:10.48550/arxiv.1712.00409.
Beleites, C., Neugebauer, U., Bocklitz, T., Krafft, C., & Popp, J. (2013). Sample size planning for classification models. Analytica Chimica Acta, 760, 25–33. doi:10.1016/j.aca.2012.11.007.
Dobbin, K. K., & Simon, R. M. (2007). Sample size planning for developing classifiers using high-dimensional DNA microarray data. Biostatistics, 8(1), 101–117. doi:10.1093/biostatistics/kxj036.
Dobbin, K. K., Zhao, Y., & Simon, R. M. (2008). How large a training set is needed to develop a classifier for microarray data? Clinical Cancer Research, 14(1), 108–114. doi:10.1158/1078-0432.CCR-07-0443.
Kier, C., & Aach, T. (2006). Predicting the benefit of sample size extension in multiclass k-NN classification. 18th International Conference on Pattern Recognition (ICPR’06). doi:10.1109/icpr.2006.942.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30, 1-11, NeurIPS Proceedings, Long Beach, California, United States.
Viering, T., & Loog, M. (2023). The Shape of Learning Curves: A Review. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(6), 7799–7819. doi:10.1109/tpami.2022.3220744.
Zhuang, F., Qi, Z., Duan, K., Xi, D., Zhu, Y., Zhu, H., Xiong, H., & He, Q. (2021). A Comprehensive Survey on Transfer Learning. Proceedings of the IEEE, 109(1), 43–76. doi:10.1109/jproc.2020.3004555.
Frey, L. J., & Fisher, D. H. (1999). Modeling decision tree performance with the power law. Seventh International Workshop on Artificial Intelligence and Statistics. PMLR, 3-6 January, 1999, Fort Lauderdale, United States.
Hess, K. R., & Wei, C. (2010). Learning Curves in Classification with Microarray Data. Seminars in Oncology, 37(1), 65–68. doi:10.1053/j.seminoncol.2009.12.002.
Brumen, B., Rozman, I., Heričko, M., Černezel, A., & Hölbl, M. (2014). Best-fit learning curve model for the C4.5 algorithm. Informatica (Netherlands), 25(3), 385–399. doi:10.15388/Informatica.2014.19.
Singh, S. (2005). Modeling performance of different classification methods: deviation from the power law. Project Report, Department of Computer Science, Vanderbilt University, Nashville, United States.
Figueroa, R. L., Zeng-Treitler, Q., Kandula, S., & Ngo, L. H. (2012). Predicting sample size required for classification performance. BMC Medical Informatics and Decision Making, 12(1), 1–10. doi:10.1186/1472-6947-12-8.
Last, M. (2007). Predicting and Optimizing Classifier Utility with the Power Law. Seventh IEEE International Conference on Data Mining Workshops (ICDMW 2007), Nebraska, United States. doi:10.1109/icdmw.2007.31.
Cortes, C., Jackel, L. D., Solla, S., Vapnik, V., & Denker, J. (1993). Learning curves: Asymptotic values and rate of convergence. Advances in Neural Information Processing Systems, 6, NeurIPS Proceedings, Long Beach, California, United States.
Kolachina, P., Cancedda, N., Dymetman, M., & Venkatapathy, S. (2012, July). Prediction of learning curves in machine translation. Proceedings of the 50th Annual Meeting of the Association for Computational, 8-14 July, 2012, Jeju Island, South Korea.
Leite, R., & Brazdil, P. (2004). Improving Progressive Sampling via Meta-learning on Learning Curves. Machine Learning: ECML 2004. ECML 2004. Lecture Notes in Computer Science, 3201, Springer, Berlin, Germany. doi:10.1007/978-3-540-30115-8_25.
Hoiem, D., Gupta, T., Li, Z., & Shlapentokh-Rothman, M. (2021). Learning curves for analysis of deep networks. International conference on machine learning, 18-24 July, 2021, Virtual Event.
Mukherjee, S., Tamayo, P., Rogers, S., Rifkin, R., Engle, A., Campbell, C., Golub, T. R., & Mesirov, J. P. (2003). Estimating Dataset Size Requirements for Classifying DNA Microarray Data. Journal of Computational Biology, 10(2), 119–142. doi:10.1089/106652703321825928.
Sambasivan, N., Kapania, S., Highfill, H., Akrong, D., Paritosh, P., & Aroyo, L. M. (2021). “Everyone wants to do the model work, not the data work”: Data Cascades in High-Stakes AI. Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. doi:10.1145/3411764.3445518.
Castelli, M., Pinto, D. C., Shuqair, S., Montali, D., & Vanneschi, L. (2022). The Benefits of Automated Machine Learning in Hospitality. Emerging Science Journal, 6(6), 1237-1254. doi:10.28991/ESJ-2022-06-06-02.
ICDAR. (2019). Overview - ICDAR 2019 Robust Reading Challenge on Scanned Receipts OCR and Information Extraction. Robust Reading Competition. Available online: https://rrc.cvc.uab.es/?ch=13 (accessed on April 2023).
Cruz, F., & Castelli, M. (2022). Dataset of personal invoices and receipts including annotation of relevant fields. 16 October 2022, Version v1. doi:10.5281/ZENODO.7213544.
Xu, Y., Li, M., Cui, L., Huang, S., Wei, F., & Zhou, M. (2020). LayoutLM: Pre-training of Text and Layout for Document Image Understanding. Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. doi:10.1145/3394486.3403172.
Dai, A. M., & Le, Q. V. (2015). Semi-supervised sequence learning. Advances in Neural Information Processing Systems, NeurIPS Proceedings, 28, 1-9.
Nakayama, H. (2018). Chakki-works/seqeval: A Python framework for sequence labeling evaluation (named-entity recognition, pos tagging, etc...). GitHub, San Francisco, United States. Available online: https://github.com/chakki-works/seqeval (accessed on July 2023).
Virtanen, P., Gommers, R., Oliphant, T. E., Haberland, M., Reddy, T., Cournapeau, D., Burovski, E., Peterson, P., Weckesser, W., Bright, J., van der Walt, S. J., Brett, M., Wilson, J., Millman, K. J., Mayorov, N., Nelson, A. R. J., Jones, E., Kern, R., Larson, E., … Vázquez-Baeza, Y. (2020). SciPy 1.0: fundamental algorithms for scientific computing in Python. Nature Methods, 17(3), 261–272. doi:10.1038/s41592-019-0686-2.
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., & Duchesnay, É. (2011). Scikit-learn: Machine learning in Python. The Journal of Machine Learning Research, 12, 2825-2830.
DOI: 10.28991/ESJ-2023-07-05-03
Refbacks
- There are currently no refbacks.
Copyright (c) 2023 Francisco Cruz, Mauro Castelli