Abstract
An electronic invoice (E-invoice) is a kind of document that records the transactions of goods or services and then stores and exchanges them electronically. E-invoice is an emerging practice and presents a valuable source of information for many areas. Dealing with these invoices is usually a very challenging task. Information reported is often incomplete or presents mistakes. Before any meaningful treatment of these invoices, it is necessary to evaluate the product represented in each file. This research puts forward a conceptual framework to explain how to apply machine learning technology to extract meaningful information from invoices at different levels of aggregation. Related work in the field is contextualized within a given framework. A study case based on real data from Electronic invoice (NF-e) and Electronic Consumer Invoice (NFC-e) documents in Brazil, related to B2B and retail transactions. We compared traditional term frequency models with the Convolutions sentence classification models. Our experiments show that even if invoice text descriptions are short and there are a lot of errors and typos, simple term frequency models can achieve high baseline results on product code assignment.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Agapito, G., Calabrese, B., Guzzi, P.H., Graziano, S., Cannataro, M.: Association rule mining from large datasets of clinical invoices document. In: Proceedings - 2019 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2019, pp. 2232–2238 (2019). https://doi.org/10.1109/BIBM47256.2019.8982934
Bardelli, C., Rondinelli, A., Vecchio, R., Figini, S.: Automatic electronic invoice classification using machine learning models. Mach. Learn. Knowl. Extr. 2(4), 617–629 (2020). https://doi.org/10.3390/make2040033, https://www.mdpi.com/2504-4990/2/4/33
Chang, W.T., Yeh, Y.P., Wu, H.Y., Lin, Y.F., Dinh, T.S., Lian, I.: An automated alarm system for food safety by using electronic invoices. PLoS ONE 15(1), e0228035 (2020). https://doi.org/10.1371/journal.pone.0228035
Cuylen, A., Kosch, L., Breitner, M.H.: Development of a maturity model for electronic invoice processes. Electron. Mark. 26(2), 115–127 (2015). https://doi.org/10.1007/s12525-015-0206-x
Da Rocha, C.C., et al.: SQL query performance on Hadoop: an analysis focused on large databases of Brazilian electronic invoices. In: ICEIS 2018 - Proceedings of the 20th International Conference on Enterprise Information Systems 1(ICEIS), pp. 29–37 (2018). https://doi.org/10.5220/0006690400290037
Enamoto, L., Weigang, L., Filho, G.P.R.: Generic framework for multilingual short text categorization using convolutional neural network. Multimedia Tools Appl. 80(9), 13475–13490 (2021). https://doi.org/10.1007/s11042-020-10314-9
Faruqui, M., Tsvetkov, Y., Rastogi, P., Dyer, C.: Problems with evaluation of word embeddings using word similarity tasks, pp. 30–35 (2016). https://doi.org/10.18653/v1/w16-2506
Feng, Y., Jiang, P., Gu, Z., Dai, Y.: Study of recognition of electronic invoice image. In: 2021 IEEE Information Technology, Networking, Electronic and Automation Control Conference, ITNEC, vol. 5, pp. 1582–1586 (2021). https://doi.org/10.1109/ITNEC52019.2021.9586969
Grida, M., Soliman, H., Hassan, M.: Short text mining: state of the art and research opportunities. J. Comput. Sci. 15(10), 1450–1460 (2019). https://doi.org/10.3844/jcssp.2019.1450.1460
He, Y., Wang, C., Li, N., Zeng, Z.: Attention and memory-augmented networks for dual-view sequential learning. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 125–134 (2020). https://doi.org/10.1145/3394486.3403055
Kieckbusch, D.S., Filho, G.P.R., Oliveira, V.D., Weigang, L.: SCAN-NF: a CNN-based system for the classification of electronic invoices through short-text product description. In: Mayo, F.J.D., Marchiori, M., Filipe, J. (eds.) Proceedings of the 17th International Conference on Web Information Systems and Technologies, WEBIST 2021, 26–28 October 2021, pp. 501–508. SCITEPRESS (2021). https://doi.org/10.5220/0010715200003058
Kim, Y.: Convolutional neural networks for sentence classification. In: EMNLP 2014 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference (2011), pp. 1746–1751 (2014). https://doi.org/10.3115/v1/d14-1181
Marinho, M.C., Di Oliveira, V., Neto, S.A.P.B., Weigang, L., Borges, V.R.P.: Visual analysis of electronic invoices to identify suspicious cases of tax frauds. In: Rocha, Á., Ferrás, C., Méndez Porras, A., Jimenez Delgado, E. (eds.) ICITS 2022. LNNS, vol. 414, pp. 185–195. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-96293-7_18
Naseem, U., Razzak, I., Musial, K., Imran, M.: Transformer based deep intelligent contextual embedding for Twitter sentiment analysis. Future Gen. Comput. Syst. 113, 58–69 (2020). https://doi.org/10.1016/j.future.2020.06.050
Oliveira, V.D., Chaim, R.M., Weigang, L., Neto, S.A.P.B., Filho, G.P.R.: Towards a smart identification of tax default risk with machine learning. In: Mayo, F.J.D., Marchiori, M., Filipe, J. (eds.) Proceedings of the 17th International Conference on Web Information Systems and Technologies, WEBIST 2021, 26–28 October 2021, pp. 422–429. SCITEPRESS (2021). https://doi.org/10.5220/0010712200003058
Paalman, J., Mullick, S., Zervanou, K., Zhang, Y.: Term based semantic clusters for very short text classification. In: International Conference Recent Advances in Natural Language Processing, RANLP, vol. 2019, pp. 878–887 (2019). https://doi.org/10.26615/978-954-452-056-4_102
Palm, R.B., Laws, F., Winther, O.: Attend, copy, parse end-to-end information extraction from documents. In: Proceedings of the International Conference on Document Analysis and Recognition, ICDAR, pp. 329–336 (2019). https://doi.org/10.1109/ICDAR.2019.00060, https://www.scopus.com/inward/record.uri?eid=2-s2.0-85079851980 &doi=10.1109%2FICDAR.2019.00060 &partnerID=40 &md5=29b092a6c8a3c0caf86779867d63d202
Phan, X.H., Nguyen, L.M., Horiguchi, S.: Learning to classify short and sparse text & web with hidden topics from large-scale data collections. In: 2008 Proceeding of the 17th International Conference on World Wide Web, WWW 2008, pp. 91–99 (2008). https://doi.org/10.1145/1367497.1367510
Sahami, M., Heilman, T.D.: A web-based kernel function for measuring the similarity of short text snippets. In: Proceedings of the 15th International Conference on World Wide Web, pp. 377–386 (2006). https://doi.org/10.1145/1135777.1135834
Schulte, J., et al.: ELINAC: autoencoder approach for electronic invoices data clustering. Appl. Sci. 12, 3008 (2022). https://doi.org/10.3390/app12063008
SEFAZ: Manual de Orientação do Contribuinte - Padrões Técnicos de Comunicação. ENCAT (2015)
Tang, P., et al.: Anomaly detection in electronic invoice systems based on machine learning. Inf. Sci. 535, 172–186 (2020). https://doi.org/10.1016/j.ins.2020.03.089
Tang, X., Zhu, Y., Hu, X., Li, P.: An integrated classification model for massive short texts with few words. In: ACM International Conference Proceeding Series, pp. 14–20 (2019). https://doi.org/10.1145/3366715.3366734
Wang, J., Wang, Z., Zhang, D., Yan, J.: Combining knowledge with deep convolutional neural networks for short text classification. In: IJCAI International Joint Conference on Artificial Intelligence, pp. 2915–2921 (2017). https://doi.org/10.24963/ijcai.2017/406
Yih, W.T., Meek, C.: Improving similarity measures for short segments of text. In: Proceedings of the National Conference on Artificial Intelligence, vol. 2, pp. 1489–1494 (2007)
Yu, J., Qiao, Y., Shu, N., Sun, K., Zhou, S., Yang, J.: Neural network based transaction classification system for chinese transaction behavior analysis. In: Proceedings - 2019 IEEE International Congress on Big Data, BigData Congress 2019 - Part of the 2019 IEEE World Congress on Services, pp. 64–71 (2019). https://doi.org/10.1109/BigDataCongress.2019.00021
Yue, Y., Zhang, Y., Hu, X., Li, P.: Extremely short Chinese text classification method based on bidirectional semantic extension. In: Journal of Physics: Conference Series. vol. 1437 (2020). https://doi.org/10.1088/1742-6596/1437/1/012026
Zhang, H., Dong, B., Feng, B., Yang, F., Xu, B.: Classification of financial tickets using weakly supervised fine-grained networks. IEEE Access 8, 129469–129477 (2020). https://doi.org/10.1109/ACCESS.2020.3007528, https://www.scopus.com/inward/record.uri?eid=2-s2.0-85089215581 &doi=10.1109%2FACCESS.2020.3007528 &partnerID=40 &md5=9fffb4e8a98ac64be2fa28de21f4e632
Zhang, X., LeCun, Y.: Text understanding from scratch (2016). http://arxiv.org/abs/1502.01710
Zhou, M., Hu, X., Zhu, Y., Li, P.: A novel classification method for short texts with few words. In: Proceedings of 2019 IEEE 3rd Information Technology, Networking, Electronic and Automation Control Conference, ITNEC 2019, pp. 861–865 (2019). https://doi.org/10.1109/ITNEC.2019.8729520
Zhu, Y., Li, Y., Yue, Y., Qiang, J., Yuan, Y.: A hybrid classification method via character embedding in Chinese short text with few words. IEEE Access 8, 92120–92128 (2020). https://doi.org/10.1109/ACCESS.2020.2994450
Acknowledgements
This work has been partially supported by the Brazilian National Council for Scientific and Technological Development (CNPq) under grant number 309545/2021-8. Thanks to Mr. Sergio Neto and other colleagues from the Department of Economy of the Federal District in Brasilia.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 Springer Nature Switzerland AG
About this paper
Cite this paper
Kieckbusch, D.S., Filho, G.P.R., Di Oliveira, V., Weigang, L. (2023). Towards Intelligent Processing of Electronic Invoices: The General Framework and Case Study of Short Text Deep Learning in Brazil. In: Marchiori, M., Domínguez Mayo, F.J., Filipe, J. (eds) Web Information Systems and Technologies. WEBIST WEBIST 2020 2021. Lecture Notes in Business Information Processing, vol 469. Springer, Cham. https://doi.org/10.1007/978-3-031-24197-0_5
Download citation
DOI: https://doi.org/10.1007/978-3-031-24197-0_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-24196-3
Online ISBN: 978-3-031-24197-0
eBook Packages: Computer ScienceComputer Science (R0)