Distributed Classification - A Scalable Approach to Semi Supervised Machine Learning

doi:https://doi.org/10.46354/i3m.2022.emss.022

Distributed Classification - A Scalable Approach to Semi Supervised Machine Learning

^aRainer Meindl ,
^bSimone Sandler,
^cElisabeth Mayrhuber,
^dOliver Krauss

^a,b,c,dResearch Group Advanced Information Systems and Technology, Research and Development Department, University of
Applied Sciences Upper Austria

Cite as

Meindl R., Sandler S., Mayrhuber E., and Krauss O. (2022).,Distributed Classification - A Scalable Approach to Semi Supervised Machine Learning. Proceedings of the 34th European Modeling & Simulation Symposium (EMSS 2022). , 022 . DOI: https://doi.org/10.46354/i3m.2022.emss.022

Download PDF

Abstract

Fitting real world data into a model for classification, is a challenging task. Modern approaches to classification are often resource intensive and may become bottlenecks. A microservice architecture that allows maintaining a model of real world data, and adding new information as it becomes available is presented in this paper. Updates to the model are handled via different microservices. The architecture and connected workflows are demonstrated in a use case of classifying text data in a taxonomy represented by a directed acyclic graph (DAG). The presented architecture removes the classification bottleneck, as multiple data points can be added independent of each other, and reading access to the model is not restricted. Additional microservices also enable a manual intervention to update the model.

Text Classification | Distributed Environment | Microservice Architecture | Semi-Supervised Learning

References

Cheatham, M. and Hitzler, P. (2013). String similarity metrics for ontology alignment. In International semantic
web conference, pages 294–309. Springer
Darshna, P. (2018). Music recommendation based on content and collaborative approach & reducing cold start
problem. In 2018 2nd International Conference on Inventive Systems and Control (ICISC), pages 1033–1037. IEEE.
Klein, B. D. (2001). User perceptions of data quality: Internet and traditional text sources. Journal of computer
information systems, 41(4):9–15.
Lu, J., Liu, A., Dong, F., Gu, F., Gama, J., and Zhang, G. (2018). Learning under concept drift: A review. IEEE Transactions on Knowledge and Data Engineering, 31(12):2346–2363.
Nedelkoski, S., Cardoso, J., and Kao, O. (2019). Anomaly detection and classification using distributed tracing and
deep learning. In 2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID),
pages 241–250.
Petz, G., Karpowicz, M., Fürschuß, H., Auinger, A., Winkler, S. M., Schaller, S., and Holzinger, A. (2012). On
text preprocessing for opinion mining outside of laboratory environments. In International Conference on Active
Media Technology, pages 618–629. Springer.
Salza, P., Hemberg, E., Ferrucci, F., and O’Reilly, U.-M. (2017). Ccube: A cloud microservices architecture for evolutionary machine learning classification. In Proceedings of the Genetic and Evolutionary Computation
Conference Companion, GECCO ’17, page 137–138, New York, NY, USA. Association for Computing Machinery.
Sandler, S. (2021). Classification of Restaurant Articles into a Taxonomy. Technical report.
Scardapane, S., Fierimonte, R., Di Lorenzo, P., Panella, M., and Uncini, A. (2016). Distributed semi-supervised support vector machines. Neural Networks, 80:43–52.
Silva, C., Lotric, U., Ribeiro, B., and Dobnikar, A. (2010). Distributed text classification with an ensemble kernelbased learning approach. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 40(3):287–297
Song, W. and Park, S. C. (2009). Genetic algorithm for text clustering based on latent semantic indexing. Computers & Mathematics with Applications, 57(11-12):1901– 1907.
Ur-Rahman, N. and Harding, J. A. (2012). Textual data mining for industrial knowledge management and text classification: A business oriented approach. Expert Systems with Applications, 39(5):4729–4739.
VMware. Rabbitmq. https://www.rabbitmq.com/, accessed on, 13 May 2022.
Yujian, L. and Bo, L. (2007). A normalized levenshtein distance metric. IEEE transactions on pattern analysis
and machine intelligence, 29(6):1091–1095.
Zhang, D. and Lee, W. S. (2006). Extracting key-substringgroup features for text classification. In Proceedings of
the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 474–483

Volume Details

Volume Title

Proceedings of the 34th European Modeling & Simulation Symposium (EMSS 2022)

Conference Location and Date

Rome, Italy

September 19-21, 2022

Conference ISSN

2724-0029

Volume ISBN

978-88-85741-72-0

Volume Editors

Michael Affenzeller

Upper Austria University of Applied Sciences, Austria

Agostino G. Bruzzone

MITIM-DIME, University of Genoa, Italy

Emilio Jimenez

University of La Rioja, Spain

Francesco Longo

University of Calabria, Italy

Antonella Petrillo

Parthenope University of Naples, Italy

EMSS 2022 Board