Distributed Classification - A Scalable Approach to Semi Supervised Machine Learning

  • Rainer Meindl ,
  • Simone Sandler,
  • Elisabeth Mayrhuber,
  • Oliver Krauss
  • a,b,c,d  Research Group Advanced Information Systems and Technology, Research and Development Department, University of
    Applied Sciences Upper Austria
Cite as
Meindl R., Sandler S., Mayrhuber E., and Krauss O. (2022).,Distributed Classification - A Scalable Approach to Semi Supervised Machine Learning. Proceedings of the 34th European Modeling & Simulation Symposium (EMSS 2022). , 022 . DOI: https://doi.org/10.46354/i3m.2022.emss.022

Abstract

Fitting real world data into a model for classification, is a challenging task. Modern approaches to classification are often resource intensive and may become bottlenecks. A microservice architecture that allows maintaining a model of real world data, and adding new information as it becomes available is presented in this paper. Updates to the model are handled via different microservices. The architecture and connected workflows are demonstrated in a use case of classifying text data in a taxonomy represented by a directed acyclic graph (DAG). The presented architecture removes the classification bottleneck, as multiple data points can be added independent of each other, and reading access to the model is not restricted. Additional microservices also enable a manual intervention to update the model.

References

  1. Cheatham, M. and Hitzler, P. (2013). String similarity metrics for ontology alignment. In International semantic
    web conference, pages 294–309. Springer
  2. Darshna, P. (2018). Music recommendation based on content and collaborative approach & reducing cold start
    problem. In 2018 2nd International Conference on Inventive Systems and Control (ICISC), pages 1033–1037. IEEE.
  3. Klein, B. D. (2001). User perceptions of data quality: Internet and traditional text sources. Journal of computer
    information systems, 41(4):9–15.
  4. Lu, J., Liu, A., Dong, F., Gu, F., Gama, J., and Zhang, G. (2018). Learning under concept drift: A review. IEEE Transactions on Knowledge and Data Engineering, 31(12):2346–2363.
  5. Nedelkoski, S., Cardoso, J., and Kao, O. (2019). Anomaly detection and classification using distributed tracing and
    deep learning. In 2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID),
    pages 241–250.
  6. Petz, G., Karpowicz, M., Fürschuß, H., Auinger, A., Winkler, S. M., Schaller, S., and Holzinger, A. (2012). On
    text preprocessing for opinion mining outside of laboratory environments. In International Conference on Active
    Media Technology, pages 618–629. Springer.
  7. Salza, P., Hemberg, E., Ferrucci, F., and O’Reilly, U.-M. (2017). Ccube: A cloud microservices architecture for evolutionary machine learning classification. In Proceedings of the Genetic and Evolutionary Computation
    Conference Companion, GECCO ’17, page 137–138, New York, NY, USA. Association for Computing Machinery.
  8. Sandler, S. (2021). Classification of Restaurant Articles into a Taxonomy. Technical report.
  9. Scardapane, S., Fierimonte, R., Di Lorenzo, P., Panella, M., and Uncini, A. (2016). Distributed semi-supervised support vector machines. Neural Networks, 80:43–52.
  10. Silva, C., Lotric, U., Ribeiro, B., and Dobnikar, A. (2010). Distributed text classification with an ensemble kernelbased learning approach. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 40(3):287–297
  11. Song, W. and Park, S. C. (2009). Genetic algorithm for text clustering based on latent semantic indexing. Computers & Mathematics with Applications, 57(11-12):1901– 1907.
  12. Ur-Rahman, N. and Harding, J. A. (2012). Textual data mining for industrial knowledge management and text classification: A business oriented approach. Expert Systems with Applications, 39(5):4729–4739.
  13. VMware. Rabbitmq. https://www.rabbitmq.com/, accessed on, 13 May 2022.
  14. Yujian, L. and Bo, L. (2007). A normalized levenshtein distance metric. IEEE transactions on pattern analysis
    and machine intelligence, 29(6):1091–1095.
  15. Zhang, D. and Lee, W. S. (2006). Extracting key-substringgroup features for text classification. In Proceedings of
    the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 474–483