skip to main content
10.1145/3493700.3493768acmconferencesArticle/Chapter ViewAbstractPublication PagescomadConference Proceedingsconference-collections
tutorial

End-to-end Machine Learning using Kubeflow

Published:08 January 2022Publication History

ABSTRACT

Usually data scientists are adept in deriving valuable insights from data by applying appropriate machine learning algorithms. However, data scientists are usually not skilled in developing or operating production level software which is the domain of ML Operators. In order to move from initial experiments to production grade systems, the code needs to run at scale on large, realistic data sets, and be able to run on both on-premise equipment as well as on public clouds. Additionally, the entire process needs to be part of some Software Development Lifecycle (SDLC), accounting for some flavour of continuous integration/continuous development (CICD).

In this tutorial, attendees will learn about the components of an end-to-end ML system, and will get hands-on experience on model training, hyperparameter tuning, and model deployment. The tutorial will be based on Kubeflow, a widely used open-source (Apache License 2.0) machine learning toolkit for Kubernetes. The related code and examples can be accessed from a public github repository.

References

  1. [1] Amazon Elastic Kubernetes Service 2021. https://aws.amazon.com/eks/Google ScholarGoogle Scholar
  2. [2] Azure Kubernetes Service 2021. https://azure.microsoft.com/en-us/services/kubernetes-service/Google ScholarGoogle Scholar
  3. [3] Cisco Kubeflow Starter Pack 2020. https://github.com/CiscoAI/cisco-kubeflow-starter-packGoogle ScholarGoogle Scholar
  4. [4] Cloud Native Computing Foundation 2021. https://www.cncf.io/Google ScholarGoogle Scholar
  5. Johnu George, Ce Gao, Richard Liu, Hou Gang Liu, Yuan Tang, Ramdoot Pydipaty, and Amit Kumar Saha. 2020. A Scalable and Cloud-Native Hyperparameter Tuning System. CoRR abs/2006.02085(2020). arxiv:2006.02085https://arxiv.org/abs/2006.02085Google ScholarGoogle Scholar
  6. [6] Google Kubernetes Engine 2021. https://cloud.google.com/kubernetes-engine/Google ScholarGoogle Scholar
  7. [7] Help! My Data Scientists Can’t Write (Production) Code 2019. https://insidebigdata.com/2019/08/13/help-my-data-scientists-cant-write-production-code/Google ScholarGoogle Scholar
  8. [8] Introduction to Katib 2021. https://www.kubeflow.org/docs/components/katib/overview/Google ScholarGoogle Scholar
  9. Kubeflow 2021. The Machine Learning Toolkit for Kubernetes. https://www.kubeflow.org/Google ScholarGoogle Scholar
  10. Kubeflow Webinar 2020. Taming your AI/ML workloads with Kubeflow – The journey to Version 1.0. https://www.cncf.io/online-programs/taming-your-ai-ml-workloads-with-kubeflow-the-journey-to-version-1-0/Google ScholarGoogle Scholar
  11. [11] Kubernetes: Production-Grade Container Orchestration 2021. https://kubernetes.io/Google ScholarGoogle Scholar
  12. Meraki Vision 2021. Cloud Managed Smart Cameras, Cisco Meraki. https://meraki.cisco.com/products/smart-cameras/Google ScholarGoogle Scholar
  13. [13] MXNet: A Scalable Deep Learning Framework 2021. https://mxnet.apache.org/Google ScholarGoogle Scholar
  14. [14] PyTorch: a deep learning framework for fast, flexible experimentation 2021. https://pytorch.org/Google ScholarGoogle Scholar
  15. [15] Scikit-learn: Machine Learning in Python 2021. https://scikit-learn.orgGoogle ScholarGoogle Scholar
  16. D. Sculley, Gary Holt, Daniel Golovin, Eugene Davydov, Todd Phillips, Dietmar Ebner, Vinay Chaudhary, Michael Young, Jean-Francois Crespo, and Dan Dennison. 2015. Hidden Technical Debt in Machine Learning Systems. In Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 2 (Montreal, Canada) (NIPS’15). MIT Press, Cambridge, MA, USA, 2503–2511. http://dl.acm.org/citation.cfm?id=2969442.2969519Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. [17] Tensor Processing Unit 2021. https://cloud.google.com/tpu/docs/tpusGoogle ScholarGoogle Scholar
  18. [18] TensorFlow: An open source machine learning framework for everyone 2021. https://www.tensorflow.org/Google ScholarGoogle Scholar
  19. Jinan Zhou, Andrey Velichkevich, Kirill Prosvirov, Anubhav Garg, Yuji Oshima, and Debo Dutta. 2019. Katib: A Distributed General AutoML Platform on Kubernetes. In 2019 USENIX Conference on Operational Machine Learning (OpML 19). USENIX Association, Santa Clara, CA, 55–57. https://www.usenix.org/conference/opml19/presentation/zhouGoogle ScholarGoogle Scholar

Index Terms

  1. End-to-end Machine Learning using Kubeflow
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        CODS-COMAD '22: Proceedings of the 5th Joint International Conference on Data Science & Management of Data (9th ACM IKDD CODS and 27th COMAD)
        January 2022
        357 pages

        Copyright © 2022 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 8 January 2022

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • tutorial
        • Research
        • Refereed limited

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format .

      View HTML Format