tutorial

End-to-end Machine Learning using Kubeflow

Authors:
Johnu George

Nutanix, IN

Nutanix, IN
View Profile

,
Amit Saha

Cisco Systems, IN

Cisco Systems, IN
View Profile

CODS-COMAD '22: Proceedings of the 5th Joint International Conference on Data Science & Management of Data (9th ACM IKDD CODS and 27th COMAD)January 2022Pages 336–338https://doi.org/10.1145/3493700.3493768

Published:08 January 2022Publication History

CODS-COMAD '22: Proceedings of the 5th Joint International Conference on Data Science & Management of Data (9th ACM IKDD CODS and 27th COMAD)

Pages 336–338

ABSTRACT

Usually data scientists are adept in deriving valuable insights from data by applying appropriate machine learning algorithms. However, data scientists are usually not skilled in developing or operating production level software which is the domain of ML Operators. In order to move from initial experiments to production grade systems, the code needs to run at scale on large, realistic data sets, and be able to run on both on-premise equipment as well as on public clouds. Additionally, the entire process needs to be part of some Software Development Lifecycle (SDLC), accounting for some flavour of continuous integration/continuous development (CICD).

In this tutorial, attendees will learn about the components of an end-to-end ML system, and will get hands-on experience on model training, hyperparameter tuning, and model deployment. The tutorial will be based on Kubeflow, a widely used open-source (Apache License 2.0) machine learning toolkit for Kubernetes. The related code and examples can be accessed from a public github repository.

References

[1] Amazon Elastic Kubernetes Service 2021. https://aws.amazon.com/eks/Google Scholar
[2] Azure Kubernetes Service 2021. https://azure.microsoft.com/en-us/services/kubernetes-service/Google Scholar
[3] Cisco Kubeflow Starter Pack 2020. https://github.com/CiscoAI/cisco-kubeflow-starter-packGoogle Scholar
[4] Cloud Native Computing Foundation 2021. https://www.cncf.io/Google Scholar
Johnu George, Ce Gao, Richard Liu, Hou Gang Liu, Yuan Tang, Ramdoot Pydipaty, and Amit Kumar Saha. 2020. A Scalable and Cloud-Native Hyperparameter Tuning System. CoRR abs/2006.02085(2020). arxiv:2006.02085https://arxiv.org/abs/2006.02085Google Scholar
[6] Google Kubernetes Engine 2021. https://cloud.google.com/kubernetes-engine/Google Scholar
[7] Help! My Data Scientists Can’t Write (Production) Code 2019. https://insidebigdata.com/2019/08/13/help-my-data-scientists-cant-write-production-code/Google Scholar
[8] Introduction to Katib 2021. https://www.kubeflow.org/docs/components/katib/overview/Google Scholar
Kubeflow 2021. The Machine Learning Toolkit for Kubernetes. https://www.kubeflow.org/Google Scholar
Kubeflow Webinar 2020. Taming your AI/ML workloads with Kubeflow – The journey to Version 1.0. https://www.cncf.io/online-programs/taming-your-ai-ml-workloads-with-kubeflow-the-journey-to-version-1-0/Google Scholar
[11] Kubernetes: Production-Grade Container Orchestration 2021. https://kubernetes.io/Google Scholar
Meraki Vision 2021. Cloud Managed Smart Cameras, Cisco Meraki. https://meraki.cisco.com/products/smart-cameras/Google Scholar
[13] MXNet: A Scalable Deep Learning Framework 2021. https://mxnet.apache.org/Google Scholar
[14] PyTorch: a deep learning framework for fast, flexible experimentation 2021. https://pytorch.org/Google Scholar
[15] Scikit-learn: Machine Learning in Python 2021. https://scikit-learn.orgGoogle Scholar
D. Sculley, Gary Holt, Daniel Golovin, Eugene Davydov, Todd Phillips, Dietmar Ebner, Vinay Chaudhary, Michael Young, Jean-Francois Crespo, and Dan Dennison. 2015. Hidden Technical Debt in Machine Learning Systems. In Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 2 (Montreal, Canada) (NIPS’15). MIT Press, Cambridge, MA, USA, 2503–2511. http://dl.acm.org/citation.cfm?id=2969442.2969519Google ScholarDigital Library
[17] Tensor Processing Unit 2021. https://cloud.google.com/tpu/docs/tpusGoogle Scholar
[18] TensorFlow: An open source machine learning framework for everyone 2021. https://www.tensorflow.org/Google Scholar
Jinan Zhou, Andrey Velichkevich, Kirill Prosvirov, Anubhav Garg, Yuji Oshima, and Debo Dutta. 2019. Katib: A Distributed General AutoML Platform on Kubernetes. In 2019 USENIX Conference on Operational Machine Learning (OpML 19). USENIX Association, Santa Clara, CA, 55–57. https://www.usenix.org/conference/opml19/presentation/zhouGoogle Scholar

Index Terms

End-to-end Machine Learning using Kubeflow
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
2. Software and its engineering

Index terms have been assigned to the content through auto-classification.

Recommendations

Machine Learning: The State of the Art

The two fundamental problems in machine learning (ML) are statistical analysis and algorithm design. The former tells us the principles of the mathematical models that we establish from the observation data. The latter defines the conditions on which ...
Read More
Machine Learning in Adversarial Game Using Flight Chess
MINES '11: Proceedings of the 2011 Third International Conference on Multimedia Information Networking and Security

Game playing is a perfect domain of the study of machine learning for its simplicity that allows the researchers to focus on the learning problems themselves and ignore marginal factors. Many learning techniques derived from games have been applied ...
Read More
ORACLE: End-to-End Model Based Reinforcement Learning
Artificial Intelligence XXXVIII
Abstract
Reinforcement Learning (RL) algorithms seek to maximize some notion of reward. There are two categories of RL agents, model-based or model-free agents. In the case of model-free learning, the algorithm learns through trial and error in the target ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CODS-COMAD '22: Proceedings of the 5th Joint International Conference on Data Science & Management of Data (9th ACM IKDD CODS and 27th COMAD)
January 2022
357 pages
ISBN:9781450385824
DOI:10.1145/3493700
Editors:
Gargi Dasgupta,
Yogesh Simmhan,
Balaji Vasan Srinivasan,
Sourav Bhowmick,
Amith Singhee,
Maya Ramanath,
Nipun Batra,
Abhinandan SP
Copyright © 2022 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 8 January 2022
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Kubeflow
ML Operations
MLOps
Machine Learning
Qualifiers
- tutorial
- Research
- Refereed limited
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 5
  Total Citations
  View Citations
- 435
  Total Downloads
- Downloads (Last 12 months)143
- Downloads (Last 6 weeks)21
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

End-to-end Machine Learning using Kubeflow

CODS-COMAD '22: Proceedings of the 5th Joint International Conference on Data Science & Management of Data (9th ACM IKDD CODS and 27th COMAD)

ABSTRACT

References

Cited By

Index Terms

Recommendations

Machine Learning: The State of the Art

Machine Learning in Adversarial Game Using Flight Chess

ORACLE: End-to-End Model Based Reinforcement Learning