ABSTRACT
Usually data scientists are adept in deriving valuable insights from data by applying appropriate machine learning algorithms. However, data scientists are usually not skilled in developing or operating production level software which is the domain of ML Operators. In order to move from initial experiments to production grade systems, the code needs to run at scale on large, realistic data sets, and be able to run on both on-premise equipment as well as on public clouds. Additionally, the entire process needs to be part of some Software Development Lifecycle (SDLC), accounting for some flavour of continuous integration/continuous development (CICD).
In this tutorial, attendees will learn about the components of an end-to-end ML system, and will get hands-on experience on model training, hyperparameter tuning, and model deployment. The tutorial will be based on Kubeflow, a widely used open-source (Apache License 2.0) machine learning toolkit for Kubernetes. The related code and examples can be accessed from a public github repository.
- [1] Amazon Elastic Kubernetes Service 2021. https://aws.amazon.com/eks/Google Scholar
- [2] Azure Kubernetes Service 2021. https://azure.microsoft.com/en-us/services/kubernetes-service/Google Scholar
- [3] Cisco Kubeflow Starter Pack 2020. https://github.com/CiscoAI/cisco-kubeflow-starter-packGoogle Scholar
- [4] Cloud Native Computing Foundation 2021. https://www.cncf.io/Google Scholar
- Johnu George, Ce Gao, Richard Liu, Hou Gang Liu, Yuan Tang, Ramdoot Pydipaty, and Amit Kumar Saha. 2020. A Scalable and Cloud-Native Hyperparameter Tuning System. CoRR abs/2006.02085(2020). arxiv:2006.02085https://arxiv.org/abs/2006.02085Google Scholar
- [6] Google Kubernetes Engine 2021. https://cloud.google.com/kubernetes-engine/Google Scholar
- [7] Help! My Data Scientists Can’t Write (Production) Code 2019. https://insidebigdata.com/2019/08/13/help-my-data-scientists-cant-write-production-code/Google Scholar
- [8] Introduction to Katib 2021. https://www.kubeflow.org/docs/components/katib/overview/Google Scholar
- Kubeflow 2021. The Machine Learning Toolkit for Kubernetes. https://www.kubeflow.org/Google Scholar
- Kubeflow Webinar 2020. Taming your AI/ML workloads with Kubeflow – The journey to Version 1.0. https://www.cncf.io/online-programs/taming-your-ai-ml-workloads-with-kubeflow-the-journey-to-version-1-0/Google Scholar
- [11] Kubernetes: Production-Grade Container Orchestration 2021. https://kubernetes.io/Google Scholar
- Meraki Vision 2021. Cloud Managed Smart Cameras, Cisco Meraki. https://meraki.cisco.com/products/smart-cameras/Google Scholar
- [13] MXNet: A Scalable Deep Learning Framework 2021. https://mxnet.apache.org/Google Scholar
- [14] PyTorch: a deep learning framework for fast, flexible experimentation 2021. https://pytorch.org/Google Scholar
- [15] Scikit-learn: Machine Learning in Python 2021. https://scikit-learn.orgGoogle Scholar
- D. Sculley, Gary Holt, Daniel Golovin, Eugene Davydov, Todd Phillips, Dietmar Ebner, Vinay Chaudhary, Michael Young, Jean-Francois Crespo, and Dan Dennison. 2015. Hidden Technical Debt in Machine Learning Systems. In Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 2 (Montreal, Canada) (NIPS’15). MIT Press, Cambridge, MA, USA, 2503–2511. http://dl.acm.org/citation.cfm?id=2969442.2969519Google ScholarDigital Library
- [17] Tensor Processing Unit 2021. https://cloud.google.com/tpu/docs/tpusGoogle Scholar
- [18] TensorFlow: An open source machine learning framework for everyone 2021. https://www.tensorflow.org/Google Scholar
- Jinan Zhou, Andrey Velichkevich, Kirill Prosvirov, Anubhav Garg, Yuji Oshima, and Debo Dutta. 2019. Katib: A Distributed General AutoML Platform on Kubernetes. In 2019 USENIX Conference on Operational Machine Learning (OpML 19). USENIX Association, Santa Clara, CA, 55–57. https://www.usenix.org/conference/opml19/presentation/zhouGoogle Scholar
Index Terms
- End-to-end Machine Learning using Kubeflow
Recommendations
Machine Learning: The State of the Art
The two fundamental problems in machine learning (ML) are statistical analysis and algorithm design. The former tells us the principles of the mathematical models that we establish from the observation data. The latter defines the conditions on which ...
Machine Learning in Adversarial Game Using Flight Chess
MINES '11: Proceedings of the 2011 Third International Conference on Multimedia Information Networking and SecurityGame playing is a perfect domain of the study of machine learning for its simplicity that allows the researchers to focus on the learning problems themselves and ignore marginal factors. Many learning techniques derived from games have been applied ...
ORACLE: End-to-End Model Based Reinforcement Learning
Artificial Intelligence XXXVIIIAbstractReinforcement Learning (RL) algorithms seek to maximize some notion of reward. There are two categories of RL agents, model-based or model-free agents. In the case of model-free learning, the algorithm learns through trial and error in the target ...
Comments