On Demand
Handling Large Datasets in Data Preparation & ML Training Using MLOps
In this technical training session, we’ll explore how to use Dask, Kubernetes, and MLRun to scale data preparation and training with maximum performance.
Dask is an open-source library for parallel computing written in Python, which can be used in conjunction with open source MLOps orchestration tool MLRun over Kubernetes to handle large-scale datasets.
In this session, we will provide a demonstration of how to use these tools to scale your data prep and ML training with ease.
Attend this session to explore:
- An overview of the tools available for large-scale data processing in Python (PySpark, Dask, Vaex, and more), and how they are used with existing ML frameworks
- Understanding Dask and how to use the same native Python code at scale, without the need to learn other technologies like Spark
- How to run Dask in a distributed and elastic way over Kubernetes to improve resource utilization
- How to deploy Dask-based data engineering and ML pipelines with MLRun and Kubeflow, in one click
- Further optimizations for handling large-scale data effectively and efficiently
The MLOps Live Webinar Series is a collection of bi-weekly online events during which data science leaders explore the elements of managing and automating machine learning pipelines to bring data science into real business applications. The sessions go beyond theory, with industry leaders sharing challenges and practical solutions.
Presented By
Yaron Haviv,
Co-Founder and CTO, Iguazio
Idan Benaun,
Data Scientist, Iguazio