Watch On-Demand

The MLOps Live Webinar Series

Session #11

Watch On-Demand

On Demand

Handling Large Datasets in Data Preparation & ML Training Using MLOps

In this technical training session, we’ll explore how to use Dask, Kubernetes, and MLRun to scale data preparation and training with maximum performance.

Dask is an open-source library for parallel computing written in Python, which can be used in conjunction with open source MLOps orchestration tool MLRun over Kubernetes to handle large-scale datasets.

In this session, we will provide a demonstration of how to use these tools to scale your data prep and ML training with ease.

Attend this session to explore:

An overview of the tools available for large-scale data processing in Python (PySpark, Dask, Vaex, and more), and how they are used with existing ML frameworks
Understanding Dask and how to use the same native Python code at scale, without the need to learn other technologies like Spark
How to run Dask in a distributed and elastic way over Kubernetes to improve resource utilization
How to deploy Dask-based data engineering and ML pipelines with MLRun and Kubeflow, in one click
Further optimizations for handling large-scale data effectively and efficiently

The MLOps Live Webinar Series is a collection of bi-weekly online events during which data science leaders explore the elements of managing and automating machine learning pipelines to bring data science into real business applications. The sessions go beyond theory, with industry leaders sharing challenges and practical solutions.

Watch On-Demand

The MLOps Live Webinar Series

Session #11

Watch On-Demand

On Demand

Handling Large Datasets in Data Preparation & ML Training Using MLOps

Presented By

Yaron Haviv,

Idan Benaun,