Mahdi Askari - Field Engineering A/NZ, Databricks Title: Making Apache Spark Better with Delta Lake Abstract: Apache Spark™ is the dominant processing framework for big data. Delta Lake adds reliability to Spark so your analytics and machine learning initiatives have ready access to quality, reliable data. This webinar covers the use of Delta Lake to enhance data reliability for Spark environments.
Join us for a fun data science event focusing on the open-source data science technologies: MLflow and Koalas. While there will be slides, this will be a demo-heavy meetup! MLflow: Managing the Machine Learning Lifecycle In this session, we will discuss the Machine Learning lifecycle and the challenges associated with it. The fundamental problem is that data and ML - whether the people or the technology - are often siloed from each other. In these silos, it becomes next to impossible for data practitioners to standardize their ML lifecycle from the preparation of data, building the model, to deploying the model. With MLflow, you and your teams can breakdown these walls and ensure that build, reproduce, and repeat your ML pipelines. Koalas: Unifying Spark and pandas API Pandas is very popular for data manipulation and analysis in Python. It is deeply integrated within the Python data science ecosystem (think sklearn, numpy, matplotlib, etc.). It can easily handle many situations but it cannot easily scale beyond a single node. With Koalas, you get the distributed power of Apache Spark using the familiar (and powerful) pandas API. This allows data scientists to seamlessly transition from small data to large data. Agenda: 6:00pm-6:15pm: Welcome 6:15pm-7:00pm: MLflow: Managing the Machine Learning Lifecycle 7:00pm-7:45pm: Koalas: Unifying Spark and pandas API 7:45pm-8:00pm: Q&A and Wrap up. Speakers: Denny Lee is a Developer Advocate at Databricks. He is a hands-on distributed systems and data sciences engineer with extensive experience developing internet-scale infrastructure, data platforms, and predictive analytics systems for both on-premise and cloud environments. He also has a Masters of Biomedical Informatics from Oregon Health and Sciences University and has architected and implemented powerful data solutions for enterprise Healthcare customers. His current technical focuses include Distributed Systems, Apache Spark, Deep Learning, Machine Learning, and Genomics.
Please join Attunix, Microsoft, and Databricks for this half-day immersive experience into Databricks. We’ll cover best practices for enterprises to use powerful open source technologies to simplify and scale your ML efforts. We’ll discuss how to leverage Apache Spark™, the de-facto data processing and analytics engine in enterprises today, for data preparation as it unifies data at massive scale across various sources. The event will run from 8:30AM - 12:30PM.
Join the Databricks team for a hands-on morning session dedicated to Delta Lake. During this event, you will learn: Gain an understanding of the Delta Lake open source project Learn how to build highly scalable and reliable data pipelines using Delta Lake See Delta Lake in action with a demo and hands-on code walkthrough Ask Databricks experts your most challenging data questions Network and learn from your data engineering and data science peers Register now!
Every enterprise today wants to accelerate innovation by building AI into their business. However, most companies struggle with preparing large datasets for analytics, managing the proliferation of ML frameworks, and moving models in development to production. In this workshop, we'll cover best practices for enterprises to use powerful open source technologies to simplify and scale your ML efforts. The event will run from 8:30AM - 12:30PM.
Join this hands-on lab to learn how Delta Lake can help you build robust production data pipelines at scale. Delta Lake is an open source storage layer that brings reliability to data lakes. It has numerous reliability features including ACID transactions, scalable metadata handling, and unified streaming and batch data processing. Delta Lake runs on top of your existing data lake, such as on Azure Data Lake Storage, AWS S3, Hadoop HDFS, or on-premise, and is fully compatible with Apache Spark APIs.
Join this half-day workshop to learn how unified analytics can bring data science and engineering together to accelerate your ML efforts. In this workshop, we’ll cover best practices for enterprises to use powerful open source technologies to simplify and scale your ML efforts. We’ll discuss how to leverage Apache Spark™, the de-facto data processing and analytics engine in enterprises today, for data preparation as it unifies data at massive scale across various sources.
Join BlueGranite, Microsoft, and Databricks for a 1-day immersive experience into Azure Databricks. This free, all-day session will provide attendees with a strong understanding of the Azure Databricks platform and hands-on experience in a live notebook environment. The event will run from 8:30AM - 3:30PM.
Join the London Women in Machine Learning & Data Science Meet-up group for an evening dedicated to Apache Spark. 6.30 - 7.00 pm - Drinks & Networking 7.00 - 7.30 pm - Machine learning in Finance - Maša Vujović 7.30 -7.45 pm - Break/Q&A 7.45 - 8.30 pm - Lessons from the field - Databricks - Holly Smith Register now!
Join us in our live webinar to see how real-world AI use cases are transforming businesses through a unified approach to big data and analytics.