This is a guest co-authored post. We thank Igor Alekseev, partner solution architect at AWS, for his contributions.
Data + AI Summit: Register now to join this free virtual event May 24-28 and learn from the global data community.
Amazon Web Services (AWS) is a Platinum Sponsor of Data + AI Summit 2021, one of the largest events in the industry. Join this event and learn from joint Databricks and AWS customers like Disney+, Capital One, Takeda and Comcast that have successfully leveraged the Databricks Lakehouse Platform for their business, bringing together data, AI and analytics on one common platform.
At Data + AI Summit, Databricks and AWS are center stage in a number of keynote talks. Attendees will have the opportunity to hear a candid discussion from Databricks CEO Ali Ghodsi and AWS Senior Vice President Matt Garman. Core AWS enterprise customers will aso take the keynote stage, including date leaders from Atlassian on Day 1 and McDonald’s on Day 2.
The sessions below are a guide for everyone interested in Databricks on AWS and span a range of topics — from building recommendation engines to fraud detection to tracking patient interactions. If you have questions about Databricks on AWS or service integrations, visit the AWS booth at Data + AI Summit. In the meantime, you can learn more about how Databricks operates on AWS here.
Dream of getting the low cost of a data lake but the performance of a data warehouse? Welcome to the Lakehouse. In this session, learn how to build a Lakehouse on your AWS cloud platform using Amazon S3 and Delta Lake. You’ll also explore how companies have created an affordable and high performance Lakehouse to drive all their analytics efforts.
Introduced in November 2019, Disney+ has grown to over 100 million users, andthe analytics platform behind that growth is Databricks on AWS. Discover how Disney+ rapidly scaled to provide a personalized and seamless experience to its customers. This experience is powered by a robust data platform that ingests, processes and surfaces billions of events per hour using Delta Lake, Databricks and AWS technologies.
Capital One: Credit Card Fraud Detection using ML in Databricks
Illegitimate credit card usage is a serious problem that can significantly impact all organizations – especially financial services – and results in a need to accurately detect fraudulent transactions vs non-fraudulent transactions. Despite regular fraud prevention measures, these are constantly being put to the test by malicious actors in an attempt to beat the system. In order to more dynamically detect fraudulent transactions, one can train ML models on a set of datasets, including credit card transaction information as well as card and demographic information of the owner of the account. Learn how Capital One is building this use case by leveraging Databricks.
See firsthand how Comcast RDK is providing the backbone of telemetry to the industry. The RDK team at Comcast analyzes petabytes of data, collected every 15 minutes from 70 million devices (video and broadband and IoT devices) installed in customer homes. SQL Analytics on the Databricks platform allows customers to operate a lakehouse architecture that provides data warehousing performance at data lake economics for up to 4x better price/performance for SQL workloads than traditional cloud data warehouses.
Discover the results of the “Test and Learn” initiative with SQL Analytics and the Delta Engine in partnership with the Databricks team. A quick demo will introduce the SQL native interface and the challenges with migration, the results of the execution and the journey of productionizing this at scale.
Northwestern Mutual: Northwestern Mutual Journey – Transform BI Space to Cloud
In this session, explore howNorthwestern Mutual leverages data driven decision making to improve both efficiency and effectiveness in its business. . As a financial company, data security is as important as the ingestion of data. In addition to fast ingestion and compute, Northwestern Mutual needed a solution to support column level encryption as well as role-based access to their data lake from many diverse teams. Learn how the data team moved hundreds of ELT jobs from an MSBI (Microsoft Business Intelligence) stack to Databricks and built a Lakehouse, resulting in massive time savings.
Business leads, executives, analysts and data scientists rely on up-to-date information to make business decisions, adjust to the market, meet the needs of their customers and run effective supply chain operations. In this session, learn how Asurion used Databricks on AWS, including Delta Lake, Structured Streaming, AutoLoader and SQL Analytics, to improve production data latency from day-minus-one to near real time. Asurion’s technical team will share battle tested tips and tricks you only get with a certain scale. The Asurion data lake executes 4,000+ streaming jobs and hosts over 4,000 tables in a production data lake on AWS.
Takeda’s Plasma Derived Therapies (PDT) business unit recently embarked on a project to use Spark Streaming on Databricks to empower how they deliver value to their Plasma Donation centers. As patients come in and interface without clinics, Takeda stores and tracks all patient interactions in real time and delivers outputs and results based on their interactions. The entire process is integrated with AWS Glue as the metadata provider. Using Spark Streaming will enable Takeda to replace their existing ETL processes based on Lambdas, step functions and triggered jobs, with a purely stream driven architecture.
Western Governors University: 10 Things Learned Releasing Databricks Enterprise Wide
Western Governors University (WGU) embarked on rewriting all of their ETL pipelines in Scala/Python, as well as migrating their enterprise data warehouse into Delta Lake – all on the Databricks platform. Starting with 4 users and rapidly growing to over 120 users across 8 business units, their Databricks environment turned into an entire unified platform used by individuals of all skill levels, data requirements and internal security requirements.
This session will dive into user management from both an AWS and Databricks perspective, understanding and managing costs, creating custom pipelines for efficient code management and utilizing new Apache Spark snippets that drove massive savings.
The AWS Booth
Visit the AWS booth to see demos and take part in discussions regarding running Databricks on AWS. There will be three lighting talks in the AWS booth:
- Quickstart 5/26 12:30 PM PDT
- Delta Lake 5/27, 2:00 PM PDT
- PrivateLink, Public Preview 5/28 11:00 AM PDT
Come take part in these discussions to learn best practices on running Databricks on AWS.
Register now to join this free virtual event and join the data community. Learn how companies are successfully building their lakehouse architecture with Databricks on AWS to create a simple, open and collaborative data platform.