Databricks was proud to be a Platinum sponsor at re:Invent. The past year has been an exciting one for our partnership with AWS, as we built new integrations and deepened existing ones with so many AWS services. re:Invent was a great opportunity to showcase how our joint customers have benefitted from those integrations and we wanted to share a recap of what was highlighted at the conference!
Session: Building Reliable Data Lakes for Analytics with Delta Lake
In this session, Michael Armbrust, the creator of Delta Lake, walked through the evolution of Delta Lake. He showed how customers using data lakes for analytics often build complex architectures that require many validation steps. Then he showed how Delta Lake takes care of that and also introduces a 3 step process to refine data and make it ready for analytics. Kyle Burke, head of data platform at Kabbage Inc. walked through how Kabbage has been using Delta Lake and some of their key metrics of improvement. Kabbage has migrated from an on-premises Hadoop-based architecture to a Spark-based cloud architecture using AWS and Databricks. They expect to see a 30-50% savings in costs due to this migration, but Kyle also pointed out their system is much more flexible and provides much more capability. Their new Delta Lake architecture enables them to handle all their streaming data as well as their batch data, and to deliver data for data science and BI reporting from one source of data, removing data discrepancies.
Announcement: Databricks Achieves Retail Competency
Databricks was awarded the AWS Retail Competency in recognition of our solutions and number of joint customers in Retail. This adds to the list of competencies Databricks already holds in Machine Learning, Data and Analytics, Life Sciences and Public Sector. Learn more about how retail customers combine Databricks and AWZ our our retail solutions page.
Delta Lake and Athena, Glue and Redshift
Delta Lake is an open source tool that customers are using to build powerful datalakes with Amazon’s S3 service. Databricks includes Managed Delta Lake in our Unified Data Analytics Platform to provide schema enforcement, ACID transactions and a time travel features that enables you to roll back data sets at any time. Using AWS Glue as a data catalog, Delta Lake tables can be registered for access and AWS services such as Redshift and Athena can query Glue to identify tables, and query Delta Lake for datasets. You can find out more about the integrations with Glue, Athena and Redshift in this blog post: Transform Your AWS Data Lake using Databricks Delta and the AWS Glue Data Catalog Service.
MLflow and SageMaker
Databricks provides built-in support for Python and R, and also provides built-in ML frameworks such as Keras, TensorFlow, and PyTorch. Many customers are using the power of Databricks for creating and building models, and then using SageMaker to put those models into production through our integration built onMLflow. You can find out more about how our customer Brandless is using this integration in this blog post: Using Databricks, MLflow, and Amazon SageMaker at Brandless to Bring Recommendation Systems to Production.
Enterprise Security and Credentials
DDatabricks enables you to architect a complete data and analytics solution seamlessly integrated with AWS security, roles and other platform elements. These integrations provide enterprise-wide visibility and policies for various teams. Policy violations can be flagged and departments can be billed with chargebacks. Learn more about IAM credential pass through in this blog post: Introducing Databricks AWS IAM Credential Passthrough.
Koalas and pandas
Over the past few years pandas has emerged as a key Python framework for data science. To provide scalability, Databricks has developed Koalas that implements the pandas DataFrame API on top of Apache Spark, and enables data scientists to make the transition from a single machine to a distributed environment without needing to learn a new framework. You can learn more in this blog post: Koalas: Easy Transition from pandas to Apache Spark.
AWS Data Exchange
AWS announced the AWS Data Exchange on November 13th that makes it easier for customers to subscribe to third-party datasets to mix with their own data and drive new insights. Databricks is used by both data providers and data subscribers to build and blend datasets at scale. You can learn more in this blog post: Databricks, AWS, and SafeGraph Team Up For Easier Analysis of Consumer Behavior.
To Learn More:
- Sign up now for our post-re:Invent webinar with SafeGraph. http://bit.ly/SAFEGRAPH
- Get six hours of free training using Databricks on AWS: http://bit.ly/TrainingAWS
- Talk to an expert: Contact us to get answers to questions you might have as you start your first project or to learn more about available training.