At Databricks, we have partnered with the team at Amazon Web Services (AWS) to provide a seamless integration with the AWS Glue Metastore. Databricks can easily use Glue as the metastore, even across multiple workspaces. YipitData, a longtime Databricks customer, has taken full advantage of this feature, storing all their metadata in AWS Glue. Databricks’ integration with Glue enables YipitData to seamlessly interact with all data that is catalogued within their metastore.
YipitData is a data company that specializes in sourcing and analyzing alternative data to answer key questions for fundamental investors. YipitData relies on the scale and processing ability of Databricks Unified Data Analytics for a competitive advantage. They are able to incorporate a far greater variety of data enriched and analyzed in different ways than competitors in their space. The ability to use the AWS Glue Metastore has been instrumental to their continued growth and success.
The key benefits for YipitData's usage of AWS Glue with Databricks:
- All their metadata resides in one data catalog, easily accessible across their data lake. Synchronization of metastores was a difficult challenge, and using Glue removes this burden.
- They are able to quickly and seamlessly integrate tools within their existing stack, with the same metastore. For example, they often perform quick queries using Amazon Athena. Data that has been ETL'd using Databricks is easily accessible to any tools within the AWS Stack, including Amazon Cloudwatch to enable monitoring.
- AWS Glue's API's are ideal for mass sorting and filtering. Understanding expiry across 10's of thousands of tables is core to Yipidata's business, and together with Databricks this used to take 8 hours to accomplish. This now can be done in under 5 minutes.
Databricks also provides several advantages that help YipitData succeed. The power of notebooks has enabled sharing of information rapidly, removing the siloes of tribal knowledge common in the past - now their analysts are able to easily share information. Using AWS’s Single Sign On service has also been a huge benefit to the team as they haven't needed to implement costly complex third-party solutions. Databricks’ ability to scale means, as Andrew Gross, Staff Engineer from YipitData puts it, "Databricks allows us to effortlessly trade scale for speed, which was not possible before."
Get Started with Databricks and AWS Glue
You can apply the power of Databricks and AWS Glue to help solve your toughest data problems. Learn more at https://docs.databricks.com/data/metastores/aws-glue-metastore.html
Additional Resources
Using AWS Glue Data Catalog as the Metastore for Databricks
AWS Data Lake Delta Transformation Using AWS Glue