Active Governance Across the Delta Lake with Alation

Download Slides

Alation provides a single interface to provide users and stewards to provide active and agile data governance across Databricks Delta Lake and Databricks SQL Analytics Service. Understand how Alation can expand adoption in the data lake while providing safe and responsible data consumption.

Speaker: Raja Perumal


– [Narrator] Thank you for joining us today. Today we will be talking about the Alation Data Catalog and delivering Data Governance on the Delta Lake. My name is Raja Perumal and I work in our partners practice to make sure that we are delivering customer value using technology partners. We’ll be splitting this discussion up into three main parts. First, we’ll be talking about the Alation Data Catalog. What makes us unique and why we are key to driving a data culture. Then we will be talking about Alations approach to Data Governance. And finally we will wrap up by talking about the unique capabilities of Alation and Databricks and what we can deliver to customers. If all goes right I should be in the chat right now, able to answer your questions live. So feel free to ask questions and I will do my best to answer them. Alation is founded on the principle of creating a data culture for our customers. Our customers are telling us that increasingly they need to make data-driven decisions. And it’s not simple enough to have a data science team or a data analyst team. A data culture needs to be throughout the business in order to drive effective decision making that affects the bottom line. In these days this driver is even more important. The pandemic is accelerating the need to make informed data-driven decisions. So Alations’ mission to empower a curious and rational world is more crucial than ever today. To do this we obviously are working to instill a data culture within our customers and we believe the key to doing that is three key pillars. First is data search discovering. When our customers have a question they need to answer with data. We allow them to find the best place to answer that question. Second is data literacy. When our customers find that data that they need to answer the question. we provide them the tools to know how to use that data well. And finally a topic that we’ll be talking at length in a few minutes is data governance. We provide the tools for organizations to give authority to their data without hindering the business value to get from data as an asset. The Explosion of data and necessary changes in the workforce due to COVID and other factors, are making it difficult for our customers to realize the return on value from their data as an asset. And an evolving set of laws, whether it’s state level like CCPA or regional like GDPR, there are implications for how our customers use that data and what they can do with that data. So we need to give tools to ensure that our customers are driving value from their data. And to do that we have created a data catalog that answers four key questions. How do you find information? Can that information be used? Should it be used and how should be used? And we answer these four questions, with five key insights in our data catalog. First, is we provide data intelligence. We provide intelligence on the data using our automated and intelligent algorithms. We ensure that right out of the box, we will understand usage patterns and understand metadata to guide or to drive our customers to make intelligent decisions. We provide a platform for our customers to build a community around data. Our customers can collaborate around, how to use data and how to trust data. We provide a accelerated onboarding experience through guided navigation. So when a customer comes to the catalog, regardless of how many data sets they have, they understand which is the right data set for them. We provide active data governance. We don’t divorce governance from or policies from the data. We put policies right with the data. So customers know how to use the data. And finally, we do this across the enterprise. We provide broad, deep connectivity. It’s not just about connecting to a lot of data sources. It’s about connecting to key data sources like databit very deeply. So we understand how people are using the data and what they’re doing with that data. There are many types of data catalogs out there and Alation really focuses on building a data catalog as a platform. That means that we’re not just focused on a defensive data governance strategy, which means like you’re preventing people from getting access to the data, or we’re not a add-on to a tool that focuses on driving adoption simply of that one tool. We are a platform that allows, the community of data users within an enterprise to both get access to the right data and make sure they understand when not to use that data. To build the platform. We connect to a series of data sources, key to which is Databricks and your cloud data Lake infrastructure. But also your on-prem data basis. So you can understand what is happening across your analytical environment. And with this information, we build a platform that provides a active metadata catalog with a behavior analysis engine that tells you what your users are doing with your data across your platform. And once you satisfy that use case, you can layer on solutions like data search and discovery, data privacy, cloud data migrations and crucially data governance. which we will now talk about. Data governance now is crucial more than ever because the rate of data that is being created is exploding due to a variety of factors. This is causing folks to be worried about data breaches in their environment. And then it is causing concerns among the C-suite to make sure that they’re responding to those changes. Traditional approaches to data governance however have been failing. These traditional approaches typically again focus on the defense, focus on preventing people from getting access to data. They tend to be done in isolation and not with engagement of business stakeholders. And these approaches are often created separate from the actual consumption of data. Altogether this creates a solution that does not focus on people, does not focus on business and will give the people who need to use data incentives to get around these approaches instead of actually delivering to true compliance. So with Alation we take a different focus. We look at each of these pillars as core to delivering active data governance. The first of which is it is about the people using the data. It’s about guiding them to use the right data. It’s about guiding them to make sure they’re using data effectively. Supporting that is Operationalized Policy. We want the data policy that is determined by our customers to actually be in action, actually be in use and be measurable. We want this approach to be collaborative, stewards and governance officers need to work with users users need to work with stewards. We want some level of automation, so this is not a manual effort throughout the catalog. And finally we need to react to changes across the business as they grow. We’ll talk more about the product and the process, the actual how this works in the mechanics in a few moments. But around the active data governance, this is our approach to our lightweight approach to understand how to work with the business to make sure that we are getting started with an active data governance process. Now, let’s talk about Databricks. Our approach to active data governance incorporates a variety of personas across the business to ensure that we’re driving adoption of our analytical investments in the Delta Lake. This approach to governance ensures that users can use data easily because it’s well curated and well governed and also can use the data safely knowing that they have policy concerns already taken into account. Here are some of the few capabilities that we take into consideration when we talk about when we talk about governance that Alation helps scale and automate. Shifting gears let’s talk about how this applies specifically to Alation and Databricks. Working with our joint customers, we just determined three key challenges about using a machine learning platform like Databricks on the cloud that illusion an outlook. First, is moving to the cloud environment, ensuring that you’re getting complex projects off the ground can require a lot of time, a lot of resources that quick moving data-driven organizations just don’t have. Second, once you are on to the Databricks platform or when you’re on a modern machine learning stack, empowering your data scientists to find the right datasets can be time-consuming. And there data scientists need to be able to collaborate with experts that really understand the data. So they’re not redoing analysis that’s already been done. Alation helps with each one of these capabilities. First Alation will identify the key data sets to move to a cloud environment. So instead of starting with the big bang project, which opens up for risk, it opens up for issues with time. Alation will help our customers identify which data assets to move, who are the user groups associated to those data assets and communicate to those data users within our data catalog to move those data assets in a timely fashion. All the while we take advantage of database capability to manage ETL insecurity and safely move that data to the cloud. Overall together we ensure that we de-risk the move to the cloud by focusing on key assets and user groups. And we can accelerate the time that it takes to do those processes. Alations core Search and discovery capabilities allow data scientists to find the right sets of data to use for their analysis, regardless of where it exists across their enterprise analytical environment. This allows the data scientist to take advantage and create data pipelines across their environment to make sure that data scientists again, can run efficient experiments using the right data regardless of where it is. Finally, using the Alation data catalog, data scientists and data analysts can get involved with, get understand the context that has already happened within the data sources. Collaborate with other users of the data, whether they’re on the data science team or not. So they can understand what analysis has already been done and what they can reuse. Once that work is done they can track these experiments, publish models and continue to use Databricks to perform their experiments, focusing on the use cases that has core value to the business. This collaboration allows Databricks and relation customers to ensure that they are making better models. And these business insights are shared across the environment. With the release of, or the announcement of Databricks SQL Analytics Service. These core tenants apply to more and more parts of the business. Previously Alation and Databricks work together to empower data scientists to find the best data for their data science experiments. But with Alation and SQL analytics service. Now data analysts can take advantage of the easy SQL endpoints that are spun up with SQL analytics service to query those same data sets in Delta lake through the Alation SQL editor, to ensure that our analysts can take advantage of the same capabilities that the data scientists can. Because Databricks makes it easy to speed up SQL analytic endpoints. Alation can use our database JDBC driver to ensure that we are using the best data. And using it in a very high performance way when we’re connecting with the SQL analytics server. Thanks to the built-in concurrences, we don’t have to worry about database management. We don’t need someone to have a ton of knowledge and spark to get started with the Alation SQL compose editor, they can get started using this dataset very quickly. This gives an idea to tie together all of the different pieces of what we’ve talked about up until this point. For a full demo we can get into each and every piece of this. But what I would like to highlight is that within Alation, we’re going to tell you, who are your top users in your Databricks environment? Who should you go ask when you have a question about this data and usage about this data? How popular are the data sets in those columns? Who are the stewards? Who has the authority and the responsibility to maintain those datasets? If you would like to know more there’s a lot to show and I encourage you to reach out to me so I can talk to you. Thank you very much.

Watch more Data + AI sessions here
Try Databricks for free
« back
Raja Perumal
About Raja Perumal


Raja Perumal helps large organizations make sense of their vast amounts of data. At Alation, he works with partners to ensure they drive value for Alation customers.