Untested, undocumented assumptions about data in data pipelines create risk, waste time and erode trust in data products. Automated testing has been one of the biggest productivity boosters in modern software development and essential for managing complex codebases. Data science and engineering have been largely missing out on automated testing. This talk introduces Great Expectations, an open-source python framework for bringing data pipelines and products under test. Great Expectations is a python framework for bringing data pipelines and products under test. Like assertions in traditional python unit tests, Expectations provide a flexible, declarative language for describing expected behavior. Unlike traditional unit tests, Great Expectations applies Expectations to data instead of code. We strongly believe that most of the pain caused by accumulating pipeline debt is avoidable.
We built Great Expectations to make it very, very simple to:
We hope it helps you as much as it’s helped us. Main takeaways:
Eugene Mandel is Head of Product at Superconductive and a core contributor to the Great Expectations open source library. Prior to Superconductive, Eugene led data science at Directly, was a lead data engineer on the Jawbone data science team, and co-founded 3 startups that used data in diverse fields - internet telephony, marketing surveys and social media. Eugene's core interest has been turning data into real products that make users happy.
Abe Gong is a core contributor to the Great Expectations open source library, and CEO and Co-founder at Superconductive. Prior to Superconductive, Abe was Chief Data Officer at Aspire Health, the founding member of the Jawbone data science team, and lead data scientist at Massive Health. Abe has been leading teams using data and technology to solve problems in health care, consumer wellness, and public policy for over a decade. Abe earned his PhD at the University of Michigan in Public Policy, Political Science, and Complex Systems. He speaks and writes regularly on data, healthcare, and data ethics.