Skip to main content
Company Blog

At Databricks, we strongly believe ("know" you could say) that data and AI are mission-critical for solving the biggest problems our world faces. From healthcare to sustainability to transportation, data is a key to understanding and analyzing these issues at the deepest level – often in real time – and in turn shapes effective solutions.

That's why we're thrilled to announce that Databricks will be working with the University of Rochester's Goergen Institute for Data Science on student capstone projects to drive social change with public datasets. The core mission of Databricks is to solve the world's toughest problems with data, so we are very excited to add this work with Rochester alongside other work we have done with nonprofits, policy makers, NGOs and other organizations. One of my favorite  examples is when we leveraged public healthcare data sets to empower the data community at the early onset of the global pandemic.

Databricks' collaboration began with Rochester's membership in the Databricks University Alliance, a global program with more than 160 member universities worldwide that helps more than 7,000 students get hands-on experience using Databricks. In this new extended partnership, Databricks employees will work with students on identifying problems, selecting datasets, doing machine learning and sharing Databricks notebooks and models that highlight novel and actionable information.

This joint effort is especially exciting given the interest in leveraging data science for social good that Databricks and several University of Rochester faculty members share. Professor Lloyd Palum, an instructor for the course Data Science at Scale, first approached Databricks in August of 2020 with an interest in using our platform to introduce students to data-intensive applications for capstone projects. This ultimately culminated in Professor Palum, in collaboration with Dr. Ajay Anand, Deputy Director of the Goergen Institute for Data Science, presenting his approach to teaching large-scale analytics using Databricks to 50+ faculty members earlier this year. Part of that conversation involved using data for good, a subject that many Databricks employees are passionate about (for an example of such work, see this recent blog post from Chengyin Eng & Brooke Wenig exploring fatality rates in police shootings).

In follow-up conversations with Professor Palum and Dr. Anand, Databricks employees expressed great enthusiasm for working with Rochester on solving tough problems with public data sets.  What does this program mean for solving the world's toughest problems? This spring 2021 semester, Databricks engineers will investigate various social, public health and humanitarian issues that have publicly-available datasets and present options for student capstone projects in the fall. 

As we hit milestones in this collaboration, we will publish more blog posts to bring visibility to the important work that these students, professors and Databricks employees are undertaking in the interest of social good.  We hope to make significant contributions to the ecosystem of responsible data scientists, ethical AI and data sets in the public domain that target real-world problems like climate change, pandemic management, social equity and sustainability at a global scale. Stay tuned!

How to Get Started with Databricks University Alliance

The Databricks University Alliance exists to help students and professors learn and use public-cloud-based analytical tools in college classrooms virtually or in-person. Enroll now and join more than 150 universities across the globe that are building the data science workforce of tomorrow.  If you are a professor or student interested in working with Databricks on using public data sets to drive social change, please contact  [email protected]. We believe that thoughtful collaboration can make a difference!

Upon acceptance, members will get access to curated content, training materials, sample notebook and pre-recorded content for learning data science and data engineering tools, including Apache Spark, Delta Lake and MLflow.  Students focused on individual skills development can sign up for the free Databricks Community Edition and follow along with these free one-hour hands-on workshops for aspiring data scientists, as well as access free self-paced courses from Databricks Academy, the training and certification organization within Databricks.

The Databricks University Alliance is powered by leading cloud providers such as Microsoft Azure, AWS and Google Cloud. Those educators looking for high-scale computing resources for their in-person and virtual classrooms may apply for cloud computing credits.