Role of Data Accessibility During Pandemic

May 27, 2021 11:00 AM (PT)

Download Slides

This talk focuses on the importance of data access and how crucial it is, to have the granular level of data availability in the open-source space as it helps researchers and data teams to fuel their work.

We present to you the research conducted by the DS4C (Data Science for Covid-19) team who made a huge and detailed level of South Korea Covid-19 data available to a wider community. The DS4C dataset was one of the most impactful datasets on Kaggle with over fifty thousand cumulative downloads and 300 unique contributors. What makes the DS4C dataset so potent is the sheer amount of data collected for each patient. The Korean government has been collecting and releasing patient information with unprecedented levels of detail. The data released includes infected people’s travel routes, the public transport they took, and the medical institutions that are treating them. This extremely fine-grained detail is what makes the DS4C dataset valuable as it makes it easier for researchers and data scientists to identify trends and more evidence to support hypotheses to track down the cause and gain additional insights. We will cover the data challenges, impact that it had on the community by making this data available on a public forum and conclude it with an insightful visual representation.

In this session watch:
Vini Jaiswal, Customer Success Engineer, Databricks
Isaac Lee, Data Scientist, Carnegie Mellon University


Vini Jaiswal

Vini Jaiswal is a Senior Developer Advocate at Databricks, where she helps data practitioners to be successful in building on Databricks and open source technologies like Apache Spark, Delta, and MLfl...
Read more

Isaac Lee

Isaac Lee is a software development team lead at Mindslab and the chief director for DS4C (Data Science for Covid). He is also pursuing a BS in computer science at Carnegie Mellon University. Isaac...
Read more