Isaac Lee is a software development team lead at Mindslab and the chief director for DS4C (Data Science for Covid). He is also pursuing a BS in computer science at Carnegie Mellon University. Isaac’s most notable work includes the development of Covid-19 patient data engineering tool at DS4C, research in semi-supervised deep learning stereo vision model at Carnegie Mellon and building a Text-To-Speech learning engine.
May 27, 2021 11:00 AM PT
This talk focuses on the importance of data access and how crucial it is, to have the granular level of data availability in the open-source space as it helps researchers and data teams to fuel their work.
We present to you the research conducted by the DS4C (Data Science for Covid-19) team who made a huge and detailed level of South Korea Covid-19 data available to a wider community. The DS4C dataset was one of the most impactful datasets on Kaggle with over fifty thousand cumulative downloads and 300 unique contributors. What makes the DS4C dataset so potent is the sheer amount of data collected for each patient. The Korean government has been collecting and releasing patient information with unprecedented levels of detail. The data released includes infected people’s travel routes, the public transport they took, and the medical institutions that are treating them. This extremely fine-grained detail is what makes the DS4C dataset valuable as it makes it easier for researchers and data scientists to identify trends and more evidence to support hypotheses to track down the cause and gain additional insights. We will cover the data challenges, impact that it had on the community by making this data available on a public forum and conclude it with an insightful visual representation.