Ido Karavany - Databricks

Ido Karavany

Big Data Architect, Intel

Ido Karavany is a big data analytics architect and development manager at Intel Advanced Analytics group. Ido is leading edge technology projects within Intel involving Big Data & Stream analytics solutions in the Internet of Things and Parkinson Disease Research. Ido has over 8 years of experience in software development in the domains of data analytics and distributed computing solutions.



Using Spark in an IoT Analytics Platform Enable breakthroughs in Parkinson Disease ResearchSummit Europe 2015

In this session, we will present a solution developed in partnership between Intel and the Michael J. Fox foundation to enable breakthroughs in Parkinson's disease (PD) research, by leveraging wearable sensors, smartphone and big data analytics to monitor PD patient's motor movements 24/7. We have built an IoT Big Analytics platform (on Amazon Cloud) based on open source technologies as Cloudera Distribution for Hadoop to enable the collection and processing of high data streams (up to 1 GB per patient per day). The platform was successfully used in multiple clinical trials and started ramp up to have thousands of patients connected 24x7 by the end of 2015. The platform uses HBase & HDFS as its main scalable storage layer. The analytics batch layer leverages Apache Spark (over HBase & HDFS) and includes a set of complex machine learning algorithms, sophisticated event based rule engine, an automatic change detection engine and a variety of PD related measurements. Examples for those are activity recognition, patients sleep quality, tremor detection, PD gait recognition and others. This presentation will present our solution and elaborate on the way we are using Spark for implementing our machine learning algorithms. We'll focus on our challenges using Spark, starting with data extracting from HBase challenges and solutions for our batch and near-real time calculations, we'll also review our solution evolution and will show what worked and didn't work for us (i.e. Many small jobs vs. fewer consolidated larger jobs, multiple vs. single Spark contexts).