Time Series Analytics with Spark

Download Slides

spark-timeseries is a Scala / Java / Python library for interacting with time series data on Apache Spark.
Time-series are an important part of data science applications, but are notoriously difficult in the context of distributed systems, due to their sequential nature. Getting this right is therefore a challenging but important element of progress in the universe of distributed systems applied to data science.

This talk will cover the current overall design of spark-timeseries, the current functionalities, and will provide some usage examples. Because the project is still at an early stage, the talk will also cover the current weaknesses and future improvements that are in the spark-timeseries project roadmap.

About Simon Ouellette

Simon Ouellette is Chief Data Scientist at Faimdata, a consumer intelligence company based in Montreal, Canada. With over 12 years of experience in Machine Learning and distributed systems engineering, he has been a contributor to the spark-timeseries project, starting with the initial fundamental design discussions with the main project administrator, Sandy Ryza.