Using Apache Spark in the Cloud—A Devops Perspective

Download Slides

Toon is a leading brand in the European smart energy market, currently expanding internationally, providing energy usage insights, eco-friendly energy management and smart thermostat use for the connected home. As value added services become ever more relevant in this market, we have the need to ensure that we can easily and safely on-board new tenants into our data platform. In this talk we’re going to guide you across a less discussed side of using Spark in production – devops. We will speak about our journey from an on-premise cluster to a managed solution in the cloud. A lot of moving parts were involved: ETL flows, data sharing with 3rd parties and data migration to the new environment. Add to this the need to have a multi-tenant environment, revamp our toolset and deploy a live public facing service. It’s possible to find a lot of great examples of how Spark is used for data-science purposes. On the data engineering side, we need to deploy production services, ensure data is cleaned, secured and available, and keep the data-science teams happy. We’d like to share some of the options we took and some of the lessons learned from this (ongoing) transition.
Session hashtag: #EUde10

About Telmo Oliveira

As a data engineer on Toon (formerly Quby), Telmo has been helping his company and team transition from using an on-premise data centre to adopting a devops approach and ensuring that automated infrastructure is provisioned on the cloud. Before that he worked on the data engineering team from Sanoma, working on a platform to ingest and process data from dozens of different media outlets.