Opher Dubrovsky

Big Data Team Lead, Nielsen

Opher is a big data team lead at Nielsen. His team builds massive data pipelines that are cost effective and scalable (~250 Billion events/day). Their projects run on AWS, using Spark, serverless Lambda functions, Airflow, OpenFAAS, Kubernetes and more. He is passionate about new technologies, data, algorithms and machine learning. He loves to tackle difficult problems and come up with amazing solutions to them. He holds 4 patents in the area of security, and lots of ideas for more..

Past sessions

Summit 2021 Anomaly Detection at Scale!

May 27, 2021 11:00 AM PT

We all know how to create ML models, but the path to turning them into a highly scalable easy to use system by users is not always clear. What happens when you need to run thousands of them, on many different datasets, simultaneously and at a huge scale? AND, do it reliably so you can sleep well at night!!

 

To achieve exactly that, we’ve decided to go down the serverless route and build an anomaly detection system on top of it. We’ll go over the pros and cons of building such a system using serverless and when such an approach could work for you. 

 

Our SpotLight anomaly detection system is capable of easily reusing ML models, and scale to run millions of time series simultaneously with ease. Our system eliminates manual work and allows our end users with no scientific background to set anomalies to detect in a plug and play way and get alerts in no time.

 

In this talk, we’ll walk you through the architecture and share useful ideas you can adopt and implement in your own projects.

In this session watch:
Opher Dubrovsky, Big Data Team Lead, Nielsen
Max Peres, Developer, Nielsen

[daisna21-sessions-od]

Summit Europe 2020 Scale-Out Using Spark in Serverless Herd Mode!

November 18, 2020 04:00 PM PT

Spark is a beast of a technology and can do amazing things, especially with large datasets. But some big data pipelines require processing the data in small chunks and running them through a large Spark cluster can be inefficient and expensive.

In this talk we’ll describe a system we’ve built using many independent spark clusters running in parallel, side by side, in Serverless style. We run them on a Kubernetes cluster, but don’t let this confuse you with Spark on Kubernetes which runs one large Spark cluster on Kubernetes. Our system scales up and down on the fly by spinning up/down more independant Spark clusters and is capable of processing huge amounts of data, at an affordable cost.

We’ll walk you through the reasoning behind this unique Spark serverless architecture, its’ benefits and how we went about building it. You’ll learn how to evaluate your own Spark cluster architecture and figure out if you too should consider using such an approach to save costs and reduce processing time.

Topics include:

  • The task scheduling problem
  • Considerations for a cost-effective task workflow
  • And much more….

Speakers: Opher Dubrovsky and Ilai Malka