Senior Big data Engineer, with over 20 years experience in the software industry. In the last 8 years, I’ve been working as a senior big data engineer at Nielsen, building big data pipelines using Spark, Kafka, Druid, Airflow and more.
In addition to presenting at technical forums within Nielsen, Spoke at a Women in Big Data meetup in Israel on September and also spoke on Talks by Softbinator Foundation this December.
May 26, 2021 12:05 PM PT
Every day, millions of advertising campaigns are happening around the world.
As campaign owners, measuring the ongoing campaign effectiveness (e.g “how many distinct users saw my online ad VS how many distinct users saw my online ad, clicked it and purchased my product?”) is super important.
However, this task (often referred to as “funnel analysis”) is not an easy task, especially if the chronological order of events matters.
One way to mitigate this challenge is combining Apache Druid and Apache DataSketches, to provide fast analytics on large volumes of data.
However, while that combination can answer some of these questions, it still can’t answer the question "how many distinct users viewed the brand’s homepage FIRST and THEN viewed product X page?"
In this talk, we will discuss how we combine Spark, Druid and DataSketches to answer such questions at scale.