Data Engineer, Analytics & Behavior Science - Databricks

Data Engineer, Analytics & Behavior Science

The product analytics & behavior science team is responsible for nearly all of the product web, app and
STB datasets used at Comcast for analyzing the product experience. This means that we build datasets
for the purpose of analysis, and collaborate with modeling experts to build production data pipelines.
Given the size/complexity of our datasets this is not a trivial task. Nearly every product team member
depends on this data, and as a member of this team you would be at the center of product innovation.

This includes projects like:
• Building datasets in partnership with data scientists
• Creating source datasets for a/b testing
• Creating data streaming applications for machine learning applications
• Maintaining prediction APIs
• Establishing new patterns for efficient processing of 100's TB datasets
• Defining metrics for tracking how customers are interacting with products and service
• Partner with UI engineering teams to enable new customer experiences
• Partner with customer experience teams to create an improved experience and be the voice for
• Lead data driven product development
The perfect person will have a background in a quantitative/technical field, will have rich experience
working with large data sets, and will have some experience in data-driven decision making.

• You have at least 5 years of experience in a data-driven environment designing and building
distributed data processing systems
• Hands on experience in developing big data pipelines end to end
• You have proficiency programming in Python, Java and/or Scala.
• You strive to write beautiful code and you're comfortable working in a variety of tech stacks.
• You're self-motivated, ambitious and quick to take action, while also open to new ideas. You recognize
when you're wrong and move past your own mistakes.
• You enjoy staying current with technology and continually strive to be better at your craft.
Must Have:
• AWS experience developing data streaming pipelines
• Spark (EMR, Databricks, and/or Apache on EC2)
• Deep Spark understanding
• S3 as object store
• Kafka/Kinesis
• Presto/Athena
• Experience programmer in Python, Java and/or Scala
• Good Communication
• Self-Starter and highly motivated