Lead Big Data Engineer - Databricks

Lead Big Data Engineer

Come join our growing Digital Platform Group (DPG) with McGraw-Hill. You’ll create innovative, digital learning solutions in an environment that feels like a startup, but has the foundation of a successful EdTech company. Contributing to this group means you’ll utilize state-of-the-art technologies to deliver intuitive learning experiences. We build data-driven digital products that are used by millions of students around the globe. As both our business and the EdTech space quickly evolve, we need hard-working, passionate people to be a part of our dynamic team.

McGraw-Hill is a learning science company which builds digital products within K-12, higher education, and professional markets. McGraw-Hill has an exciting opportunity for a Lead Big Data Engineer within our Analytics and Reporting team in Boston, MA, in the heart of the Innovation District. The Lead Engineer provides technical expertise and leadership in developing our core data ingestion, processing, and machine learning platform built in Spark, and will contribute to shaping the future of education technology by enabling advanced analytical insights across all McGraw-Hill products. Join our team and help us positively impact the next generation of learners with innovative, new learning technologies!

Your contribution to the team includes:

  • Developing applications in Scala and Spark which ingest and process millions of behavioral events from products and components in the McGraw-Hill ecosystem, in real-time.
  • Creating highly efficient and scalable structured streaming workloads which can handle complex joins and aggregations across multiple streams, while handling data issues such as out of order data and late/missing data, on large in-memory dataframes.
  • Contributing to designs and architectural decisions across the entire data infrastructure, from event ingestion (Kafka) to data warehousing (Postgres, Cassandra, DynamoDB, MarkLogic), while surfacing insights and predictive annotations which can be consumed by multiple applications.
  • Building machine learning applications such as classifiers which predict student outcomes from large sets of labeled and unlabeled training data.
  • Technical leadership across several teams with a focus on strong engineering discipline, including data-driven planning, automated testing, code reviews, continuous integration, and building the right thing, the right way, at the right time.
  • Supporting on-time delivery across multiple disciplines, including engineering, user experience, product management, system administration, and release management.
  • Researching production issues and working with colleagues to quickly resolve them.
  • Working collaboratively with Product Management and PMO to define scope.
  • Prototyping with emerging technologies to continuously improve data freshness, accuracy, and value for our customer teams.

What you’ll need to be successful:

  • Bachelor’s degree in Computer Science or equivalent experience in software engineering.
  • Several years of hands-on experience as a data engineer, working on both infrastructure and application development problems and proven experience in building Spark Streaming and ML applications and deploying to a production setting.
  • Strong understanding of functional programming (Scala).
  • Experience with Python and commonly used data science libraries.
  • Experience implementing software systems for applications developed with cloud technologies (AWS).
  • Strong understanding of database architectures, including row and column stores, data lake topologies, NoSQL DBs, graph databases, query optimization, and ANSI SQL. Strong hands-on experience in Linux, including shell scripting.
  • Familiarity and interest in classical machine learning models (classifiers, regression, neural networks, support vector machines) and their training is highly desirable.
  • Ability to innovate quickly and fail fast, in order to determine the best path.
  • Familiarity with the Agile methodology and tracking tools (Jira).
  • Excellent interpersonal skills, the ability to collaborate with teams, as well as, excellent verbal and written communications.

This position can be filled in Boston, MA or New York City, NY.

When you join our team, you become part of a company that impacts millions of students and teachers every day. As a leader in the EdTech space, McGraw-Hill offers flexibility and collaboration while creating innovative products that positively impact learning. Our mission is to unlock the potential of every learner and every employee.

Join us for a career where you’ll grow both personally and professionally in a welcoming, diverse, and inclusive environment.