NVIDIA: Senior Distributed System Engineer – Apache Spark

We are seeking experienced System Software Engineers to accelerate Apache Spark and related frameworks on GPUs.

Apache Spark is the most popular data processing engine in data centers for data science. It is used for a wide variety of data workloads, from data preparation, to running ML experiments, and all the way to deployment of ML applications.  Data scientists spend a considerable amount of time exploring data and iterating over machine learning (ML) experiments. Every hour of compute required to sort through datasets, extract features and fit ML algorithms impedes an efficient business workflow.

At NVIDIA, we are passionate about working on hard problems that have an impact. You will work with the open source community to enable Apache Spark data processing with GPUs. Data workflows can benefit tremendously from being accelerated, enabling data scientists to explore many more and larger datasets to achieve their business goals, faster and more efficiently.

What you’ll be doing:

  • Creating a collection of GPU accelerated libraries for data processing, data analytics and ML

  • Designing and implementing solutions to enhance Apache Spark for GPU aware scheduling, distributed ML execution and beyond

  • Engaging open source communities, including Apache Spark and RAPIDS, for technical discussions and contributions

  • Working with Nvidia strategic partners on deploying sophisticated machine learning and data analytics solutions in public cloud or on-premise clusters

  • Presenting technical solutions in industry conferences and meetups

What we need to see:

  • BS, MS, or PhD in Computer Science, Computer Engineering, or closely related field

  • 8+ years of work or research experience in software development

  • 4+ years working with key open source big-data projects as a contributor or committer including Apache Spark, Apache Hadoop, Apache Flink, Apache Kafka, Apache Storm and Apache Hive

  • Outstanding technical skills in crafting and implementing high-quality distributed systems

  • Excellent programming skills in C++, Java, and/or Scala

  • Proven knowledge of distributed system schedulers: Kubernetes, Hadoop YARN, Spark standalone, and/or Mesos

  • Able to work successfully with multi-functional teams across organizational boundaries and geographies

  • Highly motivated with strong communication skills

Ways to stand out from the crowd:

  • Committership at major open source projects (such as Apache Spark, Apache Hadoop, Apache Flink, Apache Kafka) would be a huge plus

  • Working experience with acceleration libraries (CUDA, RAPIDS, UCX) is helpful

  • Basic ML/DL experience with Spark ML and XGBoost would be valuable


NVIDIA is widely considered to be one of the technology world’s most desirable employers. We have some of the most forward-thinking and hardworking people in the world working for us. Are you creative and autonomous? Do you love a challenge? If so, contact us!

NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression , sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.