Principal Distributed System Software Engineer (Apache Spark) - Databricks

Principal Distributed System Software Engineer (Apache Spark)

NVIDIA is seeking experienced Principal Distributed System Software Engineer to accelerate Apache Spark and related frameworks for data science.

Data scientists spend a considerable amount of time exploring data, iterating over machine learning (ML) experiments. Every hour of compute required to sort through datasets, extract features, fit ML algorithms, hinders the ability of data scientists to drive towards results. Apache Spark is the most popular data processing engine in data centers for data science. It is used for interactive data science, from data preparation, to running ML experiments, and all the way to deployment of ML applications.

In this role, you will have the opportunity to collaborate with the open source community to accelerate Apache Spark with GPU for data science. NVIDIA believes that data science workflows can benefit tremendously from being accelerated, to enable data scientists to explore many more and larger datasets to drive towards their business goals, faster, and more reliably.

What you’ll be doing:

  • Building a collection of GPU accelerated libraries for ML and data analytics
  • Crafting and implement solutions to enhance Apache Spark for GPU aware scheduling, distributed ML execution and beyond
  • Engaging open source communities, including Apache Spark and RAPIDS, for technical discussion and contribution
  • Working with NVIDIA strategic partners on deploying advanced machine learning and data analytics solutions in public cloud or on-premise clusters
  • Presenting technical solutions in industry conferences and meetups

What we need to see:

  • You have a BS, MS, or PhD in Computer Science, Computer Engineering, or closely related field
  • 12+ years of work or research experience in software development
  • 6+ years of hands on experience with key open source big-data projects as a contributor or committer including Apache Spark, Apache Hadoop, Apache Flink, Apache Kafka, Apache Storm and Apache Hive
  • Exceptional technical skills in designing and implementing high-quality distributed systems
  • Excellent programming skills in C++, Java, Scala and/or Python
  • Solid knowledge about distributed system scheduler: Kubernetes, Hadoop YARN, Spark standalone, and/or Mesos
  • Able to work successfully with multi-functional teams across organizational boundaries and geographies
  • Highly motivated with strong communication skills

Ways to stand out from the crowd:

  • Committer-ship at major open source big-data projects (such as Apache Spark, Apache Hadoop, Apache Flink, Apache Kafka) would be a hug plus
  • Working experience with GPU-accelerated libraries (CUDA, cuBLAS, cuSparse, NCCL, nvGraph) is very helpful
  • Basic ML/DL training and/or experience with Spark ML and XGBoost would be valuable

With highly competitive salaries and a comprehensive benefits package, NVIDIA is widely considered to be one of the technology world’s most desirable employers. We have some of the most forward-thinking and talented people in the world working for us. Are you creative and autonomous? Do you love a challenge? If so, we want to hear from you.

NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression , sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.