Software Engineer - Spark Benchmarking Infrastructure - Databricks

Software Engineer – Spark Benchmarking Infrastructure


As a Software Engineer on the Spark Benchmarking team at Databricks, you are responsible for ensuring that the Databricks Runtime is the world’s best Spark execution environment in terms of performance and scalability. You will be part of the team that is continuously improving the methodology and benchmarking infrastructure, helping to increase the frequency of the releases while maintaining high quality and performance standards. Continuously improving performance is an increasingly challenging job given the high volume of commits that go into a release. In order to meet this challenge, your team will continuously increase the level of automation and provide powerful benchmarking tools to evaluate the performance impact of each change. Engineers on the Spark Benchmarking team also drive the Databricks runtime performance sign-off process, they are the gatekeepers making sure that all performance regressions are addressed before a new version is released.


  • Experience with: Large scale distributed computing, Big Data engines e.g. Spark, Hadoop.
  • Passion for software automation and Continuous Integration experience.
  • Excellent communication and teamwork.
  • Strong foundation in algorithms and data structures and their real-world use cases.
  • Solid understanding of computer systems and networks.
  • Production quality coding standards and patterns.


  • 4+ years of general software programming experience.
  • 4+ years of modern, production level experience in one of: Java, Scala, JavaScript, or C++.
  • BS in Computer Science, Math, related technical field or equivalent practical experience.
  • Experience with benchmarking big data systems
  • Experience with developing infrastructure for testing distributed systems

About Databricks

Databricks’ mission is to accelerate innovation for its customers by unifying Data Science, Engineering and Business. Founded by the original creators of Apache Spark™, Databricks provides a Unified Analytics Platform for data science teams to collaborate with data engineering and lines of business to build data products. Users achieve faster time-to-value with Databricks by creating analytic workflows that go from ETL and interactive exploration to production. The company also makes it easier for its users to focus on their data by providing a fully managed, scalable, and secure cloud infrastructure that reduces operational complexity and total cost of ownership. Databricks, venture-backed by Andreessen Horowitz, NEA and Battery Ventures, among others, has a global customer base that includes Salesforce, Viacom, Shell, and HP.  For more information, visit

Apache, Apache Spark and Spark are trademarks of the Apache Software Foundation.