Software Engineer - Storage & IO - Databricks

Software Engineer – Storage & IO


As a Software Engineer on the Storage & IO team, you are responsible for ensuring that our customers are successful in storing and accessing huge volumes of data in the Databricks platform. You will be part of the team that is responsible for the interface between Spark’s processing engine and the storage services.

Ensuring both correctness and performance in the cloud environment is a challenging job that requires understanding the inner mechanisms of storage at both file format level and file system level. Cloud storage providers offer a variety of services that differ in terms of features, performance, authentication mechanisms, semantics, and pricing. The team develops the Databricks File System (DBFS), which provides a unified namespace that reconciles multiple cloud storage backends to provide a smooth experience for data scientists and data engineers leveraging multiple data sources and data lakes. On top of that, the team builds optimizations exploiting caching and data layout to ensure state-of-the-art performance of data access and processing.


  • Excellent problem analysis and solving skills.
  • Excellent communication and teamwork.
  • Strong foundation in algorithms and data structures and their real-world use cases.
  • Solid understanding of computer systems and networks.
  • Production quality coding standards and patterns.
  • BS in Computer Science, Math, related technical field or equivalent practical experience.
  • Preferred Skills: Experience with Hadoop or other open-source projects, experience with file systems, file formats, storage engines, performance optimizations.

About Databricks

Databricks’ mission is to accelerate innovation for its customers by unifying Data Science, Engineering and Business. Founded by the original creators of Apache Spark™, Databricks provides a Unified Analytics Platform for data science teams to collaborate with data engineering and lines of business to build data products. Users achieve faster time-to-value with Databricks by creating analytic workflows that go from ETL and interactive exploration to production. The company also makes it easier for its users to focus on their data by providing a fully managed, scalable, and secure cloud infrastructure that reduces operational complexity and total cost of ownership. Databricks, venture-backed by Andreessen Horowitz, NEA and Battery Ventures, among others, has a global customer base that includes Salesforce, Viacom, Shell, and HP.  For more information, visit

Apache, Apache Spark and Spark are trademarks of the Apache Software Foundation.