Announcing New Databricks APIs for Faster Production Apache Spark Application Deployment
Today we are excited to announce the release of a set of APIs on Databricks that enable our users to manage Apache Spark clusters and production jobs via a RESTful interface.
You can read the press release here.
For the impatient, the full documentation of the APIs is here.
API + GUI: The Best of Both Worlds
The graphical user interface in Databricks has already simplified Spark operations for our users when they need to launch a cluster or schedule a job quickly. However, many want something more than a point-and-click interface because they prefer the command line, or they need to automate common operations using scripts or continuous integration tools such as Jenkins. These new APIs expose the core infrastructure functionality of Databricks so that users have complete freedom to choose how they want to manage their clusters and put applications into production.
One Platform For Data Science and Production Spark Applications
To effectively deploy data-driven applications, organizations need a wide variety of capabilities from their data platforms because of the different skill sets and responsibilities of the teams involved. Spark application developers typically work with command line and APIs to be efficient; DevOps in IT want to automate as much process as possible to improve reliability; while data science and analysts just want easy access to powerful clusters that work reliably, and an interactive environment to develop algorithms and visualize data.
Typically, each team pursues different solutions in an uncoordinated fashion. As a result, organizations end up with a complex IT infrastructure or become extremely unproductive as release cycles get bogged down with a sprawl of tools and manual processes.
No platform has been able to meet these disparate needs out of the box. With the release of these APIs, we are proud to say that Databricks is the first company to unify the full spectrum of capabilities in one Spark platform.
What’s Next
The APIs are very simple to use - you can try them out in a terminal with the cURL command. A few basic examples are below:
Create a new cluster
curl -u user:pwd -H "Content-Type: application/json" -X POST -d
'{ "cluster_name": "flights", "spark_version": "1.6.x-ubuntu15.10",
"spark_conf": { "spark.speculation": true },
"aws_attributes": { "availability": "SPOT", "zone_id": "us-west-2c" },
"num_workers": 2 }'
https://yourinstance.cloud.databricks.com/api/2.0/clusters/create
Delete a cluster
curl -u user:pwd -H "Content-Type: application/json" -X POST -d
'{"cluster_id":"0321-233513-urn580"}'
https://yourinstance.cloud.databricks.com/api/2.0/clusters/delete
Run a job
curl -u user:pwd -H "Content-Type: application/json" -X POST -d
'{ "job_id":2, "jar_params": ["param1", "param2"]}'
https://yourinstance.cloud.databricks.com/api/2.0/jobs/run-now
We will continue to release more APIs as we add new features to the Databricks platform - stay tuned. In the meantime, try out these APIs for yourself in Databricks for free.