Skip to main content

Databricks SQL Statement Execution API – Announcing the Public Preview

Adriana Ispas
Bogdan Ionut Ghit
Ben Fleis
Pearl Ubaru
Share this post

Today, we are excited to announce the public preview of the Databricks SQL Statement Execution API, available on AWS and Azure. You can now connect to your Databricks SQL warehouse over a REST API to access and manipulate data managed by the Databricks Lakehouse Platform.

The Databricks SQL Statement Execution API simplifies access to your data and makes it easier to build data applications tailored to your needs. The API is asynchronous, which removes the need to manage connections like you do with JDBC or ODBC. Moreover, you can connect to your SQL warehouse without having to first install a driver. You can use the Statement Execution API to connect your traditional and Cloud-based applications, services and devices to Databricks SQL. You can also create custom client libraries for your programming language of choice.

Databricks SQL Statement Execution

In this blog, we review some key features available in the public preview and show how to leverage your data in a spreadsheet using the Statement Execution API and JavaScript.

Statement Execution API in brief

The Statement Execution API manages the execution of SQL statements and fetching of result data on all types of Databricks SQL warehouses via HTTP endpoints for the following operations:

Submit a SQL statement for executionPOST /sql/statements
Check the status and retrieve resultsGET /sql/statements/{statement_id}
Cancel a SQL statement executionPOST /sql/statements/{statement_id}/cancel

For example, let's assume we want to retrieve the monthly order revenue for the current year for display in our data application. Assuming data on orders is already managed by our Lakehouse, a SQL statement could be as shown below:

SELECT 
	month(o_orderdate) as Month, 
	sum(o_totalprice) as `Monthly Revenue` 
FROM `samples`.`tpch`.`orders` 
WHERE year(o_orderdate) == year(current_date()) 
GROUP BY 1
ORDER BY 1

We can initiate the execution of our SQL statement by sending a POST request to the /api/2.0/sql/statements endpoint. The string representing the SQL statement is provided as a request body payload, along with the ID of a SQL warehouse to be used for executing the statement. The HTTP request must also contain the host component of your Databricks workspace URL and an access token for authentication.

POST /api/2.0/sql/statements HTTP/1.1
Host: <your_HOST>
Authorization: Bearer <your_access_token>
Content-Type: application/json
{
    "statement": "SELECT month(o_orderdate) as Month, sum(o_totalprice) as `Monthly Revenue` FROM `samples`.`tpch`.`orders` WHERE year(o_orderdate) == year(current_date()) GROUP BY 1 ORDER BY 1",
    "warehouse_id": "<your_SQL_warehouse_ID>"
}

If the statement completes quickly, the API returns the results as a direct response to the POST request. Below is an example response:

{
   "statement_id": "01ed9a57-ad5e-1333-8e76-8c718723abf2",
   "status": {
       "state": "SUCCEEDED"
   },
   "manifest": {
       "format": "JSON_ARRAY",
       "schema": {
           "column_count": 2,
           "columns": [
               {
                   "name": "Month",
                   "type_name": "INT",
                   "position": 0
               },
               {
                   "name": "Monthly Revenue",
                   "type_name": "DECIMAL",
                   "position": 1,
                   "type_precision": 28,
                   "type_scale": 2
               }
           ]
       }
   },
   "result": {
       "chunk_index": 0,
       "row_offset": 0,
       "row_count": 2,
       "data_array": [
           [
               "1",
               "14615808096.95"
           ],
           [
               "2",
               "945871268.15"
           ]
       ]
   }
}

If the statement takes longer, the API continues asynchronously. In this case, the response contains a statement ID and a status.

{
    "statement_id": "01ed9a50-c9c9-178e-9be7-0ab52bc998b0",
    "status": {
        "state": "PENDING"
    }
}

You can use the statement ID to check the execution status and, if ready, retrieve the results by sending a GET request to the /api/2.0/sql/statements/{statement_id} endpoint:

GET /api/2.0/sql/statements/<statement_id> HTTP/1.1
Host: <your_HOST>
Authorization: Bearer <your_access_token>

You can also use the statement ID to cancel the request by sending a POST request to the /api/2.0/sql/statements/cancel endpoint.

POST /api/2.0/sql/statements/<statement_id>/cancel HTTP/1.1
Host: <your_HOST>
Authorization: Bearer <your_access_token>

The API can be configured to behave synchronously or asynchronously by further configuring your requests. To find out more, check the tutorial (AWS | Azure) and the documentation (AWS | Azure).

Using the Databricks SQL Statement Execution API in JavaScript

You can send Databricks SQL Statement Execution API requests from any programming language. You can use methods like the Fetch API in JavaScript; Python Requests in Python; the net/http package in Go, and so on.

We show how you can use the Statement Execution API to populate a Google Sheet using the JavaScript Fetch API from a Spreadsheet App.

Example Spreadsheet app

Let's imagine we want to build a Google Spreadsheet App that populates a spreadsheet with data on orders. Our users can fetch monthly order revenue data based on predefined criteria, such as the monthly order revenue for the current month, the current year, or between a start and an end date. For each criterion, we can write corresponding SQL statements, submit them for execution, fetch and handle the results using the Statement Execution API.

In the next section, we outline the main building blocks to implement this example. To follow along, you can download the spreadsheet from GitHub.

Building the Spreadsheet App

Given a statement that we want to execute using the SQL Statement API, the executeStatement function below captures the overall logic for handling the default mode of the API. In this mode, statement executions start synchronously and continue asynchronously after a default timeout of 10 seconds.

We start by submitting a statement for execution using the submitStatement function. If the statement completes within the defined timeout, we fetch the results by calling the handleResult function. Otherwise, the execution proceeds asynchronously, which means we need to poll for the execution status until completion – the checkStatus function covers the required logic. Once finished, we retrieve the results using the same handleResult function.

function executeStatement(statement) {
  var response = submitStatement(statement);
  if (response.status.state == "SUCCEEDED") {
   return handleResult(response.manifest, response.result);
  } else {
    response = checkStatus(response.statement_id)
    while (response.status.state == "PENDING" || response.status.state == "RUNNING"
) {
      response = checkStatus(response.statement_id)
    }
   return handleResult(response.manifest, response.result);
  }
}

The submitStatement function defines the request body where we set execution parameters such as the wait timeout of 10 seconds (default), the execution mode and the SQL statement. It further invokes a generic fetchFromUrl function for submitting an HTTP request. We also define a HTTP_REQUEST_BASE constant to pass in the access token for the Databricks workspace user. We will reuse this constant for all HTTP requests we will be making.

const HTTP_REQUEST_BASE = {
 headers: {
   Authorization: `Bearer ${AUTH_TOKEN}`,
 },
 contentType: "application/json",
 method: "GET",
 payload: null,
 muteHttpExceptions: true,
};


function submitStatement(statement) {
 let body = {
   "statement": statement,
   "warehouse_id": WAREHOUSE,
   "wait_timeout": "10s",
   "on_wait_timeout": "CONTINUE",
 };
 let request = Object.assign({}, HTTP_REQUEST_BASE, { method: "POST", payload: JSON.stringify(body) });
 let response = fetchFromUrl(`https://${HOST}/api/2.0/sql/statements`, request);
 if (response.status.state == "FAILED") {
     showError(`Submit request failed with unexpected state: ${response.status.state}`)
     return null;
 }
 return response;
}

The fetchFromUrl function is a generic function for submitting HTTP requests with minimal error handling, as shown below.

function fetchFromUrl(url, request) {
  try {
    let response = UrlFetchApp.fetch(url, request);
    let responseJson = JSON.parse(response);
    let statusCode = response.getResponseCode();
    switch (statusCode) {
      case 200:
        return responseJson;
      default:
        showError(`Error: code=${responseJson["error_code"]} message=${responseJson["message"]}`);
        return null;
    }
  } catch (error) {
    showError(`Error: error=${error}`);
    return null;
  }
}

In the checkStatus function, if the wait timeout is exceeded, we poll the server to retrieve the status of the statement execution and determine when the results are ready to fetch.

function checkStatus(statement_id) {
 let response = fetchFromUrl(`https://${HOST}/api/2.0/sql/statements/${statement_id}`, HTTP_REQUEST_BASE);
 if (response.status.state == "FAILED") {
     let error = responseJson["status"]["error"]
     showError(`Fetch request failed: code=${error["error_code"]} message=${error["message"]}`);
     return null;
 }
 return response;
}

In the handleResult function, if the statement has completed successfully and the results are available, a fetch response will always contain the first chunk of rows. The function handles the result and attempts to fetch the subsequent chunks if available.

function handleResult(manifest, result) {
 var columnNames = manifest["schema"]["columns"].map(col => col["name"]);
 var chunks = [result.data_array];


 while (result["next_chunk_internal_link"]) {
   chunk = result["next_chunk_internal_link"];
   result = fetchFromUrl(`https://${HOST}${chunk}`, HTTP_REQUEST_BASE);
   chunks.push(result["data_array"]);
 }


 return [[columnNames]].concat(chunks).flat()
}

All that is left is to connect the executeStatement function to JavaScript event handlers for the various user interface widgets, passing in the SQL statement corresponding to the user selection. The Google Apps Script documentation provides instructions on populating the spreadsheet with the returned data.

Getting started with the Databricks SQL Statement Execution API

The Databricks SQL Statement Execution API is available with the Databricks Premium and Enterprise tiers. If you already have a Databricks account, follow our tutorial (AWS | Azure), the documentation (AWS | Azure), or check our repository of code samples. If you are not an existing Databricks customer, sign up for a free trial.

The Databricks SQL Statement Execution API complements the wide range of options to connect to your Databricks SQL warehouse. Check our previous blog post to learn more about native connectivity to Python, Go, Node.js, the CLI, and ODBC/JDBC. Data managed by the Databricks Lakehouse Platform can truly be accessed from anywhere!

Join us at the Data + AI Summit 2023 to learn more about the Databricks SQL Statement Execution API and to get updates on what is coming next.

Try Databricks for free

Related posts

See all Platform Blog posts