Quickstart Notebook(SQL)

Loading...

Databricks in 5 minutes

Create a quickstart cluster

  1. In the sidebar, right-click the Compute button and open the link in a new window.
  2. On the Clusters page, click Create Cluster.
  3. Name the cluster Quickstart.
  4. In the Databricks Runtime Version drop-down, select 7.3 LTS (Scala 2.12, Spark 3.0.1).
  5. Click Create Cluster.

Attach the notebook to the cluster and run all commands in the notebook

  1. Return to this notebook.
  2. In the notebook menu bar, select > Quickstart.
  3. When the cluster changes from to , click Run All.

The next command creates a table from a Databricks dataset

DROP TABLE IF EXISTS diamonds;
 
CREATE TABLE diamonds
USING csv
OPTIONS (path "/databricks-datasets/Rdatasets/data-001/csv/ggplot2/diamonds.csv", header "true")
 
OK
SELECT * from diamonds
 
_c0
carat
cut
color
clarity
depth
table
price
x
y
z
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
1
0.23
Ideal
E
SI2
61.5
55
326
3.95
3.98
2.43
2
0.21
Premium
E
SI1
59.8
61
326
3.89
3.84
2.31
3
0.23
Good
E
VS1
56.9
65
327
4.05
4.07
2.31
4
0.29
Premium
I
VS2
62.4
58
334
4.2
4.23
2.63
5
0.31
Good
J
SI2
63.3
58
335
4.34
4.35
2.75
6
0.24
Very Good
J
VVS2
62.8
57
336
3.94
3.96
2.48
7
0.24
Very Good
I
VVS1
62.3
57
336
3.95
3.98
2.47
8
0.26
Very Good
H
SI1
61.9
55
337
4.07
4.11
2.53
9
0.22
Fair
E
VS2
65.1
61
337
3.87
3.78
2.49
10
0.23
Very Good
H
VS1
59.4
61
338
4
4.05
2.39
11
0.3
Good
J
SI1
64
55
339
4.25
4.28
2.73
12
0.23
Ideal
J
VS1
62.8
56
340
3.93
3.9
2.46
13
0.22
Premium
F
SI1
60.4
61
342
3.88
3.84
2.33
14
0.31
Ideal
J
SI2
62.2
54
344
4.35
4.37
2.71
15
0.2
Premium
E
SI2
60.2
62
345
3.79
3.75
2.27
16
0.32
Premium
E
I1
60.9
58
345
4.38
4.42
2.68

Truncated results, showing first 1000 rows.

%python
diamonds = spark.read.csv("/databricks-datasets/Rdatasets/data-001/csv/ggplot2/diamonds.csv", header="true", inferSchema="true")
diamonds.write.format("delta").mode("overwrite").save("/delta/diamonds")
DROP TABLE IF EXISTS diamonds;
 
CREATE TABLE diamonds USING DELTA LOCATION '/delta/diamonds/'
OK
SELECT * from diamonds
 
_c0
carat
cut
color
clarity
depth
table
price
x
y
z
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
1
0.23
Ideal
E
SI2
61.5
55
326
3.95
3.98
2.43
2
0.21
Premium
E
SI1
59.8
61
326
3.89
3.84
2.31
3
0.23
Good
E
VS1
56.9
65
327
4.05
4.07
2.31
4
0.29
Premium
I
VS2
62.4
58
334
4.2
4.23
2.63
5
0.31
Good
J
SI2
63.3
58
335
4.34
4.35
2.75
6
0.24
Very Good
J
VVS2
62.8
57
336
3.94
3.96
2.48
7
0.24
Very Good
I
VVS1
62.3
57
336
3.95
3.98
2.47
8
0.26
Very Good
H
SI1
61.9
55
337
4.07
4.11
2.53
9
0.22
Fair
E
VS2
65.1
61
337
3.87
3.78
2.49
10
0.23
Very Good
H
VS1
59.4
61
338
4
4.05
2.39
11
0.3
Good
J
SI1
64
55
339
4.25
4.28
2.73
12
0.23
Ideal
J
VS1
62.8
56
340
3.93
3.9
2.46
13
0.22
Premium
F
SI1
60.4
61
342
3.88
3.84
2.33
14
0.31
Ideal
J
SI2
62.2
54
344
4.35
4.37
2.71
15
0.2
Premium
E
SI2
60.2
62
345
3.79
3.75
2.27
16
0.32
Premium
E
I1
60.9
58
345
4.38
4.42
2.68

Truncated results, showing first 1000 rows.

The next command manipulates the data and displays the results

Specifically, the command:

  1. Selects color and price columns, averages the price, and groups and orders by color.
  2. Displays a table of the results.
SELECT color, avg(price) AS price FROM diamonds GROUP BY color ORDER BY color
0.001.0k2.0k3.0k4.0k5.0kDEFGHIJTOOLTIPcolorprice

Convert the table to a chart

Under the table, click the bar chart icon.

Repeat the same operations using Python DataFrame API.

This is a SQL notebook; by default command statements are passed to a SQL interpreter. To pass command statements to a Python interpreter, include the %python magic command.

The next command creates a DataFrame from a Databricks dataset

%python
diamonds = spark.read.csv("/databricks-datasets/Rdatasets/data-001/csv/ggplot2/diamonds.csv", header="true", inferSchema="true")

The next command manipulates the data and displays the results

%python
from pyspark.sql.functions import avg
 
display(diamonds.select("color","price").groupBy("color").agg(avg("price")).sort("color"))
0.001.0k2.0k3.0k4.0k5.0kDEFGHIJTOOLTIPcoloravg(price)