データ・ AI ツールをレイクハウスに接続
Databricks 認定のデータ・AI ソリューションで新たなユースケースを構築
Databricks のパートナーになることで、分析結果や知見をより迅速にお客様に提供できるユニークなポジショニングが可能になります。Databricks の開発者およびパートナー向けのリソースと、クラウドベースのオープンなプラットフォームを活用して、共にビジネスを発展させましょう！パートナーになる
「Partner Connect での連携は、Databricks との長期にわたるパートナーシップを基盤とするものです。Partner Connect によって、多くのユーザーによりよいエクスペリエンスを提供できるようになりました。Fivetran を既に利用するユーザーも、Partner Connect を介して初めて利用するユーザーも、数百種のデータソースをレイクハウスに容易に接続することで、データからの知見の抽出、分析ユースケースの探索を加速させ、レイクハウスのデータの価値を最大化できます。」
Fivetran CEO ジョージ・フレイザー（George Fraser）氏
Databricks から Fivetran に接続することで、データの取得やメンテナンスがシンプルになります。Fivetran は、180 以上のデータソースに対応するフルマネージドコネクタを提供しており、データソースの変更データキャプチャもサポートしています。
Users can now discover and connect to Fivetran with a few clicks in Partner Connect
Clicking into Fivetran in Partner Connect starts an automated workflow between the two products where:
– Databricks automatically provisions a SQL Endpoint and associated credentials for Fivetran to interact with, with best practices baked into the configuration of Endpoint.
– Databricks passes the user’s identity and the SQL endpoint configuration to Fivetran automatically via a secure API
We are then redirected to Fivetran’s product to either sign up for a Fivetran trial or to login to Fivetran if we are an existing user. Fivetran automatically setups a trial account.
Fivetran recognizes this is a user who came from Databricks partner connect and automatically creates a Databricks destination that is configured to ingest into Delta via the SQL Endpoint that was auto-configured by Partner Connect (it would be helpful to emphasize this by pausing video & zooming in or highlighting the “Databricks Partner – demo_cloud” icon at the top left to emphasize the automated Databricks destination that was setup)
With the Databricks Delta destination already setup the user now chooses which source they want to ingest from – we will use Salesforce as a source (note the user is free to choose any of the hundreds of sources that Fivetran supports). The user authenticates to the Salesforce source, chooses the Salesforce objects they want to ingest into Databricks Delta (in this case the Acccount & Contact objects) and starts the Initial Sync
By clicking on logs we can see that Fivetran is using APIs to read data from Salesforce and is then ingesting that data into Databricks Delta via the SQL endpoint that was automatically stood up
The synch frequency from Salesforce to Databricks Delta can also be configured from Fivetran
If we click on Destination, we can see the details of the SQL endpoint configuration that was automatically created as a result of coming into Fivetran via Databricks Partner Connect – this automation has saved the user dozens of manual steps & copying/pasting of configuration they would have had to do if they manually setup the connection. It also protects the user from making any unintentional configuration errors and spending time debugging those errors
Coming back into the Databricks UI, we can see the SQL Endpoint that was automatically created by Partner Connect for Fivetran.
Now that the Salesforce data is seamlessly flowing in from Fivetran to Databricks Delta via this SQL Endpoint, we can view the ingested Delta tables in the Databricks Data Explorer
We can now query these Salesforce tables via SQL queries and analyze the data as it flows in from Fivetran for downstream BI analytics and blending with other datasets in the Lakehouse
Power BI Demo
Use the native connector to start getting insights from all kinds of data — both structured and unstructured — then communicate those insights visually through tables, charts, maps, KPIs, and dashboards.
To start your analysis in Power BI, connect Power BI Desktop to the Databricks SQL endpoint.
Click on Power BI in Databricks Partner Connect to initiate a simplified workflow
Select a SQL Endpoint and download the connection file. Connecting to Power BI Desktop is easy as the connection file comes pre configured with the required details to connect to the Databricks cluster.
To get started,
– Generate a Databricks personal access token
– Install Power BI and the Databricks ODBC Driver.
On opening the connection file,
– Power BI automatically recognizes the Databricks SQL endpoint connection details that were pre configured in the connection file
– Power BI prompts you for your access credentials.
Start building your analysis in Power BI
– Select the database and table you want to analyze
– Drag and drop the required fields and build your visualization
Tableau and Databricks empower all users with a data Lakehouse for modern analytics.
Clicking on Tableau in partner connect starts a simplified workflow for using Tableau Desktop with Databricks
You can select a SQL Endpoint and download a connection file
The connection file comes pre configured with all the details that you need to connect to the cluster.
To get started with Tableau Desktop from Databricks Partner Connect,
– Generate a Databricks personal access token
– Install Tableau and the Databricks ODBC Driver.
On opening the connection file,
– Tableau desktop automatically recognizes the SQL endpoint connection details that pre configured in the connection file
– Tableau desktop prompts you for your access credentials.
You can now focus on building your dashboard in Tableau desktop
– Select the Data Source tab
– Select the database and table
– Create a Sheet,
– Drag and drop the required fields
– Then build the visualizations and dashboards
データの取り込みから変換、供給まで、Delta Lake を活用したデータ処理の全工程がシンプルになり、組織全体でのデータ共有を促進できます。150 以上のデータソースに対応するコネクタが事前構築されており、変更データキャプチャもサポートされています。
Users can now discover and connect to Rivery with a few clicks in Partner Connect
Clicking into Rivery in Partner Connect starts an automated workflow between the two products where:
– Databricks automatically provisions a SQL Endpoint and associated credentials for Rivery to interact with, with best practices baked into the configuration of Endpoint.
– Databricks passes the user’s identity and the SQL endpoint configuration to Rivery automatically via a secure API
We are then redirected to Rivery’s product console to either sign up for a Rivery trial or to login to Rivery if we are an existing user. Rivery automatically sets up a trial account.
Now we are ready to leverage Rivery’s native data source connectors to load data into Delta Lake.
Rivery recognizes this is a user who came from Databricks partner connect and automatically creates a Databricks destination that is configured to ingest into Delta via the SQL Endpoint that was auto-configured by Partner Connect
Now, go to Connections. It includes connections of data sources and targets. We have one target connection there which is Databricks SQL.
With the Databricks Delta destination already set up the user now chooses which source they want to ingest from – we will use Salesforce CRM as a source (note the user is free to choose 150+ pre-built data source connectors that Rivery supports). The user authenticates to the Salesforce CRM source, save the connection after it passes the test. It shows up in the Connections list.
We click “Create New River” and select “Source to Target” to start the data ingestion.
– Choose Salesforce CRM as our data source.It automatically populates the Salesforce connection that we setup earlier.
– For ingestion configuration, you can choose to load multiple tables simultaneously or only load one table from Salesforce. In this demo, we only select one table which is the “Account” table. Save it.
– On the “Target”. For the ingestion to the already-set Databricks Delta destination, a user can enter an existing database name on the Databricks side or create a new database.
We enter our own database name and add table prefix. And choose “Overwrite” as default ingestion mode.
– Save and click the “Run” button to start the ingestion workflow.
Once ingestion has completed, we can come back the Databricks UI to view the ingested Delta tables in the Databricks SQL Data Explorer
We can see the schema, sample data as well as other detailed information of this table. Easy and straightforward.
We can now query these Salesforce tables via SQL queries and analyze the data as it flows in from Rivery for downstream BI analytics and blending with other datasets in the Lakehouse
Use the Labelbox Connector for Databricks to easily prepare unstructured data for AI and Analytics in the Lakehouse. Labelbox supports annotation of images, text, video, sound, and geospatial tiled images.
Click on Labelbox in Databricks Partner Connect
– A cluster will automatically be created so you can easily run a tutorial notebook that we will also provide
– Next, verify the email address for your Labelbox trial
Labelbox deposits a tutorial notebook into your shared directory in your Databricks workspace.
You’ll also get a link to that file right here.
Finish the trial sign up.
Now you’re in Labelbox with a free trial.
Let’s go back into Databricks and check out the tutorial notebook.
If I go into my workspace and click “Shared” I will find the Labelbox Demo folder. In that folder is a single notebook.
This tutorial notebook guides you through a typical workflow: Identify unstructured data in your data lake and pass the URLs to Labelbox for annotation. You’ll be able to annotate your dataset and get the labels back in Databricks for AI and analytics.
The first thing we need to do is to connect to our cluster. There’s the Labelbox cluster that was just created. I’ll run the first line to install the Labelbox SDK and the Labelbox Connector for Databricks.
This next cell requires an API key.
Navigate back to my Labelbox trial, click “Account”, “API” and then create a demo API key.
Copy that key and navigate back to Databricks and include it in the cell. We recommend using the Databricks Secrets API for this, but for this demo we’re simply pasting in the key.
For this notebook demo we’re going to seed your Databricks account with a table of sample images, but you can easily use assets from your Cloud Storage like AWS S3, Azure Blob, or Google Cloud Storage.
After I run these cells I’ll have a table with file names and URLs to image assets.
Then we take that table and pass it to Labelbox to create the dataset in Labelbox.
There’s the dataset with all of our demo images.
Before we can label the dataset, we must create a new project with an ontology. The ontology describes what kind of objects and classifications you are interested in annotating.
Once the project is ready, we can go in and label a few items.
Now that we have some annotated data, we can go back into Databricks and run the final command to bring our annotations into Databricks.
Next we can go back into our notebook and run the final command to bring these annotations into Databricks for downstream use.
The Label column includes a JSON of all the objects and classifications we placed on that asset.
You can store these annotations into Delta Lake and then train your AI models.
This notebook walks you through the basics of the Labelbox Connector for Databricks. Please check out our documentation for more information about advanced capabilities like Model Assisted Labeling, how to use the Catalog to locate and prioritize assets for labeling, and how to use Model Diagnostics to look for areas to improve your model.
Databricks から、ローコードのデータエンジニアリングプラットフォーム Prophecy にワンクリックで容易に接続。ドラッグ＆ドロップの視覚的な操作で、Apache Spark™、Delta パイプラインをインタラクティブに構築・デプロイできます。
– From here, open the partner connect page and choose Prophecy to sign in.
– When creating a Prophecy account, Databricks will automatically establish a secure connection to run your pipelines directly on your workspace.
With your email credentials being passed along, you only have to choose a new password to sign up for Prophecy.
Now that you have signed into Prophecy, let’s see how easy it is to develop and run your Spark data pipelines.
Let’s choose one of the “get started” example pipelines and open the workflow.
This shows a visual canvas on which we can start building our pipeline.
Let’s start by spinning up a new Databricks cluster.
Now that our cluster has spun up, with just a single click we can go to the Databricks interface and see our cluster on your workspace.
Coming back to the Prophecy UI, let’s explore our pipeline. Here we are reading two data sources of our ‘Customers’ and ‘Orders’ , joining them together…
…. and aggregating them by summing up the amounts column.
Later, we are sorting the data and writing it directly into a Delta table
With Prophecy, we can directly run our workflow with just a single click to see the data after each step
We can see our ‘Customer’ data, ‘Orders’ data, data joined together….
…..the aggregated field with the summed amounts…..
.. and finally, our sorted data that is written to our target Delta table
Now, let’s modify our pipeline by cleaning some of the fields
To do that, we can just drag and drop a new ‘Gem’ called ‘Reformat’…..
… connect it within our existing pipeline….
…. and choose the columns. We can add a new column called ‘full name’, concatenate our first and last name, and add a cleaned up amount column that will have the rounded up value.
Let’s also rename this Gem ‘Cleanup’.
With that, we can directly our workflow and explore the data directly after the cleanup column.
As you see, we have very easily added a Cleanup step to our pipeline.
But Prophecy is not just a visual editor. Behind the scenes, everything is being saved as a high quality Spark code that you can edit.
Additionally, Prophecy allows you to follow the best software engineering practices by directly storing the code on your Git.
Here, we can see our workflow with the latest changes directly as Scala code on Git.