Analyzing Algorand Blockchain Data with Databricks Delta
Algorand is a public, decentralized blockchain system that uses a proof of stake consensus protocol. It is fast and energy-efficient, with a transaction commit time under 5 seconds and throughput of one thousand transactions per second. The Algorand system is composed of a network of distributed nodes that work collaboratively to process transactions and add...
Measuring Advertising Effectiveness with Sales Forecasting and Attribution
Click below to download the notebooks for this solution accelerator: Campaign Effectiveness -- ETL Campaign Effectiveness -- Machine Learning How do you connect the impact of marketing and your ad spend toward driving sales? As the advertising landscape continues to evolve, advertisers are finding it increasingly challenging to efficiently pinpoint the impact of various revenue-generating...
Automate Azure Databricks Platform Provisioning and Configuration
Table of Contents Introduction Automation options Common workflow Pre-Requisites Create Azure Resource Group and Virtual Network Provision Azure Application / Service Principal Assign Role to Service Principal Configure Postman Environment Provision Azure Databricks Workspace Generate AAD Access Token Deploy Workspace using the ARM template Get workspace URL Generate Access Token for Auth Generate AAD Access...
Announcing Databricks Labs Terraform integration on AWS and Azure
We are pleased to announce integration for deploying and managing Databricks environments on Microsoft Azure and Amazon Web Services (AWS) with HashiCorp Terraform. It is a popular open source tool for creating safe and predictable cloud infrastructure across several cloud providers. With this release, our customers can manage their entire Databricks workspaces along with the...
Improving Public Health Surveillance During COVID-19 with Data Analytics and AI
As the leader of the State and Local Government business at Databricks, I get to see what governments all over the U.S. are doing to address the Novel Coronavirus and COVID-19 crisis. I am continually inspired by the work of public servants as they go about their business to save lives and address this crisis....
Profit-Driven Retention Management with Machine Learning
Companies with the highest loyalty ratings and retention rates grew revenues 250% faster than their industry peers and delivered two to five times the shareholder returns over a 10 year period. Earning loyalty and getting the largest number of customers to stick around is something that is in the best interest of both a company...
A look at the new Structured Streaming UI in Apache Spark 3.0
This is a guest community post from Genmao Yu, a software engineer at Alibaba. Structured Streaming was initially introduced in Apache Spark 2.0. It has proven to be the best platform for building distributed stream processing applications. The unification of SQL/Dataset/DataFrame APIs and Spark’s built-in functions makes it easy for developers to achieve their complex...
How to Extract Market Drivers at Scale Using Alternative Data
Watch the on-demand webinar Alternative Data Analytics with Python for a demonstration of the solution discussed in this blog and/or download the following notebooks to try it yourself. Stock Analysis - Plant-based Meat Historical Data GDELT News Source In Lakehouse Text Analytics on GDELT Alternative Data Time Series Foot Traffic Forecasting Introduction Why Alternative data...
Allow Simple Cluster Creation with Full Admin Control Using Cluster Policies
What is a Databricks cluster policy? A Databricks cluster policy is a template that restricts the way users interact with cluster configuration. Today, any user with cluster creation permissions is able to launch an Apache Spark™ cluster with any configuration. This leads to a few issues: Administrators are forced to choose between control and flexibility....
Introducing Delta Engine
Today, we announced Delta Engine, which ties together a 100% Apache Spark-compatible vectorized query engine to take advantage of modern CPU architecture with optimizations to Spark 3.0’s query optimizer and caching capabilities that were launched as part of Databricks Runtime 7.0. Together, these features significantly accelerate query performance on data lakes, especially those enabled by...