Databricks on Alibaba

Databricks DataInsight

Databricks DataInsight

Databricks Data Insight is a fully managed platform for data and analytics based on Apache SparkTM. DataInsight is built on the Databricks Runtime and Delta Lake. Integrated with Alibaba Cloud services, it ensures data security and allows you to configure monitoring and alert policies, as well as dynamic cluster scaling. It meets the analytics needs of data analysts, data engineers, and data scientists.

Better performance

Databricks Runtime provides a 50x improvement over open-source Apache SparkTM

Streaming & Batch integration

Databricks Delta Lake provides ACID transaction capabilities for data lake analytics, processing both batch and streaming datasets

Collaborative analysis

Databricks DataInsight meets the analytics needs of data scientists, data engineers and business analysts, and provides an interactive and collaborative Notebook environment

Real-time Data Insight

Separate computing and storage reduces data redundancy and enables data access for multiple audiences, reducing data storage costs, and providing independent scalability

A fully managed analytics platform

Quickly start up fully managed clusters with  simple operation and pay for only what is used.

Cluster size

Set the number of nodes according to job needs, with high availability cluster support.

Instance selection

Supports three instance type families of ECS general type, computing type and memory type.

Interactive collaborative work

Multiple user roles share data and collaborate interactively.

Notebook

A collaborative work space that provides interactive job execution mode, supports Apache Spark, PySpark, Spark R and Spark SQL jobs, with visual display of analytics results.

Unified metadata

Meta-information of databases and tables can shared between clusters without duplication.

Fully compatible with Apache Spark ecosystem

100% compatible with open source Apache Spark.

Databricks Runtime

Performance optimized Databricks runtime based on Apache Spark. I/O optimized for Alibaba Cloud OSS, providing a faster and more efficient analytics engine.

Databricks Delta Lake

An optimized version of Delta Lake integrated with Alibaba Cloud Services.

Enterprise security

Integrated with Alibaba Cloud RAM to control permissions based on users and roles to ensure data security.

Big Data Analysis Engine That Unifies Batch and Stream Processing

Deeply integrated with Alibaba Cloud services and features, such as the data governance and data lineage of DataWorks and Machine Learning Platform for AI (PAI), to provide a more comprehensive data solution

Databricks DataInsight typical architecture

Deeply integrate with Alibaba Cloud products to build a real-time/offline data warehouse.

Key Roles

  • Data collection
    Receive real-time streaming data and batch data on external cloud storage.
  • Data ETL
    Continuously and efficiently process incremental data, support data rollback and deletion, and provide ACID transactional guarantee.
  • BI data analysis
    Support Ad hoc queries, seamlessly integrated with a variety of BI analysis tools.
  • AI data exploration
    Provide a complete machine learning platform.