EU - Databricks


  • DAYS




Data and AI are joined at the hip: the best AI applications require massive amounts of constantly updated training data to build state-of-the-art models. Apache Spark has been the only unified analytics engine that combines large-scale data processing with the execution of state-of-the-art machine learning and AI algorithms.

At this conference, we want to cover great data engineering and data science content along with best practices for productionizing AI: keeping training data fresh with stream processing, monitoring quality, testing, and serving models at massive scale. We will also have deep dive sessions on popular software frameworks—e.g., TensorFlow, SciKit-Learn, Keras, PyTorch, DeepLearning4J, BigDL, and Deep Learning Pipelines.

Do you have big ideas to share on these themes and topics with the community members embarking on the same Spark + AI journey? Do you have a new developer tool or an AI application to showcase? If so, we’d love to put your ideas, case studies, best practices, and technical knowledge in front of the largest gathering of Spark, AI, and big data professionals.

The CFP will be open from April 5th – May 6th.

Talk Tracks:


In this developer focused theme, presenters cover technical content across a wide range of topics ranging from Apache Spark engine internals, machine learning to streaming.

You will be able to categorize your talk into 3 different sections including:

  • Structured Streaming and Continuous Applications
  • Core Spark
  • ETL

Please make sure to categorize your talk if you would like to be included in a specific section.

AI Uses Cases and New Opportunities

The tremendous rise and resurgence of AI is because of advances in technology: in computer hardware with dedicated use of GPUs, ASICs, and FPGAs; in the processing abilities of big data at scale and speed with distributed and unified engines like Spark; in better predictions with machine learning algorithms; and in garnering data from proliferation of devices.

All this has led data scientists and engineers to write sophisticated applications automating problems in the enterprise. If you have an AI use case, case study or solved a specific problem in automating a process, device or an automaton; recognizing and analysing speech, video, image or text; improving conversational interfaces like chat bots and intelligent personal assistants or playing intelligent games—whether you used neural networks, natural language processing, rule-based engine—this thematic category is for your use case.

Share your journey of automation with the Spark and AI community and share what’s possible in this pervasive field helping innovate modern businesses.

You will be able to categorize your talk into different use case scenarios including:

    • Automation, self-driving automatons or vehicles
    • Speech, image, or video recognition
    • Intelligent personal assistant devices or chatbots
    • Learning and playing intelligent games
    • Recommendation engines
    • Other

Please make sure to categorize your talk if you would like to included in a specific section.

Deep Learning Techniques

As a class of machine learning algorithms, deep learning has fueled development of AI and predictive analytic applications that learn from data and transferred knowledge rather than accomplish a specific task well such as playing a complicated game like AlphaGo or Chess.

Myriad open source and proprietary frameworks—TensorFlow, Keras, CNTK, Caffee, Torch, PyTorch, DeepLearning4J, MXNet, BigDL, TensorFlowOnSpark, and Deep Learning Pipelines, etc—have flourished. By using existing models or building their own neural networks, data scientists and engineers have developed real-world applications employing these frameworks.

If you have implemented a real-world application—in speech recognition, image and video processing, natural language processing, recommendation engines, ad tech or mobile advertising—using any of these frameworks and the techniques they offer, this category is for you.

Share your technical details and implementation with the Spark + AI community and tell us your gains, pains points, and merits of your solutions.

You will be able to categorize your talk into different use of DL frameworks

    • TensorFlow
    • Keras
    • CNTK
    • Caffee
    • Torch
    • PyTorch
    • DeepLearning4J
    • MXNet
    • BigDL
    • Deep Learning Pipelines
    • TensorFlowOnSpark
    • Other

Productionizing Machine Learning

How do you build and deploy machine learning models to a production environment? How do you embed what you’ve learned into customer facing data applications? How do update your model with new features? And what are the best practices and agile data architectures that data scientists and engineers employ to productionize machine learning models?

Whether your model is a deep learning model or strictly a Spark machine learning model, how do you score your trained model with real-time data?

If you have answers to these questions, if you have addressed them in your design, implementation, and deployment schemes in production then we want to hear from you.

Share your technical details and model implementation and deployment with the Spark + AI community, and tell us your gains, pains points, and merits of your solutions.

Deep Dives

As the name suggests, this topic will be is a 60-min slot that allows a presenter to go deeper into the topic than the normal 30 min sessions allow. The session should be highly technical with some demonstration. For example “Cost-Based Optimizer in Apache Spark 2.2”, Deep Dive into Deep Learning Pipelines or Easy, Scalable, Fault-Tolerant Stream Processing with Structured Streaming in Apache Spark.

This thematic category is not only restricted to Spark, though. It can cover deep learning practices and techniques as well.


Dedicated to academic and advanced industrial research, we want talks spanning systems research  involving and extending Spark + AI in use cases (e.g. genomics, GPUs, I/O storage devices, MPP, self-operating automatons, image scanning and disease detection in cancer etc.).

Data Science

While Data Science is a broad theme and overlaps with deep learning, machine learning and AI, this thematic category spotlights the practice of data science using Spark, including SparkR. Sessions cover innovative techniques, algorithms, and systems that refine raw data into actionable insight using visualization, statistics, feature engineering, and machine learning algorithms, from supervised and unsupervised learning.


This theme features use cases on how businesses deploy Spark and the lessons learned. Talks offer an exploration into business use cases across industries, ROI, best practices, relevant business metrics, compliance requirements for specific industries, and customer testimonials.

Apache Spark Experience and Use Cases

For this thematic category, we seek for organizations’ experiences using Spark, developing on Spark, and running Spark in production. We are also interested to hear about migrating experiences from Spark 1.x to Spark 2.x, moving data workloads from on-premise to cloud, and migrating large workloads from other big data processing engines to Spark.

Apache Spark Ecosystem

Spark has an expanding ecosystem. Everyday we hear of new data store or streaming engine or in-memory data store integrated with Spark. In this topic, we want to feature open source and proprietary applications, libraries, or frameworks in the Spark ecosystem: how they interact and connect with Spark and how they use Data Source APIs to implement connectors.

Python and Advanced Analytics

This theme is dedicated to talks regarding the specific use of Python and scalable data, not only in writing data science and machine learning applications but also writing ETL Spark applications. If you have a use case implemented in PySpark and you wish to share it with the Python user community, this thematic category is for you. If you have libraries or Spark packages you have developed in Python and wish to share with the community,  share with the Spark + AI community.