Call for Presentations Spark + AI Summit 2019

In June of 2018, we expanded the summit’s scope and renamed it as Spark + AI Summit. Our expanded focus incorporates unified aspects of data and AI, allowing practitioners to share their innovative applications that require big data, Apache Spark, and state-of-the-art AI technology: from autonomous vehicles to voice and image recognition; from intelligent chatbots and new deep learning frameworks to machine learning algorithms.

Do you have novel ideas to share on the suggested themes with community members embarking on the Spark + AI journey? Do you have a compelling Spark-related tool, new deep learning framework or technique, or an AI application to showcase?

If so, we’d love to put your creativity, novelty, case studies, use cases, research, best practices, or technical knowledge in front of the largest gathering of Spark, AI, and big data professionals and innovators.

Suggested Themes and Topics

These topics are just guidelines and suggestions—we are open to your creativity, so surprise us The CFP will be open September 17 – October 21.


In this developer focused theme, presenters cover technical content across a wide range of topics ranging from Spark engine internals to APIs extensions or machine learning algorithms.

You will be able to categorize your talk into different sections including:

  • Core Spark Internals
  • Spark SQL Engine and Cost Based Optimizer Extensions
  • Extensions to Spark RDDs, DataFrames, Datasets, and MLlib APIs
  • ETL

Please make sure to categorize your talk if you would like to included in a specific section.

AI Uses Cases and New Opportunities

The tremendous rise and resurgence of AI is because of advances in technology: in computer hardware with dedicated use of GPUs, ASICs, and FPGAs; in the processing abilities of big data at scale and unified engines like Spark; in efficient machine learning algorithms; and in garnering massive data from proliferation of devices.

All this has led data scientists and engineers to write sophisticated applications automating problems in the enterprise. If you have an AI use case, case study or solved a specific problem in automating a process or an automaton; in recognizing and analysing speech, video, image or text; in improving conversational interfaces like chat bots and intelligent personal assistants or playing intelligent games—whether you used neural networks, natural language processing, rule-based engine—this thematic category is for you.

Share your journey of automation with the Spark and AI community and share what’s possible in this pervasive field helping innovate modern businesses.

You will be able to categorize your talk into different use case scenarios including:

  • Automation, self-driving automatons or vehicles
  • Speech, image, or video recognition
  • Intelligent personal assistant devices or chatbots
  • Learning and playing intelligent games
  • Recommendation engines
  • Other

Please make sure to categorize your talk if you would like to be included in a specific section.

Deep Learning Techniques

As a class of machine learning techniques, deep learning has fueled development of AI and predictive analytic applications that learn from big data and transferred knowledge.

Myriad open source and proprietary frameworks—TensorFlow, Keras, CNTK, Caffe2, PyTorch, DeepLearning4J, MXNet, BigDL, TensorFlowOnSpark, and Deep Learning Pipelines, etc—have flourished. By using existing models or building their own neural networks, data scientists and engineers have developed real-world applications employing these frameworks.

If you have implemented a real-world application—in speech recognition, image and video processing, natural language processing, or recommendation engines—using any of these frameworks and the techniques they offer, this category is for you.

Share your technical details and implementation with the Spark + AI community and tell us your gains, pains points, and merits of your solutions.

You will be able to categorize your talk into different use of DL frameworks:

  • TensorFlow
  • Keras
  • CNTK
  • Caffe2
  • Theano
  • PyTorch
  • DeepLearning4J
  • MXNet
  • BigDL
  • Deep Learning Pipelines
  • TensorFlowOnSpark
  • Other


Productionizing Machine Learning

How do you build and deploy machine learning models to a production environment? How do you track or reproduce experiments? How do update your model with new features? And what are the best practices and agile data architectures that data scientists and engineers employ to productionize machine learning models?

Whether your model is a deep learning model or strictly a machine learning model, how do you serve and score your trained model with real-time data?

If you have answers to these questions, if you have addressed them in your design, implementation, and deployment schemes in production then we want to hear from you.

Share your technical details and model implementation and deployment with the Spark + AI community, and tell us your gains, pains points, and merits of your solutions.

Deep Dives

As the name suggests, this topic will be a 60-min slot that allows a presenter to go deeper into the topic than the normal 30 min sessions allow. The session should be highly technical with some demonstration. For example Cost-Based Optimizer in Apache Spark 2.“, Deep Dive into Deep Learning Pipelines or Easy, Scalable, Fault-Tolerant Stream Processing with Structured Streaming in Apache Spark.

This thematic category is not only restricted to Spark, though. It can cover deep learning practices and techniques, too.

Research and/or Hardware in the Cloud

Dedicated to academic and advanced industrial research, we want talks spanning systems research involving and extending Spark + AI in use cases (e.g., genomics, GPUs, I/O storage devices, MPP, self-operating automatons, image scanning and disease detection in cancer etc.).

As cloud computing usage grows, so does the demand for larger and faster storage for scalable distributed computing. New hardware vendors have met these new demands with new hardware.

We want you to share your research or showcase use cases and examples of using new hardware in the cloud at scale.

Data Science

While Data Science is a broad theme and overlaps with statistics, deep learning, machine learning, and AI, this thematic category spotlights the practice of data science using Spark, including SparkR, in the enterprises and industries. Sessions cover innovative techniques, algorithms, statistical models, and systems that refine raw data into actionable insights using visualization, statistics, feature engineering, and machine learning algorithms, from supervised and unsupervised learning.


Dedicated to how businesses deploy Spark and the lessons learned, talks in this session can offer an exploration into business use cases across industries, ROI, best practices, relevant business metrics, compliance requirements for specific industries, and customer testimonials.

Apache Spark Use Cases and Ecosystem

For this thematic category, we seek organizations’ use cases using Spark , developing on Spark, and running Spark in production. We are also interested to hear about migrating experiences to Apache Spark, moving data workloads from on-premise to cloud, or migrating workloads from other big data processing engines to Spark.

Because Spark has an expanding ecosystem, we hear of new data stores or in-memory data store integrated with Spark.

We want to feature open source and proprietary applications, libraries, data stores, or frameworks in the Spark ecosystem: how they interact and connect with Spark and how they use Data Source APIs to implement connectors or use Databricks Delta to accomplish fast and reliable analytics or architect end-to-end workflows in ETL.

So if you have a Spark use cases in your industry or applications that extend Spark’s ecosystem, please share with us.

Python and Advanced Analytics

Since the release of Apache Spark 2.2, Spark users can easily install PySpark via PyPi and use it to develop advanced analytics applications. Dedicated to use of Python on scalable data, not only for data science or machine learning applications but also writing ETL Spark applications, this track is for Python lovers.

If you have a use case, libraries or Spark packages you have developed in Python and wish to share with the community, then submit your talk here.

Structured Streaming and Continuous Applications

Structured Streaming is widely used as a fault-tolerant distributed streaming engine to write end-to-end Continuous Applications. An integral part of real-time data ingestion, ETL or SQL analytics, for both batch and real-time data, developers employ Structured Streaming to develop continuous applications that span from IoT analytics to real-time fraud, anomaly or threat detection; from integrating ML model scoring with real-time data ingestion to analysing and monitoring streaming data from devices, turbines, oil rigs, factory floors, atomic accelerators, weather stations etc.

If you want to share your continuous application using Apache Spark’s Structured Streaming APIs or Databricks Delta, share your knowledge with the community and submit your abstract for this theme.


Required information

You’ll need to include the following information for your proposal:

  • Proposed title
  • Presentation overview and extended description
  • Suggested themes and topics from above thematic categories
  • Speaker(s): Biography and headshot
  • A video or a youtube link to the video of the speaker
  • Level of difficulty of your talk (beginner, intermediate, advanced)

Tips for submitting a successful proposal

Help us understand why your presentation is the right one for Spark + AI Summit. Please keep in mind that this event is by and for professionals. All presentations and supporting materials must be respectful and inclusive.

  • Be authentic. Your peers need original ideas in real-world scenarios, relevant examples, and knowledge transfer.
  • Give your proposal a simple and straightforward title.
  • Include as much detail about the presentation as possible.
  • Keep proposals free of marketing and sales.
  • If you are not the speaker, provide the contact information of the person you’re suggesting. We tend to ignore proposals submitted by PR agencies and require that we can reach the suggested participant directly. Improve the proposal’s chances of being accepted by working closely with the presenter(s) to write a jargon-free proposal that contains a clear value for attendees.
  • Keep the audience in mind: they are professional and already pretty smart.
  • Limit the scope: in 30 minutes, you won’t be able to cover ‘everything about framework X’. Instead, pick a useful aspect, a particular technique, or walk through a simple program.
  • Explain why people will want to attend and what they’ll take away from it.
  • Don’t assume that your company’s name buys you credibility. If you’re talking about something important that you have specific knowledge of because of what your company does, spell that out in the description.
  • Does your presentation have the participation of a woman, person of color, or member of another group often underrepresented at a tech conference? Diversity is one of the factors we seriously consider when reviewing proposals as we seek to broaden our speaker roster.