May 24—28 | Virtual
Call for presentations
now closed


Call for Presentations Data + AI Summit 2021

Data + AI Summit will take place virtually from May 24-28. While the
program will be optimized for North America time zones, regions across
the globe can still participate because of its digital format. The call
for presentations is now closed. Data teams grapple and solve the world’s toughest data problems by using advanced data analytics, building data pipelines, and developing AI applications and machine learning models. All these require a massive amount of processing power and sophisticated systems. We are looking for technical content from the practitioners who have solved these tough data problems, using Apache Spark™, Delta Lake and Lakehouse pattern, MLflow, BI and SQL Analytics, deep learning and machine learning frameworks.

Themes and topics

Data scientists, data engineers, analysts, developers, researchers and ML
practitioners all attend Summit to learn from the world’s leading experts
on topics such as:

AI Uses Cases and New Opportunities

If you have an AI use case, case study or if you have solved a
specific problem in automating a process, device or an automaton;
recognizing and analyzing speech, video, image or text; improving
conversational interfaces like chatbots and intelligent personal
assistants or playing intelligent games — whether you used neural
networks, reinforcement learning, natural language processing,
rule-based engine — this topical category is for your use case. Share
your journey of automation with the community and tell us what’s
possible in this pervasive field helping innovate modern businesses.
You will be able to categorize your talk into different use case
scenarios including:

  • Automation, Self-Driving Automatons or Vehicles
  • Speech, Image or Video Recognition
  • Natural Language Processing (NLP) or Sentiment Analysis
  • Model Interpretability and Explainability
  • Intelligent Personal Assistant Devices or Chatbots
  • Learning and Playing Intelligent Games
  • Using AI Techniques in Health and Life Sciences, Financial
    Services or Retail
  • Recommendation Engines

Please make sure to categorize your talk if you would like to include
in a specific subtopic or category.

Apache Spark Internals and Best Practices

In this developer-focused and practitioner-oriented topic, presenters
cover technical content, best practices, and use cases across a wide
range of subtopics. Ranging from Apache Spark 3.x engine internals,
Spark performance and optimizations, extending or using Spark APIs,
Spark SQL, managing MLlib models with MLflow to constructing data
pipelines with Structured Streaming and Delta Lake, share your
innovations in areas including:

  • Structured Streaming
  • Core Spark Internals: Adaptive Query Execution, Cost-Based
    Optimizer, Code Generation, Dynamic Partition Pruning, etc.
  • Building ETL Pipelines
  • Extending Spark SQL or Spark APIs
  • Using or Adding New Spark DataSources or Data Connectors
  • Building Data Pipelines With Delta Lake

Please make sure to categorize your talk if you would like to include
in a specific subtopic or category.

Data Engineering and Data Architecture

For this thematic category, we seek speakers’ experiences in building
complex and robust data and streaming infrastructure that enables data
teams to do data analytics using Apache Spark and/or Delta Lake. In
particular, we want to hear how your data teams grappled with data
quality issues, capturing changing data, and complexities in building
end-to-end data pipelines: from ingestion to ETL to cleaning data for
consumption downstream for machine learning models or other BI and SQL
applications. If you have answers to how you architected, implemented,
monitored and deployed these data pipelines or how you combined
streaming data with historical data for visualization using SQL and BI
tools from myriad sources or used Delta Lake for your data pipelines,
then we want your stories. Some examples of tracks in this theme:

SQL Analytics and BI Using Data Warehouses and Data Lakes

The BI and analytics landscape has been evolving rapidly. Not only is
data getting bigger, but data sources are becoming more varied, and
the need for real-time reporting is growing. These trends have led
data analysts to collaborate with data engineers to do reporting and
analytics directly on the data lake rather than waiting for data to be
moved to other systems. This has been particularly true with SQL-based
analytics where technical teams have been able to use self-serve
reporting solutions. If you’ve been focused on streamlining reporting
and analytics to help your organization move faster, the data
community can benefit from your experience and expertise.

Data Science at Scale

Among data scientists, Python and its PyData ecosystem are popular
tools. While data science is a broad theme and overlaps with advanced
analytics, deep learning, machine learning and AI, this thematic
category spotlights the practice of data science using PySpark, Python
ecosystem and SparkR. Sessions can cover innovative statistical
techniques to analyze data, algorithms and systems that refine raw
data for exploratory data analysis (EDA) to garner actionable insight
with data visualization, feature engineering, statistical computing
modeling and machine learning algorithms (reinforcement, supervised
and unsupervised learning).

Machine Learning and Deep Learning Applications

Machine learning is being embedded in applications across all domains
and industry sectors. If you have implemented a real-world application
(in speech recognition, computer vision, natural language processing,
recommendation engines, forecasting and anomaly detection) in areas
including media and advertising, healthcare, financial services and
more, using any of the frameworks listed below and the techniques they
offer, this category is for you.

  • PyTorch or Fastai
  • TensorFlow/Keras
  • MXNet
  • XGBoost
  • scikit-learn
  • Other machine learning and deep learning tools

Share your technical details and implementation with the community and
tell us your gains, pain points and merits of your solutions.

Productionizing Machine Learning With MLOps Best Practices

How do you build and deploy machine learning models to a production
environment? How do you manage an entire machine learning lifecycle?
How do you update your model with new features or use feature stores?
And what are the best practices and agile data architectures that data
scientists and ML engineers employ to productionize machine learning
models? Whether your model is a deep learning model or a classical
machine learning model, how do you track experiment outcomes, train
and score your trained model with real-time data? More importantly,
how did you design and implement your MLOps processes for your ML
models, from development to production? If you have answers to these
questions, if you have used open source tools such as MLflow, if you
have addressed these challenges in your design, implementation and
deployment schemes in production, then we want to hear from you. Share
your technical details on model implementation and deployment and best
practices with the community, and tell us your gains, pain points and
merits of your solutions.

Research on Large-Scale Data Analytics and ML

Dedicated to academic and advanced industrial research, we want talks
on large-scale data analytics and machine learning systems, the
hardware that powers them (GPUs, I/O storage devices, etc.) as well as
applications of such systems for use cases like genomics, astronomy,
image scanning, disease detection, etc.

Industry and Business Use Cases

Data analytics, machine learning and AI are having a profound impact
on how organizations across industries are solving their toughest data
challenges. In this track, we’ll explore how open-source technologies,
data analytics and AI are being applied to solve business challenges
in the hottest industries including topics like:

  • Consumer Personalization
  • Customer Lifetime Value
  • Cyber Threat Prevention
  • Demand Forecasting or Simulations
  • Environmental, Social and Corporate Governance
  • Financial Risk Management
  • Fraud Prevention
  • Personalized Healthcare
  • Predictive Maintenance, and more.

Open Source Data and ML Tools

Over the past decade, open source tools have been a key enabler in
solving difficult data problems in the data community and enterprise.
With the growing maturity of these tools, support and adoption in the
community and enterprise, data teams have employed them to tackle
their data challenges: in building and scaling data pipelines,
ingesting data, ensuring data quality or orchestrating tasks within a
data pipeline, using them in data science or ML, or implementing their
usage in best practices in data engineering. If you have used open
source tools such as DBT, Presto, Apache Kafka, Apache Pulsar, Ray,
Dask, AirFlow, Kubernetes, Apache Iceberg, Apache Druid or Amundsen,
etc., to tackle tough data problems, we want to share your production
use cases with the community.

Databricks Production Use Cases

If you are a Databricks user or customer using the following
technologies in your data stack across AWS and Azure and in
production, we want to hear what, why and how. Share your insights
with the larger Databricks technical community of practitioners:

  • Use of Delta Engine/Photon
  • Use of Workspace 2.0 for collaboration
  • Use of SQL Analytics for real-time dashboards, reports and
  • Use of MLflow and MLOps capabilities in production
  • Adoption or migration of workloads to lakehouse
  • Put your favorite Databricks product here and tell us how your
    data teams use it
Do you have big ideas, innovative and impactful stories, production
use cases or case studies to share on these topics, including tips and
tricks, how-tos and best practices? Have you implemented data
architectures using the lakehouse paradigm? Or have you built the
latest and greatest features in popular open source technologies? Have
you constructed complex data pipelines using Apache Spark, Structured
Streaming and Delta Lake for doing advanced data analytics? Or have
you productionized machine learning models, built with popular ML
frameworks, at scale? If so, our virtual global Summit community of
over 70,000 would love to hear from you. So pen down your proposal for
a lightning 15-minute talk, 30-minute session or 60-minute technical
deep dive on how-to-and-why. We’d love to put your ideas, case studies
or production use cases, best practices, and technical knowledge in
front of the largest gathering of data, AI and Spark professionals.
Our virtual global Summit community would love to hear from you!

Required information

You’ll need to include the following information for your proposal:

  • Proposed title
  • Presentation overview and extended description
  • Suggested themes and topics from the above thematic categories
  • Speaker(s): Biography and headshot
  • A video or a YouTube link of you speaking. If you don’t have a
    previous talk, please record yourself explaining your suggested
    talk. This is required to complete your submission.
  • Level of difficulty of your talk: beginner (just getting started),
    intermediate (familiar with concepts and implementations),
    advanced (expert)

Tips for submitting a successful proposal

Help us understand why your presentation is the right one for Summit.
Please keep in mind that this event is by and for professionals. All
presentations and supporting materials must be respectful and
inclusive, and here is some advice on
how to write a good conference proposal.</a >

  • Be authentic. Your peers need original ideas in real-world
    scenarios, relevant examples and knowledge transfer.
  • Give your proposal a simple and straightforward title.
  • Include as much detail about the presentation as possible.
  • Keep proposals free of product, marketing or sales pitch.
  • If you are not the speaker, provide the contact information of the
    person you’re suggesting. We tend to ignore proposals submitted by
    PR agencies and require that we can reach the suggested
    participant directly. Improve the proposal’s chances of being
    accepted by working closely with the presenter(s) to write a
    jargon-free proposal that contains a clear value for attendees.
  • Keep the audience in mind: They are professional and already
    pretty smart.
  • Limit the scope: In 30 minutes, you won’t be able to cover
    “everything about framework X.” Instead, pick a useful aspect, a
    particular technique, or walk through a simple program.
  • Your talk must be technical and show code snippets or some
    demonstration of working code.
  • Explain why people will want to attend and what they’ll take away
    from it.
  • Don’t assume that your company’s name buys you credibility. If
    you’re talking about something important that you have specific
    knowledge of because of what your company does, spell that out in
    the description.
  • Does your presentation have the participation of a woman, person
    of color, or member of another group often underrepresented at a
    tech conference? Diversity is one of the factors we seriously
    consider when reviewing proposals as we seek to broaden our
    speaker roster.