Portable Scalable Data Visualization Techniques for Apache Spark and Python Notebook-based Analytics

Download Slides

Python Notebooks are great for communicating data analysis & research but how do you port these data visualizations between the many available platforms (Jupyter, Databricks, Zeppelin, Colab,…). Also learn about how to scale up your visualizations using Spark. This talk will address:

  • 6-8 strategies to render Matplotlib that generalize well
  • Reviewing the landscape of Python visualization packages and calling out gotchas
  • Headless rendering and how to scale your visualization from one to 10,000
  • How to create a cool animation
  • Connecting your big data via Spark to these visualizations

Data visualization is the only way most analytics consumers understand data science and big data. It’s challenging to visualize big data, and harder to get this to work across multiple open platforms. Double down on the difficulty for rendering 100,000 visualizations needed for ML Operations automation and data driven animations. Popular Python based Matplotlib, D3.js based, Bokeh and high density visualization packages and best ways to integrate those with massive data sets managed by Spark will be the subject of our presentation. We will demonstrate common strategies (image, svg, HTML embed) and gotchas common with integrating Spark, Jupyter and non-Jupyter environments. Headless data visualization strategies are used to automate Machine Learning Operations and data driven animations. A Python notebook will be the center of this demo. The strategies presented are accessible by those with a passing experience with Python based data visualization packages.

Try Databricks
« back
About Douglas Moore


I'm passionate about helping customers find value in data analytics and helping the people I work better succeed. 25+ year data veteran, ranging from embedded systems to massive cloud based data lakes. My early career interest centered around producing 3D animations of Finite Element Modeled Elastic Waves. Career wise, I came for the data visualizations and stayed for the computation and data. Past roles have included: Solutions Architect, Data Architect, CTO, Engineer. Current Specialties: Big Data Strategy & Architecture, Data Lakes, Streaming, Delta Lake, Spark, and Databricks.