Skip to main content
Company Blog

We are proud to announce Apache Spark Essentials - the first in a series of free technical workshops tailored for the public sector.  Data scientists, engineers, and analysts who attend Session 1: Spark Essentials, will be introduced to the Spark platform and ecosystem. It provides a hands-on introduction of how to effectively use Spark’s various processing engines and higher-level libraries to tackle a unified use case.

Sign up today at Apache Spark for Public Sector workshop as space is limited:

  • Date: Friday, January 29th
  • Time: 9:00 a.m. to 4:00 p.m.
  • Location: CACI Inc, 14360 Newbrook Dr, Chantilly, VA

Spark-Essentials-public-sector

Subsequent sessions will follow, covering in-depth exercises on machine learning, real-time stream analysis, and graph processing.

Agenda

  • Learn how to mix usage of different Spark engines for sophisticated analysis:
    • DataFrames + Spark SQL
    • RDDs
    • Spark Streaming
    • MLlib (Machine Learning)
    • GraphX
  • ETL to/from various data sources: S3, HDFS, Parquet, Hive, Cassandra, MySQL, MongoDB, neo4j, etc.
  • Launch and monitor Spark clusters in the Amazon Cloud
  • Understand Spark Architecture fundamentals
  • Use the Spark UI to analyze the performance of a job
  • Use various visualization tools (Databricks native, matplotlib, Google Charts, D3.js, etc) to surface insights

For more information, check out Spark for the Public Sector: A Workshop Series from Databricks.