We are proud to announce Apache Spark Essentials - the first in a series of free technical workshops tailored for the public sector. Data scientists, engineers, and analysts who attend Session 1: Spark Essentials, will be introduced to the Spark platform and ecosystem. It provides a hands-on introduction of how to effectively use Spark’s various processing engines and higher-level libraries to tackle a unified use case.
Sign up today at Apache Spark for Public Sector workshop as space is limited:
- Date: Friday, January 29th
- Time: 9:00 a.m. to 4:00 p.m.
- Location: CACI Inc, 14360 Newbrook Dr, Chantilly, VA
Subsequent sessions will follow, covering in-depth exercises on machine learning, real-time stream analysis, and graph processing.
Agenda
- Learn how to mix usage of different Spark engines for sophisticated analysis:
- DataFrames + Spark SQL
- RDDs
- Spark Streaming
- MLlib (Machine Learning)
- GraphX
- ETL to/from various data sources: S3, HDFS, Parquet, Hive, Cassandra, MySQL, MongoDB, neo4j, etc.
- Launch and monitor Spark clusters in the Amazon Cloud
- Understand Spark Architecture fundamentals
- Use the Spark UI to analyze the performance of a job
- Use various visualization tools (Databricks native, matplotlib, Google Charts, D3.js, etc) to surface insights
For more information, check out Spark for the Public Sector: A Workshop Series from Databricks.
Try Databricks for free