I am a software engineer in the open source team at Databricks. Inc. Since 2017, I contribute to Apache Spark, in particular, to JSON/CSV datasources, date-time APIs, datasource v2 commands and etc. At Databricks, I have been working in a few departments from support L1/L2 to engineering. Before Databricks, I developed apps on top of Apache Spark in Huawei, Fyber GmbH and Cisco Systems.
May 28, 2021 10:30 AM PT
Overview of intervals in Apache Spark before version 3.2, and the changes that are coming in the future releases. Conformance to the ANSI SQL standard. We will discuss existing issues of interval APIs, and how the issues will be solved by new types: year-month interval and day-time interval. I will show how to use the Interval API of Spark SQL and PySpark, and how to avoid potential problems. I will demonstrate construction of intervals from external types, and saving/loading via Spark's built-in datasources.
November 17, 2020 04:00 PM PT
The talk is about date-time processing in Spark 3.0, its API and implementations made since Spark 2.4. In particular, I am going to cover the following topics: 1. Definition and internal representation of dates/timestamps in Spark SQL. Comparisons of Spark 3.0 date-time API with previous versions and other DBMS. 2. Date/timestamp functions of Spark SQL. Nuances of behavior and details of implementation. Use cases and corner cases of date-time API. 3. Migration from the hybrid calendar (Julian and Gregorian calendars) to Proleptic Gregorian calendar in Spark 3.0. 4. Parsing of date/timestamp strings, saving and loading date/time data via Spark's datasources. 5. Support of Java 8 time API in Spark 3.0.
Speaker: Maxim Gekk