Kazuaki Ishizaki

Researcher, Senior Technical Staff Member, IBM

Dr. Kazuaki Ishizaki is a senior technical staff member at IBM Research – Tokyo. He has over 25 years of experience conducting research and development of dynamic compilers for Java and other languages. He is an expert in compiler optimizations, runtime systems, and parallel processing. He has been working for IBM Java just-in-time compiler and virtual machine from JDK 1.0 to the recent Java. His research has focused on how system software can enable programmers to automatically exploit hardware accelerators in high-level languages and frameworks. He is an Apache Spark committer, working for SQL components. He is an ACM distinguished member.

Past sessions

Summit 2021 Enabling Vectorized Engine in Apache Spark

May 26, 2021 03:15 PM PT

This talk explains how to enable a vectorized engine in Apache Spark to accelerate Apache Spark programs. Vectorization is an exciting approach to maximize performance as Delta Lake and other commercial database use. On the other hand, the current Apache Spark does not use the vectorization technique yet because it is not easy to use vector instructions in the current Java language.

First, this talk reviews Vector API for ease of use of the vector instructions in Java 16. Then, this talk discusses three possible approaches to vectorize Apache Spark Engine by using Vector API: 1) replace external libraries such as BLAS library, 2) use a vectorized runtime such as a sort routine, and 3) generate vectorized Java code by Catalyst from a given SQL query. Finally, this talk shares analysis and performance results by these approaches.

Here are takeaways of this talk:

1. Overview of Vector API to vectorize Java programs
2. Multiple approaches to use a vectorized engine in Apache Spark
3. Analysis and performance results by these vectorization approaches

In this session watch:
Kazuaki Ishizaki, Researcher, Senior Technical Staff Member, IBM

[daisna21-sessions-od]

Summit 2020 SQL Performance Improvements at a Glance in Apache Spark 3.0

June 24, 2020 05:00 PM PT

This talk explains how Spark 3.0 can improve the performance of SQL applications. Spark 3.0 provides many performance features such as dynamic partitioning and enhanced pushdown. Each of them can improve the performance of a different type of SQL application. Since the number of features is large, it is not easy for application developers to understand these features at a glance. This talk gives a brief explanation of these features with an example program and explains how it works and how we can improve the performance.

Here are takeaways of this talk:

  1. What optimization features for SQL Spark 3.0 support.
  2. Which programs Spark 3.0 can accelerate.