Haoyuan Li is founder and CEO of Alluxio Inc.(formerly Tachyon Nexus). Before founding the company, he was working on his Ph.D. at UC Berkeley AMPLab, where he co-created Alluxio, a memory-speed virtual distributed storage. Haoyuan is also a founding committer of Apache Spark. Before the AMPLab, he worked at Conviva and Google. Haoyuan has an MS from Cornell University and a BS from Peking University.
During the past several years, Spark has significantly changed the landscape of big data computing. It improves applications’ performance dramatically. However, there still remains several challenges, e.g. high GC overhead. In this talk, I will introduce Tachyon, a distributed in-memory storage system. In addition, I will talk about how Tachyon can further improve Spark’s performance and the integration between the two systems.
Alluxio, formerly Tachyon, is a memory speed virtual distributed storage system and leverages memory for storing data and accelerating access to data in different storage systems.. Alluxio has a quickly growing open source community of developers and users and is deployed at such organizations as Alibaba, Baidu, Barclays, Intel, Huawei, and Qunar. Many of these deployments use Alluxio with Spark, and some of them scale out to over PB’s of data. While Spark is already gaining great adoption, Alluxio can enable Spark to be even more effective. Alluxio bridges Spark applications with various storage systems and further accelerates data intensive applications. In this talk, we briefly introduce Alluxio, present several ways how Alluxio can help Spark be more effective, show benchmark results with Spark RDDs and DataFrames, and describe production deployments both Alluxio and Spark working together. In the meantime, we will provide live demos for some of the use cases.