SESSION
Delta Tensor: Efficient Vector and Tensor Storage in Delta Lake
OVERVIEW
EXPERIENCE | In Person |
---|---|
TYPE | Lightning Talk |
TRACK | Data Engineering and Streaming |
TECHNOLOGIES | AI/Machine Learning, Delta Lake, Developer Experience |
SKILL LEVEL | Intermediate |
DURATION | 20 min |
DOWNLOAD SESSION SLIDES |
In machine learning workflows, data are in the format of tensors. Unfortunately, most input data come in various formats and require onerous and inefficient data-loading and storing processes. In this talk, we present Delta Tensor, an approach to store tensor directly in Delta Lake. Besides delegating the data loading to the query engine, Delta Tensor uses chunking to reduce the IO cost of tensor slicing and sparse encoding methods to significantly improve the storage efficiency of sparse tensors, providing an efficient storage and management solution in a cloud-native Lakehouse environment.
SESSION SPEAKERS
Zhiyu Wu
/Student
Northeastern University