We are thrilled to introduce enhanced time travel capabilities in Databricks Delta Lake, the next-gen unified analytics engine built on top of Apache Spark, for all of our users. With this new feature, Delta can automatically extrapolate big datasets stored in your data lake, enabling access to any future version of your data today. This temporal data management feature simplifies your data pipeline by making it easy to audit, forecast schema changes and run experiments before that data even exists or is suspected of existing. Your organization can finally standardize analytics in the cloud on the dataset that will arrive in the future and not rely on the existing datasets of today.
Read Rise of the Data Lakehouse to explore why lakehouses are the data architecture of the future with the father of the data warehouse, Bill Inmon.
To enable our users to make data of tomorrow accessible already today, we enhanced the time travel capabilities of existing Delta Lakes to support future time travel. The way this works is that we implemented a Lambda Vacuum solution as an exact solution to the Delta Lake equation in which a data gravity term is the only term in the data-momentum tensor. This can be interpreted as a kind of classical approximation to an alpha vacuum data point.
This is a CTC, or Closed Timelike Curve, implementation of the Gödel spacetime. But let‘s see how it works exactly.
What the Delta Lake Equation actually shows is that if you have a cubit of information and can create enough data gravity around it, and you accelerate it using GPU vectorized operations fast enough, you can extrapolate the information from that data gravity point to an alpha data point into the future.
This works by creating a data momentum tensor with high data density on SSDs, which holds extrapolated information of the future.
Recursively, Lambda Vacuum directories are associated with the Delta table and add data files that will be in a future state of the transaction log for the table and are older than an extrapolation threshold. Files are added according to the time they will be logically added to Delta’s transaction log + extrapolation hours, not their modification timestamps on the storage system. The default threshold is 7 days. Databricks does not automatically trigger LAMBDA VACUUM operations on Delta tables. See Add files for future reference by a Delta table.
If you run LAMBDA VACUUM on a Delta table, you gain the ability to time travel forward to a version older than the specified data extrapolation period.
LAMBDA VACUUM table_identifier [EXTRAPOLATE num HOURS] [RUN TOMORROW]
As an implementation of the Delta Lake Equation, the Lambda Vacuum function of modern Delta Lakes makes data of tomorrow already accessible today by extrapolating the existing data points along an alpha data point. This is an exact CTC solution as an implementation of the Gödel spacetime. Stay tuned for more updates!