Frequently Asked Questions About the Data Lakehouse

by , , , , and

Question Index What is a Data Lakehouse? How is a Data Lakehouse different from a Data Warehouse? How is the Data Lakehouse different from a Data Lake? How easy is it for data analysts to use a Data Lakehouse? How do Data Lakehouse systems compare in performance and cost to data warehouses? What data governance...

Announcing the Launch of Delta Live Tables: Reliable Data Engineering Made Easy

by , and

As the amount of data, data sources and data types at organizations grow, building and maintaining reliable data pipelines has become a key enabler for analytics, data science and machine learning (ML). Prioritizing these initiatives puts increasing pressure on data engineering teams because processing the raw, messy data into clean, fresh, reliable data is a...

Introducing Delta Sharing: An Open Protocol for Secure Data Sharing

by , , , , and

Data sharing has become critical in the modern economy as enterprises look to securely exchange data with their customers, suppliers and partners. For example, a retailer may want to publish sales data to its suppliers in real time, or a supplier may want to share real-time inventory. But so far, data sharing has been severely...

Introducing Low-latency Continuous Processing Mode in Structured Streaming in Apache Spark 2.3

by , , and

Structured Streaming in Apache Spark 2.0 decoupled micro-batch processing from its high-level APIs for a couple of reasons. First, it made developer’s experience with the APIs simpler: the APIs did not have to account for micro-batches. Second, it allowed developers to treat a stream as an infinite table to which they could issue queries as...

Sign up