Session Title: Delta Lake with Azure Databricks – Let’s build a reliable Data Lake!
Speaker: Mohit Batra
Abstract: While Data Lakes allow us to store massive amounts of data, there are many challenges that come along – maintaining data and multiple files, providing consistency to downstream applications, handling updates, data recovery in case of failures, slower transactional query performance, consolidation of streaming data and much more. On the other hand, Data Warehouses based on relational databases, handle many of these challenges, but they do not scale well and are expensive to maintain.
Are you interested to build a large-scale, reliable Data Lake?
This session will take you through Delta Lake – an open-source storage layer that brings ACID transactions and reliability to your data lakes. It can scale to handle TBs and PBs of data, handle updates, combine and store batch and streaming data, as well as data and metadata management. And this is what we will look into, via demos:
1. What is Delta Lake, and its features
2. How to get started with Delta Lake using Azure Databricks
3. How to handle Inserts, Updates, Deletes and Merge for the records
4. How it provides ACID guarantees, using transactional logs
5. How Time Travel works
6. How schema enforcement and schema evolution works
7. How data clustering, and clean up works
8. And compare it with Parquet format in terms of features and performance
These features of Delta Lake can truly help you build large-scale data warehouses in your Data Lake.
300+ sessions are now available on-demand from Data Platform Summit 2021 & 2020 at no cost. Browse all sessions.
Stay tuned, more learning coming your way.