Lake Database in Azure Synapse: Spark SQL vs Delta vs Parquet | Full Demo
📘 Overview
Azure Synapse Lake Databases offer powerful data lake querying capabilities using Spark SQL. In this tutorial, we demonstrate how to:
- Create and use Lake Databases
- Compare Parquet, Delta, and Spark SQL tables
- Insert, update, and read data using Apache Spark in Synapse
🏗️ Step 1: Create Spark SQL Table (Unmanaged)
%%spark
CREATE TABLE IF NOT EXISTS lakehouse.spark_table (
id INT, name STRING, age INT
)
USING PARQUET
LOCATION 'abfss://lake@storageaccount.dfs.core.windows.net/tables/spark_table'
💾 Step 2: Create Delta Table
%%spark
CREATE TABLE lakehouse.delta_table (
id INT, product STRING, quantity INT
)
USING DELTA
LOCATION 'abfss://lake@storageaccount.dfs.core.windows.net/tables/delta_table'
🔄 Step 3: Insert Data into Tables
%%spark
INSERT INTO lakehouse.spark_table VALUES (1, 'Ali', 30), (2, 'Zara', 25);
INSERT INTO lakehouse.delta_table VALUES (1, 'Phone', 10), (2, 'Tablet', 20);
📊 Step 4: Query Tables
%%sql
SELECT * FROM lakehouse.spark_table;
SELECT * FROM lakehouse.delta_table;
✏️ Step 5: Update Delta Table
Delta supports ACID transactions, including UPDATE
and DELETE
operations.
%%spark
UPDATE lakehouse.delta_table SET quantity = 15 WHERE id = 1;
📈 Comparison: Parquet vs Delta vs Spark Tables
Feature | Parquet | Delta | Spark SQL Table |
---|---|---|---|
Storage Format | Columnar | Delta (based on Parquet + logs) | Varies |
ACID Support | No | Yes | No |
Supports Update/Delete | No | Yes | No |
Performance | Fast Read | Fast + Version Control | Simple |
📌 Best Practices
- Use Delta Lake for write-heavy transactional workloads
- Use Parquet for optimized reads and archival
- Validate permissions for Lake DB and storage paths
📺 Watch the Full Demo
📚 Credit: Content created with the help of ChatGPT and Gemini.
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.