Managed vs External Tables in Spark SQL Lake Database | Azure Synapse Analytics Explained
📘 Overview
In Azure Synapse Analytics, you can create both Managed and External Tables within a Lake Database using Spark SQL. Understanding the difference is essential for managing data, storage, and performance effectively.
📂 What Are Managed Tables?
Managed tables are those where both the metadata and the data are controlled by Synapse Spark. When you drop the table, the data and metadata are both deleted.
✅ Create a Managed Table
%%spark
CREATE TABLE lakehouse.managed_employee (
id INT,
name STRING,
salary FLOAT
)
✔️ No LOCATION
clause is specified — Spark decides where to store the data internally.
📁 What Are External Tables?
External tables store only the metadata in the Lake Database while the data resides externally in user-defined paths like ADLS Gen2.
✅ Create an External Table
%%spark
CREATE TABLE lakehouse.external_sales (
id INT,
product STRING,
amount DOUBLE
)
USING DELTA
LOCATION 'abfss://data@yourstorageaccount.dfs.core.windows.net/sales/external/'
✔️ Data lives outside Spark's internal catalog. Deleting the table will NOT remove the data.
📊 Key Differences
Feature | Managed Table | External Table |
---|---|---|
Data Location | Managed by Synapse | Custom ADLS path |
Drop Table Deletes Data | Yes | No |
Usage | Quick prototyping | Production pipelines |
📌 Best Practices
- Use external tables for production where you need to retain data
- Use managed tables for temporary analysis or rapid testing
- Prefer Delta format for ACID support and versioning
📂 Example: Read External Table
%%sql
SELECT * FROM lakehouse.external_sales;
📺 Watch the Full Video Explanation
📚 Credit: Content created with the help of ChatGPT and Gemini.
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.