Delta Table Operations in Lakehouse: Insert, Update, Delete, Merge
Microsoft Fabric Tutorial
๐งพ Overview
This guide demonstrates how to perform Delta Table operations inside a Microsoft Fabric Lakehouse using PySpark. You’ll learn how to read data, write it as a Delta table, and perform INSERT, UPDATE, DELETE, and MERGE operations efficiently in a notebook environment.
๐ง What is a Delta Table?
A Delta table is a high-performance storage format that supports:
- ACID transactions
- Efficient updates and deletes
- Schema evolution (adaptive schemas)
- Versioning and time travel
Delta Tables combine the speed of Apache Spark with the consistency of a data warehouse.
๐ฅ Step 1: Read the CSV File from Lakehouse
# Replace path if needed
df = spark.read.option("header", True).option("inferSchema", True).csv("Files/sample_customers_dw.csv")
display(df)
๐ค Step 2: Write to a Delta Table
df.write.format("delta").mode("overwrite").saveAsTable("customerdelta")
➕ Insert Operation
from pyspark.sql import Row
new_row = Row(customerid=300, firstname="Zeeshan", lastname="Ali", city="New York")
df_new = spark.createDataFrame([new_row])
df_new.write.format("delta").mode("append").saveAsTable("customerdelta")
✏️ Update Operation
from delta.tables import DeltaTable
target = DeltaTable.forName(spark, "customerdelta")
target.update(
condition="customerid = 300",
set={"city": "'Los Angeles'"}
)
๐️ Delete Operation
target.delete(condition="customerid = 300")
๐ Merge (Upsert) Operation
updatesDf = spark.createDataFrame([
Row(customerid=101, firstname="Aamir", lastname="Shahzad", city="Charlotte")
])
target.alias("t").merge(
updatesDf.alias("s"),
"t.customerid = s.customerid"
).whenMatchedUpdate(set={"city": "s.city"}) \
.whenNotMatchedInsert(values={
"customerid": "s.customerid",
"firstname": "s.firstname",
"lastname": "s.lastname",
"city": "s.city"
}).execute()
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.