Load Files from Staging Folder to Delta Table Using Scheduled Notebook
Microsoft Fabric Tutorial
📘 Overview
This tutorial demonstrates how to automate the process of reading customer and order CSV files from a Staging folder in Microsoft Fabric Lakehouse, performing a join, and saving the output to a Delta table named customerorder
. The process is managed using a scheduled notebook.
🔹 Step 1: Read CSV Files from Staging
df_customer = spark.read.option("header", True).csv("Files/Staging/customer.csv")
df_order = spark.read.option("header", True).csv("Files/Staging/order.csv")
display(df_customer)
display(df_order)
🔹 Step 2: Join Data on CustomerID
df_joined = df_customer.join(df_order, on="CustomerID", how="inner")
display(df_joined)
🔹 Step 3: Save to Delta Table
df_joined.write.format("delta").mode("overwrite").saveAsTable("customerorder")
⏰ Step 4: Schedule the Notebook
In Microsoft Fabric:
- Open the notebook used in the steps above
- Click on the Schedule tab in the notebook menu
- Define a time-based trigger (e.g., every hour, daily)
- Choose compute settings and activate the schedule
This will automate the process of refreshing your customerorder
table whenever new data is dropped into the Files/Staging/
directory.
💡 Benefits of Using Scheduled Notebooks
- Ensures Lakehouse tables are always up to date
- Eliminates manual refresh and load tasks
- Enables full ETL automation using native Spark
- Improves operational efficiency for data engineering teams
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.