Load Files from Staging Folder to Delta Table Using Scheduled Notebook | Microsoft Fabric Tutorial

Load Files from Staging Folder to Delta Table Using Scheduled Notebook | Microsoft Fabric Tutorial

Load Files from Staging Folder to Delta Table Using Scheduled Notebook

Microsoft Fabric Tutorial

📘 Overview

This tutorial demonstrates how to automate the process of reading customer and order CSV files from a Staging folder in Microsoft Fabric Lakehouse, performing a join, and saving the output to a Delta table named customerorder. The process is managed using a scheduled notebook.

🔹 Step 1: Read CSV Files from Staging

df_customer = spark.read.option("header", True).csv("Files/Staging/customer.csv")
df_order = spark.read.option("header", True).csv("Files/Staging/order.csv")

display(df_customer)
display(df_order)

🔹 Step 2: Join Data on CustomerID

df_joined = df_customer.join(df_order, on="CustomerID", how="inner")
display(df_joined)

🔹 Step 3: Save to Delta Table

df_joined.write.format("delta").mode("overwrite").saveAsTable("customerorder")

⏰ Step 4: Schedule the Notebook

In Microsoft Fabric:

  • Open the notebook used in the steps above
  • Click on the Schedule tab in the notebook menu
  • Define a time-based trigger (e.g., every hour, daily)
  • Choose compute settings and activate the schedule

This will automate the process of refreshing your customerorder table whenever new data is dropped into the Files/Staging/ directory.

💡 Benefits of Using Scheduled Notebooks

  • Ensures Lakehouse tables are always up to date
  • Eliminates manual refresh and load tasks
  • Enables full ETL automation using native Spark
  • Improves operational efficiency for data engineering teams

🎬 Watch the Step-by-Step Tutorial

Blog created with help from ChatGPT and Gemini.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.