How to Get the Size of All Tables and Files in Microsoft Fabric | Microsoft Fabric Tutorial for Beginners

How to Get the Size of All Tables and Files in Microsoft Fabric

How to Get the Size of All Tables and Files in Microsoft Fabric

Microsoft Fabric Tutorial

📘 Overview

This tutorial shows how to programmatically determine the storage size of all Delta Tables and files within the Microsoft Fabric Lakehouse using PySpark and Spark SQL.

📏 Step 1: Use DESCRIBE DETAIL on a Table

To check a specific table size:

# Run SQL and store result in a DataFrame
df = spark.sql("DESCRIBE DETAIL customer_csv")
display(df)

🔄 Step 2: Loop Through All Tables

Fetch table metadata for each Delta table in the Lakehouse catalog:

# Get all registered Delta tables
tables = spark.catalog.listTables()
tables

📦 Step 3: Get Sizes Using PySpark

Loop over each table to retrieve its location and use file utilities to get actual size:

from pyspark.sql.functions import lit
import os

table_sizes = []

for table in tables:
    table_name = table.name
    try:
        desc_df = spark.sql(f"DESCRIBE EXTENDED {table_name}")
        location_row = desc_df.filter(desc_df.col_name == 'Location').collect()[0]
        path = location_row.data_type
        size_in_bytes = dbutils.fs.ls(path)
        # Implement logic to sum file sizes here
    except Exception as e:
        print(f"Error reading table {table_name}: {e}")

📁 Optional: List Files in Staging Folder

For file size in Files/Staging/:

files_df = spark.read.format("binaryFile").load("Files/Staging/*")
files_df.select("path", "length").show()

🎯 Use Cases

  • Monitor storage usage in Fabric Lakehouse
  • Identify heavy tables for optimization
  • Clean up unused or duplicate files
  • Track growth over time in scheduled notebooks

🎬 Watch the Tutorial

Blog created with help from ChatGPT and Gemini.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.