How to Get the Size of All Tables and Files in Microsoft Fabric
Microsoft Fabric Tutorial
📘 Overview
This tutorial shows how to programmatically determine the storage size of all Delta Tables and files within the Microsoft Fabric Lakehouse using PySpark and Spark SQL.
📏 Step 1: Use DESCRIBE DETAIL
on a Table
To check a specific table size:
# Run SQL and store result in a DataFrame
df = spark.sql("DESCRIBE DETAIL customer_csv")
display(df)
🔄 Step 2: Loop Through All Tables
Fetch table metadata for each Delta table in the Lakehouse catalog:
# Get all registered Delta tables
tables = spark.catalog.listTables()
tables
📦 Step 3: Get Sizes Using PySpark
Loop over each table to retrieve its location and use file utilities to get actual size:
from pyspark.sql.functions import lit
import os
table_sizes = []
for table in tables:
table_name = table.name
try:
desc_df = spark.sql(f"DESCRIBE EXTENDED {table_name}")
location_row = desc_df.filter(desc_df.col_name == 'Location').collect()[0]
path = location_row.data_type
size_in_bytes = dbutils.fs.ls(path)
# Implement logic to sum file sizes here
except Exception as e:
print(f"Error reading table {table_name}: {e}")
📁 Optional: List Files in Staging Folder
For file size in Files/Staging/
:
files_df = spark.read.format("binaryFile").load("Files/Staging/*")
files_df.select("path", "length").show()
🎯 Use Cases
- Monitor storage usage in Fabric Lakehouse
- Identify heavy tables for optimization
- Clean up unused or duplicate files
- Track growth over time in scheduled notebooks
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.