Explore Databases and Tables in PySpark using spark.catalog

🔍 How to Explore Tables and Databases in PySpark using `spark.catalog`

Learn how to interact with databases, tables, views, cached data, and registered functions using PySpark's powerful spark.catalog interface. Ideal for metadata exploration and data pipeline control.

📘 Step 1: Inspect Current Catalog State

print("Current Database:", spark.catalog.currentDatabase())
print("Current Catalog:", spark.catalog.currentCatalog())

📂 Step 2: List Databases and Tables

print("List of Databases:")
for db in spark.catalog.listDatabases():
    print(db.name)

print("List of Tables:")
for tbl in spark.catalog.listTables():
    print(tbl.name, tbl.tableType)

🧪 Step 3: Check Existence of Tables & Databases

print("Table Exists:", spark.catalog.tableExists("demo_table"))
print("Database Exists:", spark.catalog.databaseExists("demo_db"))

🔍 Step 4: Explore Table Columns

for col in spark.catalog.listColumns("demo_table"):
    print(col.name, col.dataType)

🧼 Step 5: Drop Temporary Views

spark.catalog.dropTempView("demo_view")

🧠 Step 6: Manage Cache

print("Is Cached:", spark.catalog.isCached("demo_view"))
spark.catalog.uncacheTable("demo_view")
spark.catalog.clearCache()

Welcome To TechBrothersIT

Label

Explore Databases and Tables in PySpark with spark.catalog | A Beginner’s Guide to Metadata Management | PySpark Tutorial