How to use withColumnsRenamed function in PySpark to Rename Multiple Columns in DataFrame

PySpark Tutorial: How to use withColumnsRenamed function in PySpark

This tutorial covers how to rename multiple columns in a PySpark DataFrame using the withColumnsRenamed() function. It’s a cleaner, reusable alternative to chaining multiple withColumnRenamed() calls.

What is withColumnsRenamed() in PySpark?

PySpark introduced the withColumnsRenamed() function in version 3.4.0 to allow renaming multiple columns at once. You provide a dictionary mapping old column names to new ones.

Step 1: Create Sample Data

data = [
    ("Aamir Shahzad", "Pakistan", 25),
    ("Ali Raza", "USA", 30),
    ("Bob", "UK", 45),
    ("Lisa", "Canada", 35)
]

df = spark.createDataFrame(data, ["FullName", "Country", "AgeYears"])

print("Original DataFrame:")
df.show()

Original DataFrame:
+--------------+--------+--------+
| FullName | Country|AgeYears|
+--------------+--------+--------+
|Aamir Shahzad |Pakistan| 25|
| Ali Raza | USA | 30|
| Bob | UK | 45|
| Lisa | Canada | 35|
+--------------+--------+--------+

Step 2: Rename Multiple Columns using withColumnsRenamed()

renamed_df = df.withColumnsRenamed({
    "FullName": "Name",
    "Country": "Nationality",
    "AgeYears": "Age"
})

print("DataFrame with Renamed Columns:")
renamed_df.show()

DataFrame with Renamed Columns:
+--------------+------------+---+
| Name | Nationality|Age|
+--------------+------------+---+
|Aamir Shahzad | Pakistan | 25|
| Ali Raza | USA | 30|
| Bob | UK | 45|
| Lisa | Canada | 35|
+--------------+------------+---+

Why use withColumnsRenamed()?

Cleaner syntax for renaming multiple columns.
More readable and concise than chaining multiple withColumnRenamed() calls.
Reduces the chance of human error when renaming several columns.

📺 Watch the Full Tutorial Video

▶️ Watch on YouTube

Welcome To TechBrothersIT

Label