How to Clean Strings in PySpark | lower(), trim(), initcap() Explained with Real Data

How to Clean Strings in PySpark

Using `lower()`, `trim()`, and `initcap()` with Real Data

📌 What You’ll Learn

How to use lower() to convert text to lowercase
How to use trim() to remove leading/trailing spaces
How to use initcap() to capitalize first letter of each word
Chaining multiple string functions

📊 Sample Data

data = [
    (" Aamir ",),
    ("LISA ",),
    ("  charLie   ",),
    ("BOB",),
    (" eli",)
]
columns = ["raw_name"]
df = spark.createDataFrame(data, columns)
df.show(truncate=False)

Output:

+-----------+
|raw_name   |
+-----------+
| Aamir     |
|LISA       |
|  charLie  |
|BOB        |
| eli       |
+-----------+

🔧 Cleaned Data using PySpark Functions

1️⃣ Apply `trim()`

from pyspark.sql.functions import trim
df_trimmed = df.withColumn("trimmed", trim("raw_name"))
df_trimmed.show(truncate=False)

2️⃣ Apply `lower()` and `upper()`

from pyspark.sql.functions import lower, upper
df_lower = df_trimmed.withColumn("lowercase", lower("trimmed"))
df_upper = df_trimmed.withColumn("uppercase", upper("trimmed"))
df_lower.show(truncate=False)
df_upper.show(truncate=False)

3️⃣ Apply `initcap()`

from pyspark.sql.functions import initcap
df_initcap = df_trimmed.withColumn("titlecase", initcap("trimmed"))
df_initcap.show(truncate=False)

Welcome To TechBrothersIT

Label

How to perform String Cleaning in PySpark lower, trim, initcap Explained with Real Data | PySpark Tutorial

How to Clean Strings in PySpark

Using `lower()`, `trim()`, and `initcap()` with Real Data

📌 What You’ll Learn

📊 Sample Data

🔧 Cleaned Data using PySpark Functions

1️⃣ Apply `trim()`

2️⃣ Apply `lower()` and `upper()`

3️⃣ Apply `initcap()`

🎥 Watch the Full Tutorial

No comments:

Post a Comment

Label

How to perform String Cleaning in PySpark lower, trim, initcap Explained with Real Data | PySpark Tutorial

📌 What You’ll Learn

📊 Sample Data

🔧 Cleaned Data using PySpark Functions

1️⃣ Apply trim()

2️⃣ Apply lower() and upper()

3️⃣ Apply initcap()

🎥 Watch the Full Tutorial

No comments:

Post a Comment

1️⃣ Apply `trim()`

2️⃣ Apply `lower()` and `upper()`

3️⃣ Apply `initcap()`