PySpark when() and otherwise() Explained |Apply If-Else Conditions to DataFrames #pysparktutorial

PySpark when() and otherwise() Explained | Apply If-Else Conditions to DataFrames

PySpark when() and otherwise() Explained

In this tutorial, you'll learn how to use the when() and otherwise() functions in PySpark to apply if-else style conditional logic directly to DataFrames. These functions are useful for transforming values in a column based on conditions.

🔹 Step 1: Create SparkSession & Sample Data

from pyspark.sql import SparkSession
from pyspark.sql.functions import when, col

spark = SparkSession.builder.appName("WhenOtherwiseExample").getOrCreate()

data = [
    (1, "Aamir", 50000),
    (2, "Ali", None),
    (3, "Bob", 45000),
    (4, "Lisa", 60000),
    (5, "Zara", None),
    (6, "ALINA", 55000)
]
columns = ["id", "name", "salary"]
df = spark.createDataFrame(data, columns)
df.show()

🔹 Step 2: Apply when() and otherwise()

# Create a new column with conditional labels
df_with_label = df.withColumn(
    "salary_label",
    when(col("salary") >= 55000, "High")
    .otherwise("Low")
)

df_with_label.show()

🔹 Step 3: Apply Multiple Conditions

# Multiple when conditions
df_with_category = df.withColumn(
    "salary_category",
    when(col("salary") > 60000, "Very High")
    .when((col("salary") >= 50000) & (col("salary") <= 60000), "Medium")
    .when(col("salary") < 50000, "Low")
    .otherwise("Unknown")
)

df_with_category.show()

🎥 Watch Video Tutorial

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.