PySpark when() and otherwise() Explained
In this tutorial, you'll learn how to use the when()
and otherwise()
functions in PySpark to apply if-else style conditional logic directly to DataFrames. These functions are useful for transforming values in a column based on conditions.
🔹 Step 1: Create SparkSession & Sample Data
from pyspark.sql import SparkSession
from pyspark.sql.functions import when, col
spark = SparkSession.builder.appName("WhenOtherwiseExample").getOrCreate()
data = [
(1, "Aamir", 50000),
(2, "Ali", None),
(3, "Bob", 45000),
(4, "Lisa", 60000),
(5, "Zara", None),
(6, "ALINA", 55000)
]
columns = ["id", "name", "salary"]
df = spark.createDataFrame(data, columns)
df.show()
🔹 Step 2: Apply when()
and otherwise()
# Create a new column with conditional labels
df_with_label = df.withColumn(
"salary_label",
when(col("salary") >= 55000, "High")
.otherwise("Low")
)
df_with_label.show()
🔹 Step 3: Apply Multiple Conditions
# Multiple when conditions
df_with_category = df.withColumn(
"salary_category",
when(col("salary") > 60000, "Very High")
.when((col("salary") >= 50000) & (col("salary") <= 60000), "Medium")
.when(col("salary") < 50000, "Low")
.otherwise("Unknown")
)
df_with_category.show()
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.