Top PySpark Built-in DataFrame Functions Explained | col(), lit(), when(), expr(), rand() & More | PySpark Tutorial

Top PySpark Built-in DataFrame Functions Explained | col(), lit(), when(), expr(), rand() & More

Top PySpark Built-in DataFrame Functions Explained

In this tutorial, we walk through the most frequently used PySpark functions such as col(), lit(), when(), expr(), rand() and more.

1️⃣ Setup Spark Session

from pyspark.sql import SparkSession
from pyspark.sql.functions import col, lit, expr, when, rand

spark = SparkSession.builder.appName("BuiltinFunctionsDemo").getOrCreate()

2️⃣ Create Sample DataFrame

data = [
    ("Alice", 34),
    ("Bob", 45),
    ("Cathy", None)
]
df = spark.createDataFrame(data, ["name", "age"])
df.show()
Output:
+-----+----+
| name| age|
+-----+----+
|Alice|  34|
|  Bob|  45|
|Cathy|null|
+-----+----+

3️⃣ Using col() and lit()

df.select(col("name"), col("age"), lit(100).alias("lit_col")).show()
Output:
+-----+----+--------+
| name| age|lit_col |
+-----+----+--------+
|Alice|  34|     100|
|  Bob|  45|     100|
|Cathy|null|     100|
+-----+----+--------+

4️⃣ Conditional Logic using when()

df.select("name", "age",
          when(col("age") > 40, "Above 40")
          .otherwise("Below 40").alias("category")
).show()
Output:
+-----+----+---------+
| name| age| category|
+-----+----+---------+
|Alice|  34| Below 40|
|  Bob|  45| Above 40|
|Cathy|null| Below 40|
+-----+----+---------+

5️⃣ Expression Evaluation using expr()

df.select(expr("age + 5 as age_plus_5")).show()
Output:
+-----------+
|age_plus_5 |
+-----------+
|        39 |
|        50 |
|       null|
+-----------+

6️⃣ Generate Random Numbers with rand()

df.select("name", rand().alias("random_val")).show()
Output:
+-----+------------------+
| name|        random_val|
+-----+------------------+
|Alice|0.6348754580941226|
|  Bob|0.2984509329806971|
|Cathy|0.8883241025348764|
+-----+------------------+

🎥 Watch the Video Tutorial

📌 Learn and explore more PySpark examples with real-world data transformation use cases.

👨‍💻 Author: Aamir Shahzad

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.