Top PySpark Built-in DataFrame Functions Explained | col(), lit(), when(), expr(), rand() & More

Top PySpark Built-in DataFrame Functions Explained

In this tutorial, we walk through the most frequently used PySpark functions such as col(), lit(), when(), expr(), rand() and more.

1️⃣ Setup Spark Session

from pyspark.sql import SparkSession
from pyspark.sql.functions import col, lit, expr, when, rand

spark = SparkSession.builder.appName("BuiltinFunctionsDemo").getOrCreate()

2️⃣ Create Sample DataFrame

data = [
    ("Alice", 34),
    ("Bob", 45),
    ("Cathy", None)
]
df = spark.createDataFrame(data, ["name", "age"])
df.show()

Output:

+-----+----+
| name| age|
+-----+----+
|Alice|  34|
|  Bob|  45|
|Cathy|null|
+-----+----+

3️⃣ Using `col()` and `lit()`

df.select(col("name"), col("age"), lit(100).alias("lit_col")).show()

Output:

+-----+----+--------+
| name| age|lit_col |
+-----+----+--------+
|Alice|  34|     100|
|  Bob|  45|     100|
|Cathy|null|     100|
+-----+----+--------+

4️⃣ Conditional Logic using `when()`

df.select("name", "age",
          when(col("age") > 40, "Above 40")
          .otherwise("Below 40").alias("category")
).show()

Output:

+-----+----+---------+
| name| age| category|
+-----+----+---------+
|Alice|  34| Below 40|
|  Bob|  45| Above 40|
|Cathy|null| Below 40|
+-----+----+---------+

5️⃣ Expression Evaluation using `expr()`

df.select(expr("age + 5 as age_plus_5")).show()

Output:

+-----------+
|age_plus_5 |
+-----------+
|        39 |
|        50 |
|       null|
+-----------+

6️⃣ Generate Random Numbers with `rand()`

df.select("name", rand().alias("random_val")).show()

Output:

+-----+------------------+
| name|        random_val|
+-----+------------------+
|Alice|0.6348754580941226|
|  Bob|0.2984509329806971|
|Cathy|0.8883241025348764|
+-----+------------------+

Welcome To TechBrothersIT

Label

Top PySpark Built-in DataFrame Functions Explained | col(), lit(), when(), expr(), rand() & More | PySpark Tutorial

Top PySpark Built-in DataFrame Functions Explained

1️⃣ Setup Spark Session

2️⃣ Create Sample DataFrame

3️⃣ Using `col()` and `lit()`

4️⃣ Conditional Logic using `when()`

5️⃣ Expression Evaluation using `expr()`

6️⃣ Generate Random Numbers with `rand()`

🎥 Watch the Video Tutorial

No comments:

Post a Comment

Label

Top PySpark Built-in DataFrame Functions Explained | col(), lit(), when(), expr(), rand() & More | PySpark Tutorial

1️⃣ Setup Spark Session

2️⃣ Create Sample DataFrame

3️⃣ Using col() and lit()

4️⃣ Conditional Logic using when()

5️⃣ Expression Evaluation using expr()

6️⃣ Generate Random Numbers with rand()

🎥 Watch the Video Tutorial

No comments:

Post a Comment

3️⃣ Using `col()` and `lit()`

4️⃣ Conditional Logic using `when()`

5️⃣ Expression Evaluation using `expr()`

6️⃣ Generate Random Numbers with `rand()`