Transform Arrays & Maps in PySpark | transform(), filter(), zip_with()

Transforming Arrays and Maps in PySpark

This tutorial explains advanced functions in PySpark to manipulate array and map collections using:

transform()
filter()
zip_with()

Sample Data Setup

from pyspark.sql import SparkSession
from pyspark.sql.functions import col, transform, filter, zip_with
from pyspark.sql.types import ArrayType, StringType

spark = SparkSession.builder.appName("TransformFilterZipWith").getOrCreate()

data = [("John", [1, 2, 3, 4]), ("Lisa", [10, 20, 30]), ("Aamir", [5, 15, 25])]
df = spark.createDataFrame(data, ["name", "numbers"])
df.show()

Output:

+-----+------------+
| name|     numbers|
+-----+------------+
| John| [1, 2, 3, 4]|
| Lisa|[10, 20, 30]|
|Aamir| [5, 15, 25]|
+-----+------------+

1️⃣ transform()

Definition: Applies a transformation to each element in the array.

df.select("name", transform("numbers", lambda x: x * 2).alias("transformed")).show()

Output:

+-----+------------------+
| name|        transformed|
+-----+------------------+
| John|     [2, 4, 6, 8] |
| Lisa|  [20, 40, 60]    |
|Aamir| [10, 30, 50]     |
+-----+------------------+

2️⃣ filter()

Definition: Filters array elements based on a condition.

df.select("name", filter("numbers", lambda x: x > 10).alias("filtered")).show()

Output:

+-----+----------------+
| name|        filtered|
+-----+----------------+
| John|        []      |
| Lisa|     [20, 30]   |
|Aamir|     [15, 25]   |
+-----+----------------+

3️⃣ zip_with()

Definition: Combines two arrays element-wise using a binary function.

from pyspark.sql.functions import lit, array

df2 = df.withColumn("multiplier", array(lit(10), lit(20), lit(30), lit(40)))
df2.select("name", zip_with("numbers", "multiplier", lambda x, y: x + y).alias("zipped")).show()

Output:

+-----+------------------+
| name|           zipped |
+-----+------------------+
| John| [11, 22, 33, 44] |
| Lisa|  [30, 40, 60]    |
|Aamir| [15, 35, 55]     |
+-----+------------------+

Welcome To TechBrothersIT

Label

Transforming Arrays and Maps in PySpark : Advanced Functions_ transform(), filter(), zip_with() | PySpark Tutorial

Transforming Arrays and Maps in PySpark

Sample Data Setup

1️⃣ transform()

2️⃣ filter()

3️⃣ zip_with()

📺 Watch the Full Tutorial on YouTube

No comments:

Post a Comment