Mastering PySpark Map Functions: create_map(), map_keys(), map_concat(),map_values | PySpark Tutorial

Mastering PySpark Map Functions | create_map(), map_keys(), map_concat(), map_values()

Mastering PySpark Map Functions

In this tutorial, you'll learn how to use key PySpark map functions including create_map(), map_keys(), map_values(), map_concat(), and more with practical examples and real outputs.

🔧 Step 1: Initialize Spark

from pyspark.sql import SparkSession
spark = SparkSession.builder.appName("MapFunctions").getOrCreate()

📦 Step 2: Create DataFrame with map column

from pyspark.sql.functions import create_map, lit

data = [
    ("Aamir", "USA", "English"),
    ("Sara", "Canada", "French"),
    ("John", "UK", "English"),
    ("Lina", "Mexico", "Spanish")
]

df = spark.createDataFrame(data, ["name", "country", "language"])
df.show()
Output:
+------+--------+--------+
| name | country|language|
+------+--------+--------+
|Aamir | USA    |English |
|Sara  | Canada |French  |
|John  | UK     |English |
|Lina  | Mexico |Spanish |
+------+--------+--------+

🧱 create_map()

Creates a new map from key-value pairs.

df_map = df.select("name", create_map(
    lit("country"), lit("USA"),
    lit("language"), lit("English")
).alias("new_map"))
df_map.show(truncate=False)
Output:
+------+--------------------------+
| name | new_map                  |
+------+--------------------------+
| Aamir| {country -> USA, language -> English} |
| Sara | {country -> USA, language -> English} |
| John | {country -> USA, language -> English} |
| Lina | {country -> USA, language -> English} |
+------+--------------------------+

🗝 map_keys()

Returns an array of all keys from a map.

from pyspark.sql.functions import map_keys

df_keys = df_map.select("name", map_keys("new_map").alias("keys"))
df_keys.show(truncate=False)

📥 map_values()

Returns an array of all values from a map.

from pyspark.sql.functions import map_values

df_values = df_map.select("name", map_values("new_map").alias("values"))
df_values.show(truncate=False)

🔁 map_concat()

Concatenates two or more maps into one.

from pyspark.sql.functions import map_concat

df_concat = df.select("name", map_concat(
    create_map(lit("status"), lit("active")),
    create_map(lit("region"), lit("east"))
).alias("concatenated_map"))
df_concat.show(truncate=False)

🔍 map_contains_key()

Checks whether a key exists in a map.

from pyspark.sql.functions import map_contains_key

df_contains_key = df_map.select("name", map_contains_key("new_map", lit("country")).alias("has_country"))
df_contains_key.show(truncate=False)

📺 Watch the Full Video Tutorial

© 2025 Aamir Shahzad | PySpark Tutorials

Some of the contents in this website were created with assistance from ChatGPT and Gemini

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.