Mastering PySpark Map Functions
In this tutorial, you'll learn how to use key PySpark map functions including create_map()
, map_keys()
, map_values()
, map_concat()
, and more with practical examples and real outputs.
🔧 Step 1: Initialize Spark
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName("MapFunctions").getOrCreate()
📦 Step 2: Create DataFrame with map column
from pyspark.sql.functions import create_map, lit
data = [
("Aamir", "USA", "English"),
("Sara", "Canada", "French"),
("John", "UK", "English"),
("Lina", "Mexico", "Spanish")
]
df = spark.createDataFrame(data, ["name", "country", "language"])
df.show()
Output:
+------+--------+--------+
| name | country|language|
+------+--------+--------+
|Aamir | USA |English |
|Sara | Canada |French |
|John | UK |English |
|Lina | Mexico |Spanish |
+------+--------+--------+
🧱 create_map()
Creates a new map from key-value pairs.
df_map = df.select("name", create_map(
lit("country"), lit("USA"),
lit("language"), lit("English")
).alias("new_map"))
df_map.show(truncate=False)
Output:
+------+--------------------------+
| name | new_map |
+------+--------------------------+
| Aamir| {country -> USA, language -> English} |
| Sara | {country -> USA, language -> English} |
| John | {country -> USA, language -> English} |
| Lina | {country -> USA, language -> English} |
+------+--------------------------+
🗝 map_keys()
Returns an array of all keys from a map.
from pyspark.sql.functions import map_keys
df_keys = df_map.select("name", map_keys("new_map").alias("keys"))
df_keys.show(truncate=False)
📥 map_values()
Returns an array of all values from a map.
from pyspark.sql.functions import map_values
df_values = df_map.select("name", map_values("new_map").alias("values"))
df_values.show(truncate=False)
🔁 map_concat()
Concatenates two or more maps into one.
from pyspark.sql.functions import map_concat
df_concat = df.select("name", map_concat(
create_map(lit("status"), lit("active")),
create_map(lit("region"), lit("east"))
).alias("concatenated_map"))
df_concat.show(truncate=False)
🔍 map_contains_key()
Checks whether a key exists in a map.
from pyspark.sql.functions import map_contains_key
df_contains_key = df_map.select("name", map_contains_key("new_map", lit("country")).alias("has_country"))
df_contains_key.show(truncate=False)
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.