PySpark String Functions Explained
In this tutorial, you'll learn how to use PySpark string functions like contains()
, startswith()
, substr()
, and endswith()
. These functions are very helpful for filtering, searching, and extracting string data in PySpark DataFrames.
🔹 Sample Data
from pyspark.sql import SparkSession
from pyspark.sql.functions import col
spark = SparkSession.builder.appName("StringFunctionsExample").getOrCreate()
data = [
(1, "Aamir"),
(2, "Ali"),
(3, "Bob"),
(4, "Lisa"),
(5, "Zara"),
(6, "ALINA"),
(7, "amrita"),
(8, "Sana")
]
columns = ["id", "name"]
df = spark.createDataFrame(data, columns)
df.show()
🔹 contains() Function
Filter rows where name
contains "a":
df.filter(col("name").contains("a")).show()
🔹 startswith() Function
Filter rows where name
starts with "A":
df.filter(col("name").startswith("A")).show()
🔹 endswith() Function
Filter rows where name
ends with "a":
df.filter(col("name").endswith("a")).show()
🔹 substr() Function
Extract first two characters from name
:
df.withColumn("first_two", col("name").substr(1, 2)).show()
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.