PySpark String Functions Explained | contains(), startswith(), substr(), endswith()

PySpark String Functions Explained

In this tutorial, you'll learn how to use PySpark string functions like contains(), startswith(), substr(), and endswith(). These functions are very helpful for filtering, searching, and extracting string data in PySpark DataFrames.

🔹 Sample Data

from pyspark.sql import SparkSession
from pyspark.sql.functions import col

spark = SparkSession.builder.appName("StringFunctionsExample").getOrCreate()

data = [
    (1, "Aamir"),
    (2, "Ali"),
    (3, "Bob"),
    (4, "Lisa"),
    (5, "Zara"),
    (6, "ALINA"),
    (7, "amrita"),
    (8, "Sana")
]
columns = ["id", "name"]
df = spark.createDataFrame(data, columns)
df.show()

🔹 contains() Function

Filter rows where name contains "a":

df.filter(col("name").contains("a")).show()

🔹 startswith() Function

Filter rows where name starts with "A":

df.filter(col("name").startswith("A")).show()

🔹 endswith() Function

Filter rows where name ends with "a":

df.filter(col("name").endswith("a")).show()

🔹 substr() Function

Extract first two characters from name:

df.withColumn("first_two", col("name").substr(1, 2)).show()

Welcome To TechBrothersIT

Label

PySpark String Functions Explained | contains(), startswith(), substr(), endswith() with Examples | PySpark Tutorial