PySpark Null & Comparison Functions: between(), isNull(), isin(), like(), rlike(), ilike() | PySpark Tutorial

PySpark Null & Comparison Functions Explained | between(), isNull(), isin(), like(), rlike(), ilike()

PySpark Null & Comparison Functions Explained

This PySpark tutorial explains how to use essential functions for handling nulls, filtering data, and performing pattern matching in DataFrames using:

  • between()
  • isNull() and isNotNull()
  • isin()
  • like(), rlike(), and ilike()

1. Create a Sample DataFrame

from pyspark.sql import SparkSession
from pyspark.sql.functions import col

spark = SparkSession.builder.appName("NullComparisonOps").getOrCreate()

data = [
    (1, "Aamir", 50000),
    (2, "Ali", None),
    (3, "Bob", 45000),
    (4, "Lisa", 60000),
    (5, "Zara", None),
    (6, "ALINA", 55000)
]

columns = ["id", "name", "salary"]
df = spark.createDataFrame(data, columns)
df.show()

2. Use between() Function

Select employees whose salary is between 45000 and 60000:

df.filter(col("salary").between(45000, 60000)).show()

3. Use isNull() and isNotNull()

Filter rows where salary is null:

df.filter(col("salary").isNull()).show()

Filter rows where salary is not null:

df.filter(col("salary").isNotNull()).show()

4. Use isin() Function

Filter names that are in the list ["Aamir", "Lisa"]:

df.filter(col("name").isin("Aamir", "Lisa")).show()

5. Use like(), rlike(), and ilike()

Names that start with 'A':

df.filter(col("name").like("A%")).show()

Names matching regex (e.g., all names ending in 'a'):

df.filter(col("name").rlike(".*a$")).show()

Case-insensitive LIKE (if using Spark 3.3+):

df.filter(col("name").ilike("ali%")).show()

📺 Watch Full Tutorial

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.