Date Calculations in PySpark | Add, Subtract, Datediff, Months Between & More | PySpark Tutorial

Date Calculations in PySpark | Add, Subtract, DateDiff & More

Date Calculations in PySpark

This tutorial explains how to use PySpark functions to calculate dates, such as adding/subtracting days or months, calculating date difference, and more.

📘 Initialization

from pyspark.sql import SparkSession
from pyspark.sql.functions import (
    col, to_date, add_months, date_add, date_sub, datediff,
    months_between, next_day, current_date
)

spark = SparkSession.builder.appName("DateArithmeticDemo").getOrCreate()

📘 Sample Data

sample_data = [
    ("Aamir", "2023-12-15"),
    ("Sara", "2024-01-01"),
    ("John", "2024-03-20")
]

df = spark.createDataFrame(sample_data, ["name", "start_date"])
df = df.withColumn("start_date", to_date(col("start_date"), "yyyy-MM-dd"))
df.show()
+------+-----------+
| name |start_date |
+------+-----------+
|Aamir |2023-12-15 |
|Sara  |2024-01-01 |
|John  |2024-03-20 |
+------+-----------+

➕ Add Months

df.select("name", "start_date", add_months(col("start_date"), 2).alias("plus_2_months")).show()
+------+-----------+-------------+
| name |start_date |plus_2_months|
+------+-----------+-------------+
| Aamir|2023-12-15 |2024-02-15   |
| Sara |2024-01-01 |2024-03-01   |
| John |2024-03-20 |2024-05-20   |
+------+-----------+-------------+

➕ Add 10 Days

df.select("name", "start_date", date_add(col("start_date"), 10).alias("plus_10_days")).show()
+------+-----------+-------------+
| name |start_date |plus_10_days |
+------+-----------+-------------+
| Aamir|2023-12-15 |2023-12-25   |
| Sara |2024-01-01 |2024-01-11   |
| John |2024-03-20 |2024-03-30   |
+------+-----------+-------------+

➖ Subtract 5 Days

df.select("name", "start_date", date_sub(col("start_date"), 5).alias("minus_5_days")).show()
+------+-----------+-------------+
| name |start_date |minus_5_days |
+------+-----------+-------------+
| Aamir|2023-12-15 |2023-12-10   |
| Sara |2024-01-01 |2023-12-27   |
| John |2024-03-20 |2024-03-15   |
+------+-----------+-------------+

📏 Difference Between Two Dates

df.select("name", "start_date", datediff(current_date(), col("start_date")).alias("days_diff")).show()
+------+-----------+----------+
| name |start_date |days_diff|
+------+-----------+----------+
| Aamir|2023-12-15 |115       |
| Sara |2024-01-01 |98        |
| John |2024-03-20 |19        |
+------+-----------+----------+

📆 Months Between Two Dates

df.select("name", "start_date", months_between(current_date(), col("start_date")).alias("months_diff")).show()
+------+-----------+------------+
| name |start_date |months_diff |
+------+-----------+------------+
| Aamir|2023-12-15 |3.8         |
| Sara |2024-01-01 |3.2         |
| John |2024-03-20 |0.6         |
+------+-----------+------------+

🔜 Get Next Specific Day

df.select("name", "start_date", next_day(col("start_date"), "Tuesday").alias("next_Tuesday")).show()
+------+-----------+--------------+
| name |start_date |next_Tuesday  |
+------+-----------+--------------+
| Aamir|2023-12-15 |2023-12-19    |
| Sara |2024-01-01 |2024-01-02    |
| John |2024-03-20 |2024-03-26    |
+------+-----------+--------------+

🎥 Watch the Full Tutorial

© 2025 Aamir Shahzad | PySpark Tutorials

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.