Date Calculations in PySpark
This tutorial explains how to use PySpark functions to calculate dates, such as adding/subtracting days or months, calculating date difference, and more.
📘 Initialization
from pyspark.sql import SparkSession
from pyspark.sql.functions import (
col, to_date, add_months, date_add, date_sub, datediff,
months_between, next_day, current_date
)
spark = SparkSession.builder.appName("DateArithmeticDemo").getOrCreate()
📘 Sample Data
sample_data = [
("Aamir", "2023-12-15"),
("Sara", "2024-01-01"),
("John", "2024-03-20")
]
df = spark.createDataFrame(sample_data, ["name", "start_date"])
df = df.withColumn("start_date", to_date(col("start_date"), "yyyy-MM-dd"))
df.show()
+------+-----------+
| name |start_date |
+------+-----------+
|Aamir |2023-12-15 |
|Sara |2024-01-01 |
|John |2024-03-20 |
+------+-----------+
➕ Add Months
df.select("name", "start_date", add_months(col("start_date"), 2).alias("plus_2_months")).show()
+------+-----------+-------------+
| name |start_date |plus_2_months|
+------+-----------+-------------+
| Aamir|2023-12-15 |2024-02-15 |
| Sara |2024-01-01 |2024-03-01 |
| John |2024-03-20 |2024-05-20 |
+------+-----------+-------------+
➕ Add 10 Days
df.select("name", "start_date", date_add(col("start_date"), 10).alias("plus_10_days")).show()
+------+-----------+-------------+
| name |start_date |plus_10_days |
+------+-----------+-------------+
| Aamir|2023-12-15 |2023-12-25 |
| Sara |2024-01-01 |2024-01-11 |
| John |2024-03-20 |2024-03-30 |
+------+-----------+-------------+
➖ Subtract 5 Days
df.select("name", "start_date", date_sub(col("start_date"), 5).alias("minus_5_days")).show()
+------+-----------+-------------+
| name |start_date |minus_5_days |
+------+-----------+-------------+
| Aamir|2023-12-15 |2023-12-10 |
| Sara |2024-01-01 |2023-12-27 |
| John |2024-03-20 |2024-03-15 |
+------+-----------+-------------+
📏 Difference Between Two Dates
df.select("name", "start_date", datediff(current_date(), col("start_date")).alias("days_diff")).show()
+------+-----------+----------+
| name |start_date |days_diff|
+------+-----------+----------+
| Aamir|2023-12-15 |115 |
| Sara |2024-01-01 |98 |
| John |2024-03-20 |19 |
+------+-----------+----------+
📆 Months Between Two Dates
df.select("name", "start_date", months_between(current_date(), col("start_date")).alias("months_diff")).show()
+------+-----------+------------+
| name |start_date |months_diff |
+------+-----------+------------+
| Aamir|2023-12-15 |3.8 |
| Sara |2024-01-01 |3.2 |
| John |2024-03-20 |0.6 |
+------+-----------+------------+
🔜 Get Next Specific Day
df.select("name", "start_date", next_day(col("start_date"), "Tuesday").alias("next_Tuesday")).show()
+------+-----------+--------------+
| name |start_date |next_Tuesday |
+------+-----------+--------------+
| Aamir|2023-12-15 |2023-12-19 |
| Sara |2024-01-01 |2024-01-02 |
| John |2024-03-20 |2024-03-26 |
+------+-----------+--------------+
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.