PySpark Date & Time Creation Tutorial
Learn how to use PySpark's date and time creation functions: make_date(), make_timestamp(), and make_interval() with examples.
Introduction
In this tutorial, we will explore three essential PySpark functions: make_date(), make_timestamp(), and make_interval(). These functions are key when dealing with date and time data types in Spark.
make_date() Function
Definition: The make_date() function in PySpark is used to create a date from the given year, month, and day columns.
Example:
# Create DataFrame
data = [
(2023, 1, 1),
(2022, 12, 31),
(2021, 11, 15)
]
df = spark.createDataFrame(data, ["year", "month", "day"])
# Use make_date() to create date
df = df.withColumn("date", make_date("year", "month", "day"))
df.show()
Output:
+----+-----+---+----------+
|year|month|day| date|
+----+-----+---+----------+
|2023| 1| 1|2023-01-01|
|2022| 12| 31|2022-12-31|
|2021| 11| 15|2021-11-15|
+----+-----+---+----------+
make_timestamp() Function
Definition: The make_timestamp() function in PySpark is used to create a timestamp from the year, month, day, hour, minute, and second columns.
Example:
# Create DataFrame
data = [
(2023, 1, 1, 12, 30, 45),
(2022, 12, 31, 14, 45, 20),
(2021, 11, 15, 9, 15, 5)
]
df = spark.createDataFrame(data, ["year", "month", "day", "hour", "minute", "second"])
# Use make_timestamp() to create timestamp
df = df.withColumn("timestamp", make_timestamp("year", "month", "day", "hour", "minute", "second"))
df.show()
Output:
+----+-----+---+----+------+------|-------------------+
|year|month|day|hour|minute|second| timestamp|
+----+-----+---+----+------+------|-------------------+
|2023| 1| 1| 12| 30| 45|2023-01-01 12:30:45|
|2022| 12| 31| 14| 45| 20|2022-12-31 14:45:20|
|2021| 11| 15| 9| 15| 5|2021-11-15 09:15:05|
+----+-----+---+----+------+------|-------------------+
make_interval() Function
Definition: The make_interval() function is used to create an interval from year, month, day, hour, minute, and second values.
Example:
# Create DataFrame
data = [
(1, 0, 0, 0, 0, 0),
(0, 1, 0, 0, 0, 0),
(0, 0, 1, 0, 0, 0)
]
df = spark.createDataFrame(data, ["year", "month", "day", "hour", "minute", "second"])
# Use make_interval() to create interval
df = df.withColumn("interval", make_interval("year", "month", "day", "hour", "minute", "second"))
df.show()
Output:
+----+-----+---+----+------+------|-----------+
|year|month|day|hour|minute|second| interval|
+----+-----+---+----+------+------|-----------+
| 1| 0| 0| 0| 0| 0| 1 years |
| 0| 1| 0| 0| 0| 0|1 months |
| 0| 0| 1| 0| 0| 0|1 days |
+----+-----+---+----+------+------|-----------+
Watch the Tutorial Video
For more details on the functions explained above, check out the video:



No comments:
Post a Comment
Note: Only a member of this blog may post a comment.