PySpark Date & Time Creation Tutorial
Learn how to use PySpark's date and time creation functions: make_date()
, make_timestamp()
, and make_interval()
with examples.
Introduction
In this tutorial, we will explore three essential PySpark functions: make_date()
, make_timestamp()
, and make_interval()
. These functions are key when dealing with date and time data types in Spark.
make_date() Function
Definition: The make_date()
function in PySpark is used to create a date from the given year, month, and day columns.
Example:
# Create DataFrame data = [ (2023, 1, 1), (2022, 12, 31), (2021, 11, 15) ] df = spark.createDataFrame(data, ["year", "month", "day"]) # Use make_date() to create date df = df.withColumn("date", make_date("year", "month", "day")) df.show()
Output:
+----+-----+---+----------+ |year|month|day| date| +----+-----+---+----------+ |2023| 1| 1|2023-01-01| |2022| 12| 31|2022-12-31| |2021| 11| 15|2021-11-15| +----+-----+---+----------+
make_timestamp() Function
Definition: The make_timestamp()
function in PySpark is used to create a timestamp from the year, month, day, hour, minute, and second columns.
Example:
# Create DataFrame data = [ (2023, 1, 1, 12, 30, 45), (2022, 12, 31, 14, 45, 20), (2021, 11, 15, 9, 15, 5) ] df = spark.createDataFrame(data, ["year", "month", "day", "hour", "minute", "second"]) # Use make_timestamp() to create timestamp df = df.withColumn("timestamp", make_timestamp("year", "month", "day", "hour", "minute", "second")) df.show()
Output:
+----+-----+---+----+------+------|-------------------+ |year|month|day|hour|minute|second| timestamp| +----+-----+---+----+------+------|-------------------+ |2023| 1| 1| 12| 30| 45|2023-01-01 12:30:45| |2022| 12| 31| 14| 45| 20|2022-12-31 14:45:20| |2021| 11| 15| 9| 15| 5|2021-11-15 09:15:05| +----+-----+---+----+------+------|-------------------+
make_interval() Function
Definition: The make_interval()
function is used to create an interval from year, month, day, hour, minute, and second values.
Example:
# Create DataFrame data = [ (1, 0, 0, 0, 0, 0), (0, 1, 0, 0, 0, 0), (0, 0, 1, 0, 0, 0) ] df = spark.createDataFrame(data, ["year", "month", "day", "hour", "minute", "second"]) # Use make_interval() to create interval df = df.withColumn("interval", make_interval("year", "month", "day", "hour", "minute", "second")) df.show()
Output:
+----+-----+---+----+------+------|-----------+ |year|month|day|hour|minute|second| interval| +----+-----+---+----+------+------|-----------+ | 1| 0| 0| 0| 0| 0| 1 years | | 0| 1| 0| 0| 0| 0|1 months | | 0| 0| 1| 0| 0| 0|1 days | +----+-----+---+----+------+------|-----------+
Watch the Tutorial Video
For more details on the functions explained above, check out the video:
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.