PySpark Date & Time Creation Tutorial _ make_date(), make_timestamp(), make_interval() Explained with Examples | PySpark Tutorial

102 - PySpark Date & Time Creation Tutorial | make_date(), make_timestamp(), make_interval() Explained with Examples

PySpark Date & Time Creation Tutorial

Learn how to use PySpark's date and time creation functions: make_date(), make_timestamp(), and make_interval() with examples.

Introduction

In this tutorial, we will explore three essential PySpark functions: make_date(), make_timestamp(), and make_interval(). These functions are key when dealing with date and time data types in Spark.

make_date() Function

Definition: The make_date() function in PySpark is used to create a date from the given year, month, and day columns.

Example:

# Create DataFrame
data = [
    (2023, 1, 1),
    (2022, 12, 31),
    (2021, 11, 15)
]
df = spark.createDataFrame(data, ["year", "month", "day"])

# Use make_date() to create date
df = df.withColumn("date", make_date("year", "month", "day"))
df.show()
        

Output:

+----+-----+---+----------+
|year|month|day|      date|
+----+-----+---+----------+
|2023|    1|  1|2023-01-01|
|2022|   12| 31|2022-12-31|
|2021|   11| 15|2021-11-15|
+----+-----+---+----------+
        

make_timestamp() Function

Definition: The make_timestamp() function in PySpark is used to create a timestamp from the year, month, day, hour, minute, and second columns.

Example:

# Create DataFrame
data = [
    (2023, 1, 1, 12, 30, 45),
    (2022, 12, 31, 14, 45, 20),
    (2021, 11, 15, 9, 15, 5)
]
df = spark.createDataFrame(data, ["year", "month", "day", "hour", "minute", "second"])

# Use make_timestamp() to create timestamp
df = df.withColumn("timestamp", make_timestamp("year", "month", "day", "hour", "minute", "second"))
df.show()
        

Output:

+----+-----+---+----+------+------|-------------------+
|year|month|day|hour|minute|second|          timestamp|
+----+-----+---+----+------+------|-------------------+
|2023|    1|  1| 12|    30|    45|2023-01-01 12:30:45|
|2022|   12| 31| 14|    45|    20|2022-12-31 14:45:20|
|2021|   11| 15|  9|    15|     5|2021-11-15 09:15:05|
+----+-----+---+----+------+------|-------------------+
        

make_interval() Function

Definition: The make_interval() function is used to create an interval from year, month, day, hour, minute, and second values.

Example:

# Create DataFrame
data = [
    (1, 0, 0, 0, 0, 0),
    (0, 1, 0, 0, 0, 0),
    (0, 0, 1, 0, 0, 0)
]
df = spark.createDataFrame(data, ["year", "month", "day", "hour", "minute", "second"])

# Use make_interval() to create interval
df = df.withColumn("interval", make_interval("year", "month", "day", "hour", "minute", "second"))
df.show()
        

Output:

+----+-----+---+----+------+------|-----------+
|year|month|day|hour|minute|second|   interval|
+----+-----+---+----+------+------|-----------+
|   1|    0|  0|   0|     0|     0| 1 years  |
|   0|    1|  0|   0|     0|     0|1 months |
|   0|    0|  1|   0|     0|     0|1 days |
+----+-----+---+----+------+------|-----------+
        

Watch the Tutorial Video

For more details on the functions explained above, check out the video:

Thank you for reading! Stay tuned for more PySpark tutorials.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.