PySpark Date Formatting & Conversion Tutorial _ to_date(), to_timestamp(), unix_timestamp(), from_unixtime() | PySpark Tutorial

PySpark Date Formatting & Conversion Tutorial | to_date(), to_timestamp(), unix_timestamp()

PySpark Date Formatting & Conversion Tutorial

Understand how to handle date formatting and Unix timestamp conversions using PySpark functions like to_date(), to_timestamp(), and unix_timestamp().

Step 1: Sample Data Setup

data = [
    ("Aamir", "2025-04-07", "2025-04-07 15:30:00", 1744049400),
    ("Sara", "2024-12-25", "2024-12-25 09:00:00", 1735126800)
]

df = spark.createDataFrame(data, ["name", "date_str", "timestamp_str", "unix_ts"])
df.show()

Output:

+------+----------+-------------------+----------+
| name | date_str |   timestamp_str   | unix_ts  |
+------+----------+-------------------+----------+
|Aamir |2025-04-07|2025-04-07 15:30:00|1744049400|
|Sara  |2024-12-25|2024-12-25 09:00:00|1735126800|
+------+----------+-------------------+----------+

Step 2: Convert to DateType

df.select("name", to_date("date_str", "yyyy-MM-dd").alias("date_converted")).show()

Output:

+------+----------------+
| name | date_converted |
+------+----------------+
|Aamir |    2025-04-07  |
|Sara  |    2024-12-25  |
+------+----------------+

Step 3: Convert to TimestampType

df.select("name", to_timestamp("timestamp_str", "yyyy-MM-dd HH:mm:ss").alias("ts_converted")).show()

Output:

+------+---------------------+
| name |     ts_converted    |
+------+---------------------+
|Aamir |2025-04-07 15:30:00.0|
|Sara  |2024-12-25 09:00:00.0|
+------+---------------------+

Step 4: Convert to Timestamp with Local Timezone

df.select("name", to_timestamp_ltz("timestamp_str").alias("ts_ltz")).show()

Output:

+------+---------------------+
| name |        ts_ltz       |
+------+---------------------+
|Aamir |2025-04-07 15:30:00.0|
|Sara  |2024-12-25 09:00:00.0|
+------+---------------------+

Step 5: Convert to Timestamp without Timezone

df.select("name", to_timestamp_ntz("timestamp_str").alias("ts_ntz")).show()

Output:

+------+---------------------+
| name |        ts_ntz       |
+------+---------------------+
|Aamir |2025-04-07 15:30:00.0|
|Sara  |2024-12-25 09:00:00.0|
+------+---------------------+

Step 6: Convert from Unix Timestamp

df.select("name", from_unixtime("unix_ts").alias("from_unix_ts")).show()

Output:

+------+---------------------+
| name |     from_unix_ts    |
+------+---------------------+
|Aamir |2025-04-07 15:30:00  |
|Sara  |2024-12-25 09:00:00  |
+------+---------------------+

Step 7: Convert Timestamp to Unix Timestamp

df.select("name", unix_timestamp("timestamp_str").alias("unix_time")).show()

Output:

+------+----------+
| name | unix_time|
+------+----------+
|Aamir |1744049400|
|Sara  |1735126800|
+------+----------+

📺 Watch the Full Tutorial on YouTube

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.