PySpark Date Formatting & Conversion Tutorial
Understand how to handle date formatting and Unix timestamp conversions using PySpark functions like to_date()
, to_timestamp()
, and unix_timestamp()
.
Step 1: Sample Data Setup
data = [
("Aamir", "2025-04-07", "2025-04-07 15:30:00", 1744049400),
("Sara", "2024-12-25", "2024-12-25 09:00:00", 1735126800)
]
df = spark.createDataFrame(data, ["name", "date_str", "timestamp_str", "unix_ts"])
df.show()
Output:
+------+----------+-------------------+----------+
| name | date_str | timestamp_str | unix_ts |
+------+----------+-------------------+----------+
|Aamir |2025-04-07|2025-04-07 15:30:00|1744049400|
|Sara |2024-12-25|2024-12-25 09:00:00|1735126800|
+------+----------+-------------------+----------+
Step 2: Convert to DateType
df.select("name", to_date("date_str", "yyyy-MM-dd").alias("date_converted")).show()
Output:
+------+----------------+
| name | date_converted |
+------+----------------+
|Aamir | 2025-04-07 |
|Sara | 2024-12-25 |
+------+----------------+
Step 3: Convert to TimestampType
df.select("name", to_timestamp("timestamp_str", "yyyy-MM-dd HH:mm:ss").alias("ts_converted")).show()
Output:
+------+---------------------+
| name | ts_converted |
+------+---------------------+
|Aamir |2025-04-07 15:30:00.0|
|Sara |2024-12-25 09:00:00.0|
+------+---------------------+
Step 4: Convert to Timestamp with Local Timezone
df.select("name", to_timestamp_ltz("timestamp_str").alias("ts_ltz")).show()
Output:
+------+---------------------+
| name | ts_ltz |
+------+---------------------+
|Aamir |2025-04-07 15:30:00.0|
|Sara |2024-12-25 09:00:00.0|
+------+---------------------+
Step 5: Convert to Timestamp without Timezone
df.select("name", to_timestamp_ntz("timestamp_str").alias("ts_ntz")).show()
Output:
+------+---------------------+
| name | ts_ntz |
+------+---------------------+
|Aamir |2025-04-07 15:30:00.0|
|Sara |2024-12-25 09:00:00.0|
+------+---------------------+
Step 6: Convert from Unix Timestamp
df.select("name", from_unixtime("unix_ts").alias("from_unix_ts")).show()
Output:
+------+---------------------+
| name | from_unix_ts |
+------+---------------------+
|Aamir |2025-04-07 15:30:00 |
|Sara |2024-12-25 09:00:00 |
+------+---------------------+
Step 7: Convert Timestamp to Unix Timestamp
df.select("name", unix_timestamp("timestamp_str").alias("unix_time")).show()
Output:
+------+----------+
| name | unix_time|
+------+----------+
|Aamir |1744049400|
|Sara |1735126800|
+------+----------+
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.