103 - PySpark Date and Time Extraction Tutorial | year(), hour(), dayofweek(), date_part() with Examples
In this tutorial, we will cover PySpark's functions for extracting parts of date and time, including year()
, hour()
, dayofweek()
, and date_part()
, with real-time examples.
1. Extracting Year with year()
Definition: The year()
function is used to extract the year from a date or timestamp.
df = spark.createDataFrame([("2020-12-25",), ("2021-01-10",)], ["date"])
df.select(year("date").alias("year")).show()
Output:
+----+
|year|
+----+
|2020|
|2021|
+----+
2. Extracting Hour with hour()
Definition: The hour()
function extracts the hour from a timestamp.
df = spark.createDataFrame([("2020-12-25 14:30:00",), ("2021-01-10 06:15:00",)], ["timestamp"])
df.select(hour("timestamp").alias("hour")).show()
Output:
+----+
|hour|
+----+
| 14 |
| 6 |
+----+
3. Extracting Day of the Week with dayofweek()
Definition: The dayofweek()
function returns the day of the week (1 = Sunday, 7 = Saturday) from a date or timestamp.
df = spark.createDataFrame([("2020-12-25",), ("2021-01-10",)], ["date"])
df.select(dayofweek("date").alias("day_of_week")).show()
Output:
+-----------+
|day_of_week|
+-----------+
| 6|
| 7|
+-----------+
4. Extracting Date Part with date_part()
Definition: The date_part()
function extracts a specific part of a date or timestamp, such as the month, day, or year.
df = spark.createDataFrame([("2020-12-25",), ("2021-01-10",)], ["date"])
df.select(date_part("year", "date").alias("year_part")).show()
Output:
+---------+
|year_part|
+---------+
| 2020|
| 2021|
+---------+
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.