103 - PySpark Date and Time Extraction Tutorial | year(), hour(), dayofweek(), date_part() with Examples
In this tutorial, we will cover PySpark's functions for extracting parts of date and time, including year(), hour(), dayofweek(), and date_part(), with real-time examples.
1. Extracting Year with year()
Definition: The year() function is used to extract the year from a date or timestamp.
df = spark.createDataFrame([("2020-12-25",), ("2021-01-10",)], ["date"])
df.select(year("date").alias("year")).show()
Output:
+----+
|year|
+----+
|2020|
|2021|
+----+
2. Extracting Hour with hour()
Definition: The hour() function extracts the hour from a timestamp.
df = spark.createDataFrame([("2020-12-25 14:30:00",), ("2021-01-10 06:15:00",)], ["timestamp"])
df.select(hour("timestamp").alias("hour")).show()
Output:
+----+
|hour|
+----+
| 14 |
| 6 |
+----+
3. Extracting Day of the Week with dayofweek()
Definition: The dayofweek() function returns the day of the week (1 = Sunday, 7 = Saturday) from a date or timestamp.
df = spark.createDataFrame([("2020-12-25",), ("2021-01-10",)], ["date"])
df.select(dayofweek("date").alias("day_of_week")).show()
Output:
+-----------+
|day_of_week|
+-----------+
| 6|
| 7|
+-----------+
4. Extracting Date Part with date_part()
Definition: The date_part() function extracts a specific part of a date or timestamp, such as the month, day, or year.
df = spark.createDataFrame([("2020-12-25",), ("2021-01-10",)], ["date"])
df.select(date_part("year", "date").alias("year_part")).show()
Output:
+---------+
|year_part|
+---------+
| 2020|
| 2021|
+---------+



No comments:
Post a Comment
Note: Only a member of this blog may post a comment.