Top PySpark Math Functions Explained with Examples
Explore powerful PySpark math functions like abs()
, round()
, log()
, and more with real-time examples and expected outputs to boost your data engineering skills.
📌 Sample Data
data = [
(1, 4.0, -3.5),
(2, 9.0, 0.0),
(3, 16.0, 2.75)
]
df = spark.createDataFrame(data, ["id", "value", "offset"])
df.show()
Expected Output:
+---+-----+-------+
| id|value|offset |
+---+-----+-------+
| 1 | 4.0| -3.5 |
| 2 | 9.0| 0.0 |
| 3 | 16.0| 2.75 |
+---+-----+-------+
✅ abs() - Absolute Value
df.select("id", abs("offset").alias("abs_offset")).show()
Expected Output:
+---+-----------+
| id|abs_offset |
+---+-----------+
| 1| 3.5 |
| 2| 0.0 |
| 3| 2.75 |
+---+-----------+
✅ sqrt() - Square Root
df.select("id", sqrt("value").alias("sqrt_value")).show()
Expected Output:
+---+-----------+
| id|sqrt_value |
+---+-----------+
| 1| 2.0 |
| 2| 3.0 |
| 3| 4.0 |
+---+-----------+
✅ ceil() - Rounds Up
df.select("id", ceil("offset").alias("ceil_offset")).show()
Expected Output:
+---+------------+
| id|ceil_offset |
+---+------------+
| 1| -3.0|
| 2| 0.0|
| 3| 3.0|
+---+------------+
✅ floor() - Rounds Down
df.select("id", floor("offset").alias("floor_offset")).show()
Expected Output:
+---+-------------+
| id|floor_offset |
+---+-------------+
| 1| -4.0 |
| 2| 0.0 |
| 3| 2.0 |
+---+-------------+
✅ round() - Round to Decimal
df.select("id", round("offset", 1).alias("rounded")).show()
Expected Output:
+---+--------+
| id|rounded |
+---+--------+
| 1| -3.5 |
| 2| 0.0 |
| 3| 2.8 |
+---+--------+
✅ exp() - Exponential
df.select("id", exp("offset").alias("exp_offset")).show()
Expected Output:
+---+------------------+
| id| exp_offset |
+---+------------------+
| 1| 0.0301973834 |
| 2| 1.0000000000 |
| 3| 15.6426318846 |
+---+------------------+
✅ log() - Natural Log
df.select("id", log("value").alias("log_value")).show()
Expected Output:
+---+-----------------+
| id| log_value |
+---+-----------------+
| 1| 1.38629436111989|
| 2| 2.19722457733622|
| 3| 2.77258872223978|
+---+-----------------+
✅ pow() - Raise to Power
df.select("id", pow("value", "offset").alias("value_pow_offset")).show()
Expected Output:
+---+------------------+
| id| value_pow_offset |
+---+------------------+
| 1| 0.015625 |
| 2| 1.000000 |
| 3| 689.292 |
+---+------------------+
✅ sin(), cos(), tan()
df.select("id", sin("offset"), cos("offset"), tan("offset")).show()
Expected Output (approx):
+---+--------+--------+---------+
| id| sin| cos| tan|
+---+--------+--------+---------+
| 1| -0.350 | -0.936 | 0.374 |
| 2| 0.000 | 1.000 | 0.000 |
| 3| 0.382 | -0.924 | -0.414 |
+---+--------+--------+---------+
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.