Top PySpark Math Functions Explained | abs(), round(), log(), pow(), sin(), degrees() & More | PySpark Tutorial

Top PySpark Math Functions Explained with Examples | abs(), round(), log()

Top PySpark Math Functions Explained with Examples

Explore powerful PySpark math functions like abs(), round(), log(), and more with real-time examples and expected outputs to boost your data engineering skills.

📌 Sample Data

data = [
    (1, 4.0, -3.5),
    (2, 9.0, 0.0),
    (3, 16.0, 2.75)
]
df = spark.createDataFrame(data, ["id", "value", "offset"])
df.show()
Expected Output:
+---+-----+-------+
| id|value|offset |
+---+-----+-------+
| 1 |  4.0|  -3.5 |
| 2 |  9.0|   0.0 |
| 3 | 16.0|  2.75 |
+---+-----+-------+

✅ abs() - Absolute Value

df.select("id", abs("offset").alias("abs_offset")).show()
Expected Output:
+---+-----------+
| id|abs_offset |
+---+-----------+
|  1|       3.5 |
|  2|       0.0 |
|  3|      2.75 |
+---+-----------+

✅ sqrt() - Square Root

df.select("id", sqrt("value").alias("sqrt_value")).show()
Expected Output:
+---+-----------+
| id|sqrt_value |
+---+-----------+
|  1|       2.0 |
|  2|       3.0 |
|  3|       4.0 |
+---+-----------+

✅ ceil() - Rounds Up

df.select("id", ceil("offset").alias("ceil_offset")).show()
Expected Output:
+---+------------+
| id|ceil_offset |
+---+------------+
|  1|        -3.0|
|  2|         0.0|
|  3|         3.0|
+---+------------+

✅ floor() - Rounds Down

df.select("id", floor("offset").alias("floor_offset")).show()
Expected Output:
+---+-------------+
| id|floor_offset |
+---+-------------+
|  1|        -4.0 |
|  2|         0.0 |
|  3|         2.0 |
+---+-------------+

✅ round() - Round to Decimal

df.select("id", round("offset", 1).alias("rounded")).show()
Expected Output:
+---+--------+
| id|rounded |
+---+--------+
|  1|   -3.5 |
|  2|    0.0 |
|  3|    2.8 |
+---+--------+

✅ exp() - Exponential

df.select("id", exp("offset").alias("exp_offset")).show()
Expected Output:
+---+------------------+
| id|       exp_offset |
+---+------------------+
|  1|     0.0301973834 |
|  2|     1.0000000000 |
|  3|    15.6426318846 |
+---+------------------+

✅ log() - Natural Log

df.select("id", log("value").alias("log_value")).show()
Expected Output:
+---+-----------------+
| id|       log_value |
+---+-----------------+
|  1| 1.38629436111989|
|  2| 2.19722457733622|
|  3| 2.77258872223978|
+---+-----------------+

✅ pow() - Raise to Power

df.select("id", pow("value", "offset").alias("value_pow_offset")).show()
Expected Output:
+---+------------------+
| id| value_pow_offset |
+---+------------------+
|  1| 0.015625         |
|  2| 1.000000         |
|  3| 689.292          |
+---+------------------+

✅ sin(), cos(), tan()

df.select("id", sin("offset"), cos("offset"), tan("offset")).show()
Expected Output (approx):
+---+--------+--------+---------+
| id|     sin|     cos|      tan|
+---+--------+--------+---------+
|  1| -0.350 | -0.936 |   0.374 |
|  2|  0.000 |  1.000 |   0.000 |
|  3|  0.382 | -0.924 |  -0.414 |
+---+--------+--------+---------+

🎥 Watch the Video Tutorial

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.