Sort Functions in PySpark Explained with Examples
In this post, you'll learn how to use various sort functions in PySpark to order data by ascending/descending, and control the handling of nulls. This guide is perfect for anyone working with big data in Spark!
1️⃣ Setup
from pyspark.sql import SparkSession
from pyspark.sql.functions import asc, desc, asc_nulls_first, desc_nulls_last
spark = SparkSession.builder.appName("SortFunctionsDemo").getOrCreate()
2️⃣ Sample Data
sample_data = [
("Alice", 5000),
("Bob", None),
("Cara", 6200),
("Dan", None),
("Eli", 4500),
("Fay", 7000)
]
columns = ["employee", "sales"]
df = spark.createDataFrame(sample_data, columns)
df.show()
Output:
+--------+-----+
|employee|sales|
+--------+-----+
| Alice| 5000|
| Bob| null|
| Cara| 6200|
| Dan| null|
| Eli| 4500|
| Fay| 7000|
+--------+-----+
3️⃣ Sort by Ascending
df.orderBy(asc("sales")).show()
4️⃣ Sort with Nulls First
df.orderBy(asc_nulls_first("sales")).show()
5️⃣ Sort with Nulls Last
df.orderBy(desc_nulls_last("sales")).show()
6️⃣ Descending Order
df.orderBy(desc("sales")).show()
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.