PySpark Tutorial : Master PySpark Sorting: sort(), asc(), desc() Explained with Examples #pyspark

Sort Functions in PySpark Explained with Examples

Sort Functions in PySpark Explained with Examples

In this post, you'll learn how to use various sort functions in PySpark to order data by ascending/descending, and control the handling of nulls. This guide is perfect for anyone working with big data in Spark!

1️⃣ Setup

from pyspark.sql import SparkSession
from pyspark.sql.functions import asc, desc, asc_nulls_first, desc_nulls_last

spark = SparkSession.builder.appName("SortFunctionsDemo").getOrCreate()

2️⃣ Sample Data

sample_data = [
    ("Alice", 5000),
    ("Bob", None),
    ("Cara", 6200),
    ("Dan", None),
    ("Eli", 4500),
    ("Fay", 7000)
]

columns = ["employee", "sales"]
df = spark.createDataFrame(sample_data, columns)
df.show()

Output:

+--------+-----+
|employee|sales|
+--------+-----+
|   Alice| 5000|
|     Bob| null|
|    Cara| 6200|
|     Dan| null|
|     Eli| 4500|
|     Fay| 7000|
+--------+-----+

3️⃣ Sort by Ascending

df.orderBy(asc("sales")).show()

4️⃣ Sort with Nulls First

df.orderBy(asc_nulls_first("sales")).show()

5️⃣ Sort with Nulls Last

df.orderBy(desc_nulls_last("sales")).show()

6️⃣ Descending Order

df.orderBy(desc("sales")).show()

📺 Video Tutorial

© 2024 Aamir Shahzad — All rights reserved.
Some of the contents in this website were created with assistance from ChatGPT and Gemini.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.