PySpark coalesce() Function Tutorial: Optimize Partitioning for Faster Spark Jobs
This tutorial will help you understand how to use the coalesce()
function in PySpark to reduce the number of partitions in your DataFrame and improve performance.
1. What is coalesce() in PySpark?
coalesce()
reduces the number of partitions in a DataFrame.- It is preferred over
repartition()
when reducing partitions because it avoids full data shuffle. - Ideal for optimizing small files and preparing data for output operations.
2. Create Spark Session
from pyspark.sql import SparkSession
spark = SparkSession.builder \\
.appName("PySpark coalesce() Example") \\
.getOrCreate()
3. Create Sample DataFrame
data = [
(1, "Aamir Shahzad", 35),
(2, "Ali Raza", 30),
(3, "Bob", 25),
(4, "Lisa", 28)
]
columns = ["id", "name", "age"]
df = spark.createDataFrame(data, columns)
df.show()
+---+--------------+---+
| id| name|age|
+---+--------------+---+
| 1| Aamir Shahzad| 35|
| 2| Ali Raza| 30|
| 3| Bob| 25|
| 4| Lisa| 28|
+---+--------------+---+
| id| name|age|
+---+--------------+---+
| 1| Aamir Shahzad| 35|
| 2| Ali Raza| 30|
| 3| Bob| 25|
| 4| Lisa| 28|
+---+--------------+---+
4. Check Number of Partitions Before coalesce()
print("Partitions before coalesce:", df.rdd.getNumPartitions())
Partitions before coalesce: 4
5. Apply coalesce() to Reduce Partitions
df_coalesced = df.coalesce(1)
6. Check Number of Partitions After coalesce()
print("Partitions after coalesce:", df_coalesced.rdd.getNumPartitions())
Partitions after coalesce: 1
7. Show Transformed Data
df_coalesced.show()
+---+--------------+---+
| id| name|age|
+---+--------------+---+
| 1| Aamir Shahzad| 35|
| 2| Ali Raza| 30|
| 3| Bob| 25|
| 4| Lisa| 28|
+---+--------------+---+
| id| name|age|
+---+--------------+---+
| 1| Aamir Shahzad| 35|
| 2| Ali Raza| 30|
| 3| Bob| 25|
| 4| Lisa| 28|
+---+--------------+---+
O bônus de R$ 7.500 da Parimatch é real, sim! Claro que é preciso depositar um valor maior e cumprir o rollover de 40x, mas eu fiz isso aos poucos, jogando cassino ao vivo e alguns slots que contam pro bônus. Acesse BR Login e resgate seu bônus de cadastro ao se registrar! O site é bem claro nas regras, e o suporte no chat tirou todas as minhas dúvidas. É uma das melhores promoções de boas-vindas para quem curte cassino de verdade.
ReplyDelete