How to Use toJSON() in PySpark - Convert DataFrame Rows to JSON Strings | PySpark Tutorial

PySpark Tutorial: How to Use toJSON() – Convert DataFrame Rows to JSON Strings

PySpark Tutorial: How to Use toJSON() – Convert DataFrame Rows to JSON Strings

This tutorial demonstrates how to use PySpark's toJSON() function to convert each row of a DataFrame into a JSON string. This is especially useful for exporting data, streaming to APIs, or sending JSON records to systems like Kafka or NoSQL databases.

1. Import SparkSession

from pyspark.sql import SparkSession

spark = SparkSession.builder.appName("PySpark toJSON Example").getOrCreate()

2. Create a Sample DataFrame

data = [
    (1, "Aamir Shahzad", "Pakistan", 5000),
    (2, "Ali Raza", "Pakistan", 6000),
    (3, "Bob", "USA", 5500),
    (4, "Lisa", "Canada", 7000)
]

columns = ["ID", "Name", "Country", "Salary"]

df = spark.createDataFrame(data, columns)

df.show()

Output:

ID Name Country Salary
1Aamir ShahzadPakistan5000
2Ali RazaPakistan6000
3BobUSA5500
4LisaCanada7000

3. Convert Rows to JSON using toJSON()

json_rdd = df.toJSON()

for item in json_rdd.collect():
    print(item)

Output:

JSON Output
{"ID":1,"Name":"Aamir Shahzad","Country":"Pakistan","Salary":5000}
{"ID":2,"Name":"Ali Raza","Country":"Pakistan","Salary":6000}
{"ID":3,"Name":"Bob","Country":"USA","Salary":5500}
{"ID":4,"Name":"Lisa","Country":"Canada","Salary":7000}

4. Why Use toJSON()?

  • To convert structured data into JSON for downstream systems
  • To write JSON records to a file or stream
  • To integrate with Kafka, APIs, or NoSQL databases
  • To prepare data for logging or message queues

🎥 Watch the Full Video Tutorial

▶️ Watch on YouTube

Author: Aamir Shahzad

© 2025 PySpark Tutorials. All rights reserved.

1 comment:

  1. Great tutorial on using toJSON() in PySpark! It's always helpful to understand how to efficiently convert DataFrame rows to JSON. While looking into resources on PySpark, I also came across a tally institute near me that offers courses on big data tools.

    ReplyDelete