PySpark writeTo() Explained: Save, Append, Overwrite DataFrames to Tables | PySpark Tutorial

PySpark writeTo() Explained | Save, Append, Overwrite DataFrames

PySpark writeTo() Explained – Save, Append, Overwrite DataFrames

In this tutorial, you'll learn how to use the writeTo() function in PySpark to write DataFrames into managed or external tables using different write modes like append, overwrite, and more.

Step 1: Create Spark Session

from pyspark.sql import SparkSession

spark = SparkSession.builder.appName("WriteToExample").getOrCreate()

Step 2: Create a Sample DataFrame

data = [
  ("Aamir Shahzad1", "Lahore", "Pakistan"),
  ("Ali Raza1", "Karachi", "Pakistan"),
  ("Bob1", "New York", "USA"),
  ("Lisa", "Toronto", "Canada")
]
columns = ["full_name", "city", "country"]

df = spark.createDataFrame(data, schema=columns)
df.show()

Step 3: Write to Table using writeTo()

# This will create or replace the table if it exists
df.writeTo("default.people_table").createOrReplace()
print("✅ Table 'people_table' written using writeTo(). You can query it using SQL.")

Step 4: Query Table using Spark SQL

spark.sql("SELECT * FROM default.people_table").show()

Notes:

  • You can use .create() to fail if the table already exists.
  • Use .append() to add data to an existing table.
  • This works with tables in Hive metastore or Unity Catalog in Databricks.
  • The table will be stored under the 'default' database unless otherwise specified.

📺 Watch the Full Tutorial

No comments:

Post a Comment