PySpark writeTo()
Explained – Save, Append, Overwrite DataFrames
In this tutorial, you'll learn how to use the writeTo()
function in PySpark to write DataFrames into managed or external tables using different write modes like append, overwrite, and more.
Step 1: Create Spark Session
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName("WriteToExample").getOrCreate()
Step 2: Create a Sample DataFrame
data = [
("Aamir Shahzad1", "Lahore", "Pakistan"),
("Ali Raza1", "Karachi", "Pakistan"),
("Bob1", "New York", "USA"),
("Lisa", "Toronto", "Canada")
]
columns = ["full_name", "city", "country"]
df = spark.createDataFrame(data, schema=columns)
df.show()
Step 3: Write to Table using writeTo()
# This will create or replace the table if it exists
df.writeTo("default.people_table").createOrReplace()
print("✅ Table 'people_table' written using writeTo(). You can query it using SQL.")
Step 4: Query Table using Spark SQL
spark.sql("SELECT * FROM default.people_table").show()
Notes:
- You can use
.create()
to fail if the table already exists. - Use
.append()
to add data to an existing table. - This works with tables in Hive metastore or Unity Catalog in Databricks.
- The table will be stored under the 'default' database unless otherwise specified.
No comments:
Post a Comment