How to Use createDataFrame Function with Schema in PySpark to create DataFrame | PySpark Tutorial

How to use createDataFrame() with Schema in PySpark

How to use createDataFrame() with Schema in PySpark

In PySpark, when creating a DataFrame using createDataFrame(), you can specify a schema to define column names and data types explicitly. This is useful when you want to control the structure and data types of your DataFrame instead of relying on PySpark's automatic inference.

Why define a Schema?

  • Ensures consistent column names and data types
  • Improves data quality and validation
  • Provides better control over data transformations

Example Usage

Below is a sample example of how to create a DataFrame using a schema in PySpark:

from pyspark.sql.types import StructType, StructField, IntegerType, StringType

# Define schema
schema = StructType([
    StructField("id", IntegerType(), False),
    StructField("name", StringType(), True),
    StructField("age", IntegerType(), True)
])

# Sample data
data = [
    (1, "Alice", 25),
    (2, "Bob", 30),
    (3, "Charlie", 35),
    (4, "Amir", 40)  # None represents a NULL value in PySpark
]

# Create DataFrame using schema
df = spark.createDataFrame(data, schema=schema)

# Show the DataFrame
df.show()

# Check the schema of the DataFrame
df.printSchema()

Output

+---+-------+---+
| id|   name|age|
+---+-------+---+
|  1|  Alice| 25|
|  2|    Bob| 30|
|  3|Charlie| 35|
|  4|   Amir| 40|
+---+-------+---+

Check the Schema

root
 |-- id: integer (nullable = false)
 |-- name: string (nullable = true)
 |-- age: integer (nullable = true)

Watch the Video Tutorial

If you prefer a video explanation, check out the tutorial below:

1 comment:

  1. SELLING FRESH LEADS, FULLZ, DATABASE
    USA SSN – UK NIN – CANADA SIN
    verified and freshly updated 2025

    USA FULLZ | UK FULLZ | CANADA FULLZ
    =SSN DL front back with Selfie
    =Passport photo
    =UK DL
    =Canada DL
    =EIN INFO
    =Business owner Leads
    =Payday & Personal loan Leads
    =First hit Sweepstakes Leads
    =Casinos database
    =Home owners Leads
    =Employee Leads
    =USA Bank Leads
    =Phone numbers & Email leads
    =Mortgage Leads
    =Crypto & Forex Leads
    =Stock Market Trader Leads
    =Education Leads
    =Cars data base with registration number
    =Loan Method & Carding Method
    Many other stuff available…

    All info will be fresh and updated
    Wrong and invalid data will be replaced
    Stuff delivery after payment proof
    Payment mode only crypto
    Available 24/7

    For deals & discounts contact us
    What’s APP = +1.. 605.. 8461… 870
    TELE GRAM = @ lead_pro20
    E-mail = datatrader 3 at Gmail dot com

    ReplyDelete