How to use createDataFrame() with Schema in PySpark
In PySpark, when creating a DataFrame using createDataFrame()
, you can specify a schema to define column names and data types explicitly. This is useful when you want to control the structure and data types of your DataFrame instead of relying on PySpark's automatic inference.
Why define a Schema?
- Ensures consistent column names and data types
- Improves data quality and validation
- Provides better control over data transformations
Example Usage
Below is a sample example of how to create a DataFrame using a schema in PySpark:
from pyspark.sql.types import StructType, StructField, IntegerType, StringType
# Define schema
schema = StructType([
StructField("id", IntegerType(), False),
StructField("name", StringType(), True),
StructField("age", IntegerType(), True)
])
# Sample data
data = [
(1, "Alice", 25),
(2, "Bob", 30),
(3, "Charlie", 35),
(4, "Amir", 40) # None represents a NULL value in PySpark
]
# Create DataFrame using schema
df = spark.createDataFrame(data, schema=schema)
# Show the DataFrame
df.show()
# Check the schema of the DataFrame
df.printSchema()
Output
+---+-------+---+
| id| name|age|
+---+-------+---+
| 1| Alice| 25|
| 2| Bob| 30|
| 3|Charlie| 35|
| 4| Amir| 40|
+---+-------+---+
Check the Schema
root
|-- id: integer (nullable = false)
|-- name: string (nullable = true)
|-- age: integer (nullable = true)
Watch the Video Tutorial
If you prefer a video explanation, check out the tutorial below:
SELLING FRESH LEADS, FULLZ, DATABASE
ReplyDeleteUSA SSN – UK NIN – CANADA SIN
verified and freshly updated 2025
USA FULLZ | UK FULLZ | CANADA FULLZ
=SSN DL front back with Selfie
=Passport photo
=UK DL
=Canada DL
=EIN INFO
=Business owner Leads
=Payday & Personal loan Leads
=First hit Sweepstakes Leads
=Casinos database
=Home owners Leads
=Employee Leads
=USA Bank Leads
=Phone numbers & Email leads
=Mortgage Leads
=Crypto & Forex Leads
=Stock Market Trader Leads
=Education Leads
=Cars data base with registration number
=Loan Method & Carding Method
Many other stuff available…
All info will be fresh and updated
Wrong and invalid data will be replaced
Stuff delivery after payment proof
Payment mode only crypto
Available 24/7
For deals & discounts contact us
What’s APP = +1.. 605.. 8461… 870
TELE GRAM = @ lead_pro20
E-mail = datatrader 3 at Gmail dot com