How to Add Columns and Check Schema in PySpark DataFrame | PySpark Tutorial

How to Add Columns to DataFrame and Check Schema in PySpark

How to Add Columns to DataFrame and Check Schema in PySpark

In this tutorial, we’ll cover how to add columns to a DataFrame and also how to check the schema of a DataFrame using PySpark.

1. Creating a DataFrame

data = [
    (1, "Alice", 25),
    (2, "Bob", 30),
    (3, "Charlie", 35),
    (4, "David", 40)
]

df = spark.createDataFrame(data, ["id", "name", "age"])
df.show()

2. Adding New Columns

We can add new columns using the withColumn() function.

from pyspark.sql.functions import lit

df_new = df.withColumn("country", lit("USA"))
df_new.show()

3. Adding Columns Using Expressions

from pyspark.sql.functions import col

df_exp = df.withColumn("age_double", col("age") * 2)
df_exp.show()

4. Adding Multiple Columns

df_multi = df \
    .withColumn("country", lit("USA")) \
    .withColumn("age_plus_ten", col("age") + 10)

df_multi.show()

5. Checking the Schema of DataFrame

df.printSchema()

This command prints the schema of the DataFrame, showing column names and data types.

Conclusion

Adding columns in PySpark is simple and flexible. The withColumn() method is the most common way to add or modify columns, and the printSchema() method provides a quick view of the DataFrame’s structure.

Watch the Tutorial Video

Watch on YouTube

3 comments:

  1. These tools make data manipulation efficient and intuitive, much like how a digital services agency streamlines processes and enhances workflows for businesses. By leveraging advanced tools and expertise, they ensure seamless operations and clear insights, empowering organizations to achieve their goals with precision and ease.

    ReplyDelete
  2. SELLING FRESH LEADS, FULLZ, DATABASE
    USA SSN – UK NIN – CANADA SIN
    verified and freshly updated 2025

    USA FULLZ | UK FULLZ | CANADA FULLZ
    =SSN DL front back with Selfie
    =Passport photo
    =UK DL
    =Canada DL
    =EIN INFO
    =Business owner Leads
    =Payday & Personal loan Leads
    =First hit Sweepstakes Leads
    =Casinos database
    =Home owners Leads
    =Employee Leads
    =USA Bank Leads
    =Phone numbers & Email leads
    =Mortgage Leads
    =Crypto & Forex Leads
    =Stock Market Trader Leads
    =Education Leads
    =Cars data base with registration number
    =Loan Method & Carding Method
    Many other stuff available…

    All info will be fresh and updated
    Wrong and invalid data will be replaced
    Stuff delivery after payment proof
    Payment mode only crypto
    Available 24/7

    For deals & discounts contact us
    What’s APP = +1.. 605.. 8461… 870
    TELE GRAM = @ lead_pro20
    E-mail = datatrader 3 at Gmail dot com

    ReplyDelete
  3. Thank you for the detailed and clear guide! It seems like a simple task - add a column and check the scheme, but when everything is described step by step, it becomes much easier to avoid mistakes. Especially when working with important data, it is important not only to understand what you are doing, but also why. By the way, this kind of conscious approach is now increasingly valued in various fields. Recently I read an article on the website of a marine company https://gaelixmarineservice.com/safety/, where they share their approach to safety - and these are not just instructions, but a whole system in which everything starts with careful attention to detail. I think this is no less relevant in IT: each column, each command in SQL is also part of the overall structure, behind which lies the security and stability of the system.

    ReplyDelete