PySpark cast() vs astype() Explained | Convert String to Int, Float & Double

PySpark cast() vs astype() Explained

In this tutorial, we'll explore how to convert PySpark DataFrame columns from one type to another using cast() and astype(). You'll learn how to convert string columns to integers, floats, and doubles in a clean and efficient way.

1. Sample DataFrame

from pyspark.sql import SparkSession
from pyspark.sql.functions import col

spark = SparkSession.builder.appName("CastExample").getOrCreate()

data = [
    ("1", "Aamir", "50000.5"),
    ("2", "Ali", "45000.0"),
    ("3", "Bob", None),
    ("4", "Lisa", "60000.75")
]

columns = ["id", "name", "salary"]
df = spark.createDataFrame(data, columns)
df.printSchema()
df.show()

2. Using `cast()` Function

Convert id to integer and salary to float:

df_casted = df.withColumn("id", col("id").cast("int")) \
              .withColumn("salary", col("salary").cast("float"))
df_casted.printSchema()
df_casted.show()

3. Using `astype()` Function

This is an alias for cast() and used in the same way:

df_astype = df_casted.withColumn("salary", col("salary").astype("double"))
df_astype.printSchema()
df_astype.show()

Output:

Original DataFrame (all columns as strings):
+---+-----+--------+
| id| name| salary |
+---+-----+--------+
| 1 |Aamir|50000.5 |
| 2 | Ali |45000.0 |
| 3 | Bob |  null  |
| 4 |Lisa |60000.75|
+---+-----+--------+

After cast():
root
 |-- id: integer (nullable = true)
 |-- name: string (nullable = true)
 |-- salary: float (nullable = true)

After astype():
root
 |-- id: integer (nullable = true)
 |-- name: string (nullable = true)
 |-- salary: double (nullable = true)

Welcome To TechBrothersIT

Label

PySpark cast() vs astype() Explained |Convert String to Int, Float & Double in DataFrame | PySpark Tutorial

PySpark cast() vs astype() Explained

1. Sample DataFrame

2. Using `cast()` Function

3. Using `astype()` Function

Output:

📺 Watch the Full Tutorial

No comments:

Post a Comment

Label

PySpark cast() vs astype() Explained |Convert String to Int, Float & Double in DataFrame | PySpark Tutorial

PySpark cast() vs astype() Explained

1. Sample DataFrame

2. Using cast() Function

3. Using astype() Function

Output:

📺 Watch the Full Tutorial

No comments:

Post a Comment

2. Using `cast()` Function

3. Using `astype()` Function