How to Use the display() Function in Databricks | PySpark Tutorial for Beginners

How to Use display() Function in Databricks | PySpark Tutorial for Beginners

How to Use display() Function in Databricks | PySpark Tutorial for Beginners

The display() function in Databricks provides an interactive way to visualize DataFrames directly within your Databricks notebook. Although not part of standard PySpark, it's a powerful tool designed specifically for Databricks users.

Advantages of display()

  • Auto-formats the table output: Displays DataFrame results in a well-formatted table automatically.
  • Interactive sorting and filtering: Allows you to sort and filter columns directly in the notebook interface.
  • Built-in charts and visualizations: Provides options to visualize data using bar charts, pie charts, line graphs, etc., without writing extra code.

How to Use display() in Databricks

Here’s a simple example of how to use the display() function inside a Databricks notebook:

# Create a sample DataFrame
data = [("James", "Smith", "USA", "CA"), 
        ("Michael", "Rose", "USA", "NY"), 
        ("Robert", "Williams", "USA", "CA"), 
        ("Maria", "Jones", "USA", "FL")]

columns = ["firstname", "lastname", "country", "state"]

df = spark.createDataFrame(data, columns)

# Display the DataFrame in Databricks notebook
display(df)

This will open an interactive table display of your DataFrame in the Databricks notebook. You can sort, filter, and switch to chart views easily.

When Should You Use display()?

Use display() when working in Databricks notebooks and you need a quick, interactive way to explore and visualize your data.

Note: The display() function works only within Databricks and isn’t available in standalone PySpark environments.

Watch the Video Tutorial

Watch on YouTube

Author: Aamir Shahzad

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.