PySpark Tutorial: limit() Function to Display Limited Rows | PySpark tutorial for Data Engineers

PySpark limit() Function Explained with Examples | Step-by-Step Guide

PySpark limit() Function Explained with Examples

The limit() function in PySpark is used to return a specified number of rows from a DataFrame. It helps in sampling data or fetching a small subset for quick analysis, especially useful for data engineers working with large datasets.

Sample Data

data = [
    (1, "Alice", 5000),
    (2, "Bob", 6000),
    (3, "Charlie", 7000),
    (4, "David", 8000),
    (5, "Eve", 9000),
    (6, "Frank", 10000),
    (7, "Grace", 11000),
    (8, "Hannah", 12000),
    (9, "Ian", 13000),
    (10, "Jack", 14000)
]

Create a DataFrame

df = spark.createDataFrame(data, ["id", "name", "salary"])

Show the Full DataFrame

df.show()

Example 1: Get the First 5 Rows

df.limit(5).show()

Example 2: Get the First 3 Rows

df.limit(3).show()

Example 3: Store the Limited DataFrame

df_limited = df.limit(4)
df_limited.show()

Watch the Video Tutorial

3 comments:

  1. What to do after a big betting win?
    When I won my first big bet, the first thing I did was cash out a portion and leave the rest to continue playing smart. I bet on my favorite sports, like soccer and tennis, at https://pure-win-casino.in/, and one thing I've learned is that after a big win, it's easy to get overexcited and risk too much. My advice is to enjoy the moment, but don't lose discipline. It's always a good idea to set a budget and not bet everything you've won at once.

    ReplyDelete
  2. The article explains PySpark's limit function well, showing how to fetch a subset of rows from a DataFrame useful for quick data analysis. Examples make it easy to understand. However, inserting cheap essay writing service seems unrelated and spammy, degrading the content's quality.

    ReplyDelete