How to Join Parquet File with CSV Using OPENROWSET in Serverless SQL Pool - Azure Synapse

🔗 How to Join Parquet File with CSV Using OPENROWSET in Serverless SQL Pool - Azure Synapse

Azure Synapse Analytics Serverless SQL Pool allows you to query data stored in various file formats such as CSV and Parquet directly from Azure Data Lake using the OPENROWSET function. This blog post shows how you can join data between a Parquet file and a CSV file without ingesting them into a database.

🧠 File Format Overview

CSV (Comma Separated Values): A flat file format used to store tabular data, often with headers and delimited values.
Parquet: A columnar storage file format optimized for analytical workloads — fast to read specific columns and space-efficient.

📘 JOINing CSV and Parquet Files using OPENROWSET

Both files are read directly from Azure Data Lake using their respective OPENROWSET definitions, then joined using T-SQL.

🧪 Example Query


SELECT 
    csv.CustomerID,
    csv.Name,
    prq.Country,
    prq.TotalSpent
FROM 
    OPENROWSET(
        BULK 'https://yourlakehouse.dfs.core.windows.net/sales/customer.csv',
        FORMAT = 'CSV',
        PARSER_VERSION = '2.0',
        HEADER_ROW = TRUE
    ) AS csv
JOIN 
    OPENROWSET(
        BULK 'https://yourlakehouse.dfs.core.windows.net/sales/customer_spending.parquet',
        FORMAT = 'PARQUET'
    ) AS prq
ON csv.CustomerID = prq.CustomerID;

✅ Benefits of This Approach

No need to load files into tables
Low-cost, pay-per-query model
Supports ad hoc exploration of raw data

⚠️ Best Practices

Ensure that both files have compatible data types for join keys
Use SELECT only on required columns to reduce data scanned
Preview file schemas with TOP 10 or sp_describe_first_result_set before joining

📺 Watch the Full Tutorial

Credit: This blog post was created with the help of ChatGPT and Gemini.

Welcome To TechBrothersIT

Label