๐ How to Join Parquet File with CSV Using OPENROWSET in Serverless SQL Pool - Azure Synapse
Azure Synapse Analytics Serverless SQL Pool allows you to query data stored in various file formats such as CSV and Parquet directly from Azure Data Lake using the OPENROWSET function. This blog post shows how you can join data between a Parquet file and a CSV file without ingesting them into a database.
๐ง File Format Overview
- CSV (Comma Separated Values): A flat file format used to store tabular data, often with headers and delimited values.
- Parquet: A columnar storage file format optimized for analytical workloads — fast to read specific columns and space-efficient.
๐ JOINing CSV and Parquet Files using OPENROWSET
Both files are read directly from Azure Data Lake using their respective OPENROWSET definitions, then joined using T-SQL.
๐งช Example Query
SELECT
csv.CustomerID,
csv.Name,
prq.Country,
prq.TotalSpent
FROM
OPENROWSET(
BULK 'https://yourlakehouse.dfs.core.windows.net/sales/customer.csv',
FORMAT = 'CSV',
PARSER_VERSION = '2.0',
HEADER_ROW = TRUE
) AS csv
JOIN
OPENROWSET(
BULK 'https://yourlakehouse.dfs.core.windows.net/sales/customer_spending.parquet',
FORMAT = 'PARQUET'
) AS prq
ON csv.CustomerID = prq.CustomerID;
✅ Benefits of This Approach
- No need to load files into tables
- Low-cost, pay-per-query model
- Supports ad hoc exploration of raw data
⚠️ Best Practices
- Ensure that both files have compatible data types for join keys
- Use
SELECTonly on required columns to reduce data scanned - Preview file schemas with
TOP 10orsp_describe_first_result_setbefore joining
๐บ Watch the Full Tutorial
Credit: This blog post was created with the help of ChatGPT and Gemini.



No comments:
Post a Comment
Note: Only a member of this blog may post a comment.