๐ SQL GROUP BY with External Table and CSV File | Azure Synapse Analytics Tutorial
In Azure Synapse Analytics, especially when using Serverless SQL Pools, the GROUP BY
clause is essential for summarizing data. It allows you to perform aggregations such as COUNT
, SUM
, AVG
, and more — grouped by one or more columns.
✅ What is GROUP BY?
The GROUP BY
clause groups rows that have the same values into summary rows, like "total sales by region". It's often used with aggregate functions.
๐ Using GROUP BY with External CSV Files
In Synapse Serverless SQL Pools, you can use OPENROWSET
to query external files stored in Azure Data Lake without needing to load them into a database.
๐งช Example 1: Total Customers per Country from CSV File
SELECT Country, COUNT(*) AS TotalCustomers
FROM OPENROWSET(
BULK 'https://yourstorage.dfs.core.windows.net/container/customers/*.csv',
FORMAT = 'CSV',
PARSER_VERSION = '2.0',
HEADER_ROW = TRUE
) AS [result]
GROUP BY Country;
๐งช Example 2: Total Revenue per Product from External Table
SELECT ProductID, SUM(Revenue) AS TotalRevenue
FROM dbo.ExternalSalesTable
GROUP BY ProductID;
๐ Tips
- Always include only columns in
GROUP BY
or aggregate functions in the SELECT clause - Use aliases for aggregate expressions for better readability
- Test query performance — file format (CSV, Parquet) impacts it significantly
๐บ Watch the Tutorial
Credit: This blog was created with the help of ChatGPT and Gemini.
No comments:
Post a Comment