SQL GROUP BY with External Table and CSV File | Azure Synapse Analytics Tutorial | Azure Synapse Analytics Tutorial

SQL GROUP BY with External Table and CSV File | Azure Synapse Analytics Tutorial

๐Ÿ“Š SQL GROUP BY with External Table and CSV File | Azure Synapse Analytics Tutorial

In Azure Synapse Analytics, especially when using Serverless SQL Pools, the GROUP BY clause is essential for summarizing data. It allows you to perform aggregations such as COUNT, SUM, AVG, and more — grouped by one or more columns.

✅ What is GROUP BY?

The GROUP BY clause groups rows that have the same values into summary rows, like "total sales by region". It's often used with aggregate functions.

๐Ÿ“‚ Using GROUP BY with External CSV Files

In Synapse Serverless SQL Pools, you can use OPENROWSET to query external files stored in Azure Data Lake without needing to load them into a database.

๐Ÿงช Example 1: Total Customers per Country from CSV File


SELECT Country, COUNT(*) AS TotalCustomers
FROM OPENROWSET(
    BULK 'https://yourstorage.dfs.core.windows.net/container/customers/*.csv',
    FORMAT = 'CSV',
    PARSER_VERSION = '2.0',
    HEADER_ROW = TRUE
) AS [result]
GROUP BY Country;

๐Ÿงช Example 2: Total Revenue per Product from External Table


SELECT ProductID, SUM(Revenue) AS TotalRevenue
FROM dbo.ExternalSalesTable
GROUP BY ProductID;

๐Ÿ” Tips

  • Always include only columns in GROUP BY or aggregate functions in the SELECT clause
  • Use aliases for aggregate expressions for better readability
  • Test query performance — file format (CSV, Parquet) impacts it significantly

๐Ÿ“บ Watch the Tutorial

Credit: This blog was created with the help of ChatGPT and Gemini.

No comments:

Post a Comment