Using OPENROWSET with Explicit Schema in Azure Synapse | Azure Synapse Analytics Tutorial

Azure Synapse Analytics Demo: OPENROWSET with Schema

💡 Azure Synapse Analytics Demo: OPENROWSET with Schema

Author: Aamir Shahzad

Description: Demonstrating how to explicitly define schema using the WITH clause in OPENROWSET for CSV, Parquet, and JSON files using Serverless SQL Pool in Azure Synapse.


🔹 Example 1: Read Specific Columns from a CSV File

When reading CSV files, you can define schema using column positions. This avoids loading unnecessary columns and improves query performance.

SELECT TOP 10 *
FROM OPENROWSET(
    BULK 'https://techbrotherssynapsestg.dfs.core.windows.net/synpasecontainer/Europe/customer_20250313_194510_EMEA.csv',
    FORMAT = 'CSV',
    PARSER_VERSION = '2.0',
    FIRSTROW = 2
) WITH (
    customer_id INT 1,           -- Customer ID
    FName       VARCHAR(50) 2 ,  -- First Name
    EmployeeSalary VARCHAR(50) 4 -- Salary
) AS tbl;

Note: If ordinal positions are not provided, Synapse assumes left-to-right order starting at column 1.

🔹 Example 2: Read Parquet File with Explicit Schema

Parquet files support named fields. You don’t need to specify positions — but your column names in the WITH clause must match the actual schema inside the Parquet file.

SELECT *
FROM OPENROWSET(
    BULK 'https://techbrotherssynapsestg.dfs.core.windows.net/synpasecontainer/input/titanic.parquet',
    FORMAT = 'PARQUET'
) WITH (
    Sex  VARCHAR(10),      -- Gender
    Age  FLOAT,            -- Age
    Name VARCHAR(200)      -- Full Name
) AS rows;

Tip: If the column names do not match exactly, the result will show NULLs.

🔹 Example 3: Parse and Read Values from a JSON File

This method lets you extract values from NDJSON (newline-delimited JSON) using OPENROWSET combined with OPENJSON.

SELECT
    jsonContent,
    jsonData.customer_id,
    jsonData.name,
    jsonData.email
FROM OPENROWSET(
    BULK 'https://techbrotherssynapsestg.dfs.core.windows.net/synpasecontainer/input/sample_customers.json',
    FORMAT = 'CSV',
    FIELDQUOTE = '0x0b',
    FIELDTERMINATOR = '0x0b',
    ROWTERMINATOR = '0x0a'
) WITH (
    jsonContent VARCHAR(MAX)  -- Read JSON as text blob
) AS raw
CROSS APPLY OPENJSON(jsonContent)
WITH (
    customer_id INT,
    name        VARCHAR(100),
    email       VARCHAR(100)
) AS jsonData;

Explanation: This approach tricks Synapse into treating the whole line as a single string using 0x0b delimiters, allowing JSON parsing with OPENJSON().


📺 Watch the Full Video Tutorial

Here’s a complete walkthrough demo on YouTube:



This blog post was created with assistance from ChatGPT and Gemini AI to ensure technical clarity and accuracy.

No comments:

Post a Comment