How to Use COPY INTO or Bulk Load CSV/Parquet Files into Dedicated SQL Pool | Azure Synapse Analytics Tutorial

How to Use COPY INTO or Bulk Load CSV/Parquet Files into Dedicated SQL Pool | Azure Synapse

How to Use COPY INTO or Bulk Load CSV/Parquet Files into Dedicated SQL Pool | Azure Synapse

📦 What is COPY INTO in Azure Synapse?

COPY INTO is a high-performance T-SQL command used in Azure Synapse Analytics to bulk load data from external files (CSV or Parquet) in Azure Data Lake Storage Gen2 into a table in a Dedicated SQL Pool.

This method is optimized for loading large datasets efficiently and supports common file formats like .csv and .parquet, using parallel processing under the hood.

🗂️ Supported File Formats

  • CSV — with or without headers
  • Parquet — columnar storage format for analytics

🛠️ Syntax of COPY INTO

COPY INTO target_table
FROM 'https://.dfs.core.windows.net//path/'
WITH (
    FILE_TYPE = 'CSV', -- or 'PARQUET'
    CREDENTIAL = (IDENTITY= 'Shared Access Signature', SECRET=''),
    FIELDTERMINATOR = ',',  -- CSV specific
    ROWTERMINATOR = '0x0A', -- CSV specific
    FIRSTROW = 2            -- skip header if present
);

✅ Example: Load CSV File into Sales Table

COPY INTO dbo.Sales
FROM 'https://techstorage.dfs.core.windows.net/raw/sales.csv'
WITH (
    FILE_TYPE = 'CSV',
    CREDENTIAL = (IDENTITY= 'Shared Access Signature', SECRET='sv=2024...'),
    FIELDTERMINATOR = ',',
    ROWTERMINATOR = '0x0A',
    FIRSTROW = 2
);

✅ Example: Load Parquet File into Product Table

COPY INTO dbo.Product
FROM 'https://techstorage.dfs.core.windows.net/raw/productdata/'
WITH (
    FILE_TYPE = 'PARQUET',
    CREDENTIAL = (IDENTITY= 'Shared Access Signature', SECRET='sv=2024...')
);

📌 Tips for Using COPY INTO

  • Make sure your Dedicated SQL Pool is resumed and ready to load.
  • Use Azure Data Lake Storage Gen2 for optimal performance.
  • Test COPY INTO with a small file before loading TBs of data.
  • Always provide correct SAS token or Managed Identity permissions for access.

🚀 When to Use COPY INTO

  • Bulk data ingestion from raw zones in your data lake
  • Loading files created by Synapse Pipelines or Azure Data Factory
  • Loading historical or batch data into a reporting database

📺 Watch the Video Tutorial

📚 Credit: Content created with the help of ChatGPT and Gemini.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.