SSIS - How to Find The Version Of SSIS Package From Dtsx File

Scenario: 

Let's say you just start working for a company and they pointed you to a folder which holds SSIS Packages. You need to find out the version of these SSIS Package and schedule them to SQL Server 2008 or SQL Server 2012 according to their version.

Solution:

To find out the version our SSIS Package , we need to read the .dtsx file itself. We can open the file by using different programs such as internet explorer or notepad or word pad etc. The .dtsx files are xml files and the property we need to look for is "PackageFormatVersion".

For SSIS 2008
PackageFormatVersion="PackageFormatVersion">3


For SSIS 2012 
PackageFormatVersion="PackageFormatVersion">6

Fig 1: SSIS Package with different versions

As you can see that I have two SSIS Packages but can't tell either they are SSIS 2008 or SSIS 2012.

Right Click on the Package.dtsx and go to Open With. You can choose the program you want to use for this. I opened it with Notepad.

Once the file is opened. You have to look for PackageFormatVersion. You can use Ctrl+F ( Find) and then find it. 

Fig 2: Find Version of SSIS Package from dtsx file

As PackageFormatVersion=6, This package was developed in SSIS 2012.


Let's check the package1.dtsx by using the same method.
Fig 3: Read SSIS Package file (dtsx) to find out version

As PackageFormatVersion=3, The version of this SSIS Package is 2008/2008 R2.

SSIS - How To Use Flat File Or Excel File In Lookup Transformation [Cache Transformation]

Scenario: 

We use Lookup Transformation to join input data with reference data to get required data from Reference Data. By using Lookup Transformation we can connect to databases and get the reference data. But our company has An Excel file that they want to use as reference. The business users can add or remove the records from this Excel. Before we load any data to our destination table we need to verify against this Excel sheet if data matches on required columns.


Solution:  

Once solution to this problem can be, Create the Staging Table and then truncate on each load and load from Flat file and then use in lookup transformation for reference.

In SSIS 2008, Cache Transformation was introduction so we don't have to load the data into Staging/Temp tables to use in Lookup Transformation. We can load the data in memory by using Cache Transformation and then use in Lookup Transformation.

Let's say if I have a source table as shown below
Fig 1: Source Data for Cache Transformation Example

As we can see that I don't have the CountryName. I need to get the CountryName from Flat file by joining the input and reference data on Country Code.


Fig 2: Flat File Reference Data for Cache Transformation


Step 1: 

Create an SSIS Package. Inside SSIS Package bring the Data Flow Task. In this Data Flow Task we will load the Data from Flat File to Cache Transformation. Bring the Flat source and create connection manager for flat file and then bring the Cache Transformation and configure as shown below.
Fig 3: Create Configure Cache Transformation in SSIS Package

Fig 4: Configure Index Position for Mapping Columns in Cache Transformation.

As we are going to use CountryCode for joining with out input column. Set the Index position for this column to 1. If you are using more than one column for joining , you can change the Index position. For first column, It will be 1 and then for second 2 and so on. We don't need to Index the Column/s which we are not using in Join.

Fig 5: Load Reference Data To Cache by using Cache Transformation in First Data Flow Task

In Fig 5, we have loaded the reference data from Flat file source to Cache by using Cache Transformation in Data Flow Task.

Step 2: 

Bring the second Data Flow Task and connect with First Data Flow Task. Inside 2nd Data Flow Task, Read the data from SQL table by using OLE DB Source.Bring the Lookup Transformation and attach OLE DB source to it and configure as shown below.

Fig 6: Configure OLE DB Source


Double Click the Lookup Transformation and then in Connection Type choose Cache Connection Manager as shown below.

I Left Fail Component in How to handle rows with no matching entries. You can configure as per your requirement. You can redirect or fail if there are some record/s that does not match with reference dataset.
Fig 7: Use Cache Connection Manager in Lookup Transformation


Fig 8: Choose Cache Connection Manager in Lookup Transformation

Go to Columns and then connect the columns on which do you want to join. I have connected CountryCode column from Source to CountryCode in Reference Dataset. By using the Check Boxes, Select the columns those you want to retrieve from reference dataset. In our case we want to retrieve Country Name and finally hit Ok. 
Fig 9: Map the Columns from Source to Reference Data Set


Add the destination and then map the columns. Just to show the output, I have used Multicast Transformation as Test Destination and added data viewer to show the records. 

Fig 10: Output from Lookup Transformation


Our final package will look like below fig.
Fig 1: How to Use Cache Transformation in SSIS Package




SSIS - How To Delete Bottom N Rows From Flat File in SSIS Package

Scenario:

In our previous post, we learnt how to delete Top N rows from flat file by using SSIS. In this post we will learn how to delete Bottom or Last N rows from text file in SSIS by using Script task. In below fig we can see that there are four rows those are company information and file information. We want to delete these rows before use this file as source to load into destination.
fig 1: Source file with extra information that need to deleted.


Solution:

We will be using Script Task in SSIS Package to delete bottom N rows from text file. Here are the steps.

Step 1:

Create SSIS Package and inside SSIS Package create two variables. One to hold  source file path and other to hold the rows to be deleted from bottom.
Fig 2: Create variables in Delete Bottom N Rows SSIS Package

Step 2:

Bring the Script Task to Control Flow Pane and then map the variables as shown in fig 3.
Fig 3: Map the variables in Script Task for Delete Last N Rows from Flat File

Click on Edit Script button and then paste the below script. I have bold the script that I wrote. You can copy and paste only bold code to your Script Task.

#region Namespaces
using System;
using System.Data;
using Microsoft.SqlServer.Dts.Runtime;
using System.Windows.Forms;
using System.IO;
using System.Linq;


#endregion

namespace ST_c62f3dcfb0964917aade179aac4edfab
{
   
    [Microsoft.SqlServer.Dts.Tasks.ScriptTask.SSISScriptTaskEntryPointAttribute]
    public partial class ScriptMain : Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase
    {
       
        public void Main()
        {
            // TODO: Add your code here
    string FilePath = Dts.Variables["User::VarFilePath"].Value.ToString();
    Int32 DeleteBottomNRows = Convert.ToInt32(Dts.Variables["VarDeleteBottomNRows"].Value); 
            string[] lines = System.IO.File.ReadAllLines(FilePath);
            
           Array.Reverse(lines);
           lines= lines.Skip(DeleteBottomNRows).ToArray();
           Array.Reverse(lines);

            System.IO.StreamWriter file = new System.IO.StreamWriter(FilePath);

            foreach (string line in lines)
            {
            // MessageBox.Show(line.ToString());
            file.WriteLine(line);
            }

            file.Close();
            Dts.TaskResult = (int)ScriptResults.Success;
        }

     
        enum ScriptResults
        {
            Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success,
            Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure
        };
        

    }
}


Let's run our SSIS Package now and check the file if required rows are deleted from bottom.

Fig 4:Bottom N Rows are deleted from Flat File by using SSIS Package.

As we can see that the required rows are deleted from source file. We are reading the file in string array and then deleting the required rows and over writing the actual source file. The number of rows can be changed by using the value of VarDeleteBottomNRows variable.





SSIS - How to Delete Top N Rows from CSV or Text File in SSIS by using Script Task

Scenario:

We have received text or csv file. The file has some company information,date created etc. before the data rows start. Our goal is to delete these information rows and regenerate file start from header row.
Fig: Source File With Company Information

Solution:

We will be using Script Task to Delete the Top N rows from text file. We will be reading all the data from the file in string type array and then overwrite the original file after removing Top N rows.

Step 1: 

Create two variables as shown in figure.
VarFilePath is variable holding the file path. VarDeleteTopNRows is variable that will hold the number of rows you want to delete from starting of file.
Fig 1: Create variables in SSIS Package To Delete Top N Rows from Flat File by using Script Task

Step 2:

Bring the Script Task to Control Flow Pane. Map the above Variables as shown to ReadOnlyVariables in Script Task.
Fig 2: Map Variables in Script Task

Paste the below Code. I have bold the code that I wrote, rest of the code is auto generated by Script Task.

#region Namespaces
using System;
using System.Data;
using Microsoft.SqlServer.Dts.Runtime;
using System.Windows.Forms;
using System.IO;
using System.Linq;

#endregion

namespace ST_c62f3dcfb0964917aade179aac4edfab
{
   
    [Microsoft.SqlServer.Dts.Tasks.ScriptTask.SSISScriptTaskEntryPointAttribute]
    public partial class ScriptMain : Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase
    {
       
        public void Main()
        {
            // TODO: Add your code here
        string FilePath = Dts.Variables["User::VarFilePath"].Value.ToString();
        Int32 DeleteTopNRows = Convert.ToInt32(Dts.Variables["VarDeleteTopNRows"].Value); 
            string[] lines = System.IO.File.ReadAllLines(FilePath);
            lines = lines.Skip(DeleteTopNRows).ToArray();
            System.IO.StreamWriter file = new System.IO.StreamWriter(FilePath);

            foreach (string line in lines)
            {
            //    MessageBox.Show(line.ToString());
            file.WriteLine(line);
            }

            file.Close();
            Dts.TaskResult = (int)ScriptResults.Success;
        }

     
        enum ScriptResults
        {
            Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success,
            Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure
        };
        

    }
}


Save the code. Run the script task and once completed. Check the file again.
Fig 3: Top 3 Rows Deleted By SSIS Package.

You can change the value of variable VarDeleteTopNRows according to your requirements. 

DBA Posts - Shrinking SQL Server Database

Before I start on this topic, I want to make everyone aware that lots of considerations need to be taken into account prior to shrinking a database. Below are some facts of shirking database

  1. Never Shrink a database unless you absolutely have to
  2. Shrinking database can be very costly in terms of database performance
  3. Avoid Auto Shrink
  4. Avoid Shrinking by planning database usage ahead of time 
First and foremost consideration that you need to take into account is your database performance, shrinking database files or log files might be alright when it comes to TEST, DEV environments, however I would avoid in production environment. Shrinking database will give you unused space back, which might look great at the time being but back behind the scene it is causing high fragmentation, high fragmentation means poor performance of database. We will talk about adverse effects of shrinking database later in this post. Let's talk about first how to shrink the database files.


Shrinking database files in SQL Server 2005  

 a) Using T-SQL:


1: BACKUP LOG SQLAge WITH no_log
2: USE SQLAge;    GO
    
DBCC SHRINKFILE (SQLAge_LOg, 10); -- This will reduce the log file size to 10mb.

I would avoid shrinking the entire database, Best way to do it is shrinking the files of a database, log and/or data files separately.
b) Shrinking Database files using management Studio 
Step 1:
BACKUP LOG SQLAge WITH no_log -- SQLAge is your database name , you can use truncate option as well
Step 2:
Right click on the database and go to Tasks > Shrink > Files

Fig 1 - Shrinking Database Files in SQL Server 2005

Select the file type which you would like to Shrink (Data or Log)

Fig 2 - Shrinking Database Files in SQL Server 2005
Click on re-organize pages before releasing unused space option and choose the file size in Shrink File to - Fig 3 shows, shrinking Log file to 10 MB. Then click ok

Fig 3 - Shrinking Database Files in SQL Server 2005

Shrinking database Files in SQL Server 2008 /2012

a) Using T-SQL :

USE SQLAge
GO
 
ALTER DATABASE SQLAge -- This will truncate the Log file when you execute this statement
SET RECOVERY SIMPLE
GO
 
DBCC SHRINKFILE (SQLAge_Log, 10);--This will Shrink the log file to 10 mb
GO
 
ALTER DATABASE SQLAge
SET RECOVERY FULL
GO
 

Note: Keep in mind running above T-SQL will break the Transactional log backup chain, you might want to run FULL backup of the database right after above operation.
b) Shrinking database Files in SQL Server 2008 /2012 Using Management Studio
 Step 1.

Right click on the database that you would like to shrink and go to properties

Fig 1 - Shrinking Database files in SQL Server 2008/2012



Click on options and select recovery mode to Simple and click Ok

Fig 2 - Shrinking Database files in SQL Server 2008/2012


Step 2.

Right click on the database and go to Tasks > Shrink > Files

Fig 3 - Shrinking Database files in SQL Server 2008/2012

Select the file type which you would like to Shrink (Data or Log)


Fig 4 - Shrinking Database files in SQL Server 2008/2012
Click on reorganize pages before releasing unused space option and choose the file size in Shrink File to - Fig 5 shows, shrinking Log file to 10 MB. Then click ok


Fig 5 - Shrinking Database files in SQL Server 2008/2012


What's New in SQL Server 2014 - Overview


What are the new features in SQL Server 2014?  


In this post I will be covering briefly new and enhanced feature of SQL Server 2014. This post is intended to give visibility into new and enhanced features but I will not be covering technical detail (process and procedure) of these features.

Below are the new or enhanced features of SQL Server 2014

1.       Memory Optimized Objects

In SQL Server 2014, memory-optimized Engine is basically integrated in SQL Server engine. You can define SQL Server objects such as Tables and Store procedures to run in the memory. In previous versions, almost all the operations on sql server objects contained I/O process that means it involved disk back and forth during sql server operations such as select, insert, update and delete. With Memory Optimized Engine, these operations will take place in memory instead of SQL Server going back and forth to Disk drives.

Memory Optimized OLTP is completely transactional based, all the processing take place in the form of transactions, however it is being done in memory if you declare those objects to take advantage of memory-optimized feature. There are some prerequisites that need to be considered before you can use this feature. I will be covering all that in a separate post.

2.       SQL Server Data Files in Windows Azure

This features is mainly to move or have your database files on windows Azure Blob storage, Azure Blob storage basically is huge storage system that’s chunked in 1TB of Blobs, you can use this storage to keep your database files, it provides benefits of migrating database from one location to another, replicating database files from different location within the Azure cluster or you can copy it back to on-premises system in order to restore the database using attach/detach method.

3.       Hosting SQL Server in Windows VM

This feature enables you to deploy your existing on-premises database to Windows Azure SQL Server VM, in other words you can migrate your existing on-premises SQL Server to Azure using migration Wizard.

4.       Backup and Restore

If you are familiar with SQL Server 2012 SP1, There is new option introduced in 2012 SP1 was to backup SQL Server Database to URL, however this option can only be utilized using either T-SQL , PowerShell and SMO. But in SQL Server 2014, you can use management studio to accomplish backing and restoring your databases to URL.

Keep in mind when scheduling SQL Server Database backups, the only URL supported in this operation is Windows Azure URL, you can’t use your existing URL’s i.e. Sharepoint file system URL. May be Microsoft next releases of SQL Server 2014 include this feature to use other URLs than Windows Azure.

5.       Redesign of Query Plans and Performance boost

Query Plans use cardinality estimator to provide query execution plans in SQL Server, in SQL Server 2014; estimator is redesigned to improve accuracy as well as quality of query plans which helps to improve the performance of your query. Again this is just an enhancement to provide a quality query plans. Personally, I didn’t see much of a change using it, there might be some queries which can take advantage of this enhancement while others remain the same.

6.       Reduced Latency – Delayed Durability

I found this feature quite beneficial, however it totally depends on organizational structure for example, Finance oriented organization, it is very important to NOT lose any data, while other organizations, it is tolerable to lose some data as long as they can recover most of the data. So in SQL Server 2014, there is a concept introduced called delayed durability, which actually means that client application will not wait from SQL Server that its transactions are written on the disk, transaction on sql server side will be stored in memory and will be written in batches so it will reduce lots of I/O operations hence reduce the latency. But again you have to be very careful making your transactions delayed durable or not.

7.       Always On – Availability Group (AG)

a)      Always on concept was introduced in SQL Server 2012. Purpose of this concept was high availability of the database and isolates some of the heavy read such as reporting from the database. Always on was accomplished using feature called Availability group, basically you can provide replica of your production database as read only database onto another SQL Server instance so that reporting or other heavy read of database can run on secondary replica without causing lots of load on actual production database. In SQL Server 2012, secondary copy of your database were limited to 4 replicas, however in SQL Server 2014, it is increased to 8 secondary replicas.

b)      If primary replica goes down, secondary replica will remain online for read operations in SQL Server 2014.

c)       Failover Clustered instances can share clustered shared volumes as clustered shared disks

8.       Table Partitioning and Indexing

In SQL Server 2014, you can rebuild individual partition of a partitioned table

9.       Columnstore Indexes

This feature is one of my favorite, columnstore indexes were introduced in 2012, however in 2012, you can only create non-clustered columns store indexes, but in SQL Server 2014, you can create Clustered Columnstore indexes and they are updatable. It greatly enhanced the data warehouse performance.

10.   Buffer Pool Extension

Buffer Pool Extension is a great feature of SQL Server 2014, it enables buffer pool to extend buffer pool size between RAM and SSD (Solid State Drive). Just to give you a little background about buffer pool if you already don’t know it, buffer pool is primary memory allocation source of SQL Server. It is a key component in achieving efficiency. It serves two purposes.

1. Access and update database pages

2. Buffer pool to reduce I/O
             This feature helps SQL Server Buffer pool to store larger worksets in the pool, hence make   
             SQL Server 2014 more efficient in its operations.



11.   Resource Governor Enhancement
       Resource Governor helps to control your resources such as CPU, IO etc. In SQL Server 2014 you can assign physical I/O resource of a particular resource pool, depending upon priority and urgency of the operation, this value can be set to manipulate the resources in SQL Server 2014

12.   Incremental Statistics
       This feature will allow you to create or update statistics of a particular partition if your table is partitioned. Before that, you had to update statistics of the whole table

13.   Online Index Operation
       Online index operation progress can be monitored real time as partitionID and PartitionNumber column is added in SQL Server 2014 and you can query the progress in order to get real time status of online index rebuild Job

14.   Database compatibility Level 

      Compatibility 90 (SQL Server 2005) is not valid in SQL Server 2014  
 
 
 

SSIS -How to Convert Excel File To CSV ( Comma Separated values) File in SSIS Package

Scenario:

Our business users create excel file. We need to send this file to vendor but vendor only accepts csv ( Comma separated values) file. We need to convert excel file into csv file by using SSIS Package. 

Solution:

In this post we will be using Script Task to convert our Excel file into Comma separated values file. The SSIS Package can be modified by using For- each loop container in case where you want to read multiple excel files and convert all of them to csv files. 

Step 1: 

Create an excel file if you don't have one. I have created one for this post as shown below

Fig 1: Sample Excel for to convert to csv by SSIS Package

Step 2: 

Create two variables, one that is pointing to Source Excel file and one for destination csv file as shown below.
Fig 2: Create Source and Destination variables for Excel to Csv conversion SSIS Package


Step 3: 

Bring the Script Task to the Control Flow Pane and then choose the above variables in ReadOnlyVariables list.
Fig 3: Map Source and Destination Path variables in Script Task

Click on Edit Script and paste the below script.

#region Namespaces
using System;
using System.Data;
using Microsoft.SqlServer.Dts.Runtime;
using System.Windows.Forms;
using System.Data.OleDb;
using System.IO;
#endregion

namespace ST_a916a8b3a6d640be9f6302fae0a06c8e
{
    /// 
    /// ScriptMain is the entry point class of the script.  Do not change the name, attributes,
    /// or parent of this class.
    /// 
[Microsoft.SqlServer.Dts.Tasks.ScriptTask.SSISScriptTaskEntryPointAttribute] public partial class ScriptMain : Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase {
  public void Main() {
  // TODO: Add your code here
//File DataTable from Execel Source File. I have used Sheet1 in my case, if your sheet name is different then change it.
string ConnString; 
 ConnString = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source=" + Dts.Variables["User::Var_ExcelFilePath"].Value.ToString() +
                ";Extended Properties=\"Excel 12.0;HDR=No;IMEX=1\";";
            var conn = new OleDbConnection(ConnString);
            conn.Open();
            string query = "SELECT * FROM [Sheet1$]";
            var command = new OleDbCommand(query, conn);
            OleDbDataAdapter adap = new OleDbDataAdapter(command);
            var datatable = new DataTable();
            adap.Fill(datatable);


            //Create csv File

 using (var sw = new StreamWriter(Dts.Variables["User::Var_CsvFilePath"].Value.ToString()))
            {
                for (int row = 0; row < datatable.Rows.Count; row++)
                {
                    var strRow = "";

                    for (int col = 0; col < datatable.Columns.Count; col++)
                    {
         strRow += "\"" + datatable.Rows[row][col].ToString() + "\",";
                        
                    }
                    //remove last , from row
                    strRow = strRow.Remove(strRow.Length - 1);
                    //write row to file
                    sw.WriteLine(strRow);
                }
            }
            Dts.TaskResult = (int)ScriptResults.Success;
        }

        #region ScriptResults declaration

        enum ScriptResults
        {
            Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success,
            Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure
        };
        #endregion

    }
}

The code I added is in bold, You can copy this code to your script task and leave rest of your code as it is.

Let's run the Package and check if the file is created successfully.

Fig 4: Converted csv File from Excel by using SSIS Package

DBA Posts - How to add data file to a filegroup?

You can add new data file to an existing filegroup or you can move data files from one filegroup to another.


1- Using T-SQL
 
ALTER DATABASE SQLAge
ADD FILE
(
NAME = SQLAge2,
FILENAME = 'D:\Datafiles\SQLAge\SQLAge_2.ndf',
SIZE = 10MB,
MAXSIZE = 100MB,
FILEGROWTH = 10MB
)
TO FILEGROUP SQLAgeFG1;
GO
 
2- Using Management Studio

                   a)      Right click on the database and go to properties

b)      Click on Files
c)      Click on Add
d)     Provide logical name of the file and click on dropdown manu of filegroups
e)      Click on the new filegroup that you have created






Fig 1 - How to add a data file in a filegroup

Fig 2 - How to add a data file in a filegroup

What is filegroup in SQL Server? When to use multiple filegroups?


In my previous post, I have explained file types of a database, Filegroup is essentially a container that can be used to group database objects and database data files for allocation and administrative purposes.
There could only be ONE default file group of secondary data files of a database, you cannot place one database object in multiple file groups. All the system objects are placed in primary file group, all the secondary data files can be moved from one file group to another file group or you can specify file group of a data file manually while creating it.

It is good idea to create a separate file group when:

1-      You have database objects (tables, views) that are being accessed heavily; SQL Server access data using threads, each thread is accountable for retrieving or updating data on specific pages at specific location on disks, if you have multiple file groups and data is spread across the disk, SQL Server can take advantage of parallel threads, this enhances the performance of your database. For example if you have a join operation that involves various tables in a database, if tables are placed under one filegroup, it will keep SQL Server doing its operation in parallel manner (for the most part), however if tables in join resides in different filegroups and placed at different disks or luns, it will enhance execution against heavily accessed tables residing in different file groups as SQL Server can take advantage of parallel threads.

2-      Your database is performing poor while ready or updating the database; Multiple threads can concurrently work to retrieve data spread in different file groups  

3-      You have a table that contains several years of data however you are using few recent years of data; this is very useful when you are going to partition the table and indexes (partitioning will be explained in another post), It makes it easy to archive data and helps in unnecessary scan of those records which are not really important at the moment.

4-      You have a large database and recovery time is not acceptable using full backup or differential backup; if you find database corruption of objects related to a particular filegroup, you can restore only that file group in order to recover data very quickly.

How to create a filegroup?

1-      Using T-SQL

USE master
GO
ALTER DATABASE SQLAge
ADD FILEGROUP SQLAgeFG1
GO

2-      Using management Studio

a)      Right click on the database that you want to create filegroup in and go to properties

b)      Click on Filegroups

c)      Click on Add

d)     Name filegroup appropriately and click ok


Fig 1 - Adding filegroups to an existing database

Fig 2 - Adding filegroups to an existing database

Fig 3 - Adding filegroups to an existing database