SSIS Hints & Tips: Handling Flat Files

Introduction

After working with flat files (especially text and CSV) for some time in my current role I have decided to come up with this little discussion which I believe may benefit peers who might not yet have come across some of the issues that I have raised in this piece. This piece is just a summary of what I have thought would be useful hints for anyone to consider when they are working with CSV or text files. Everything in this writing is purely from my experience and I am discussing it at SSIS level. I believe most of the actions can be applied across in SSMS when dealing with flat files there.

Flat Files as Destination File

Exporting data to flat file destination is relatively easy and straight forward. There aren’t as many issues experienced here as they are when one imports data from flat files. However, whatever one does here hugely impacts on how easy and straightforward importing data down the line from your flat file that you create at this stage will be. There are a few things to note. When you open your Flat File Destination component, of the options that you will see – listed below is a discussion of some of them. I have found it useful to take the actions which I have mentioned below.

  1. Column names in the first row – If you want the column names to be in the first row on your file make sure that you always check the box with that option. SSIS does not do that for you by default.
  2. Column delimiter – I always prefer to use Tab {t}. The default delimiter is Comma {,}. The problem that I have found with use of comma delimiters is that if the values have inherent commas within them the system does not always get it right in determining where the column ends. As a result you may end up getting some rows with shifted and misplaced values from their original columns owing to the wrong column allocation by the system.
  3. AlwaysCheckForRowDelimiters – this is a Flat File Connection Manager property. The default setting of this property is True. At one time I found myself having to use this after I faced a huge problem with breaking and misplacement of rows in the dataset that I was dealing with. The problem emanated from the values in one of the columns. The offending column had values of varchar datatype which were presented in the form of paragraphs with all sorts of special characters within them, e.g.

This is ++, an example of what – I mean… the characters ;

in the dataset which gave me: nearly 100% ++ headaches – looked like {well}; this piece

OF: example??

You can see from the above italicised dummy value example what I mean. Values such as that make the system to prematurely break the rows. I don’t know why but the somehow painful experience that I had about this led me to the conclusion that I should not leave the system to auto-decide where the row ends. As such, when I changed the property AlwaysCheckForRowDelimiters from True to False, along with the recommendations mentioned in items 1 and 2 above, breaking and misplacement of rows was solved. By breaking I mean -  you will find one row in a table being broken into two or three separate rows in the flat file. This is carried over to the new table where that flat will is loaded.

Text or CSV file?? – In my experience going with the text file is always efficient. Besides, some of the things recommended above only work in text file (I suppose so. I stand to be corrected on this). An example of this is column delimiters. Item 2 above recommends use of Tab {t} column delimiter whereas in CSV, as the name suggests, the delimiters are commas.

Flat Files as Source File

In my experience, many headaches of working with flat files are seen at importing data from flat files. A few examples of the headaches that I’m talking about are things such as,

  1. Datatypes and datatype length, if using string
  2. Shifting and misplacement of column values
  3. Broken rows, with some pseudo-rows appearing in your import file
  4. Double quotation marks in your values

Below I will address some of the common things which I have personally experienced and hope will be useful to other people. When you open your Flat File Source component, of the options that you will see – listed below is a discussion of some of them. I have found it useful to take the actions which I have mentioned below.

  1. Retain null values from the source as null values in the data flow – this option comes unchecked by default. From the time I noticed the importance of putting a check mark in it, I always make sure that I check it. It was after some of my rows in the destination table were coming up with shifted and misplaced column values. By shifted and misplaced column values I mean certain values appearing under columns where you do not expect them, by so doing showing that purely the value has been moved from its original column to another column where it does not belong.
  2. Text qualifier – the default entry here is <none>. I have found that it is always handy to insert double quotes here (“). This will eliminate any double quotes which the system may have included at the time when the flat file was created. This happens when the values in question have commas as part of the characters in them.
  3. Column delimiter – this solely depends on the column delimiter which was specified at the time when the flat file was created. The system default is Comma {,}. Please note that if the delimiter specified here is different from the one in your flat file the system will throw up an error with a message like “An error occurred while skipping data rows”.
  4. Column names in the first data row – if you want the first row to be column names put a check mark on this option.

Datatypes and Datatypes Length

By default when you import a flat file your datatypes for all the columns come up as varchar (50) in SSIS. More often than not if you leave this default setup your package will fail when you run it. This is because some of the values in some of your columns will be more than 50 characters, the default length. The resulting error will be a truncation error. I have found two ways of dealing with this.

  1. Advanced – This is an option found on the Flat File Source Editor. Once this option is selected on your Flat File Source Editor you will be presented with a list of columns from your flat file. To determine your datatypes and length there are two possible things that you can do at this stage.
    1. Go column by column – going column by column you can manually input your desired datatypes and lengths on the Flat File Source Editor through the Advanced option.
    2. Suggest types – this is another option under Advanced selection. What this option does is suggest datatypes and lengths for you based on the sample data amount that you mention in the pop-up dialog box. I have noticed that while this is a handy functionality, the problem with it is that if some of the values from the non-sampled data have lengths bigger than what the system would have suggested the package will fail with a truncation error.
  2. View code – this is viewing of XML code. If for example you want all your columns to be of 255 characters length in your landing staging table
    1. Go to your package name, right click on it and select the option View code from the list presented to you. XML code will then come up.
    2. Hit Ctrl + F to get a “Find and Replace:” window. On “Find What” type in DTS:MaximumWidth="50" and on “Replace with:” type in DTS:MaximumWidth="255". Make sure that under “Look in” the selection is Current Document.
    3. Click “Replace All” and all your default column lengths of 50 characters will be changed to 255 characters.
    4. Once done, save the changes. Close the XML code page. Go to your package GUI designer. You will find that the Flat File Source component at this point will be highlighted with a yellow warning triangle. This is because the metadata definition has changed. Double click the Flat File Source component and then click on Ok. The warning will disappear and you will be set to pull and load your data to your staging database with all columns being varchar (255). If you need to change any columns to specific data types you can either use Data Conversion Component or Derived Column component for that purpose, OR you can use both components depending on the data types that you will converting to.

Dynamic Flat File Name and Date

Please see this blog http://www.bidn.com/blogs/mikedavis/ssis/153/using-expression-ssis-to-save-a-file-with-file-name-and-date

There is so much to flat files to be discussed in one piece