SSIS Performance tips

The article will guide you how to resolve  performance issues which most SSIS developers face, offering some tips which will increase their ETL performance.

  1. In lookups, using cache connection manager we can do a lookup against non-OLEDB sources unlike lookup's restriction of OLEDB source. Cache connection manager reduce memory usage and database load as it can exist at project level and package level to be shared by multiple users and packages.

2.  To improve performance, use "IGNORE FAILURE" in lookup instead of using "Redirect rows to no match".  However, note ignore failure adds these non matching rows to destination with NULL values.

  1. SCD is good ONLY for small dimensions having not more than 10,000 rows. It degrades the performance as it uses OLEDB command transformation which works at row level for updating the SCD rows.

  2. For loading millions of records , apply CDC to get changed rows. If CDC is not feasible then firstly apply hash values to columns to identify the change and update to be performed and then use set based operations i.e. drag an execute sql task . Use a MERGE query inside the execute sql task to apply insert as well as update to the rows.

  3. For loading fact, create a new partition for latest data . Load it , apply compression , add columnstore index and then switch this partition into fact table.

  4. Variables when used to store object variables the remember that you could run out of memory. Consider storing large objects into database or SSIS raw files

  5. While logging if expected number of log entries is huge then use SQL server  or SQL server profiler to log them. SQL profiler is good to monitor the SSIS performance ,reliability and is easily aligned with OS. However, at the same time,  log it into files when your requirement is of portability i.e to move the files over email independent of OS and database.

  6. For source adapters  optimization and performance, consider the capabilities of different PROVIDERs in SSIS.

  7. To improve the performance of data flow, you can configure the engine thread property of data flow. Engine thread govern or create source threads and workers threads to perform work. while configuring do take into consideration the other task running in parallel.

  8. Non blocking transformations re-consume same buffers and hence no performance impact. While partially and blocking transformations have per-requisites of having sufficient memory.

  9. Lookups using full cache or partial cache require sufficient memory. If it is not available then lookup does not swap rows to disk instead lookup FAILS . Hence, do reserve sufficient memory for lookups. Do not add many heavy lookups in one data flow. Split these lookups in different transformations and use temporary table to transfer data.

  10. SSIS fuzzy transformations are useful for huge amount of data and you do not have reference data source. DQS is useful for small amount of data and when reference data (knowledge base) is available.

  11. When using fuzzy lookup, to increase performance and reduce their load, firstly add a lookup to categorize match and non-match rows then direct the non-match row only the fuzzy lookup transformation.

  12. In case of fuzzy grouping, firstly add an aggregate transformation and then add fuzzy grouping transformation to increase performance.

  13. Fuzzy transformations work only with STRING data while DQS matches integer, numeric /decimal,string and dates.

  14. SSAS Data mining transformation can be used when SSAS is installed in Multi-dimensional mode and not TABULAR mode.

  15. Use custom components only when you cannot perform the task with in-built transformations and scripts task/components. Remember to develop custom components it is necessary to understand the SSIS engine workflow.