Working with Excel Files with the Script Task
Integration Services provides the Excel connection manager, Excel source, and Excel destination for working with data stored in spreadsheets in the Microsoft Excel file format. The techniques described in this topic use the Script task to obtain information about available Excel databases (workbook files) and tables (worksheets and named ranges). These samples can easily be modified to work with any of the other file-based data sources supported by the Microsoft Jet OLE DB Provider.
Configuring a Package to Test the Samples
Example1: Check Whether an Excel File Exists
Example 2: Check Whether an Excel Table Exists
Example 3: Get a List of Excel Files in a Folder
Example 4: Get a List of Tables in an Excel File
Displaying the Results of the Samples
Note
If you want to create a task that you can more easily reuse across multiple packages, consider using the code in this Script task sample as the starting point for a custom task. For more information, see Developing a Custom Task.
Configuring a Package to Test the Samples
You can configure a single package to test all the samples in this topic. The samples use many of the same package variables and the same .NET Framework classes.
To configure a package for use with the examples in this topic
Create a new Integration Services project in Business Intelligence Development Studio and open the default package for editing.
Variables. Open the Variables window and define the following variables:
ExcelFile, of type String. Enter the complete path and filename to an existing Excel workbook.
ExcelTable, of type String. Enter the name of an existing worksheet or named range in the workbook named in the value of the ExcelFile variable. This value is case-sensitive.
ExcelFileExists, of type Boolean.
ExcelTableExists, of type Boolean.
ExcelFolder, of type String. Enter the complete path of a folder that contains at least one Excel workbook.
ExcelFiles, of type Object.
ExcelTables, of type Object.
Imports statements. Most of the code samples require you to import one or both of the following .NET Framework namespaces at the top of your script file:
System.IO, for file system operations.
System.Data.OleDb, to open Excel files as data sources.
References. The code samples that read schema information from Excel files require an additional reference in the script project to the System.Xml namespace.
Set the default scripting language for the Script component by using the Scripting language option on the General page of the Options dialog box. For more information, see General Page.
Example 1 Description: Check Whether an Excel File Exists
This example determines whether the Excel workbook file specified in the ExcelFile variable exists, and then sets the Boolean value of the ExcelFileExists variable to the result. You can use this Boolean value for branching in the workflow of the package.
To configure this Script Task example
Add a new Script task to the package and change its name to ExcelFileExists.
In the Script Task Editor, on the Script tab, click ReadOnlyVariables and enter the property value using one of the following methods:
Type ExcelFile.
-or-
Click the ellipsis (…) button next to the property field, and in the Select variables dialog box, select the ExcelFile variable.
Click ReadWriteVariables and enter the property value using one of the following methods:
Type ExcelFileExists.
-or-
Click the ellipsis (…) button next to the property field, and in the Select variables dialog box, select the ExcelFileExists variable.
Click Edit Script to open the script editor.
Add an Imports statement for the System.IO namespace at the top of the script file.
Add the following code.
Example 1 Code
Public Class ScriptMain
Public Sub Main()
Dim fileToTest As String
fileToTest = Dts.Variables("ExcelFile").Value.ToString
If File.Exists(fileToTest) Then
Dts.Variables("ExcelFileExists").Value = True
Else
Dts.Variables("ExcelFileExists").Value = False
End If
Dts.TaskResult = ScriptResults.Success
End Sub
End Class
public class ScriptMain
{
public void Main()
{
string fileToTest;
fileToTest = Dts.Variables["ExcelFile"].Value.ToString();
if (File.Exists(fileToTest))
{
Dts.Variables["ExcelFileExists"].Value = true;
}
else
{
Dts.Variables["ExcelFileExists"].Value = false;
}
Dts.TaskResult = (int)ScriptResults.Success;
}
}
Example 2 Description: Check Whether an Excel Table Exists
This example determines whether the Excel worksheet or named range specified in the ExcelTable variable exists in the Excel workbook file specified in the ExcelFile variable, and then sets the Boolean value of the ExcelTableExists variable to the result. You can use this Boolean value for branching in the workflow of the package.
To configure this Script Task example
Add a new Script task to the package and change its name to ExcelTableExists.
In the Script Task Editor, on the Script tab, click ReadOnlyVariables and enter the property value using one of the following methods:
Type ExcelTable and ExcelFile separated by commas.
-or-
Click the ellipsis (…) button next to the property field, and in the Select variables dialog box, select the ExcelTable and ExcelFile variables.
Click ReadWriteVariables and enter the property value using one of the following methods:
Type ExcelTableExists.
-or-
Click the ellipsis (…) button next to the property field, and in the Select variables dialog box, select the ExcelTableExists variable.
Click Edit Script to open the script editor.
Add a reference to the System.Xml assembly in the script project.
Add Imports statements for the System.IO and System.Data.OleDb namespaces at the top of the script file.
Add the following code.
Example 2 Code
Public Class ScriptMain
Public Sub Main()
Dim fileToTest As String
Dim tableToTest As String
Dim connectionString As String
Dim excelConnection As OleDbConnection
Dim excelTables As DataTable
Dim excelTable As DataRow
Dim currentTable As String
fileToTest = Dts.Variables("ExcelFile").Value.ToString
tableToTest = Dts.Variables("ExcelTable").Value.ToString
Dts.Variables("ExcelTableExists").Value = False
If File.Exists(fileToTest) Then
connectionString = "Provider=Microsoft.Jet.OLEDB.4.0;" & _
"Data Source=" & fileToTest & _
";Extended Properties=Excel 8.0"
excelConnection = New OleDbConnection(connectionString)
excelConnection.Open()
excelTables = excelConnection.GetSchema("Tables")
For Each excelTable In excelTables.Rows
currentTable = excelTable.Item("TABLE_NAME").ToString
If currentTable = tableToTest Then
Dts.Variables("ExcelTableExists").Value = True
End If
Next
End If
Dts.TaskResult = ScriptResults.Success
End Sub
End Class
public class ScriptMain
{
public void Main()
{
string fileToTest;
string tableToTest;
string connectionString;
OleDbConnection excelConnection;
DataTable excelTables;
string currentTable;
fileToTest = Dts.Variables["ExcelFile"].Value.ToString();
tableToTest = Dts.Variables["ExcelTable"].Value.ToString();
Dts.Variables["ExcelTableExists"].Value = false;
if (File.Exists(fileToTest))
{
connectionString = "Provider=Microsoft.Jet.OLEDB.4.0;" +
"Data Source=" + fileToTest + ";Extended Properties=Excel 8.0";
excelConnection = new OleDbConnection(connectionString);
excelConnection.Open();
excelTables = excelConnection.GetSchema("Tables");
foreach (DataRow excelTable in excelTables.Rows)
{
currentTable = excelTable["TABLE_NAME"].ToString();
if (currentTable == tableToTest)
{
Dts.Variables["ExcelTableExists"].Value = true;
}
}
}
Dts.TaskResult = (int)ScriptResults.Success;
}
}
Example 3 Description: Get a List of Excel Files in a Folder
This example fills an array with the list of Excel files found in the folder specified in the value of the ExcelFolder variable, and then copies the array into the ExcelFiles variable. You can use the Foreach from Variable enumerator to iterate over the files in the array.
To configure this Script Task example
Add a new Script task to the package and change its name to GetExcelFiles.
Open the Script Task Editor, on the Script tab, click ReadOnlyVariables and enter the property value using one of the following methods:
Type ExcelFolder
-or-
Click the ellipsis (…) button next to the property field, and in the Select variables dialog box, select the ExcelFolder variable.
Click ReadWriteVariables and enter the property value using one of the following methods:
Type ExcelFiles.
-or-
Click the ellipsis (…) button next to the property field, and in the Select variables dialog box, select the ExcelFiles variable.
Click Edit Script to open the script editor.
Add an Imports statement for the System.IO namespace at the top of the script file.
Add the following code.
Example 3 Code
Public Class ScriptMain
Public Sub Main()
Const FILE_PATTERN As String = "*.xls"
Dim excelFolder As String
Dim excelFiles As String()
excelFolder = Dts.Variables("ExcelFolder").Value.ToString
excelFiles = Directory.GetFiles(excelFolder, FILE_PATTERN)
Dts.Variables("ExcelFiles").Value = excelFiles
Dts.TaskResult = ScriptResults.Success
End Sub
End Class
public class ScriptMain
{
public void Main()
{
const string FILE_PATTERN = "*.xls";
string excelFolder;
string[] excelFiles;
excelFolder = Dts.Variables["ExcelFolder"].Value.ToString();
excelFiles = Directory.GetFiles(excelFolder, FILE_PATTERN);
Dts.Variables["ExcelFiles"].Value = excelFiles;
Dts.TaskResult = (int)ScriptResults.Success;
}
}
Alternate Solution
Instead of using a Script task to gather a list of Excel files into an array, you can also use the ForEach File enumerator to iterate over all the Excel files in a folder. For more information, see How to: Loop through Excel Files and Tables by Using a Foreach Loop Container.
Example 4 Description: Get a List of Tables in an Excel File
This example fills an array with the list of worksheets and named ranges found in the Excel workbook file specified by the value of the ExcelFile variable, and then copies the array into the ExcelTables variable. You can use the Foreach from Variable Enumerator to iterate over the tables in the array.
Note
The list of tables in an Excel workbook includes both worksheets (which have the $ suffix) and named ranges. If you have to filter the list for only worksheets or only named ranges, you may have to add additional code for this purpose.
To configure this Script Task example
Add a new Script task to the package and change its name to GetExcelTables.
Open the Script Task Editor, on the Script tab, click ReadOnlyVariables and enter the property value using one of the following methods:
Type ExcelFile.
-or-
Click the ellipsis (…) button next to the property field, and in the Select variables dialog box, select the ExcelFile variable.
Click ReadWriteVariables and enter the property value using one of the following methods:
Type ExcelTables.
-or-
Click the ellipsis (…) button next to the property field, and in the Select variables dialog box, select the ExcelTablesvariable.
Click Edit Script to open the script editor.
Add a reference to the System.Xml namespace in the script project.
Add an Imports statement for the System.Data.OleDb namespace at the top of the script file.
Add the following code.
Example 4 Code
Public Class ScriptMain
Public Sub Main()
Dim excelFile As String
Dim connectionString As String
Dim excelConnection As OleDbConnection
Dim tablesInFile As DataTable
Dim tableCount As Integer = 0
Dim tableInFile As DataRow
Dim currentTable As String
Dim tableIndex As Integer = 0
Dim excelTables As String()
excelFile = Dts.Variables("ExcelFile").Value.ToString
connectionString = "Provider=Microsoft.Jet.OLEDB.4.0;" & _
"Data Source=" & excelFile & _
";Extended Properties=Excel 8.0"
excelConnection = New OleDbConnection(connectionString)
excelConnection.Open()
tablesInFile = excelConnection.GetSchema("Tables")
tableCount = tablesInFile.Rows.Count
ReDim excelTables(tableCount - 1)
For Each tableInFile In tablesInFile.Rows
currentTable = tableInFile.Item("TABLE_NAME").ToString
excelTables(tableIndex) = currentTable
tableIndex += 1
Next
Dts.Variables("ExcelTables").Value = excelTables
Dts.TaskResult = ScriptResults.Success
End Sub
End Class
public class ScriptMain
{
public void Main()
{
string excelFile;
string connectionString;
OleDbConnection excelConnection;
DataTable tablesInFile;
int tableCount = 0;
string currentTable;
int tableIndex = 0;
string[] excelTables = new string[5];
excelFile = Dts.Variables["ExcelFile"].Value.ToString();
connectionString = "Provider=Microsoft.Jet.OLEDB.4.0;" +
"Data Source=" + excelFile + ";Extended Properties=Excel 8.0";
excelConnection = new OleDbConnection(connectionString);
excelConnection.Open();
tablesInFile = excelConnection.GetSchema("Tables");
tableCount = tablesInFile.Rows.Count;
foreach (DataRow tableInFile in tablesInFile.Rows)
{
currentTable = tableInFile["TABLE_NAME"].ToString();
excelTables[tableIndex] = currentTable;
tableIndex += 1;
}
Dts.Variables["ExcelTables"].Value = excelTables;
Dts.TaskResult = (int)ScriptResults.Success;
}
}
Alternate Solution
Instead of using a Script task to gather a list of Excel tables into an array, you can also use the ForEach ADO.NET Schema Rowset Enumerator to iterate over all the tables (that is, worksheets and named ranges) in an Excel workbook file. For more information, see How to: Loop through Excel Files and Tables by Using a Foreach Loop Container.
Displaying the Results of the Samples
If you have configured each of the examples in this topic in the same package, you can connect all the Script tasks to an additional Script task that displays the output of all the examples.
To configure a Script task to display the output of the examples in this topic
Add a new Script task to the package and change its name to DisplayResults.
Connect each of the four example Script tasks to one another, so that each task runs after the preceding task completes successfully, and connect the fourth example task to the DisplayResults task.
Open the DisplayResults task in the Script Task Editor.
On the Script tab, click ReadOnlyVariables and use one of the following methods to add all seven variables listed in Configuring a Package to Test the Samples:
Type the name of each variable separated by commas.
-or-
Click the ellipsis (…) button next to the property field, and in the Select variables dialog box, selecting the variables.
Click Edit Script to open the script editor.
Add Imports statements for the Microsoft.VisualBasic and System.Windows.Forms namespaces at the top of the script file.
Add the following code.
Run the package and examine the results displayed in a message box.
Code to Display the Results
Public Class ScriptMain
Public Sub Main()
Const EOL As String = ControlChars.CrLf
Dim results As String
Dim filesInFolder As String()
Dim fileInFolder As String
Dim tablesInFile As String()
Dim tableInFile As String
results = _
"Final values of variables:" & EOL & _
"ExcelFile: " & Dts.Variables("ExcelFile").Value.ToString & EOL & _
"ExcelFileExists: " & Dts.Variables("ExcelFileExists").Value.ToString & EOL & _
"ExcelTable: " & Dts.Variables("ExcelTable").Value.ToString & EOL & _
"ExcelTableExists: " & Dts.Variables("ExcelTableExists").Value.ToString & EOL & _
"ExcelFolder: " & Dts.Variables("ExcelFolder").Value.ToString & EOL & _
EOL
results &= "Excel files in folder: " & EOL
filesInFolder = DirectCast(Dts.Variables("ExcelFiles").Value, String())
For Each fileInFolder In filesInFolder
results &= " " & fileInFolder & EOL
Next
results &= EOL
results &= "Excel tables in file: " & EOL
tablesInFile = DirectCast(Dts.Variables("ExcelTables").Value, String())
For Each tableInFile In tablesInFile
results &= " " & tableInFile & EOL
Next
MessageBox.Show(results, "Results", MessageBoxButtons.OK, MessageBoxIcon.Information)
Dts.TaskResult = ScriptResults.Success
End Sub
End Class
public class ScriptMain
{
public void Main()
{
const string EOL = "\r";
string results;
string[] filesInFolder;
//string fileInFolder;
string[] tablesInFile;
//string tableInFile;
results = "Final values of variables:" + EOL + "ExcelFile: " + Dts.Variables["ExcelFile"].Value.ToString() + EOL + "ExcelFileExists: " + Dts.Variables["ExcelFileExists"].Value.ToString() + EOL + "ExcelTable: " + Dts.Variables["ExcelTable"].Value.ToString() + EOL + "ExcelTableExists: " + Dts.Variables["ExcelTableExists"].Value.ToString() + EOL + "ExcelFolder: " + Dts.Variables["ExcelFolder"].Value.ToString() + EOL + EOL;
results += "Excel files in folder: " + EOL;
filesInFolder = (string[])(Dts.Variables["ExcelFiles"].Value);
foreach (string fileInFolder in filesInFolder)
{
results += " " + fileInFolder + EOL;
}
results += EOL;
results += "Excel tables in file: " + EOL;
tablesInFile = (string[])(Dts.Variables["ExcelTables"].Value);
foreach (string tableInFile in tablesInFile)
{
results += " " + tableInFile + EOL;
}
MessageBox.Show(results, "Results", MessageBoxButtons.OK, MessageBoxIcon.Information);
Dts.TaskResult = (int)ScriptResults.Success;
}
}
|