Getting those XML files into your brand spanking new relational database system
Up until now, XML documents have generally languished as mere files in the lowly file-system, our relational database systems haven’t seen them as important enough to warrant inclusion, up until SQL Server 2005 that is. In SQL Server 2005, XML documents are now first class (data type) citizens; no more painful shredding required to hammer those triangular trees into oblong tables.
With the new XML data type in SQL Server 2005, the rush is now on to get our XML documents out of the file-system and into the relational database, where they can, at last, feel validated (XML Schema validated that is!).
Let’s take a look at how we can get those XML documents sitting in the filesystem into SQL Server 2005.
A BULKed up OPENROWSET provider We have extended the OPENROWSET provider with the ‘BULK’ option; this allows the provider to read data directly from a file. We then added three optional arguments (SINGLE_CLOB, SINGLE_NCLOB, SINGLE_BLOB) for the BULK option, these arguments enable the contents of a file to be returned as a single-row, single-column row-set of, respectively, VARCHAR(max), NVARCHAR(max) or VARBINARY(max).
Example: The SINGLE_CLOB (non-Unicode) and SINGLE_NCLOB (Unicode) arguments enable us to read an XML file and insert it into an XML data type column.
[UPDATE 7/19: John Gallardo, rightly points out... "For importing XML data using OPENROWSET we recommend people use SINGLE_BLOB rather than SINGLE_CLOB or SINGLE_NCLOB.". I have updated the usage below. Thanks John!]
CREATE TABLE xml_documents( x XML );
INSERT INTO xml_documents ( x )
SELECT * FROM OPENROWSET(BULK N'myXmlFile.xml', SINGLE_BLOB) AS x;
GO
A parameterization problem That was easy, but there is a problem. I have a lot of .xml files on my machine and they are all called different names. Unfortunately OPENROWSET does not allow me to parameterize the filename parameter (I have still yet to find out why), i.e. this fails:
DECLARE @filename NVARCHAR(max);
SET @filename = N'myOtherXmlFile.xml';
INSERT INTO xml_documents ( x )
SELECT * FROM OPENROWSET(BULK @filename, SINGLE_BLOB) AS x;
GO
Msg 102, Level 15, State 1, Line 5
Incorrect syntax near '@filename'.
So unless all my file names are the same, which of course they are not, I am left with building statements at runtime using dynamic SQL -- slow, horrible, and often vulnerable to a sql injection attack -- which we want to avoid.
DECLARE @filename NVARCHAR(max);
SET @filename = N'myOtherXmlFile.xml';
DECLARE @SQLString NVARCHAR(max);
SET @SQLString = N'INSERT INTO xml_documents ( x ) ' +
'SELECT * FROM OPENROWSET(BULK ''' +
@filename + ''', SINGLE_BLOB) AS x';
EXEC sp_executesql @SQLString;
GO
SQL/CLR integration to the rescue Here is a great example of how a quick, small and neat use of the CLR integration in SQL Server 2005 can save you from a T/SQL coding mess.
I create a function in C#, let’s say LoadFile( SqlString filename ), that does allow me to parameterize the filename, and then I allow access to that function from T/SQL. Here is how:
Save this file as ‘LoadFile.cs’
using System.IO;
public partial class UserDefinedFunctions
{
[Microsoft.SqlServer.Server.SqlFunction]
public static byte[] LoadFile( string filename )
{
using (BinaryReader binaryReader = new BinaryReader((Stream)File.OpenRead(filename)))
{
byte[] bytes = new byte[binaryReader.BaseStream.Length];
binaryReader.Read(bytes, 0, bytes.Length);
return bytes;
}
}
};
[Update 7/21: This function passes the contents of the file back as binary bytes (original post was returning a string). We return binary bytes instead of reading as text, so we do not lose the XML document encoding]
Compile the C# file (just using the standard C# compiler sitting in your Windows\Microsoft.Net directory):
C:\WINDOWS\Microsoft.NET\Framework\v2.0.50215\csc /target:library /out:"c:\temp\SqlClrObjects.dll" "LoadFile.cs"
Import the assembly library into SQL Server and expose the C# function as a TSQL function.
/* First enable SQL CLR functionality (which is off by default) */
sp_configure "clr enabled", 1
GO
RECONFIGURE
GO
/* Create the assembly. Do a defensive drop of assembly and function (that maybe bound to the assembly) first */
IF EXISTS (SELECT * FROM sys.objects WHERE name = N'LoadFile')
DROP FUNCTION LoadFile;
IF EXISTS (SELECT * FROM sys.assemblies WHERE name = N'SqlClrAssembly')
DROP ASSEMBLY SqlClrAssembly;
CREATE ASSEMBLY SqlClrAssembly
FROM N'c:\temp\SqlClrObjects.dll'
WITH PERMISSION_SET = EXTERNAL_ACCESS;
GO
// Note the PERMISSION_SET = EXTERNAL_ACCESS is required because the C# function we have written accesses the file-system
/* Expose the C# function in the Assembly as a T/SQL function */
CREATE FUNCTION LoadFile ( @filename NVARCHAR(max) )
RETURNS VARBINARY(max)
AS EXTERNAL NAME SqlClrAssembly.UserDefinedFunctions.LoadFile;
GO
/* Test the function */
SELECT CONVERT( XML, dbo.LoadFile( 'c:\myOtherXmlFile.xml' ) ) AS xml
GO
xml
-----------------------------------------
<please>Don't forget me</please>
(1 row(s) affected)
So Easy Now, we can use our new parameterized LoadFile function to insert an XML file (where the filename is specified as a variable) into the XML data type column
DECLARE @filename NVARCHAR(max);
SET @filename = N'myOtherXmlFile.xml';
INSERT INTO xml_documents ( x )
SELECT dbo.LoadFile ( @filename );
GO
This is far neater, safer (and I bet faster) than the horrible dynamic SQL (sp_executesql) above.
If Only If I had a table containing the names of all the .XML files on my file-system, i.e. like this:
CREATE TABLE files ( filename NVARCHAR(max) );
INSERT INTO files VALUES( N'c:\myXmlFile.xml' );
INSERT INTO files VALUES( N'c:\myOtherXmlFile.xml' );
GO
I could write a single T/SQL statement to insert the contents of all my .xml files as rows in the xml_documents table
INSERT INTO xml_documents ( x )
SELECT dbo.LoadFile ( filename ) FROM files;
GO
(2 row(s) affected)
SELECT * FROM xml_documents;
GO
<please>Put me in the RDBMS</please>
<please>Don't forget me</please>
But that is for another post.
In this post we have seen some of the enhancements to the OPENROWSET provider for loading files, we have seen a glimpse of the XML data type, and we have seen what may be the shortest SQL/CLR function you will ever see. If these help get you started, let me know. If I have left you with more questions than answers, then fire the questions my way and I will get back to you.
In my next post we will look at how we can create a SQL/CLR Table Valued Function (TVF) that returns a table containing a directory listing of all the .xml files on my machine.
Comments
Anonymous
July 21, 2005
Stuart Padley, a Test Lead here in the SQL Server Engine team, posts about loading XML documents using...Anonymous
July 21, 2005
In my last post, I said we would look at how we could return a table that contains a directory listing...Anonymous
October 06, 2005
Very well described, extremely simple to understand.
Looking forward for more articles :)Anonymous
December 03, 2005
The comment has been removedAnonymous
January 25, 2007
pecan Zermatt http://www.autosjobs.info/atelier_Spain/grist_Comunidad%20de%20Madrid/pecan_Madrid_1.htmlAnonymous
February 07, 2007
Very much clear and useful article. Thanks for sharing your idea.Anonymous
February 14, 2007
Awesome! This is precisely what I am trying to do, but I got stuck when I discovered that BULK doesn't accept a parameter. I've been wanting to use CLR assemblies in SQL for a while, but haven't had time to plunge in. This was a perfect first attempt. NOTE: CREATE ASSEMBLY didn't work for me until I signed the assembly, even though I was using a login with the sysadmin role.Anonymous
March 13, 2007
The comment has been removedAnonymous
March 29, 2007
pecan Zermatt http://www.autosjobs.info/atelier_Spain/grist_Comunidad%20de%20Madrid/pecan_Madrid_1.html I do not agree. Go to http://www.cyberworkz.info/crouton_United%20Kingdom/infiltration_Scotland/unprotected_Ayr_1.htmlAnonymous
December 06, 2007
I have a table that already contains a column with a separate GUIDnumber for each row. I have corresponding xml files where each file has the name GUID number.xml. I would like to load these XML files into this existing table in a new column called XMLRepr. Any suggestions as to how to use what you have here. Can I do an update query using dbo.LoadFile? Thanks! LarryAnonymous
November 26, 2008
Brilliant, just what I was looking for!Anonymous
June 16, 2009
PingBack from http://fixmycrediteasily.info/story.php?id=8889Anonymous
June 18, 2009
PingBack from http://gardendecordesign.info/story.php?id=3035