SharePoint 2013: What happens when you index an XML file?

I have seen this question several times. How does SharePoint 2013 handle XML files out of the box? The simplest way to find out is to just test it!

I decided to test using the standard books.xml example file found on MSDN. This is a single file with multiple books in it. I split this into one file per book, but I kept the surrounding catalog tag.

Test file 

 

Create a folder with test files 

I created 4 separate XML files in a regular folder. I used the ID of the book as the filename.

Share the folder

This is my favourite technique for testing something quickly. Dump some files in a folder, share with Everyone (to avoid boring security issues) and start crawl.

 

Add the content source and crawl.

In Search Administration, Content Sources I create a File Share called xmlbooks. I add the path to my content source start address. Since my test server is called sp2013 the path is .

 

 

 Check the crawl log

 Next I check the crawl log. I always do this when I crawl something new. After all, I don't know what to expect. To my surprise I got 5 success, 0 errors.

At this point I'm curious about what search actually will return. For this kind of testing I always prefer to use the REST API directly. The SharePoint 2013 Search Query Tool from Nadeem Ishqair is my weapon of choice. I do a simple search for xmlbooks, and I get my 4 documents and their folder in return. That's good. They are actually in the index at least.

What parts of an XML file is searchable out of the box?

Finally we need to answer the question, what is searchable? It turns out it works exactly like I would expect.

  • XML tags and attributes are not searchable
    I search for author and catalog and get 0 hits.
  • XML data is searchable
    I can search for the names of authors, descriptions of books.
  • Teasers in the default search interface show an extract of all the XML data.

 

Mapping XML to managed properties

Usually you want to make a more intelligent search application than this. For example you want to use a list of XML files with orders, people, books or whatever as input data for your search system. You want to have refiners on price, date, author and genre for your books. And you certainly don't want users to open an XML file when you click on the link in the search results.

What you need is a way to map XML properties to your managed properties. You can do this either via a BCS Connector. Check out Nadeem's XML Connector for SharePoint 2010 or the SharePoint 2013 version on Anders Fagerhaug's blog.

Note: If you don't want to create a BCS Connector, you could always use the Content Enrichment Web Service in SharePoint 2013 and map XML properties to managed properties there. Maybe I'll make a post on this some time in the future.

Comments

  • Anonymous
    February 04, 2015
    Hi,
    Hope you are doing well.
    We want to index XML files using fileshare since we are having problem with XML connector.
    We want to map XML properties to Managed properties in CEWS. We want to know how we can achieve that in CEWS. We know that only managed properties can be given as Input & Output to CEWS.
    Your guidance will help. Thanks.

    Thanks,
    Sumanth
  • Anonymous
    July 21, 2015
    I have the same question. we are crawling some XML with file share content, we don't want to create BCS, but using CEWS instead. A question is that we usually loop through the items.properties to get item values and then map to some managed properties, but how to get the whole XML with class Microsoft.Office.Server.Search.ContentProcessingEnrichment.item? we want to get the XML and parse it and map to managed properties. Could you post the answer? Thanks!

    Jennifer