MOSS Search: How to control content to be crawled..
Today morning I was answering a DL question. Question was..."How to I control what Search content to be crawled? There are some paths URLs which I do not want to crawl?.
I was wondering if there is a way to configure MOSS Search to exclude the path and library names in the search result?
YES! You can easily control content to be crawled using following techique.
Create a new page INDEX.HTML and use Index.html page to control what needs to be crawled. Details ….
1. Create a new path with only one page in it (Index.html)
2. Add all the paths URLs you want to crawl to Index.html page. [Do not include paths you do not want to crawl]
3. Use Index.html page URL to define Content Catalog. specify newly created Path where Index file is (Say https://MyServer/Search/Index.html)
-
- Content Source Type (select Web Sites radio button)
- Start Adderess (Type the Index.HTML page URL)
4. In crawler setting use custom settings to control server hop and page depth
-
- Choose Radio button option "Custom- specify page depth and server hops"
- Use Limit Page Depth option and Limit Server Hops options to control content to be crawled
This way you have full control on content to be crawled.