Robots & Sitemaps

February 20, 2013 11:41

This file is used to inform the search engine crawler what folder and files of our site you want to crawl or don’t want to. Robot file is normally located on the root of the website.

A typical robots file would contain following to allow crawler to view entire site.
User-agent: *

Then there are Allow and Disallow tags to differentiate what to crawl and what not. Due to this there is one issue. As the file is publically viewable, anyone can see what sections/folders of the site you don’t want robots to see.  The best part is don’t use robots to hide information.

This file typically contains list of URLs you want to crawl and indexed by the search engines. It’s a quick way to locate every page of your site.
Following it the format for sitemap.xml

<?xml version=”1.0″ encoding=”UTF-8″?>

There are additional tags which you can associate with the links which are lastmod, changefreq, priority. There are many tools used to generate the sitemaps. I normally use an online link to generate the sitemaps.

Also another important things to do other than setting these two files is to configure your google webmaster tools and submitting the sitemap to search engine.

