wordpressA short guide on what the WordPress standard Robots.txt file should contain, and what the lines of codes mean. There are many questions about what the robots.txt does, but not to worry, it’s not a big issue. Essentially, what the file does is it allows and/or disallows search engines bots/spiders to crawl your website. Depending on your objectives, you can block the spiders from crawling specific folders or files.

As you can see on the 2nd line, the folder /cgi-bin is prevented from being crawled, since there isn’t any useful content that you want to be indexed.

User-agent: *
Disallow: /cgi-bin
Disallow: /wp-admin
Disallow: /wp-includes
Disallow: /wp-content/plugins
Disallow: /wp-content/cache
Disallow: /wp-content/themes
Disallow: /trackback
Disallow: /feed
Disallow: /comments
Disallow: /category/*/*
Disallow: */trackback
Disallow: */feed
Disallow: */comments
Disallow: /*?*
Disallow: /*?
Allow: /wp-content/uploads

# Google Image
User-agent: Googlebot-Image
Disallow:
Allow: /*

# Google AdSense
User-agent: Mediapartners-Google*
Disallow:
Allow: /*

# digg mirror
User-agent: duggmirror
Disallow: /

Sitemap: http://www.example.com/sitemap.xml

Within the same folder /wp-content/ , you can allow certain folder to be crawed and other folders to be restricted.

Usually pictures are all saved in the uploads folder, so it will serve you well to have the images indexed, and get more traffic.

Allow: /wp-content/uploads

Preventing busybodies from finding out what plugins you are using. It is also a good idea to have a blank index.htm file so that individuals cannot access your folder directly.

Disallow: /wp-content/plugins

Note that the domain name need not be included in the robot.txt file. Just the directory path and file names will do, so you can duplicate similar robot.txt for all your blogs (and edit/customize accordingly to your respective blog requirements).

For those using Google Webmaster Tools, if pages from your site are shown as being blocked by your robots.txt file, you can use the testing tool to test the URLs that are blocked and see exactly what’s causing the crawling error. You can then fine-tune and do the necessary corrections.

FOLLOW

lincoln

Lincoln is a fan of Apple products, loves red wine and traveling. He blogs on internet marketing, social media, wordpress tips & guides, and using technology to maximize efficiency. Follow him on Twitter or Facebook for regular updates.
FOLLOW
Tagged with →  

Leave a Reply

Your email address will not be published. Required fields are marked *