The Robots.txt File Explained

User Rating: 5 / 5

Star ActiveStar ActiveStar ActiveStar ActiveStar Active
 
Pin It

The robots.txt file is an important file located in your website files, it is a hidden file by default, you can view the robots.txt fie by logging into your Cpanal go to file manager and tick the show hidden files box then select the document root for the domain that you want to view the file, right click on the robots.txt file and select view.

The robots.txt is part of the robots exclusion protocol and this file regulates how and what files the robots can crawl, robots like the googlebot search out and harvest all the available links on your website and follow them to the next link, including all internal links on your website and all the external links linking to other websites on the internet, these links are then indexd by Google to be seached by other users to find what they are looking for.

With any website there are files and links with in that website that you do not want to be followed and indexed by the search engines, for example your installation files, libraries and logs, these files have no use to anyone if they are indexed and could cause a security problem as well, so these type of files are added to the disallow, and looks like this in your robots.txt file, Disallow: /installation/ now when the robots are crawling your site the instalation files are not indexed.

Files that you want indexed including any images as these count to your SEO score are allowed and look like this in your robots.txt file, Allow: /*.jpg*, the robots.txt file must be installed in the site root and not a subfolder within the main domain to work correctly, you can also place your links to your xml site map with in the robots.txt file, and the site map will be crawled and indexed when the robots crawl your website. 

So how does the robots.txt work, the robots crawl every website on the internet by following every link on every website which is billions of links world wide, when the crawler arrives at your website or any other website it looks for the robots.txt file first and will read this file before crawling the rest of the site, the file instructs the crawler which files to index on your site before following the links to next website where it reads the robots.txt again.

The robots.txt is installed automatically when you install your content management system like Joomla or Wordpress, and the file is installed in a standard configuration, depending on your needs you edit the file to allow or to disallow what you want to be indexed or not to be indexed, when you have a subdomain each website needs to have its own robots.txt, the file is very useful for when you have duplicated content on your website, you can allow one to be indexed and disallow the duplicated one from indexing.

Reliable Hosting from $3.95/month at Siteground Read More

config

Recommended Web Hosting

Web Hosting

SEO Powersuite

SEO PowerSuite

SEO or search engine optimization is an ongoing and essential part of any website maintenance and requires alot of time and effort to get right, but now there is a multi purpose tool available called SEO Power Suite designed to make SEO easier and more effective for webmasters. SEO…
SEMRush seo

SEMRush SEO And Competitor Research

SEMRush is a powerful online tool for search engine optimization and competitor research, there tools are designed for competitor analysis which businesses need to get an edge on there competitors and there plans also include a marketing suite and various other tools to help your website…
Template Monster

Template Monster Templates And Themes

Joomla and Wordpress are two of the most popular content management systems with tens of thousands of websites created from them all around the world, from online blogs to eCommerce websites. The base of these websites are created from themes and templates, these templates enable…