What is Robots.txt File in 2020

What is Robots.txt File in 2020


Robots.txt is the simplest text file that is available at the root directory of the website. The robots.txt file is mainly used to instruct by web crawlers or spiders which web pages and many more of your website should be crawled. If you want to block any parts of your website from spiders. You put some code in robots.txt file according to syntax and disallow it. You can set different rules for different spiders.
what is robots.txt file
Googlebot is known as Google's web crawler. There are two different types of crawlers: one for a desktop crawler that copy a user on the desktop and another is a mobile crawler that copies a user on the mobile device. Google implements it to crawl the internet information about the website and store in the database. So that's why you know how to google rank different websites on the search results page.

Robots.txt is the part of SEO using this file with your website is a web standard. Google's web crawlers or other search engine's crawlers find for the robots.txt in the host directory or main folder of your website. This file is named "robots.txt". You can find out your robots.txt file from your website.

For Example https://www.yourwebsite.net/robots.txt

If the file is available on your website it will show in browser otherwise you got it 404 errors.

How does a robots.txt file look like?


I am sharing with you the basic format of a robots.txt file.


User-agent: [bot identifier]
[directive 1]
[directive 2]
[directive ...]

User-agent: [another bot identifier]
[directive 1]
[directive 2]
[directive ...]
  
Sitemap: [URL location of sitemap]

Now, I am sharing with you live example of a robots.txt file.


User-agents

We can identifies the search engines with different user-agent. You can set custom identifiers according to itself in your robots.txt file because there are hunderds of bot identifiers [user-agent] but here, I am showing some useful ones for SEo.


  • Goolge: Googlebot
  • Goole Image: Googlebot-Image
  • Yahoo: Slurp
  • Bing: Bingbot
  • Baidu: Baiduspider
  • DuckDuckGo: DuckDuckBot

Note: All user-agent are case sensitive. You can use the star (*) which is known as "Wildcard" to assign directories to all user-agent in robots.txt.

For Example:

If You want to block all the bot except Googlebot from crawling your site.

Let's have look

User-agent: *
Disallow: /

User-agent: Googlebot
Allow: /


Note: Crawlers only follow the rules which are declared rules under user-agent in robots.txt file. So that above code blocks all the bots except Googlebot from crawling the site.


Disallow

Disallow is used to instruct the search engines that crawlers not access or ignore the pages and files that fall under a specific path. For example, If you want to see how to block all bots from accessing your FAQ.

User-agent: *
Disallow: /FAQ


Allow:

Allow is used to instruct the search engines to access the all pages or files or specific paths. For Example,


User-agent: *
Disallow: /FAQ

Allow: /

Allow: /FAQ/FAQ-QUESTION

0 Response to "What is Robots.txt File in 2020"

Post a Comment

Ad Post

Ad Post1

Ad Post 3

Ad Post 4