As you know, configuring robot.txt is important to any website that is working on a site’s SEO. Particularly, when you configure the sitemapto allow search engines to index your store, it is necessary to give web crawlers the instructions in the robot.txt file to avoid indexing the disallowed sites. The robot.txt file, that resides in the root of your Magento installation, is directive that search engines such as Google, Yahoo, Bing can recognize and track easily. In this post, I will introduce the guides to configure the robot.txt file so that it works well with your site.

Following steps to Configure robots.txt in Magento 2

  • On the Admin panel, click Stores. In the Settings section, select Configuration.
  • Select Design under General in the panel on the left
  • Open the Search Engine Robots section, and continue with following:
    • In Default Robots, select one of the following:
      • INDEX, FOLLOW
      • NOINDEX, FOLLOW
      • INDEX, NOFOLLOW
      • NOINDEX, NOFOLLOW
    • In the Edit Custom instruction of robots.txt File field, enter custom instructions if needed.
    • In the Reset to Defaults field, click on Reset to Default button if you need to restore the default instructions.
  • When complete, click Save Config.

How to Configure Robots.txt

Examples of Custom Robots.txt file

  • Allows Full Access
User-agent:* Disallow: 
  • Disallows Access to All Folders
User-agent:* Disallow: / 

Default Robots.txt for Magento 2

Disallow: /lib/ Disallow: /*.php$ Disallow: /pkginfo/ Disallow: /report/ Disallow: /var/ Disallow: /catalog/ Disallow: /customer/ Disallow: /sendfriend/ Disallow: /review/ Disallow: /*SID= Disallow: /*? # Disable checkout & customer account Disallow: /checkout/ Disallow: /onestepcheckout/ Disallow: /customer/ Disallow: /customer/account/ Disallow: /customer/account/login/ # Disable Search pages Disallow: /catalogsearch/ Disallow: /catalog/product_compare/ Disallow: /catalog/category/view/ Disallow: /catalog/product/view/ # Disable common folders Disallow: /app/ Disallow: /bin/ Disallow: /dev/ Disallow: /lib/ Disallow: /phpserver/ Disallow: /pub/ # Disable Tag & Review (Avoid duplicate content) Disallow: /tag/ Disallow: /review/ # Common files Disallow: /composer.json Disallow: /composer.lock Disallow: /CONTRIBUTING.md Disallow: /CONTRIBUTOR_LICENSE_AGREEMENT.html Disallow: /COPYING.txt Disallow: /Gruntfile.js Disallow: /LICENSE.txt Disallow: /LICENSE_AFL.txt Disallow: /nginx.conf.sample Disallow: /package.json Disallow: /php.ini.sample Disallow: /RELEASE_NOTES.txt # Disable sorting (Avoid duplicate content) Disallow: /*?*product_list_mode= Disallow: /*?*product_list_order= Disallow: /*?*product_list_limit= Disallow: /*?*product_list_dir= # Disable version control folders and others Disallow: /*.git Disallow: /*.CVS Disallow: /*.Zip$ Disallow: /*.Svn$ Disallow: /*.Idea$ Disallow: /*.Sql$ Disallow: /*.Tgz$ 

More Robots.txt examples

Block Google bot from a folder

User-agent: Googlebot Disallow: /subfolder/ 

Block Google bot from a page

User-agent: Googlebot Disallow: /subfolder/page-url.html 

Common Web crawlers (Bots)

Here are some common bots in the internet.

User-agent: Googlebot User-agent: Googlebot-Image/1.0 User-agent: Googlebot-Video/1.0 User-agent: Bingbot User-agent: Slurp # Yahoo User-agent: DuckDuckBot User-agent: Baiduspider User-agent: YandexBot User-agent: facebot # Facebook User-agent: ia_archiver # Alexa