Robots.txt Generator & Editor
Create a professional robots.txt file to control how search engines crawl your website. Block unwanted bots, protect sensitive directories, and optimize crawl budget.
Click any preset to auto-fill the form below with recommended settings.
Use /directory/ to block entire directories, /page.html to block specific pages, or /*?* to block all URLs with parameters
Allow rules override Disallow rules for specific paths. Useful for allowing subdirectories within blocked directories.
Add your sitemap location to help search engines discover all your pages.
Sets delay between requests for the same user agent. Use 5-10 for large sites, 1-2 for small sites.
Try These Examples
Understanding Robots.txt (Complete Guide)
Robots.txt is a text file placed in your website's root directory that tells search engine crawlers which pages or sections of your site to crawl or not crawl. It's the first file search engines check when visiting your site β before any other content.
Think of robots.txt as a "gatekeeper" for search engines. When Googlebot arrives at your site, it immediately looks for `https://example.com/robots.txt`. This file instructs the bot where it can and cannot go. Well-configured robots.txt files save crawl budget, protect sensitive content, and improve SEO performance.
- π° Crawl Budget Optimization: Search engines allocate limited crawl time to your site. Blocking low-value pages (search results, archives, admin) ensures important pages get crawled more frequently.
- π Content Protection: Prevent indexing of staging environments, admin panels, login pages, and duplicate content. Note: robots.txt blocks crawling but not indexing if linked externally.
- β‘ Faster Indexing: By directing crawlers to your most important content, new pages get discovered and indexed faster.
- π« Block Spam Bots: Restrict malicious or resource-draining bots that waste server resources without providing SEO value.
- πΊοΈ Sitemap Discovery: Specify your sitemap location so crawlers find all your important pages even if not linked internally.
- π Server Load Reduction: Blocking unnecessary crawling reduces server load, improving site performance for real users.
- π Prevent Duplicate Content: Block URL parameters, printer-friendly versions, and other duplicate content sources that dilute SEO value.
Robots.txt is a directive, not an enforcement. Honest crawlers respect it, but malicious bots ignore it. For true content protection, use password protection or noindex meta tags. Also, if other sites link to blocked pages, Google may still index them without crawling content.
Robots.txt Syntax & Directives Guide
Specifies which search engine crawler the rules apply to. Use "*" for all crawlers, or specific names like "Googlebot", "Bingbot", "YandexBot".
User-agent: *
Blocks crawlers from accessing specific URLs or directories. Use "/directory/" to block entire folders, "/page.html" for specific pages.
Disallow: /admin/
Overrides Disallow rules for specific paths. Useful for allowing subdirectories within blocked directories.
Allow: /admin/public/
Specifies the location of your XML sitemap. Helps crawlers discover all your important pages.
Sitemap: https://example.com/sitemap.xml
Sets delay (in seconds) between successive requests from the same crawler. Reduces server load.
Crawl-delay: 5
Matches any sequence of characters. Use "/*?*" to block all URLs with parameters, "/2023/*" to block specific year archives.
Disallow: /*?*
User-agent: * Disallow: /search/ Disallow: /*?s= Disallow: /*?q=
This prevents search engines from crawling internal search result pages, which are typically low-value and generate infinite URLs.
12 Costly Robots.txt Mistakes
Blocking CSS, JavaScript, or image files prevents Google from rendering your page correctly, harming mobile usability scores and rankings.
Robots.txt is public β anyone can view it. Never use it to hide sensitive information (passwords, personal data, payment pages). Use authentication instead.
Without sitemap declaration, crawlers may miss important pages not linked internally. Always add Sitemap directive.
Disallow: / on a live site removes your site from search results entirely. Only use during development or maintenance.
Missing colons, extra spaces, or incorrect capitalization breaks robots.txt. Use our generator to avoid syntax errors.
Must be at `https://example.com/robots.txt` (root directory). Placing elsewhere makes itζ ζ.
Accidentally blocking product pages, blog posts, or category pages destroys SEO. Review your disallow list carefully.
Parameter URLs (?sort=asc, ?page=2) create infinite crawl space. Block them unless they contain unique content.
Different crawlers need different rules. Googlebot handles JavaScript; others don't. Configure per user-agent as needed.
Always test robots.txt changes using Google Search Console's robots.txt tester. One typo can block your entire site.
Crawl-delay: 0.1 may still overwhelm small servers. Start with 5-10 seconds for shared hosting, adjust based on server logs.
Rules for "*" don't automatically apply to specific bots. You must duplicate rules or specify each user-agent individually.
Platform-Specific Robots.txt Best Practices
Block wp-admin, wp-includes, and plugin directories. Block /?s=* (search), /feed/, and /trackback/. Allow /wp-admin/admin-ajax.php for functionality.
Shopify automatically generates robots.txt. You can only customize via theme.liquid. Block /collections/*/products/, /pages/*/comments.
Block /checkout/, /catalogsearch/, /customer/, /wishlist/, and parameter URLs. Block version-specific directories.
Block /admin/, /includes/, /logs/, /temp/, /backup/, and /config/. Block script files (.php, .inc) unless necessary.
You Might Also Like These SEO Tools
Frequently Asked Questions About Robots.txt
Generate Your Robots.txt File Now
Free robots.txt generator for SEO professionals and webmasters. Control crawlers, save crawl budget, improve SEO.