Should I block AI crawlers in robots.txt?

No. Blocking AI crawlers removes your brand from both training data and live retrieval — the two memory layers AI uses to describe companies. An AI-friendly robots.txt explicitly welcomes these crawlers while blocking only admin, session, and duplicate-generating paths.

What AI bots should I allow?

At minimum, allow GPTBot, ChatGPT-User, ClaudeBot, Google-Extended, PerplexityBot, and Applebot-Extended. These cover the major AI platforms that shape buyer perception. Use explicit User-agent + Allow directives for each.

What's the difference between robots.txt and llms.txt?

robots.txt controls access — it tells crawlers which pages they can visit. llms.txt provides context — it gives AI models a token-efficient index of your best content, structured specifically for language model consumption. They're complementary files in a complete AI discoverability stack.

Do I need a separate robots.txt for AI?

No. You use one robots.txt file at your domain root. Within it, you add bot-specific User-agent blocks for each AI crawler. The template on this page shows exactly how to structure this with explicit Allow directives for 13+ AI crawlers.

Your brand has an AI profile — whether you know it or not. Search the AI Brand Index →

Claim your brand →

Updated March 2026

Your robots.txt Is the Front Door for AI

Most robots.txt advice tells you to block AI crawlers. We take the opposite approach. Here's a free, production-ready template — and the reasoning behind every line.

Why Your robots.txt Matters for AI

Parametric Memory

What AI learned during training. If crawlers couldn't access your site when models were built, you're not in the weights.

Retrieval Memory

What AI fetches live when answering questions. Block crawlers and you lose real-time context too.

AI has two memory layers — parametric knowledge from training data, and retrieval-augmented knowledge from live crawling. If your robots.txt blocks AI crawlers, you lose both. Your brand becomes invisible to the models that shape buyer perception.

This isn't theoretical. When we launched Optimly's AI Brand Index, OpenAI's crawlers indexed 150+ pages on day one. That content now shapes how ChatGPT describes every brand in our directory. The front door was open — and it mattered.

150+

pages crawled by OpenAI on our AI Brand Index launch day

The Template

Copy this, customize the Disallow paths for your site, replace the domain, and upload to your root. That's it.

# =============================================================
  # AI-Friendly robots.txt Template
  # Generated by Optimly — https://optimly.ai
  # Last updated: March 2026
  # =============================================================

  # Default rules for all crawlers
  User-agent: *
  Allow: /

  # Block admin and internal paths
  Disallow: /admin/
  Disallow: /api/internal/
  Disallow: /cart/
  Disallow: /checkout/
  Disallow: /account/
  Disallow: /login/

  # Block URL parameters that create duplicate content
  Disallow: /search?
  Disallow: /?utm_
  Disallow: /?ref=
  Disallow: /*?session=

  # =============================================================
  # Explicitly allow AI crawlers
  # Why: Some crawlers check bot-specific rules first.
  # An explicit Allow signals intent — you WANT to be indexed.
  # =============================================================

  # OpenAI
  User-agent: GPTBot
  Allow: /

  User-agent: ChatGPT-User
  Allow: /

  User-agent: OAI-SearchBot
  Allow: /

  # Anthropic
  User-agent: ClaudeBot
  Allow: /

  User-agent: anthropic-ai
  Allow: /

  # Google AI
  User-agent: Google-Extended
  Allow: /

  User-agent: GoogleOther
  Allow: /

  # Perplexity
  User-agent: PerplexityBot
  Allow: /

  # You.com
  User-agent: YouBot
  Allow: /

  # Cohere
  User-agent: cohere-ai
  Allow: /

  # Apple
  User-agent: Applebot-Extended
  Allow: /

  # Microsoft / Bing
  User-agent: bingbot
  Allow: /

  # Meta
  User-agent: FacebookBot
  Allow: /

  # =============================================================
  # Sitemap — replace with your actual sitemap URL
  # =============================================================
  Sitemap: https://YOUR-DOMAIN.com/sitemap.xml

  # =============================================================
  # Companion files for AI discoverability
  # llms.txt          → Token-efficient index of your best content
  # llms-full.txt     → Extended version with full page context
  # ai-agent-manifest.json → Machine-readable brand positioning
  # sitemap.xml       → Full crawlable URL structure
  # =============================================================

Design Decisions

The AI Discoverability Stack

Four files. One system. Together they control how AI models discover, read, and represent your brand. See the full series →

robots.txt— Front DoorYou are here

Tells AI crawlers what they can access

llms.txt— Concierge

Token-efficient index of your best content

llms-full.txt— Deep Content

RAG-ready structured content for retrieval

ai-agent-manifest.json— Identity

Machine-readable brand positioning

Next in the stack: llms.txt

AI Crawler Reference

Known AI crawler user-agents as of March 2026. Bookmark this — we keep it updated.

User-Agent	Company	Purpose
GPTBot	OpenAI	Training & inference
ChatGPT-User	OpenAI	Live browsing (ChatGPT)
OAI-SearchBot	OpenAI	SearchGPT results
ClaudeBot	Anthropic	Training & retrieval
anthropic-ai	Anthropic	Research crawling
Google-Extended	Google	Gemini training
GoogleOther	Google	AI features & research
PerplexityBot	Perplexity	Answer engine retrieval
YouBot	You.com	AI search results
cohere-ai	Cohere	Enterprise AI training
Applebot-Extended	Apple	Apple Intelligence features
bingbot	Microsoft	Search + Copilot retrieval
FacebookBot	Meta	Meta AI features
CCBot	Common Crawl	Open training datasets

Common Mistakes

Blocking /blog/ or /resources/

Removes your highest-value content from AI categorization. These pages contain the signals that shape how models describe your brand.

Using Disallow: / for all AI bots

Makes you invisible to every model. You lose both parametric memory and live retrieval — the two ways AI forms opinions about brands.

Forgetting URL parameters

Creates thousands of duplicate pages. Crawlers waste budget on ?utm_, ?ref=, and ?session= variants instead of your real content.

No Sitemap directive

Crawlers miss important pages. Without an explicit Sitemap line, bots rely on link discovery alone — and skip orphaned pages entirely.

Inconsistent robots.txt and llms.txt

Conflicting signals confuse crawlers. If robots.txt blocks a path that llms.txt links to, neither file is trusted.

Aggressive Crawl-delay

Throttles how much AI indexes per visit. Handle rate limiting at the CDN layer instead — it's more precise and doesn't penalize legitimate crawlers.

Implementation Checklist

Step 1

Download the template above

Step 2

Customize Disallow paths for your site structure

Step 3

Replace YOUR-DOMAIN.com with your actual domain

Step 4

Upload to your site root (must be at /robots.txt)

Step 5

Verify in Google Search Console

Step 6

Set up llms.txt and BrandVault companion files

Step 7

Monitor crawl activity in your server logs

See What AI Believes About Your Brand

Your robots.txt opens the door. But what are AI models actually saying once they walk in? Search our directory to find out.

Search the AI Brand Index Claim your profile — free

What is AI Brand Reputation?Methodology Hal9 Case Study llms.txt

Don't want to maintain these files yourself?

Claim your profile in the AI Brand Index, and we generate, maintain, and serve your entire Discoverability Stack automatically. Your llms.txt stays current. Your manifest reflects your latest positioning. Your brand-feed updates when you update your Brand Vault.

All served via our agentic brand API so AI models always have your latest truth.

Claim Your Profile See a working example on our site