We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience and analyze website traffic. By clicking “Accept,” you agree to our website's cookie use as described in our Cookie Policy. You can change your cookie settings at any time by clicking “Preferences.”
    Your brand has an AI profile — whether you know it or not. Search the AI Brand Index →
    Optimly Logo
    Claim your brand →
    Updated March 2026

    Your robots.txt Is the Front Door for AI

    Most robots.txt advice tells you to block AI crawlers. We take the opposite approach. Here's a free, production-ready template — and the reasoning behind every line.

    Why Your robots.txt Matters for AI

    Parametric Memory

    What AI learned during training. If crawlers couldn't access your site when models were built, you're not in the weights.

    Retrieval Memory

    What AI fetches live when answering questions. Block crawlers and you lose real-time context too.

    AI has two memory layers — parametric knowledge from training data, and retrieval-augmented knowledge from live crawling. If your robots.txt blocks AI crawlers, you lose both. Your brand becomes invisible to the models that shape buyer perception.

    This isn't theoretical. When we launched Optimly's AI Brand Index, OpenAI's crawlers indexed 150+ pages on day one. That content now shapes how ChatGPT describes every brand in our directory. The front door was open — and it mattered.

    150+

    pages crawled by OpenAI on our AI Brand Index launch day

    The Template

    Copy this, customize the Disallow paths for your site, replace the domain, and upload to your root. That's it.

    # =============================================================
      # AI-Friendly robots.txt Template
      # Generated by Optimly — https://optimly.ai
      # Last updated: March 2026
      # =============================================================
    
      # Default rules for all crawlers
      User-agent: *
      Allow: /
    
      # Block admin and internal paths
      Disallow: /admin/
      Disallow: /api/internal/
      Disallow: /cart/
      Disallow: /checkout/
      Disallow: /account/
      Disallow: /login/
    
      # Block URL parameters that create duplicate content
      Disallow: /search?
      Disallow: /?utm_
      Disallow: /?ref=
      Disallow: /*?session=
    
      # =============================================================
      # Explicitly allow AI crawlers
      # Why: Some crawlers check bot-specific rules first.
      # An explicit Allow signals intent — you WANT to be indexed.
      # =============================================================
    
      # OpenAI
      User-agent: GPTBot
      Allow: /
    
      User-agent: ChatGPT-User
      Allow: /
    
      User-agent: OAI-SearchBot
      Allow: /
    
      # Anthropic
      User-agent: ClaudeBot
      Allow: /
    
      User-agent: anthropic-ai
      Allow: /
    
      # Google AI
      User-agent: Google-Extended
      Allow: /
    
      User-agent: GoogleOther
      Allow: /
    
      # Perplexity
      User-agent: PerplexityBot
      Allow: /
    
      # You.com
      User-agent: YouBot
      Allow: /
    
      # Cohere
      User-agent: cohere-ai
      Allow: /
    
      # Apple
      User-agent: Applebot-Extended
      Allow: /
    
      # Microsoft / Bing
      User-agent: bingbot
      Allow: /
    
      # Meta
      User-agent: FacebookBot
      Allow: /
    
      # =============================================================
      # Sitemap — replace with your actual sitemap URL
      # =============================================================
      Sitemap: https://YOUR-DOMAIN.com/sitemap.xml
    
      # =============================================================
      # Companion files for AI discoverability
      # llms.txt          → Token-efficient index of your best content
      # llms-full.txt     → Extended version with full page context
      # ai-agent-manifest.json → Machine-readable brand positioning
      # sitemap.xml       → Full crawlable URL structure
      # =============================================================

    Design Decisions

    The AI Discoverability Stack

    Four files. One system. Together they control how AI models discover, read, and represent your brand. See the full series →

    robots.txt— Front DoorYou are here

    Tells AI crawlers what they can access

    llms.txt— Concierge

    Token-efficient index of your best content

    llms-full.txt— Deep Content

    RAG-ready structured content for retrieval

    ai-agent-manifest.json— Identity

    Machine-readable brand positioning

    Next in the stack: llms.txt

    AI Crawler Reference

    Known AI crawler user-agents as of March 2026. Bookmark this — we keep it updated.

    User-AgentCompanyPurpose
    GPTBotOpenAITraining & inference
    ChatGPT-UserOpenAILive browsing (ChatGPT)
    OAI-SearchBotOpenAISearchGPT results
    ClaudeBotAnthropicTraining & retrieval
    anthropic-aiAnthropicResearch crawling
    Google-ExtendedGoogleGemini training
    GoogleOtherGoogleAI features & research
    PerplexityBotPerplexityAnswer engine retrieval
    YouBotYou.comAI search results
    cohere-aiCohereEnterprise AI training
    Applebot-ExtendedAppleApple Intelligence features
    bingbotMicrosoftSearch + Copilot retrieval
    FacebookBotMetaMeta AI features
    CCBotCommon CrawlOpen training datasets

    Common Mistakes

    Blocking /blog/ or /resources/

    Removes your highest-value content from AI categorization. These pages contain the signals that shape how models describe your brand.

    Using Disallow: / for all AI bots

    Makes you invisible to every model. You lose both parametric memory and live retrieval — the two ways AI forms opinions about brands.

    Forgetting URL parameters

    Creates thousands of duplicate pages. Crawlers waste budget on ?utm_, ?ref=, and ?session= variants instead of your real content.

    No Sitemap directive

    Crawlers miss important pages. Without an explicit Sitemap line, bots rely on link discovery alone — and skip orphaned pages entirely.

    Inconsistent robots.txt and llms.txt

    Conflicting signals confuse crawlers. If robots.txt blocks a path that llms.txt links to, neither file is trusted.

    Aggressive Crawl-delay

    Throttles how much AI indexes per visit. Handle rate limiting at the CDN layer instead — it's more precise and doesn't penalize legitimate crawlers.

    Implementation Checklist

    Step 1

    Download the template above

    Step 2

    Customize Disallow paths for your site structure

    Step 3

    Replace YOUR-DOMAIN.com with your actual domain

    Step 4

    Upload to your site root (must be at /robots.txt)

    Step 5

    Verify in Google Search Console

    Step 6

    Set up llms.txt and BrandVault companion files

    Step 7

    Monitor crawl activity in your server logs

    See What AI Believes About Your Brand

    Your robots.txt opens the door. But what are AI models actually saying once they walk in? Search our directory to find out.

    Search the AI Brand IndexClaim your profile — free
    What is AI Brand Reputation?MethodologyHal9 Case Studyllms.txt

    Don't want to maintain these files yourself?

    Claim your profile in the AI Brand Index, and we generate, maintain, and serve your entire Discoverability Stack automatically. Your llms.txt stays current. Your manifest reflects your latest positioning. Your brand-feed updates when you update your Brand Vault.

    All served via our agentic brand API so AI models always have your latest truth.

    Claim Your Profile See a working example on our site