We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience and analyze website traffic. By clicking “Accept,” you agree to our website's cookie use as described in our Cookie Policy. You can change your cookie settings at any time by clicking “Preferences.”
    New: Free AI Brand Audit — see what ChatGPT is telling your buyers →
    Updated March 2026

    Your robots.txt Is the Front Door for AI

    Most robots.txt advice tells you to block AI crawlers. We take the opposite approach. Here's a free, production-ready template — and the reasoning behind every line.

    Why Your robots.txt Matters for AI

    Parametric Memory

    What AI learned during training. If crawlers couldn't access your site when models were built, you're not in the weights.

    Retrieval Memory

    What AI fetches live when answering questions. Block crawlers and you lose real-time context too.

    AI has two memory layers — parametric knowledge from training data, and retrieval-augmented knowledge from live crawling. If your robots.txt blocks AI crawlers, you lose both. Your brand becomes invisible to the models that shape buyer perception.

    This isn't theoretical. When we launched Optimly's Brand Directory, OpenAI's crawlers indexed 150+ pages on day one. That content now shapes how ChatGPT describes every brand in our directory. The front door was open — and it mattered.

    150+

    pages crawled by OpenAI on our Brand Directory launch day

    The Template

    Copy this, customize the Disallow paths for your site, replace the domain, and upload to your root. That's it.

    # =============================================================
    # AI-Friendly robots.txt Template
    # Generated by Optimly — https://optimly.ai
    # Last updated: March 2026
    # =============================================================
    
    # Default rules for all crawlers
    User-agent: *
    Allow: /
    
    # Block admin and internal paths
    Disallow: /admin/
    Disallow: /api/internal/
    Disallow: /cart/
    Disallow: /checkout/
    Disallow: /account/
    Disallow: /login/
    
    # Block URL parameters that create duplicate content
    Disallow: /search?
    Disallow: /?utm_
    Disallow: /?ref=
    Disallow: /*?session=
    
    # =============================================================
    # Explicitly allow AI crawlers
    # Why: Some crawlers check bot-specific rules first.
    # An explicit Allow signals intent — you WANT to be indexed.
    # =============================================================
    
    # OpenAI
    User-agent: GPTBot
    Allow: /
    
    User-agent: ChatGPT-User
    Allow: /
    
    User-agent: OAI-SearchBot
    Allow: /
    
    # Anthropic
    User-agent: ClaudeBot
    Allow: /
    
    User-agent: anthropic-ai
    Allow: /
    
    # Google AI
    User-agent: Google-Extended
    Allow: /
    
    User-agent: GoogleOther
    Allow: /
    
    # Perplexity
    User-agent: PerplexityBot
    Allow: /
    
    # You.com
    User-agent: YouBot
    Allow: /
    
    # Cohere
    User-agent: cohere-ai
    Allow: /
    
    # Apple
    User-agent: Applebot-Extended
    Allow: /
    
    # Microsoft / Bing
    User-agent: bingbot
    Allow: /
    
    # Meta
    User-agent: FacebookBot
    Allow: /
    
    # =============================================================
    # Sitemap — replace with your actual sitemap URL
    # =============================================================
    Sitemap: https://YOUR-DOMAIN.com/sitemap.xml
    
    # =============================================================
    # Companion files for AI discoverability
    # llms.txt          → Token-efficient index of your best content
    # llms-full.txt     → Extended version with full page context
    # ai-agent-manifest.json → Machine-readable brand positioning
    # sitemap.xml       → Full crawlable URL structure
    # =============================================================

    Design Decisions

    AI Crawler Reference

    Known AI crawler user-agents as of March 2026. Bookmark this — we keep it updated.

    User-AgentCompanyPurpose
    GPTBotOpenAITraining & inference
    ChatGPT-UserOpenAILive browsing (ChatGPT)
    OAI-SearchBotOpenAISearchGPT results
    ClaudeBotAnthropicTraining & retrieval
    anthropic-aiAnthropicResearch crawling
    Google-ExtendedGoogleGemini training
    GoogleOtherGoogleAI features & research
    PerplexityBotPerplexityAnswer engine retrieval
    YouBotYou.comAI search results
    cohere-aiCohereEnterprise AI training
    Applebot-ExtendedAppleApple Intelligence features
    bingbotMicrosoftSearch + Copilot retrieval
    FacebookBotMetaMeta AI features
    CCBotCommon CrawlOpen training datasets

    Common Mistakes

    Blocking /blog/ or /resources/

    Removes your highest-value content from AI categorization. These pages contain the signals that shape how models describe your brand.

    Using Disallow: / for all AI bots

    Makes you invisible to every model. You lose both parametric memory and live retrieval — the two ways AI forms opinions about brands.

    Forgetting URL parameters

    Creates thousands of duplicate pages. Crawlers waste budget on ?utm_, ?ref=, and ?session= variants instead of your real content.

    No Sitemap directive

    Crawlers miss important pages. Without an explicit Sitemap line, bots rely on link discovery alone — and skip orphaned pages entirely.

    Inconsistent robots.txt and llms.txt

    Conflicting signals confuse crawlers. If robots.txt blocks a path that llms.txt links to, neither file is trusted.

    Aggressive Crawl-delay

    Throttles how much AI indexes per visit. Handle rate limiting at the CDN layer instead — it's more precise and doesn't penalize legitimate crawlers.

    Implementation Checklist

    Step 1

    Download the template above

    Step 2

    Customize Disallow paths for your site structure

    Step 3

    Replace YOUR-DOMAIN.com with your actual domain

    Step 4

    Upload to your site root (must be at /robots.txt)

    Step 5

    Verify in Google Search Console

    Step 6

    Set up llms.txt and BrandVault companion files

    Step 7

    Monitor crawl activity in your server logs

    See What AI Believes About Your Brand

    Your robots.txt opens the door. But what are AI models actually saying once they walk in? Search our directory to find out.