What is an AI brand audit?

A structured assessment of how AI models understand and represent your brand, measured across Answer Presence, Message Pull-Through, and Owned Citations.

How is the BAI score calculated?

BAI combines three weighted dimensions: Answer Presence (does AI mention you?), Message Pull-Through (is the description accurate against your ground truth?), and Owned Citations (does AI reference your authoritative sources?). Scores range 0-100.

How does my brand compare to others?

Among 5,829 brands: 58.4% score 80-100 (strong), 25.3% score 60-79 (present but inconsistent), 4.9% score 40-59 (significant gaps), 3.9% score 20-39 (mostly invisible), and 7.5% score 0-19 (AI doesn't know you exist).

How long does an AI brand audit take?

Manual: 2-4 hours for a thorough audit across multiple models and query types. Automated via our tool: seconds.

Your brand has an AI profile — whether you know it or not. Claim yours →

Claim your brand →

Guide

How to Audit Your Brand's AI Visibility

An AI brand audit isn't just asking ChatGPT about yourself. It's a structured assessment across multiple dimensions, query types, and models. We've run this on 5,829 brands. Here's the framework.

The BAI Scoring Framework

The Brand Authority Index measures three weighted dimensions. Each tells you something different about how AI sees your brand.

Answer Presence (Weight: ~40%)

Does AI mention your brand when asked relevant category and intent queries? Not just "what is [brand]" but "what are the best tools for [your use case]?" The distinction matters — most brands pass the identity test but fail the category and intent tests. The most common Answer Presence failure isn't total absence — it's partial absence. A brand might appear in identity queries but be completely missing from buyer intent queries. This means AI knows who you are but doesn't recommend you when someone's actually buying.

Among 5,829 brands, 7.5% score 0-19 on this dimension (completely absent from AI responses). Incumbents average 85%+ presence across query types while Phantoms average 0%.

Message Pull-Through (Weight: ~35%)

When AI does mention you, does it get the description right? We measure this against a brand's "ground truth" — the actual positioning, category, products, and differentiators as defined by the brand itself. The gap between ground truth and AI output is the Message Pull-Through score. This dimension carries the highest business impact per point because inaccurate descriptions actively steer buyers away, whereas low presence simply means you're not in the conversation.

60% of brands have at least one significant misrepresentation — wrong category being the most common (59.8% of errors). Among Misread brands, the average Message Pull-Through score is below 30.

Owned Citations (Weight: ~25%)

Does AI reference your authoritative sources? When Claude answers a question about your brand, does it cite your website, your documentation, your case studies — or a third-party article from 2022? Owned citations create a positive feedback loop for future accuracy. When AI models cite your sources, future training reinforces correct information. When they cite outdated third-party content, the errors compound.

Brands with high owned citation rates have 3x more stable BAI scores week-over-week. Improving owned citations is the single best predictor of sustained BAI improvement over 90 days.

BAI distribution across 5,829 brands:

80-100 (Strong)

58.4%

60-79 (Present)

25.3%

40-59 (Gaps)

4.9%

20-39 (Invisible)

3.9%

0-19 (Unknown)

7.5%

What separates a 40 from an 80: it's usually the difference between showing up in identity queries only vs. showing up in buyer intent queries.

The Audit Process Step by Step

Define your ground truth

What should AI say about you? Category, products, differentiators, competitive positioning, key messages. If you can't define this clearly, no audit framework will help.

Design your query set

The 5 categories: identity, category, buyer intent, sentiment, competitor displacement. Each probes a different dimension of AI's understanding.

Run queries across models

Minimum 3 models (ChatGPT, Claude, Gemini), ideally 5+ (add Perplexity, Copilot). Each model has different training data.

Score each response

Against your ground truth. Is the brand mentioned? Is the description accurate? Are your sources cited?

Calculate your BAI

Combine the three dimension scores into an overall Brand Authority Index.

Diagnose your archetype

Incumbent, Challenger, Phantom, or Misread. Each has a distinct remediation path.

Build a remediation plan

Specific fixes based on where you scored lowest. See our fix guide for the detailed playbook.

See the full query methodology →

Mistakes We See Teams Make When Running Their First Audit

Testing only identity queries

Asking "What is [brand]?" and stopping there. Most brands pass the identity test. The real failures show up in category and buyer intent queries. If you only test identity queries, you'll think your AI brand reputation is fine when it's not.

Testing only one model

ChatGPT has the highest crawl volume (10,816 requests/week to our directory) and often has the most current data. If you only test ChatGPT and it looks good, you'll miss that Claude or Perplexity has you wrong. Always test 3+ models.

Running the audit once

We track 8,008 score changes per week. A single audit is a photograph of a moving target. Schedule recurring audits — monthly at minimum, weekly if you're actively remediating.

Not defining ground truth first

If you don't know what AI should say about you, you can't evaluate whether it's getting it right. Start with a 3-sentence ground truth statement before running any queries: what you are, who you serve, and what differentiates you.

Benchmarks — Where You Stand

Category-level data from the directory:

SaaS / Cloud Software

(429 brands)

Largest category. Wide BAI distribution — well-known players score 90+, niche tools often fall below 40. SaaS brands with clear subcategory positioning (e.g., 'AI-native CRM' vs. generic 'CRM') score 15-20 points higher on Message Pull-Through. Most common archetype: Challenger (AI knows these brands exist but the category is too crowded for consistent recommendation).

Fintech / Financial Services

(89 brands)

Generally strong presence due to regulatory content and press coverage. Most common archetype: Incumbent. The most common error pattern: AI describes fintech companies by their original product, not their current platform. A company that started as a 'payment processor' and evolved into 'financial infrastructure' will be described by the older term because financial services press coverage tends to use legacy terminology.

Retail / E-commerce

(63 brands)

High variance. DTC brands often score lower because AI models prioritize aggregator sites over brand sites. Retail brands generally have the highest BAI scores — strong consumer presence means abundant, consistent data. The exception: D2C brands that pivoted to B2B or marketplace models. AI models still describe them as 'online retailer' when they've become 'commerce infrastructure.'

Healthcare / Life Sciences

(32 brands)

Highest Misread rate of any category. AI frequently conflates 'medical devices' with 'pharmaceuticals,' 'health tech' with 'telehealth,' and 'clinical decision support' with 'EHR systems.' If you're in healthcare, the audit should specifically test whether AI places you in the correct medical subcategory.

If your category isn't listed above, your benchmark is the overall directory: median BAI in the 60-79 range, with 58.4% scoring 80-100 and 7.5% scoring 0-19. If you're a SaaS company scoring below 60, you're underperforming your category. If you're a healthcare brand scoring above 70, you're ahead of most of your peers. For the full research, see What We Learned Scoring 5,829 Brands.

What to Do Next

The audit tells you where you stand. The fix guide tells you how to move.