Marketing

How LLMs decide which brands to cite (ChatGPT, Perplexity, Gemini, Claude)

How ChatGPT, Perplexity, Gemini, and Claude pick sources, audited live against Yuba Home Buyer and Fast Home Buyer California on May 4, 2026. Three citation surfaces, RAG retrieval mechanics, why brand mentions outweigh backlinks, query fan-out for niche operators, what becoming the representative source actually requires, and the productized AI Brand Mentions path.

YK KulievMay 5, 2026

REI Spark is a B2B SEO platform run by a licensed California real estate agent (DRE #02006033). The operator runs two active cash-buying businesses on the same stack — Yuba Home Buyer in the hyperlocal Yuba-Sutter market and Fast Home Buyer California with statewide positioning — and audited both sites against Google search and Perplexity on May 4, 2026 to produce the data this article runs on. Most AI-search advice on the open web was written for SaaS and e-commerce. Investor queries are mostly low-volume long-tail, which is where AI engines fan out and reward niche operators with structured content over generic publishers. This post sits within the entity-based SEO root post as the AI-search branch of the topical map.

What is an LLM citation?

An LLM citation is the source an AI engine attributes when it answers a query. Three surfaces appear: a clickable URL in the response, an in-text reference to a named publication, and an unlinked brand mention woven into the prose.

The three surfaces matter separately because each rewards a different kind of authority. Clickable URLs appear in AI search results pages — Perplexity's source list, ChatGPT search's web results, Gemini's "show sources" panel. In-text references appear when the model names a publication in the answer prose — "according to The Wall Street Journal." Unlinked brand mentions are the third — the model writes "Yuba Home Buyer covers the Yuba-Sutter area" without a hyperlink, surfacing the brand as a recognized entity rather than a retrieved page. All three count toward AI visibility; only the first looks like traditional SEO at first glance.

Platform behavior diverges sharply. Per the 5W AI Platform Citation Source Index 2026, ChatGPT concentrates citations on Wikipedia, Reddit, Forbes, and Business Insider. Perplexity cites roughly three times more sources per response than ChatGPT (per Qwairy's 118,000-response analysis) and rewards primary sources and named B2B authority. Gemini favors brand-owned websites with structured schema — 52% of citations come from brand domains in Yext's 6.8 million-citation analysis. Claude leans toward established journalism — The New York Times, The Atlantic, The New Yorker, The Economist. The same query asked across four engines returns four different source mixes.

Citation is not the same system as ranking. Google ranks URLs against a query through a relevance-and-authority model tuned for click-through. An AI engine retrieves passages from a much larger candidate set, ranks them at the passage level rather than the URL level, and synthesizes the top few into a single answer with attributions. A page that ranks #4 on Google can be the only source cited by Perplexity for the same query, because the two systems weight passages and pages on different signals.

How does RAG pick sources?

RAG parses the question, embeds it as a vector, retrieves matching passages from indexed corpora or live web search, ranks them by relevance, and feeds top passages to the model as context. The model writes the answer citing those passages.

Retrieval-augmented generation runs five stages on every AI-search query. Parsing identifies entities, intent, and scope. Embedding converts the query into a vector — a numerical representation of meaning rather than exact words. Retrieval matches that vector against an indexed corpus, a live web search result set, or both in hybrid retrieval. Reranking evaluates candidate passages and orders them by how well they actually answer the question, not just how similar they look. Generation feeds the top passages to the language model as context, and the model writes the answer with citations pointing to the passages it used.

The consensus-engine effect runs underneath all five stages. Information that appears consistently across the training corpus and the retrieved sources gets weighted higher because the model has higher confidence in it. A fact stated identically by ten sources lands in the answer; a fact stated five different ways across ten sources fragments and may be omitted to avoid hallucination. NAP inconsistency — the operator's address listed three different ways across the web — degrades retrieval for that entity before any ranking question gets asked.

The retrieve-rank-serve pipeline is structurally identical to traditional search. Google retrieves URLs and ranks them by hundreds of signals; RAG retrieves passages and ranks them through embedding similarity plus reranking. Same fundamentals, different surface. The implication for operators is that the SEO disciplines that build authority for traditional search — entity strength, structured data, NAP consistency, topical depth — build the same authority for AI search, with the citation surface as the new ranking proxy.

Why are brand mentions, not just backlinks, the signal LLMs reward?

LLMs reward brand mentions because the model estimates entity authority by cross-validating mentions across sources, not by counting inbound links. Unlinked mentions count when the engine can disambiguate the entity through context — credentials, NAP consistency, and registry sameAs references.

The cross-validation mechanic is what separates LLM authority signals from Google's link graph. A backlink is a single edge in a graph. A brand mention is evidence that a named entity exists, has stable attributes, and is referenced by independent sources for those attributes. The model aggregates mentions during training and live retrieval, builds confidence around entity facts, and surfaces the entity when that confidence is high enough. Inbound link count feeds that confidence; consistent unlinked mentions feed it equally.

Reddit ranks #1 across every major AI engine in aggregate — roughly 40% citation frequency per the 5W AI Platform Citation Source Index 2026. That aggregate is real for broad-topic queries where Reddit hosts the conversation, and it is misleading for niche commercial queries where Reddit doesn't have the relevant content. A May 2026 spot-check ran seven California cash-buyer fan-out queries on Google and Perplexity. Reddit appeared in zero citations across the seven queries. The cited sources were brand-domain pages — investor company sites, hyperlocal cash-buyer pages, BBB profiles, regional news writeups. Source preferences vary by query type, not just by platform. The operator-relevant takeaway is that platform-level aggregate citation share doesn't predict where investor-niche citation slots actually go.

For investor niches, the brand-mention surfaces that move the needle are niche directories, B2B authority sites, local civic and chamber pages, and the operator's own entity-rich site. Each contributes a consistent factual mention of the brand — LLC name, operator name, DRE license, service area. When those mentions agree, model confidence rises. When they disagree, the model hedges or omits.

This is the same authority pillar covered in branded anchor distribution, measured by a different system. Branded anchors signal entity to Google's link graph; brand mentions signal the same entity to LLM training corpora. An operator who has built a clean branded-anchor profile is already partway through the brand-mention work — the mentions exist on the donor side, just not yet measured against AI citation behavior.

How does query fan-out surface niche brands?

Query fan-out is when an AI engine decomposes a compound query into smaller sub-queries searched independently. For low-volume investor queries — like "sell house fast Yuba City probate" — the niche operator with deep sub-query coverage often wins multiple slots.

Fan-out is decomposition followed by synthesis. The engine reads a compound query — "sell house fast Yuba City probate cash" — and breaks it into smaller sub-queries that each get their own retrieval pass: "Yuba City probate process," "cash home buyer Yuba City," "selling inherited property California," "how does a probate sale close." Each sub-query pulls its own candidate passage set. The synthesized answer draws from passages across the union of those sets. Whichever entity covers the most sub-query slots with retrievable passages dominates the final answer.

This rewards niche operators with deep coverage on a tight topic. A generic publisher with one broad article on "selling inherited property" wins one sub-query slot. A niche operator with separate articles on probate process, on Yuba-Sutter probate court timelines, on cash-buyer mechanics, on inherited-property tax implications, and on closing logistics wins five. This is the Semantic Content Network thesis applied to LLM retrieval: every article must encode overlapping attributes so multiple passages of the same site can be retrieved across the fan-out.

Carrot template homogeneity becomes a fan-out problem in this layer. Passage-level deduplication strips redundant content during retrieval — fifty thousand Carrot sites with identical service-area template copy collectively contribute one passage's worth of retrievable content for the queries that template addresses. The platform isn't broken; the template is doing what templates do. The variance in citation outcomes within Carrot is operator-driven, not platform-driven. A May 2026 spot-check pulled ten cited cash-buyer domains from seven California fan-out queries on Google; five of the ten ran on Carrot's platform (image-cdn.carrot.com signature plus the shared GTM-WWPZRDH Tag Manager container). The most-cited single domain in the set, laurelbuyshouses.com, appeared in four of the seven queries — and is itself a heavily customized Carrot install with a personal author entity, four published books, BBB credentials, and original photography. Default-template Carrot installs without entity customization appeared once each. Custom WordPress builds (osbornehomes.com, socalhomebuyers.com) won citations through the same mechanism: a named author entity, dedicated taxonomies, original content.

The operator framing follows: Carrot dominates the AI-search investor niche, and the citation variance inside that dominance is determined by how much entity customization the operator layered on top of the defaults. Migrating off Carrot is one path to representative-source status. Customizing on Carrot is another. The platform is not the constraint.

What does "being the representative source" mean for a real estate investor?

A representative source is the entity an AI engine treats as canonical when answering a query about a topic. For a real estate investor, three earnable inputs build it: verifiable credentials with sameAs links, proprietary data, and consistent NAP across every indexed surface.

Representative-source status is the AI-search analog of Knowledge Graph eligibility. The engine has decided this entity is a reliable, citable source on this topic; it surfaces the entity by name in answers and cites its pages. The status is earned, not declared. Three inputs build it.

Verifiable credentials with sameAs anchors disambiguate the entity to the engine. A Person schema block with hasCredential pointing to a state license registry and sameAs URLs pointing to that registry's public lookup gives the engine a fact-checking surface. It can resolve "YK Kuliev, DRE #02006033" against the public DRE lookup, confirm the credential is active, and treat downstream content under that author as higher-confidence. Most investor sites ship no Person schema at all, or ship it without sameAs. Proprietary data and operator tests are the second input. AI engines reward content competitors can't reproduce — a transaction-volume figure from the operator's own deal records, a market timing chart pulled from MLS access, an audit run on the operator's own portfolio. Generic content gets retrieved; primary-source content gets cited. Consistent NAP and entity attributes across every indexed surface is the third — the same LLC name, business address, phone number, service area, and founder name appearing identically across the operator's site, BBB, the DRE registry, Secretary of State filings, and every directory. Inconsistency reduces engine confidence; the model hedges or omits when it can't reconcile competing attributes for the same entity.

The investor moat is that most operators don't bother. Carrot defaults don't ship Person schema with credentials, most agency-built sites run identical bio copy across clients, and NAP audits aren't part of the standard investor SEO checklist. A licensed investor running structured Person schema with DRE registry sameAs and proprietary transaction data is a small population.

The May 2026 audit referenced in the introduction surfaces the pattern directly. Yuba Home Buyer — hyperlocal Yuba-Sutter, DRE in schema, operator-authored content depth — was cited by Perplexity with the verbatim description "Yuba Home Buyer (yubahomebuyer.com): Local expert covering Yuba City, Marysville, and nearby areas like Olivehurst; offers cash in 24 hours and handles code violations." Google placed the same site at #2 with the DRE license number surfaced in the meta description. Fast Home Buyer California — same operator, statewide positioning, less per-city content depth — did not surface in either platform across four California-wide commercial queries (Sacramento cash buyers, best California cash-buyer companies, California cash-buyer process, California probate inherited). Same operator, same credentials, two outcomes determined by content depth at the entity-relevant query layer. Hyperlocal entity strength wins narrow fan-out slots; broader-territory positioning without proportional depth fails to surface even on stated specialties — FHBC publishes probate content but did not surface for the canonical California probate query. The Perplexity session was signed in and shows some account-aware personalization; the directional finding is defensible, the exact ordering is not.

How does AI brand-mention distribution actually work?

AI brand-mention distribution is a productized 4-pack service category: 5–100 unique branded mentions per pack, published across 20,000+ sites indexed by LLM training pipelines. The function is data saturation across crawler-accessible surfaces — not premium backlinks.

The service is a managed pipeline that does the brand-mention work the previous five sections describe at scale. Each pack uses 15 angle-varied templates so the published mentions surface different facets of the same entity — DRE license, service area, probate specialty, transaction volume, operator background — rather than repeating one piece of copy across every placement. The intake form is pre-tuned for real estate investors so the entity facets the engines need are the ones the form collects. Turnaround is one working day, fully managed.

Straight talk on what the packs are not: these are not premium backlinks, and operators should not expect meaningful Google SERP lift. The publishing network is built for breadth and indexation across surfaces LLM training pipelines scrape, not for the editorial authority that moves Google rankings. Buying this to climb Google is the wrong tool for the job.

What the packs do is plant consistent factual brand mentions across the indexable web. The mechanism is the consensus-engine effect described above: more consistent factual mentions across crawler-accessible surfaces increases LLM confidence in citing or describing the brand, which lifts how ChatGPT, Claude, Perplexity, and Gemini answer questions about the operator's market. The pack tiers — 5 mentions on the entry pack up to 100 on the largest — let an operator dose the spend against an AI-visibility audit baseline. An operator already cited well in their hyperlocal market spends differently than one trying to surface in a broader metro for the first time.

Frame: AI brand-mention distribution is one of several paths to representative-source status — the productized one for operators who would rather buy distribution than run it themselves. The other paths in this article — earning credentials with sameAs, building proprietary content, tightening NAP consistency, customizing on or off Carrot — work just as well, and they take longer.

The same retrieval, ranking, and serving fundamentals run underneath AI search. Citation is the new ranking surface; brand mention consensus is the new authority currency. The action: pick three canonical investor queries for your market, ask each on ChatGPT, Perplexity, Gemini, and Claude with a fresh signed-out session, and record which entities surface in the citations. The gap between where you appear and where you don't is your AI-visibility roadmap. Return upstream to the entity-based SEO root post for the topical authority context this diagnosis sits within. AI search rewards niche operators who build entity strength deliberately — the same thing semantic SEO rewards in Google, with a wider citation surface.

By YK Kuliev, California DRE #02006033 — operating cash-buyer brand sites since 2018, REI Spark since 2025.