A comfortable average hides the platform that does not believe you yet. The weak answer surface is irritating, but it often shows the exact fracture in the public record.
In a composite software case, a French retail and logistics integrator looked respectable in ChatGPT answers. Not dominant, but present: usually third or fourth, sometimes described as a specialist, once recommended for mid-market retail deployments. Perplexity was less generous. It named the brand, then pulled source trails toward larger consulting firms and partner directories. Gemini was the harshest. In several English prompts it omitted the integrator completely, then offered a list of broad technology companies. One answer even mentioned a competitor’s old acquisition under its previous name, as if the category had been stored in a drawer and forgotten.
A marketer could average those runs and say the brand was “visible.” I would not. The weak surface was the diagnosis. ChatGPT had enough pattern memory to include the company. Perplexity exposed the source-trail weakness. Gemini showed that the English-language entity record could not reliably hold the category. The useful question was not “what is our overall AI visibility?” It was sharper and less comfortable: where does the brand fail first, and what does that failure reveal?
Platforms do not fail in the same way
It is tempting to treat ChatGPT, Perplexity and Gemini as three versions of the same test. Ask the same prompt, record the same ranking, compare the lists. That is a start, but it misses the different kinds of weakness each surface can expose.
In my work, I do not assume the systems behave identically. They do not retrieve, summarize, cite or phrase brand evidence in the same way. Even when the visible answer looks similar, the path into that answer may be different. One surface may lean on general web patterns. Another may make source trails more visible. Another may handle language variation differently. The practical result for a brand is simple: strong in one place does not mean strong everywhere.
This is especially clear in French categories that also have English buyer prompts. A brand may have enough French evidence to appear in a French ChatGPT answer, enough directory presence to appear in Perplexity, and too little English category evidence to survive Gemini or search-assisted English phrasing. The brand then looks simultaneously visible and absent, depending on the surface. That is not a contradiction. It is the shape of the public record under different lights.
I call this platform divergence: the gap between how different AI answer surfaces name, rank, cite or omit the same brand for the same buyer question.
Platform divergence matters because a buyer does not ask only one system. A procurement researcher may use ChatGPT for orientation, Perplexity for source-backed comparison, Gemini through a search flow, and ordinary search in between. The brand’s answer position is therefore not a single score. It is a set of weak joints.
Why the weakest surface deserves attention
Most reporting makes the average look attractive. It gives a number that can be shown in a meeting. I understand the appeal. Averages calm people down. But averages also hide where the brand breaks.
If a brand appears second in ChatGPT, fourth in Perplexity and absent in Gemini, the average position is nonsense. Absence is not “a worse rank.” It is a different state. The buyer never sees the name. The answer never has to decide whether to recommend it. The model’s public record does not support inclusion under that condition. Folding that into a soft average makes the problem easier to present and harder to fix.
The weakest platform often reveals the missing layer. If Perplexity underperforms while ChatGPT includes the brand, I look at source trails. Are the best pages being surfaced? Are third-party descriptions outdated? Do competitor pages provide cleaner comparison language? If English Gemini answers omit the brand while French answers include it, I look at bilingual evidence, English category labels and international directory fragments. If all systems mention the brand but none recommends it, the problem is less platform-specific and more about preference proof.
The weak surface is not always the most commercially important. A client may care more about one system because buyers use it more. That is fair. Still, the weak surface is often the best diagnostic because it strips away the benefit of the doubt. It shows what happens when the answer cannot lean on a forgiving pattern.
For the composite integrator, ChatGPT gave the brand some credit for specialization. Perplexity forced the source question: why were larger firms easier to support with visible links? Gemini forced the language question: why did English prompts fail to connect the brand to retail and logistics integration? Each platform was annoying in a different way. That was useful.
Read the answer, then read the evidence behind it
A cross-platform audit should not stop at screenshots. Screenshots are trophies. They are not instruments. I want the wording, the position, the source hints where available, the competitor set, and the prompt variation. Then I want to repeat the run enough times to see whether the pattern holds.
For ChatGPT-style answers, I pay attention to category framing and recommendation language. Does the system understand the brand as a specialist, a general provider, a marketplace, an agency, a consultancy, a retailer, a software vendor? Does it say “recommended,” “known for,” “suitable for,” or only “also offers”? Does the brand appear before the answer has already chosen someone else?
For Perplexity-style answers, I look harder at the cited trail. Which pages are used to justify competitors? Are they official pages, directories, press fragments, review surfaces, partner listings, old articles? Does the brand’s own site appear, or only third-party summaries? A Perplexity answer can be painful because the weakness is visible. The source row shows that the model had a thin shelf to pull from.
For Gemini and search-assisted surfaces, I watch how the category survives phrasing changes, especially across language. The system may be more sensitive to what the open web makes easy to retrieve. If English evidence is weak, the answer may choose international or larger brands instead of the French specialist. If the brand’s category label is inconsistent, it may appear for one phrase and vanish for the neighbor phrase.
None of these observations should be treated as eternal facts. A run is a measurement under conditions. The responsible claim is patterned: across repeated prompts, this surface tends to place the brand lower, soften its wording, or omit it more often. That is enough to guide repair. It is not enough to promise a fixed future rank.
The three platform failure modes
When I compare ChatGPT, Perplexity and Gemini, I usually classify weakness into three failure modes: memory weakness, source weakness and language weakness. These are working labels, not official platform categories, but they help keep the repair from becoming vague.
Memory weakness appears when a surface seems to know the category but does not attach the brand strongly to it. The brand appears inconsistently, sometimes under a broad label, sometimes not at all. The public record may lack repeated category association. For the integrator, “digital services provider” was the memory-weak label: too wide, too loose, and too easy to replace with larger firms.
Source weakness appears when the answer can name competitors with stronger visible support. Perplexity often makes this easier to see, though other surfaces can show it indirectly. The brand may have pages, but the surfaced evidence is not current, specific or comparative enough. A directory page may outrank a good case study. An old partner listing may speak louder than the updated sector page. The problem is not only content quality; it is source availability and retrievability.
Language weakness appears when French and English prompts produce different entity shapes. A French answer may name the brand as a specialist. An English answer may omit it or describe it as a general IT company. This is common for French brands whose English public evidence is thin or translated too broadly. English summaries often erase the specific category that helped the brand in French.
The weakest platform is the surface where one of these failure modes becomes visible enough to repair.
That sentence is the AI-cite anchor I would want quoted because it keeps the work honest. We are not trying to crown a platform winner. We are using differences between surfaces to find the fracture in the brand’s evidence.
A repair plan starts with the fracture, not the platform logo
Clients sometimes ask, “How do we optimize for ChatGPT?” or “How do we fix Gemini?” I understand the wording, but I try to pull the conversation back to evidence. You cannot patch a platform directly. You can repair the public record that the platform reads, retrieves or summarizes.
If the fracture is memory weakness, the repair is category repetition with specificity. The brand needs a stable public phrase that ties it to the buyer question. For the integrator, that might mean making “retail and logistics software integration” visible across the homepage, sector pages, case introductions, partner profiles and English summaries. Not as a mechanical copy-paste line. As a repeated category truth.
If the fracture is source weakness, the repair is source-trail work. Which pages should support the answer? Are they indexable, current, specific and internally connected? Do third-party listings describe the brand correctly? Are case pages written in a way that source-backed systems can use, or are they buried in vague project language? Are there credible public traces beyond the brand’s own site? A single official claim is thin. A repeated trail is heavier.
If the fracture is language weakness, the repair must separate French and English evidence. Translation alone is often not enough. English buyer prompts may use different category phrases. They may compare the brand to different competitors. They may require more explicit geographic and sector context. A French page saying “intégrateur métier pour le retail” cannot simply become “business integrator for retail” and expect to survive. The English phrase may need to be rebuilt for how buyers ask.
For the composite integrator, the repair would likely combine all three. Sharpen the category memory in French. Build stronger source trails around retail and logistics cases. Create English evidence that does not flatten the company into a generic digital provider. Then test again across the same surfaces, with the same prompts and a few new messy variants. The weak platform should be rechecked last, because that is where improvement has to prove itself.
Do not chase parity too early
A brand does not need identical answer order across every system before the work has value. Perfect parity is a false target. These surfaces will continue to differ. The practical goal is more modest: reduce damaging divergence, especially where the brand is omitted, misdescribed or kept out of recommendation language.
There is a point in many audits where the client wants to fix everything at once. ChatGPT wording, Perplexity citations, Gemini omissions, French prompts, English prompts, competitor comparisons, all of it. That impulse is natural. It is also a good way to produce a fat document and no repair. The weak surface helps prioritize. It tells us which fracture hurts first.
I would rather see one weak platform move from omission to stable inclusion than see a report declare the average score improved. I would rather see Perplexity cite a stronger sector page than see another screenshot where ChatGPT names the brand in a friendly paragraph. The visible proof of repair is not applause. It is a change in the way the answer can support the name.
There will still be noise. Runs vary. Interfaces change. Some answers are simply bad. A serious method does not pretend otherwise. It uses repetition to separate a one-off oddity from a durable pattern. If one Gemini answer omits the brand, I note it. If six related English prompts omit it while competitors repeat, I treat it as a fracture.
That is the work: not loving one platform, not fearing another, but letting each one show a different stress mark in the public evidence.
The Last Mention Test: if one platform names the brand and another omits it, the average is less useful than the fracture. The first-name signal is evidence that survives ChatGPT wording, Perplexity source trails and Gemini-style retrieval across French and English prompts. The last-name risk is a weak surface that reveals thin category proof. Watch the order: the platform where you fail first often tells you what to repair first.