How ChatGPT, Claude, and Perplexity use your product reviews in 2026

Quick answer

ChatGPT, Claude, and Perplexity use product reviews in two different ways: training-time absorption (reviews scraped from public sites like Trustpilot, Reddit, and Amazon become part of the model's general knowledge) and live retrieval (Perplexity and ChatGPT Search query the open web in real time and cite review pages by URL). The practical consequence: reviews living on a public, crawlable domain with schema markup get surfaced. Reviews locked inside a Shopify product page widget that loads via JavaScript often do not. This article explains exactly how each model ingests reviews and the five tactics that actually move the needle.

Reviewed by Nicolas Provost, founder of Reviewz.ai. Insights based on auditing 500+ Shopify review setups and analyzing public pricing, schema, and conversion data across the leading review platforms. LinkedIn

Training-time knowledge vs live retrieval: the two ways AI uses your reviews

Every large language model has two distinct ways of knowing what people say about your product. Understanding the difference is the entire game.

Training-time knowledge is everything the model absorbed during its training run. For GPT-4 class models, that means a snapshot of the open web up to a cutoff date (mid-2024 for GPT-4o, January 2026 for Claude Opus 4.7). The training corpus is dominated by Common Crawl, a non-profit web archive that has scraped over 250 billion pages since 2008. If your reviews live on a public, indexable URL, they almost certainly made it into Common Crawl, and from there into the training data of every major model.

Live retrieval (also called RAG, retrieval-augmented generation) is what happens when a user asks ChatGPT, Perplexity, or Claude a current shopping question. The model fires off a real-time search, fetches the top 5 to 15 URLs, and synthesizes an answer with citations. This is how Perplexity has always worked. ChatGPT added it via ChatGPT Search in late 2024. Claude added web search in early 2025.

Most ecommerce founders ask the wrong question. They ask: "How do I get ChatGPT to mention my product?" The right question is: "Which of my review pages are publicly crawlable, have schema.org/Review markup, and contain quotable sentences that match how shoppers ask questions?" That is the AI optimization checklist. Everything else is theater.

How ChatGPT actually ingests review data

OpenAI has never published a complete list of training data sources, but the company has confirmed in its model cards and on the Common Crawl documentation that Common Crawl is a major source. Common Crawl scrapes the public web at scale. That means: Trustpilot review pages, Reddit threads, Amazon product Q&A (until Amazon started blocking it more aggressively in 2023), TripAdvisor, Google Maps review snippets that leak into the SERP, and forum posts. It does NOT meaningfully include: reviews behind a login wall, reviews loaded via JavaScript that crawlers do not render, reviews stored inside a Shopify product widget that requires the page to execute JS before the reviews appear in the DOM.

That last point matters more than people realize. A native Shopify product reviews app that injects reviews into the page via client-side JavaScript may render fine for human shoppers, but Common Crawl's basic crawler does not execute JS on every page. Your reviews might be invisible to the model that trained GPT-5. If you want your reviews to enter the LLM's general knowledge, they need to live in the initial HTML response, ideally with schema markup so the crawler knows what they are. Our review schema generator outputs the exact JSON-LD that Google and AI crawlers parse.

The second route in is Reddit. OpenAI signed a licensing deal with Reddit in May 2024 worth a reported $60M/year. That means every Reddit thread where someone says "I bought [your brand] and here's what happened" is now part of the training pipeline directly. The implication: a single viral Reddit post about your product carries more long-term LLM weight than 200 product page reviews.

How Perplexity surfaces reviews (and why it is different)

Perplexity does not rely on training-time knowledge for shopping questions. It runs a live search every time. When a user asks "Is the [Your Brand] coffee maker any good?", Perplexity searches the web in real time, ranks the top results, and writes an answer with numbered citations linked to the source pages.

In practice, Perplexity heavily favors three types of sources for ecommerce review queries:

Trustpilot domain pages (because they have high domain authority and clean schema markup)
Reddit threads (high authority, unfiltered opinions, often the most-cited single source)
Blog comparison posts ("X vs Y in 2026", listicles, "is X worth it" pages)

Notably absent from Perplexity's typical citation set: native Shopify product review widgets. The reason is mechanical. Perplexity's crawler does render JavaScript (unlike basic Common Crawl), but it weights pages by domain authority and content density. A product page that mostly contains the product description plus 12 review widgets ranks lower than a Trustpilot category page with 200 reviews on a high-DA domain.

A Trustpilot company page concentrating hundreds of reviews on a single high-authority URL with schema markup — A Trustpilot company page packs dozens of reviews onto one high-authority URL with clean schema, exactly the structure Perplexity and ChatGPT Search reward when citing review sources.

This is why we keep telling Shopify founders: your Trustpilot profile is your AI moat, not your product page widgets. Read our breakdown of whether Trustpilot is legit if you are still on the fence.

Turn every purchase into a 5-star review with Reviewz on Shopify

Reviewz · Shopify

Route happy customers to Trustpilot & Google, capture negatives privately.

Install Reviewz on Shopify

How Claude and Gemini differ from ChatGPT

Claude (Anthropic) added native web search in early 2025 but treats it more conservatively than Perplexity. Claude tends to search the web only when the user explicitly asks for current information or when the model recognizes a recency-sensitive query. For shopping questions, Claude often falls back on training-time knowledge first, which means Trustpilot domain authority matters disproportionately.

Gemini (Google) has the biggest unfair advantage and the messiest behavior. Gemini is built on Google Search infrastructure, so it has access to the full Google index in real time. In practice, Gemini's review citations skew heavily toward Google's own Shopping graph (Google Merchant Center product feeds), Google Maps reviews, and YouTube. If your store is in Google Merchant Center with review feeds enabled, Gemini knows about you. If not, you are dramatically under-represented relative to ChatGPT or Perplexity.

Model	Primary review source	Live retrieval default	Best optimization lever
ChatGPT (GPT-4o/5)	Common Crawl + Reddit license	Sometimes (ChatGPT Search)	Trustpilot + Reddit presence
Perplexity	Live web search	Always	High-DA review pages with schema
Claude	Training data first, search second	Only on explicit cue	Trustpilot domain authority
Gemini	Google index + Merchant Center	Always	Google Merchant Center reviews feed

Which review platforms are over-represented in LLM training data

Here is the stance we will defend: Trustpilot reviews are over-represented in LLM training data by roughly 10x relative to their share of total ecommerce reviews. The reason is simple. Trustpilot lives on one massive public domain (trustpilot.com), every review has its own clean URL, schema markup is correct out of the box, and the domain has been continuously crawlable for 15+ years. Common Crawl loves it. So does every downstream LLM trainer.

Compare that to the typical Shopify product reviews app. Reviews on Loox, Judge.me, Yotpo, and Stamped live inside iframes or get injected via JavaScript on tens of thousands of individual merchant domains. Each domain has low authority. Each review is fragmented across small pages. Crawlers struggle to attribute reviews to the right product. For a complete view of how these apps compare on this dimension, see our best Shopify review apps teardown.

Rank-ordered by our estimate of LLM training data representation per review collected:

Google Maps / Google Business reviews: massively over-represented because Google indexes itself
Trustpilot: high representation, clean schema, single domain
Amazon reviews: high historic representation, declining as Amazon blocks scrapers
Reddit shopping threads: very high LLM weight due to OpenAI licensing deal
Yotpo / Loox / Judge.me product widgets: low representation per review unless they leak to comparison blogs
Native Shopify product reviews (the legacy free app, now retired): essentially zero unless you exposed the JSON-LD

This is exactly why a 5-star Shopify store with 8,000 product reviews stuck inside a Loox widget often loses LLM mindshare to a competitor with 200 reviews on Trustpilot. See Trustpilot vs Judge.me for the side-by-side.

5 tactics to make your reviews AI-friendly

Stop trying to game the model. The model is unbeatable at scale. What you can do is make your reviews easier to ingest, cite, and quote. These five tactics have measurable impact across our audit base:

1. Put reviews on a public, crawlable URL with schema markup. Each product page should have schema.org/Review JSON-LD in the initial HTML response, not injected after page load. Use our review schema generator to produce the correct markup. Validate it with Google's Rich Results Test. If Google can render it, Common Crawl and Perplexity probably can too.

2. Send happy customers to public domains (Trustpilot, Google) rather than to your product page widget. This is exactly what Reviewz.ai does on the Shopify side: route 4-5 star sentiment to Trustpilot or Google, capture 1-3 star privately. Our breakdown of how to get more Trustpilot reviews on Shopify walks through the routing logic.

3. Write quotable sentences. LLMs cite specific phrasing. "The setup took 4 minutes" is quotable. "Easy to use" is not. Encourage reviewers with specific prompts ("How long did setup take?", "What's one thing you wish you'd known?"). The reviews that get cited by Perplexity are the ones that contain concrete claims, specific numbers, and short standalone sentences.

4. Build presence on Reddit. Not by spamming. By being mentioned organically. Every honest mention in a comparison thread or a "what should I buy" post compounds because OpenAI now licenses Reddit data directly. Even better: build a real Reddit profile, answer questions in your category honestly, occasionally name your product.

5. Get cited by third-party blog posts. A single "X vs Y" comparison post that includes your product becomes a Perplexity citation magnet. Most stores under-invest here because the SEO payoff is slow. The AI payoff is immediate. See our take on how to respond to negative reviews for content patterns that get cited.

What NOT to do: cloaking, AI-detection trickery, and other dead ends

Three patterns are wasting founders' time in 2026:

Cloaking reviews for AI crawlers. Serving different content to GPTBot than to humans was an early SEO trick. It is detectable, against OpenAI's published crawler policy, and gets your domain delisted from training data. The penalty is silent and permanent. Do not do it.

AI-generated fake reviews to inflate count. The FTC's 2024 final rule banning fake reviews made this a fine-able offense at $51,744 per violation. Beyond the legal risk, LLMs are increasingly good at detecting AI-generated text patterns. Our fake review checker shows what the detectors are looking for, and our hands-on test of 12 detection tools measures how accurate they actually are.

Stuffing structured data with keywords that do not match review content. Schema markup that misrepresents the underlying page is flagged as spammy by both Google and the major LLMs. Mark up what is actually there, in a clean format.

The honest version of AI optimization for reviews is boring and slow: collect more real reviews, route them to public domains, mark them up correctly, write content that quotes them. There is no shortcut and no clever hack. The stores winning AI mindshare in 2026 are doing the unsexy fundamentals.

FAQ

Does ChatGPT actually read product reviews?

Yes, in two ways. First, ChatGPT was trained on a snapshot of the public web (heavily sourced from Common Crawl) which includes review pages from Trustpilot, Reddit, and historic Amazon. Second, ChatGPT Search (the live retrieval feature added in late 2024) fetches current web pages including review sites and cites them in answers. The catch: reviews loaded via JavaScript on Shopify product pages often do not get ingested because Common Crawl's basic crawler does not execute JS on every page. Put reviews in your initial HTML response with schema.org/Review markup if you want them ingested.

Why does Perplexity cite Trustpilot so often?

Perplexity ranks review sources by domain authority and content density. Trustpilot is a single massive domain with 15+ years of clean, crawlable reviews per merchant on its own URL. Each Trustpilot company page concentrates dozens or hundreds of reviews in one place with proper schema markup, which is exactly what Perplexity's ranking algorithm rewards. Native Shopify review widgets, by contrast, fragment reviews across thousands of low-authority merchant domains and often hide them behind JavaScript. The mechanics favor Trustpilot regardless of which platform has "better" reviews.

Will fake AI-generated reviews fool ChatGPT?

No, and they will likely get you penalized twice. First, the FTC's 2024 rule banning fake reviews carries fines of $51,744 per violation in the US, and the EU has similar enforcement under the Omnibus Directive. Second, modern LLMs are getting better at detecting AI-generated text patterns (repetitive sentence structures, generic adjectives, no specific numbers or dates). Reviews flagged as AI-generated reduce your overall trust signal in the model's ranking. Use a tool like Reviewz.ai's fake review checker to audit your own pile before regulators do.

Should I block GPTBot from crawling my reviews?

For an ecommerce store with real customer reviews, no. Blocking GPTBot removes your store from ChatGPT's training data and reduces the chance ChatGPT cites or recommends you. The publishers blocking GPTBot are typically news sites or content businesses worried about cannibalization. For a Shopify merchant, every mention by ChatGPT is free top-of-funnel traffic. Let it crawl. The exception: if you have a paywalled premium content section that competes directly with AI-generated answers, blocking GPTBot on those specific URLs makes sense.

How fast do new reviews show up in ChatGPT answers?

For training-time knowledge, the lag is months to over a year (the next time the model is retrained). For live retrieval via ChatGPT Search or Perplexity, the lag is hours to days, however fast their crawler revisits your URL. Trustpilot pages typically get re-crawled within 48 hours of a new review going live. Native Shopify widget reviews may take weeks. If you need an AI-cited review by next week, get it on Trustpilot or Google, not on a product widget.

Does Reddit really matter for AI search?

More than most founders realize. OpenAI signed a reported $60M/year licensing deal with Reddit in May 2024, giving GPT-4 and later models direct, structured access to Reddit's content firehose. Google has a similar deal. In practice, a single organic Reddit mention in r/BuyItForLife or your category's subreddit carries more long-term LLM weight than 100 product widget reviews. Perplexity also weights Reddit threads heavily in its citations for shopping queries. The trick: it has to be organic. Reddit's anti-spam community is brutal, and shadow-banned posts do not enter the training pipeline.

Reviewz · Shopify

Route happy customers to Trustpilot & Google, capture negatives privately.

Install Reviewz on Shopify

About the author

Nicolas Provost · Founder of Reviewz.ai

Nicolas built Reviewz.ai after auditing 500+ Shopify review setups while running Kanal (WhatsApp marketing for Shopify). He has spent four years inside the Shopify ecosystem and writes about review collection, brand trust SEO, and the actual economics of running customer-feedback flows on ecommerce sites.

LinkedIn · Reviewz.ai · Kanal (WhatsApp for Shopify)

Start generating revenue with reviews.