Large Language Models as ‘Hidden Persuaders’

Fake Product Reviews are Indistinguishable to Humans and Machines

This project investigates how large language models (LLMs) can generate highly convincing fake product reviews and examines whether humans or AI systems can reliably distinguish them from real reviews.

Funder
Innovate UK
Duration
May 2024 – ongoing
Investigators
Weiyao Meng, John Harvey, James Goulding, Chris James Carter, Evgeniya Lukinova, Andrew Smith, Paul Frobisher, Mina Forrest, and Georgiana Nica-Avram
Partners
Strategic Innovation Ltd

Project Description

Short Summary

Online product reviews play a critical role in consumer decision-making. However, the rapid development of generative AI has made it easier than ever to produce realistic fake reviews. This research explores whether people and AI systems can reliably detect such AI-generated content.

Through a series of experimental studies, the project compares human judgment with LLMs when evaluating real and AI-generated product reviews. The research reveals how generative AI may influence online information ecosystems and consumer trust, highlighting emerging risks for digital marketplaces and review platforms.

Partner info - Strategic Innovation Ltd

Funding info - Innovate UK KTP

Method

The research consists of three empirical studies examining the ability of humans and LLMs to identify AI-generated product reviews.

First, LLMs were used to generate synthetic product reviews designed to resemble authentic online reviews. Human participants were then asked to judge whether reviews were real or AI-generated.

Second, LLMs were also tested as a detector, tasked with identifying whether reviews were genuine or generated by AI. The performance of humans and LLMs was then compared using accuracy and classification metrics.

Finally, the study analysed the decision-making patterns used by humans and machines to evaluate authenticity, revealing differences in judgement strategies.

Results

The results show that humans are largely unable to distinguish authentic from AI-generated reviews, achieving an average accuracy of only 50.8%, which is effectively no better than random guessing.

LLMs also struggle to detect synthetic reviews and sometimes perform equally poorly or worse than humans. However, humans and AI systems make different types of errors: humans tend to show scepticism toward positive reviews while being more vulnerable to misclassifying fake negative reviews.

These findings suggest that review platforms may become increasingly vulnerable to automated review manipulation unless stronger verification mechanisms, such as purchase verification, are implemented.

Associated Publications

Large Language Models as ‘Hidden Persuaders’: Fake Product Reviews are Indistinguishable to Humans and Machines.
Reading and evaluating product reviews is central to how most people decide what to buy and consume online. However, the recent emergence of Large Language Models and Generative Artificial Intelligence now means writing fraudulent or fake reviews is potentially easier than ever. Through three studies we demonstrate that… [more]

arXiv preprint, 2025Meng, W., Harvey, J., Goulding, J., Carter, C. J., Lukinova, E., Smith, A., Frobisher. P., Forrest, M. & Nica-Avram, G.