Diagnosing Disease with Shopping Data

Retailers loyalty card data is a currently underused and under-explored dataset in health research despite containing large-scale, and longitudinal behavioural information on the populations diet, product use and self-medication.

GDPR now gives individuals the right to access a copy of the personal data commercial companies hold on them. As more studies evidence how what we consume is affecting our health, the opportunity should not be missed to link data-sets on our purchasing habits to health outcomes.

  • Funder


  • Duration

    Sept 2019 – Dec 2023

  • Investigators

    Elizabeth Dolan, James Goulding, Anya Skatova, Alexandra Lang, Laila J Tata

  • Partners

    ALSPAC, Boots, ONS, JBC, NHS

Project Description

Personal commercial transactional data is the information stored when an exchange occurs between an individual and a business, including customer shopping data. This research will connect store sales and loyalty card data (customer shopping information held by a retailer), to data on respiratory disease, and to information from women with ovarian cancer. Connecting these datasets will be used to investigate whether shopping data can be used to get women with ovarian cancer diagnosed earlier, and/or if it can help in informing public health decisions in a pandemic.

The aim of this project is to create recommendations for using shopping data in medical research and asks the question:

How can personal transactional data be collected and analysed for the purposes of health research in a way that is acceptable to society, and works for infectious and chronic disease.

The project is connected to a wider project by partners ALSPAC at Bristol University and the Alan Turing Institute: Donating personal transactional data for research: investigating the public acceptability of using commercial transactional data in public health research.


A collection of studies will be done to iteratively create machine learning models whose predictions could help in the earlier diagnosis of ovarian cancer and/or the understanding of ILI (Influenza Like Illnesses) outbreaks.

The methodology to be used is mixed methods collecting and analysing both qualitative data, and quantitative data for integrated interpretation. The studies will be used to inform the models schema creation, feature engineering, to understand, and validate its outputs and any interpretations made from these. The iterative design will allow for adjustments to the model for successful implementation in a clinical setting.


Watch this space.

Associated Publications

Assessing the value of integrating national longitudinal shopping data into respiratory disease forecasting models

The COVID-19 pandemic led to unparalleled pressure on healthcare services. Improved healthcare planning in relation to diseases affecting the respiratory system has consequently become a key concern. We investigated the value of integrating sales of non-prescription medications commonly bought for managing respiratory symptoms, to improve forecasting of weekly registered deaths from respiratory disease at local levels across England… [more]

Qualitative Investigation of the Novel Use of Shopping Loyalty Card Data in Medical Decision Making

This paper describes early results of a small qualitative study investigating the potential impact of shopping loyalty card data (SLCD) in the diagnostic pathway for ovarian cancer. There is early evidence that pharmaceutical products such as pain relief and medications for irritable bowel syndrome and bloating are bought by women to manage the early symptoms of ovarian cancer… [more]

Using Shopping Data to Improve the Diagnosis of Ovarian Cancer: Computational Analysis of a Web-Based Survey 2023

Background: Shopping data can be analyzed using machine learning techniques to study population health. It is unknown if the use of such methods can successfully investigate prediagnosis purchases linked to self-medication of symptoms of ovarian cancer.

Objective: The aims of this study were to gain new domain knowledge from women’s experiences, understand how women’s shopping behavior relates to their pathway to the diagnosis of ovarian cancer, and inform research on computational analysis of shopping data for population health… [more]

Public attitudes towards sharing loyalty card data for academic health research: a qualitative study 2022

A growing number of studies show the potential of loyalty card data for use in health research. However, research into public perceptions of using this data is limited. This study aimed to investigate public attitudes towards donating loyalty card data for academic health research, and the safeguards the public would want to see implemented … [more]

Psychology of personal data donation 2019

Advances in digital technology have led to large amounts of personal data being recorded and retained by industry, constituting an invaluable asset to private organizations. The implementation of the General Data Protection Regulation in the EU … enables the general public to access data collected about them by organisations, opening up the possibility of this data being used for research that benefits the public themselves; for example, to uncover lifestyle causes of poor health outcomes… [more]


Value of Commercial Product Sales Data in Healthcare Prediction

Technical report and code for above project conducted with the NHS can be viewed at


Media, Blogs and News Stories

The World Education News (WEN) Article

New research could predict deaths based on shopping habits
Angela Chau


News Medical Article

Sales of respiratory medications key to predicting disease mortality, UK study finds
Pooja Toshniwal Paharia



Applying a novel variable importance technique, MCR (model class reliance), to machine learning models in order to assess the Value of Commercial Product Sales Data in Healthcare Prediction


Cancer Therapy Advisor News Story

Could Shopping Data Be Used to Predict Cancer and Diagnose It Earlier?
Carina Storrs, PhD