Bundle Entropy as a Measure of Product Choice Combinations

Bundle entropy as an optimised measure of consumers’ systematic product choice combinations in mass transactional data

In this work, a novel measure is developed based on entropy to directly measure the predictability of basket composition: bundle entropy, where zero denotes a bundle’s total predictability and one – the total unpredictability.

Funder
EPSRC
Duration
Oct 2020 – Oct 2023
Investigators
Roberto Mansilla, Gavin Smith, Andrew Smith, and James Goulding
Partners
Tesco

Project Description

Short Summary

Understanding and measuring the predictability of consumer purchasing (basket) behaviour is of significant value. While predictability measures such as entropy have been well studied and leveraged in other sectors, their development and application to very large multi-dimensional data sets present in the retailing sector are less common. While a small number of methods exist, we demonstrate they fail to accord with intuition, leading to the potential for misunderstandings between those who conduct the analysis and those who act on the insights. We delineate the requirements for such a measure in this domain to demonstrate these issues in context. A novel measure is then developed based on entropy to directly measure the predictability of basket composition. The measure is designated as bundle entropy (zero denotes a bundle’s total predictability, one the total unpredictability). We empirically compare the proposed bundle entropy against existing measures using two large-scale real-world transactional data sets, each including more than 2,000 households (frequent shoppers) over two years. First, we demonstrate how the proposed measure is the only measure that behaves according to the desired properties. Second, we show empirically that bundle entropy differs noticeably from the other measures. Finally, we consider some use case analyses and discuss the utility of the proposed measure in practice.

Partner info - Tesco

Funding info - EPSRC

This research was supported by the EPSRC project “Neo-demographics: Opening Developing World Markets by Using Personal Data and Collaboration” EP/L021080/1, the EPSRC project “Trusted Data-Driven Products” EP /T022493/1 and the EPSRC project “The CIVIC Project: A Sustainable Platform for COVID-19 syndromic-surveillance via Health, Deprivation and Mass Loyalty-Card Datasets” EP/V053922/1.

Method

This research paper outlines the necessary conditions for developing a measurement tool that can accurately capture consumers’ systematic purchasing behaviors. The study introduces a novel metric called “bundle entropy,” which employs the concept of entropy to measure the degree of predictability in the composition of a consumer’s shopping baskets. Through empirical analysis using extensive real-world transactional data involving over 2,000 frequent shopper households across two years, the proposed bundle entropy is compared against established measures.

The evaluations are based on two different, real-world, mass transactional data sets. The first is Dunnhumby – The complete Journey – a freely available data set. The data set includes grocery purchases at a household level over two years from 2,500 frequent shoppers, providing a cohort for tracking systematic choices over time. The data set contains over 2.5 million records at the household level. The second data set is a large transactional data set from 2,181 loyalty card holders over 20 months (between 2014 and 2016) from a large UK grocery retailer.

Results

The empirical analysis reveals that the proposed metric, Bundle Entropy, is the sole measure aligning with the desired properties for such a measure, and it significantly differs from other existing measures. Additionally, practical use case scenarios are considered, discussing the potential utility of the proposed measure in real-world applications.

Associated Publications

A refined limit on the predictability of human mobility
It has been recently claimed that human movement is highly predictable. While an upper bound of 93% predictability was shown, this was based upon human movement trajectories of very high spatiotemporal granularity. Recent studies reduced this spatiotemporal granularity down to the level of GPS data, and under a similar methodology results once again suggested a high predictability upper bound (i.e. 90% when movement was quantized down to a spatial resolution approximately the size of a large building). In this work we reconsider the derivation of the upper bound to movement predictability. By considering… [more]

IEEE International Conference on Pervasive Computing and Communications, 2014 G. Smith, R. Wieser, J. Goulding, and D. Barrack