Automatic Lifestate Identification and Clustering

While numerous methods to segment and/or summarise time series exist, the properties often do not align with the needs of consumers of the summaries or require the unrealistic setting of parameters. Addressing this we define a set of broad properties that lead to high utility in a broad class of domains and propose a model with complexity controlled by normalised maximum likelihood (NML) automatically realising a summarization meeting the aforementioned properties based on an information theoretic notion of optimality.

  • Funder

    EPSRC

  • Duration

    September 2021 – September 2025

  • Investigators

    Samuel Smith, Gavin Smith, John Harvey

  • Partners

    RSSB

Project Description

Summarising high-dimensional time series data across multiple entities is an increasingly prevalent problem. Mass data collection is routine in most domains. For example, regular survey collection, consumer purchasing history from transactional data (where the number of possible items to choose from is high), or other repeatedly sampled digital footprint data. Summarization in such a context is both with regard to a reduction of the high-dimensional observations and large number of temporal points. While numerous methods to segment and/or summarise time series exist, the properties often do not align with the needs of consumers of the summaries or require the unrealistic setting of parameters. Addressing this we define a set of broad properties that lead to high utility in a broad class of domains. Intuitively these properties reflect the summarization of such data into life-states where (1) the number of states is limited and shared across entities to allow interpretation and comparison and (2) the number of state-transitions is jointly controlled to provide a parameterless, optimal summarization of both the high sample and temporal dimensionality. Specifically the aim is the realisation of a segmentation that optimally trades off the number of states and segments that humans must then interpret while still capturing salient state changes. Building on prior work, we propose a model with complexity controlled by normalised maximum likelihood (NML) automatically realising a summarization meeting the aforementioned properties based on an information theoretic notion of optimality.

Method

Current methods of summarising high-dimensional time series employ MDL and NML principles. This project follows a similar approach but has additional constraints on how datasets can be segmented. These constraints significantly change the NML regret term calculation and post-analysis interpretation.

Results

The project is ongoing and no results have been published yet.

Associated Publications

Media, Blogs and News Stories