MSE Research Project Database

Robust datamining for medicine

Project Leader: Uwe Aickelin
Staff: Pauline Lin
Primary Contact: Uwe Aickelin (
Keywords: artificial intelligence; data mining; health and bioinformatics; machine learning; optimisation
Disciplines: Computing and Information Systems

Our goal is to develop datamining methodologies for medical problems that are robust to data problems. By robust we mean solutions remain optimal without having to resolve the problem or (re-)manipulate the underlying raw data. In a medical context we may want to cluster patients so that the correct course of treatment is prescribed. In this example it is unhelpful if the a patient is clustered in different groups due to immaterial changes in the data.

Also of relevance here is the concept of uncertainty. Conventional data mining and statistical techniques work best for data that is certain. However, in the real world, uncertain data arises for many reasons such as imprecise measurement systems, natural variations, missing observations, human nature and linguistic expression. These uncertain data can be found across many problem domains and are particularly prevalent in medicine. Such uncertainty is not always bad. Sometimes an uncertain measurement can be beneficial as it captures the ‘gold standard’, eg, a range of expert opinions. The main step is to model, quantify and handle uncertain data to improve the datamining and hence decision support capabilities of the system.

In the field of optimisation, robustness and uncertainty have previously been explored and there are a number of mature approaches such as stochastic programming. In the field of datamining, this is a newer concept and only some basic approaches exist, like robust Principal Component Analysis. The goal of this project is to develop new approaches and apply them to live medical data sets, such as electronic  health care records, with the aim of making a genuine difference for patients and doctors.