# Robust datamining

**Project Leader: **Uwe Aickelin**Primary Contact: **Uwe Aickelin (uwe.aickelin@unimelb.edu.au)**Keywords: **artificial intelligence; data mining; health and bioinformatics; machine learning; optimisation**Disciplines: **Computing and Information Systems**Domains: **

The research goal is to develop datamining methodologies that are robust to changes in data. By robust we mean solutions remain ‘optimal’ or are easily repaired. Of interest here is not gradual drift, but cases where data items lie at borderlines between classes or can be perturbed easily.

Broadly, the desired robustness can be achieved in two ways: One, by having ‘slack’ in the solution or two, by constructing the solution such that is easily repairable, eg: failures are isolated. In a medical context we want to cluster patients so that the correct course of treatment is prescribed. It may be unhelpful if the same patient is repeatedly clustered in different groups due to small updates in the data. In a transportation problem, as solution represents a complex distribution schedule. It is unhelpful if the complete schedule had to be revised if one depot has unexpected problems.

In the field of optimisation, robustness has previously been explored and there are mature approaches such as stochastic programming. In the field of datamining, this is a newer concept and only some basic approaches exist, like robust Principal Component Analysis

This project has five steps: First to confirm that changing behaviour exists in data sets, second to obtain suitable definitions of ‘robustness’ (or ‘slack’ etc), third to implement some established Operational Research method (e.g. stochastic programming) to address the problem, fourth compare these to a basic robust datamining method (eg: ‘Robust Principal Component Analysis’) and then fifth implement a new advanced method.