Project Leader: Uwe Aickelin
Staff: Pauline Lin
Primary Contact: Uwe Aickelin (firstname.lastname@example.org)
Keywords: artificial intelligence; bioinformatics; data mining; machine learning; optimisation
Disciplines: Computing and Information Systems
“Anti-learning” is a new concept in Machine Learning. It challenges a foundational viewpoint that “nearby things tend to have the same label”. This intuition is instantiated in SVMs, nearest neighbour classifiers, decision trees, and neural networks. However, it turns out there are natural problems where this intuition is incorrect.
This violation of the proximity intuition means that when the number of examples is few, negating a classifier which attempts to exploit proximity can provide predictive power (thus, the term “anti-learning”).
Consider the following examples:
- Assume the XOR problem with four instances. If you perform 4-fold cross-validation and employ the linear SVM classifier, you’re guaranteed to always misclassify.
- Assume that do not perform stratified cross-validation, and that your learning algorithm ignores all the attributes except for the binary label. If there are 50% white and 50% black instances in your data, and you do a training/test split where the training test has more than 50% white instances, the test set is guaranteed to contain fewer than 50% of white instances.
This research project will investigate anti-learning further and consider producing algorithms that can help with the phenomena, ideally such that data-set can be split into ‘learnable’, ‘anti-learnable’ and noise.