MSE Research Project Database

Accurate and efficient data integration

Project Leader: Ben Rubinstein
Staff: Shen Wang
Student: Neil Marchant
Sponsors: Australian Research Council, Australian Bureau of Statistics
Primary Contact: Ben Rubinstein (
Keywords: database systems; machine learning
Disciplines: Computing and Information Systems
Domains: Networks and data in society

This project explores the frontiers of data integration (known also as record linkage) which seeks to match records representing the same entity from multiple heterogenous, noisy datasets. A highly practical problem first studied for national Census data, data integration is now used across numerous sectors including technology, medicine, finance, government. Working in the database community (publishing in for example VLDB) we bring a machine learning and mathematical statistics view to the area, seeking scalable algorithms with guarantees on data efficiency. This project connects with another (differential privacy) project, in also considering data privacy with security colleagues in the school.

Further information: