CiteULike is a free online bibliography manager. Register and you can start organising your references online.

Towards parameter-free blocking for scalable record linkage Export

No. TR-CS-07-03. (August 2007)

Citation Format

[Posts]

View FullText article


rgayler's tags for this article

credit_scoring linkage

X Reviews [Write a review of this article]

X Find related articles from these CiteULike users

X Find related articles with these CiteULike tags

X Posting History

X Abstract

Linking or matching databases is becoming increasingly important in many data mining projects, as linked data can contain information that is not available otherwise, or that would be too expensive to collect. A main challenge when linking large databases is the complexity of the linkage process: potentially each record in one database has to be compared with all records in the other database. Various techniques, collectively know as `blocking', have been developed to deal with this quadratic complexity. Most of these techniques require several parameters to be set by the user in order to achieve good results. In this paper we evaluate six blocking techniques within a common framework with regard to the number and quality of the candidate record pairs generated. We propose a modification to two existing techniques that reduces the variance in the quality of the blocking results over a range of parameter values, enabling more robust, practical record linkage without the need of time consuming manual parameter tuning.


X BibTeX record

X RIS record


Privacy Statement | Terms & Conditions
CiteULike organises scholarly (or academic) papers or literature and provides bibliographic (which means it makes bibliographies) for universities and higher education establishments. It helps undergraduates and postgraduates. People studying for PhDs or in postdoctoral (postdoc) positions. The service is similar in scope to EndNote or RefWorks or any other reference manager like BibTeX, but it is a social bookmarking service for scientists and humanities researchers.