CiteULike is a free online bibliography manager. Register and you can start organising your references online.

Approximate personal name-matching through finite-state graphs Export

Journal of the American Society for Information Science and Technology, Vol. 58, No. 13. (2007), pp. 1960-1976.

Citation Format

[Posts]

View FullText article


RafG's tags for this article

evaluation precision recall

X Reviews [Write a review of this article]

X Notes for this article

RafG has 1 private note and 0 public notes for this article. If you are RafG then you can log in to see the private note.

X Find related articles from these CiteULike users

X Find related articles with these CiteULike tags

X Posting History

X Abstract

This article shows how finite-state methods can be employed in a new and different task: the conflation of personal name variants in standard forms. In bibliographic databases and citation index systems, variant forms create problems of inaccuracy that affect information retrieval, the quality of information from databases, and the citation statistics used for the evaluation of scientists' work. A number of approximate string matching techniques have been developed to validate variant forms, based on similarity and equivalence relations. We classify the personal name variants as nonvalid and valid forms. In establishing an equivalence relation between valid variants and the standard form of its equivalence class, we defend the application of finite-state transducers. The process of variant identification requires the elaboration of: (a) binary matrices and (b) finite-state graphs. This procedure was tested on samples of author names from bibliographic records, selected from the Library and Information Science Abstracts and Science Citation Index Expanded databases. The evaluation involved calculating the measures of precision and recall, based on completeness and accuracy. The results demonstrate the usefulness of this approach, although it should be complemented with methods based on similarity relations for the recognition of spelling variants and misspellings.


X BibTeX record

X RIS record


Privacy Statement | Terms & Conditions
CiteULike organises scholarly (or academic) papers or literature and provides bibliographic (which means it makes bibliographies) for universities and higher education establishments. It helps undergraduates and postgraduates. People studying for PhDs or in postdoctoral (postdoc) positions. The service is similar in scope to EndNote or RefWorks or any other reference manager like BibTeX, but it is a social bookmarking service for scientists and humanities researchers.