CiteULike is a free online bibliography manager. Register and you can start organising your references online.

Topic model methods for automatically identifying out-of-scope resources Export

In JCDL '09: Proceedings of the 2009 joint international conference on Digital libraries (2009), pp. 19-28.

Citation Format

[Posts]

View FullText article


AlisonBabeu's tags for this article

digital_libraries topic-modeling

X Reviews [Write a review of this article]

X Find related articles from these CiteULike users

X Find related articles with these CiteULike tags

X Posting History

X Abstract

Recent years have seen the rise of subject-themed digital libraries, such as the NSDL pathways and the Digital Library for Earth System Education (DLESE). These libraries often need to manually verify that contributed resources cover topics that fit within the theme of the library. We show that such scope judgments can be automated using a combination of text classification techniques and topic modeling. Our models address two significant challenges in making scope judgments: only a small number of out-of-scope resources are typically available, and the topic distinctions required for digital libraries are much more subtle than classic text classification problems. To meet these challenges, our models combine support vector machine learners optimized to different performance metrics and semantic topics induced by unsupervised statistical topic models. Our best model is able to distinguish resources that belong in DLESE from resources that don't with an accuracy of around 70%. We see these models as the first steps towards increasing the scalability of digital libraries and dramatically reducing the workload required to maintain them.


X BibTeX record

X RIS record


Privacy Statement | Terms & Conditions
CiteULike organises scholarly (or academic) papers or literature and provides bibliographic (which means it makes bibliographies) for universities and higher education establishments. It helps undergraduates and postgraduates. People studying for PhDs or in postdoctoral (postdoc) positions. The service is similar in scope to EndNote or RefWorks or any other reference manager like BibTeX, but it is a social bookmarking service for scientists and humanities researchers.