CiteULike is a free online bibliography manager. Register and you can start organising your references online.
Tags

Automatic Extraction of Useful Facet Hierarchies from Text Databases

by: Wisam Dakka, Panagiotis G. Ipeirotis
Data Engineering, 2008. ICDE 2008. IEEE 24th International Conference on In ICDE '08: Proceedings of the 2008 IEEE 24th International Conference on Data Engineering (25 April 2008), pp. 466-475, doi:10.1109/icde.2008.4497455  Key: citeulike:2846006

Formatted Citation


Show HTML

Likes (beta)

This copy of the article hasn't been liked by anyone yet.

View FullText article


Abstract

Databases of text and text-annotated data constitute a significant fraction of the information available in electronic form. Searching and browsing are the typical ways that users locate items of interest in such databases. Faceted interfaces represent a new powerful paradigm that proved to be a successful complement to searching. Thus far the identification of the facets was either a manual procedure or relied on apriori knowledge of the facets that can potentially appear in the underlying collection. In this paper we present an unsupervised technique for automatic extraction of facets useful for browsing text databases. In particular we observe through a pilot study that facet terms rarely appear in text documents showing that we need external resources to identify useful facet terms. For this we first identify important phrases in each document. Then we expand each phrase with "context" phrases using external resources such as WordNet and Wikipedia causing facet terms to appear in the expanded database. Finally we compare the term distributions in the original database and the expanded database to identify the terms that can be used to construct browsing facets. Our extensive user studies using the Amazon Mechanical Turk service show that our techniques produce facets with high precision and recall that are superior to existing approaches and help users locate interesting items faster.


ericahere's tags for this article

Citations (CiTO)

No CiTO relationships defined

X There are no reviews yet

X Find related articles from these CiteULike users

X Find related articles with these CiteULike tags

X Posting History


X Export records

Privacy Statement | Terms & Conditions
CiteULike organises scholarly (or academic) papers or literature and provides bibliographic (which means it makes bibliographies) for universities and higher education establishments. It helps undergraduates and postgraduates. People studying for PhDs or in postdoctoral (postdoc) positions. The service is similar in scope to EndNote or RefWorks or any other reference manager like BibTeX, but it is a social bookmarking service for scientists and humanities researchers.