CiteULike is a free online bibliography manager. Register and you can start organising your references online.

Learning to Generate Labels for Organizing Search Results from a Domain-Specified Corpus Export

In WI '06: Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence (2006), pp. 390-396.

Citation Format

[Posts]

View FullText article


AlisonBabeu's tags for this article

cluster-analysis clustering web-searching

X Reviews [Write a review of this article]

X Find related articles from these CiteULike users

X Find related articles with these CiteULike tags

X Posting History

X Abstract

Organizing Web search results into labeled categories is a difficult but very useful task. The idea is to group the many results that each user query generates into well-labeled categories, so that users can find it much easier to browse these results. In the past, clustering-based methods have been applied to solve the search-result organization problem, but it has been difficult to extract the human-readable descriptions for these clusters. An alternative solution to this problem is to generate a series of labels from search results firstly, and then assign documents to relevant labels to form labeled categories. In this approach, a major task is how to generate the labels for the documents. In this paper, we propose a novel label generation method: Firstly, we extract some phrases as candidates of labels based on the search results, and adopt a binary classifier as our learning model to classify these label candidates into useful or meaningless label category. Then, the candidates in the useful label category form the final results. As our method is applied on the search results which are retrieved from a domain-specified corpus instead of general corpus, there're some special features of the labels for classification. Experimental results show that the accuracy of our system is nearly 10% higher than using the mutual information criterion, which is an unsupervised method for solving this problem, to do the label selection.


X BibTeX record

X RIS record


Privacy Statement | Terms & Conditions
CiteULike organises scholarly (or academic) papers or literature and provides bibliographic (which means it makes bibliographies) for universities and higher education establishments. It helps undergraduates and postgraduates. People studying for PhDs or in postdoctoral (postdoc) positions. The service is similar in scope to EndNote or RefWorks or any other reference manager like BibTeX, but it is a social bookmarking service for scientists and humanities researchers.