TopicRank: Bringing Insight to Users
Traditional search engines, based on exact keyword match, return too many documents in response to a user query, and most of the returned documents are irrelevant. Very often the user simply doesn't know how to formulate the query in a way that expresses his intention. Furthermore, as observed by the authors of Scatter/Gather , users may not only search documents but also browse through the collection to discover the general information content of the corpus. In addition, users generally reject complex interfaces for formulating advanced queries, and demand a fast response time. To overcome this problem, major search engines take a lot of trouble to provide the user with an intuitive interface to: - help formulate a query representing his intention - browse long lists of documents - discover related topics Several methods based on Document Clustering [1, 2, 8], Faceted Categories  or more recently Tag Clouds [3, 5, 7], introduced by the Blog community, are used to satisfy these needs. Google Labs Suggestion , Yahoo! Search Assistant  or Clusty remix clustering  are examples of this kind of interface. In this paper, we describe TopicRank, a Word Clustering based approach that automatically and dynamically generates an interactive Tag Cloud related to the user query where the layout of presented keywords relies on a semantic closeness metric. Used in this way, in contrast to , we found that Tag Clouds are both an efficient navigational tool and a good tool for understanding abstract information.