CiteULike is a free online bibliography manager. Register and you can start organising your references online.
Tags

A Statistical Model for Topic Segmentation and Clustering

by: Shafiei, EvangelosE Milios

edited by: Sabine Bergler

In Advances in Artificial Intelligence, Vol. 5032 (2008), pp. 283-295, doi:10.1007/978-3-540-68825-9_27  Key: citeulike:11863352

Formatted Citation


Show HTML

Likes (beta)

This copy of the article hasn't been liked by anyone yet.

View FullText article


Abstract

This paper presents a statistical model for discovering topical clusters of words in unstructured text. The model uses a hierarchical Bayesian structure and it is also able to identify segments of text which are topically coherent. The model is able to assign each segment to a particular topic and thus categorizes the corresponding document to potentially multiple topics. We present some initial results indicating that the word topics discovered by the proposed model are more consistent compared to other models. Our early experiments show that our model clustering performance compares well with other clustering models on a real text corpus, which do not provide topic segmentation. Segmentation performance of our model is also comparable to a recently proposed segmentation model which does not provide document clustering.


hamidnourashraf's tags for this article

Citations (CiTO)

No CiTO relationships defined

X There are no reviews yet

X Posting History


X Export records

Privacy Statement | Terms & Conditions
CiteULike organises scholarly (or academic) papers or literature and provides bibliographic (which means it makes bibliographies) for universities and higher education establishments. It helps undergraduates and postgraduates. People studying for PhDs or in postdoctoral (postdoc) positions. The service is similar in scope to EndNote or RefWorks or any other reference manager like BibTeX, but it is a social bookmarking service for scientists and humanities researchers.