CiteULike is a free online bibliography manager. Register and you can start organising your references online.
Tags

Discovery of Novel Term Associations in a Document Collection Bisociative Knowledge Discovery

by: Teemu Hynönen, Sébastien Mahler, Hannu Toivonen

edited by: Michael R. Berthold

Vol. 7250 (2012), pp. 91-103, doi:10.1007/978-3-642-31830-6_7  Key: citeulike:11476729

Formatted Citation


Show HTML

Likes (beta)

This copy of the article hasn't been liked by anyone yet.

View FullText article


Abstract

We propose a method to mine novel, document-specific associations between terms in a collection of unstructured documents. We believe that documents are often best described by the relationships they establish. This is also evidenced by the popularity of conceptual maps, mind maps, and other similar methodologies to organize and summarize information. Our goal is to discover term relationships that can be used to construct conceptual maps or so called BisoNets. The model we propose, tpf–idf–tpu, looks for pairs of terms that are associated in an individual document. It considers three aspects, two of which have been generalized from tf–idf to term pairs: term pair frequency (tpf; importance for the document), inverse document frequency (idf; uniqueness in the collection), and term pair uncorrelation (tpu; independence of the terms). The last component is needed to filter out statistically dependent pairs that are not likely to be considered novel or interesting by the user. We present experimental results on two collections of documents: one extracted from Wikipedia, and one containing text mining articles with manually assigned term associations. The results indicate that the tpf–idf–tpu method can discover novel associations, that they are different from just taking pairs of tf–idf keywords, and that they match better the subjective associations of a reader.


matjajuri's tags for this article

Citations (CiTO)

No CiTO relationships defined

X There are no reviews yet

X Posting History


X Export records

Privacy Statement | Terms & Conditions
CiteULike organises scholarly (or academic) papers or literature and provides bibliographic (which means it makes bibliographies) for universities and higher education establishments. It helps undergraduates and postgraduates. People studying for PhDs or in postdoctoral (postdoc) positions. The service is similar in scope to EndNote or RefWorks or any other reference manager like BibTeX, but it is a social bookmarking service for scientists and humanities researchers.