CiteULike is a free online bibliography manager. Register and you can start organising your references online.

A New Approach to Data Driven Clustering Export

In ICML '06: 23rd International Conference on Machine Learning (25 June 2006)

Citation Format

[Posts]

View FullText article


ldietz's tags for this article

clustering community

X Reviews [Write a review of this article]

X Find related articles from these CiteULike users

X Find related articles with these CiteULike tags

X Posting History

X Abstract

We consider the problem of clustering in its most basic form where only a local metric on the data space is given. No parametric statistical model is assumed, and the number of clusters is learned from the data. We introduce, analyze and demonstrate a novel approach to clustering where data points are viewed as nodes of a graph, and pairwise similarities are used to derive a transition probability matrix P for a Markov random walk between them. The algorithm automatically reveals structure at increasing scales by varying the number of steps taken by this random walk. Points are represented as rows of Pt, which are the t-step distributions of the walk starting at that point; these distributions are then clustered using a KL-minimizing iterative algorithm. Both the number of clusters, and the number of steps that ‘best reveal’ it, are found by optimizing spectral properties of P.


X BibTeX record

X RIS record


Privacy Statement | Terms & Conditions
CiteULike organises scholarly (or academic) papers or literature and provides bibliographic (which means it makes bibliographies) for universities and higher education establishments. It helps undergraduates and postgraduates. People studying for PhDs or in postdoctoral (postdoc) positions. The service is similar in scope to EndNote or RefWorks or any other reference manager like BibTeX, but it is a social bookmarking service for scientists and humanities researchers.