CiteULike is a free online bibliography manager. Register and you can start organising your references online.

CUR matrix decompositions for improved data analysis Export

Proceedings of the National Academy of Sciences, Vol. 106, No. 3. (20 January 2009), pp. 697-702.

Citation Format

[Posts]

View FullText article


jzujovic's tags for this article

approximation decomposition matrix onur2009 svd

X Reviews [Write a review of this article]

X Notes for this article

jzujovic has 1 private note and 0 public notes for this article. If you are jzujovic then you can log in to see the private note.

X Find related articles from these CiteULike users

X Find related articles with these CiteULike tags

X Posting History

X Abstract

10.1073/pnas.0803205106 Principal components analysis and, more generally, the Singular Value Decomposition are fundamental data analysis tools that express a data matrix in terms of a sequence of orthogonal or uncorrelated vectors of decreasing importance. Unfortunately, being linear combinations of up to all the data points, these vectors are notoriously difficult to interpret in terms of the data and processes generating the data. In this article, we develop CUR matrix decompositions for improved data analysis. CUR decompositions are low-rank matrix decompositions that are explicitly expressed in terms of a small number of actual columns and/or actual rows of the data matrix. Because they are constructed from actual data elements, CUR decompositions are interpretable by practitioners of the field from which the data are drawn (to the extent that the original data are). We present an algorithm that preferentially chooses columns and rows that exhibit high “statistical leverage” and, thus, in a very precise statistical sense, exert a disproportionately large “influence” on the best low-rank fit of the data matrix. By selecting columns and rows in this manner, we obtain improved relative-error and constant-factor approximation guarantees in worst-case analysis, as opposed to the much coarser additive-error guarantees of prior work. In addition, since the construction involves computing quantities with a natural and widely studied statistical interpretation, we can leverage ideas from diagnostic regression analysis to employ these matrix decompositions for exploratory data analysis.


X BibTeX record

X RIS record


Privacy Statement | Terms & Conditions
CiteULike organises scholarly (or academic) papers or literature and provides bibliographic (which means it makes bibliographies) for universities and higher education establishments. It helps undergraduates and postgraduates. People studying for PhDs or in postdoctoral (postdoc) positions. The service is similar in scope to EndNote or RefWorks or any other reference manager like BibTeX, but it is a social bookmarking service for scientists and humanities researchers.