CiteULike is a free online bibliography manager. Register and you can start organising your references online.

High-dimensional Data Analysis: From Optimal Metrics to Feature Selection Export

(06 May 2008)

Citation Format

[Posts]

View FullText article


sdvillal's tags for this article

concentration-of-measure high-dimensionality kernel-machines locality local-learning nearest-neighbors overfitting

X Reviews [Write a review of this article]

X Find related articles from these CiteULike users

X Find related articles with these CiteULike tags

X Posting History

X Abstract

High-dimensional data are everywhere: texts, sounds, spectra, images, etc. However, many data analysis tools (coming from statistics, artificial intelligence, etc.) were designed for low-dimensional data. Many of the assumptions behind data analysis tools are not transposable to high- dimensional data. For instance, the Euclidean distance concentrates in high- dimensional spaces; all distances seem identical! It furthermore does not distinguish between relenvant and irrelevant features. In Part One of the book, the phenomenon of the concentration of the distances is considered, and its consequences on data analysis tools are studied. Part Two focuses on the problem of feature selection in the case of a large number of initial features. Most of the concepts studied and presented in this thesis are illustrated on chemometric data, and more particularly on spectral data, with the objective of inferring a physical or chemical property of a material by analysis the spectrum of the light it reflects.


X BibTeX record

X RIS record


Privacy Statement | Terms & Conditions
CiteULike organises scholarly (or academic) papers or literature and provides bibliographic (which means it makes bibliographies) for universities and higher education establishments. It helps undergraduates and postgraduates. People studying for PhDs or in postdoctoral (postdoc) positions. The service is similar in scope to EndNote or RefWorks or any other reference manager like BibTeX, but it is a social bookmarking service for scientists and humanities researchers.