CiteULike is a free online bibliography manager. Register and you can start organising your references online.

Higher criticism thresholding: Optimal feature selection when useful features are rare and weak Export

Proceedings of the National Academy of Sciences, Vol. 105, No. 39. (30 September 2008), pp. 14790-14795.

Citation Format

[Posts]

View FullText article


zufar's tags for this article

classification feature-selection

X Reviews [Write a review of this article]

X Find related articles from these CiteULike users

X Find related articles with these CiteULike tags

X Posting History

X Abstract

10.1073/pnas.0807471105 In important application fields todayâgenomics and proteomics are examplesâselecting a small subset of useful features is crucial for success of Linear Classification Analysis. We study feature selection by thresholding of feature -scores and introduce a principle of threshold selection, based on the notion of (HC). For = 1, 2, â¦, , let Ï denote the two-sided -value associated with the th feature -score and Ï denote the th order statistic of the collection of -values. The HC threshold is the absolute -score corresponding to the -value maximizing the HC objective (/ â Ï)/ . We consider a rare/weak (RW) feature model, where the fraction of useful features is small and the useful features are each too weak to be of much use on their own. HC thresholding (HCT) has interesting behavior in this setting, with an intimate link between maximizing the HC objective and minimizing the error rate of the designed classifier, and very different behavior from popular threshold selection procedures such as false discovery rate thresholding (FDRT). In the most challenging RW settings, HCT uses an unconventionally low threshold; this keeps the missed-feature detection rate under better control than FDRT and yields a classifier with improved misclassification performance. Replacing cross-validated threshold selection in the popular Shrunken Centroid classifier with the computationally less expensive and simpler HCT reduces the variance of the selected threshold and the error rate of the constructed classifier. Results on standard real datasets and in asymptotic theory confirm the advantages of HCT.


X BibTeX record

X RIS record


Privacy Statement | Terms & Conditions
CiteULike organises scholarly (or academic) papers or literature and provides bibliographic (which means it makes bibliographies) for universities and higher education establishments. It helps undergraduates and postgraduates. People studying for PhDs or in postdoctoral (postdoc) positions. The service is similar in scope to EndNote or RefWorks or any other reference manager like BibTeX, but it is a social bookmarking service for scientists and humanities researchers.