CiteULike is a free online bibliography manager. Register and you can start organising your references online.

Exploring Potential of Leave-One-Out Estimator for Calibration of SVM in Text Mining Export

Advances in Knowledge Discovery and Data Mining (2004), pp. 361-372.

Citation Format

[Posts]

View FullText article


sdvillal's tags for this article

calibration imbalanced kernel-machines multiclass text-classification

X Reviews [Write a review of this article]

X Find related articles from these CiteULike users

X Find related articles with these CiteULike tags

X Posting History

X Abstract

This paper investigates a number of techniques for calibration of the output of a Support Vector Machine in order to provide a posterior probability P(target class | instance). Five basic calibration techniques are combined with five ways of correcting the SVM scores on the training set. The calibration techniques used are addition of a simple ramp function, allocation of a Gaussian density, fitting of a sigmoid to the output and two binning techniques. The correction techniques include three methods that are based on recent theoretical advances in leave-one-out estimators and two that are variants of hold-out validation set. This leads us to thirty different settings (including calibration on uncorrected scores). All thirty methods are evaluated for two linear SVMs (one with linear and one with quadratic penalty) and for the ridge regression model (regularisation network) on three categories of the Reuters Newswires benchmark and the WebKB dataset. The performance of these methods are compared to both the probabilities generated by a naive Bayes classifier as well as a calibrated centroid classifier. The main conclusions of this research are: (i) simple calibrators such as ramp and sigmoids perform remarkably well, (ii) score correctors using leave-one-out techniques can perform better than those using validation sets, however, cross-validation methods allow more reliable estimation of test error from the training data.


X BibTeX record

X RIS record


Privacy Statement | Terms & Conditions
CiteULike organises scholarly (or academic) papers or literature and provides bibliographic (which means it makes bibliographies) for universities and higher education establishments. It helps undergraduates and postgraduates. People studying for PhDs or in postdoctoral (postdoc) positions. The service is similar in scope to EndNote or RefWorks or any other reference manager like BibTeX, but it is a social bookmarking service for scientists and humanities researchers.