A Dynamic AdaBoost Algorithm With Adaptive Changes of Loss Function
AdaBoost is a method to improve a given learning algorithm's classification accuracy by combining its hypotheses. Adaptivity, one of the significant advantages of AdaBoost, makes AdaBoost maximize the smallest margin so that AdaBoost has good generalization ability. However, when the samples with large negative margins are noisy or atypical, the maximized margin is actually a “hard margin.” The adaptive feature makes AdaBoost sensitive to the sampling fluctuations, and prone to overfitting. Therefore, the traditional schemes prevent AdaBoost from overfitting by heavily damping the influences of samples with large negative margins. However, the samples with large negative margins are not always noisy or atypical; thus, the traditional schemes of preventing overfitting may not be reasonable. In order to learn a classifier with high generalization performance and prevent overfitting, it is necessary to perform statistical analysis for the margins of training samples. Herein, Hoeffding inequality is adopted as a statistical tool to divide training samples into reliable samples and temporary unreliable samples. A new boosting algorithm, which is named DAdaBoost, is introduced to deal with reliable samples and temporary unreliable samples separately. Since DAdaBoost adjusts weighting scheme dynamically, the loss function of DAdaBoost is not fixed. In fact, it is a series of nonconvex functions that gradually approach the 0–1 function as the algorithm evolves. By defining a virtual classifier, the dynamic adjusted weighting scheme is well unified into the progress of DAdaBoost, and the upper bound of training error is deduced. The experiments on both synthetic and real world data show that DAdaBoost has many merits. Based on the experiments, we conclude that DAdaBoost can effectively prevent AdaBoost from overfitting.