A novel tree kernel partial least squares for modeling the structure–activity relationship
Kernel partial least squares (KPLS) has become a popular technique for regression and classification of complex data sets, which is a nonlinear extension of linear PLS in which training samples are transformed into a feature space via a nonlinear mapping. The PLS algorithm can then be carried out in the feature space. In the present study, we attempt to develop a novel tree KPLS (TKPLS) classification algorithm by constructing an informative kernel on the basis of decision tree ensembles. The constructed tree kernel can effectively discover the similarities of samples and select informative features by variable importance ranking in the process of building the kernel. Simultaneously, TKPLS can also handle nonlinear relationships in the structure–activity relationship data by such a kernel. Finally, three data sets related to different categorical bioactivities of compounds are used to evaluate the performance of TKPLS. The results show that the TKPLS algorithm can be regarded as an alternative and promising classification technique. Copyright © 2013 John Wiley & Sons, Ltd.