Note on Naive Bayes Based on Binary Descriptors in Cheminformatics
A plethora of articles on naive Bayes classifiers, where the chemical compounds to be classified are represented by binary-valued (absent or present type) descriptors, have appeared in the cheminformatics literature over the past decade. The principal goal of this paper is to describe how a naive Bayes classifier based on binary descriptors (NBCBBD) can be employed as a feature selector in an efficient manner suitable for cheminformatics. In the process, we point out a fact well documented in other disciplines that NBCBBD is a linear classifier and is therefore intrinsically suboptimal for classifying compounds that are nonlinearly separable in their binary descriptor space. We investigate the performance of the proposed algorithm on classifying a subset of the MDDR data set, a standard molecular benchmark data set, into active and inactive compounds.