Towards effective subspace clustering with an evolutionary algorithm
We propose a new evolutionary algorithm for subspace clustering in very large and high-dimensional databases. The design includes task-specific coding and genetic operators, along with a nonrandom initialization procedure. Experimental results show that the algorithm scales almost linearly with the size and dimensionality of the database as well as the dimensionality of the hidden clusters. Our algorithm is able to discover clusters of different densities embedded in both low and high dimensional subspaces of the original space. Finally, the discovered knowledge is presented in the form of nonoverlapping clustering rules where only those features relevant to the clustering are reported. These two properties contributes to the relatively high comprehensibility of the clustering output.