Diverse dimension decomposition for itemset spaces
We introduce the problem of diverse dimension decomposition in transactional databases, where a dimension is a set of mutually exclusive itemsets. The problem we consider requires to find a decomposition of the itemset space into dimensions, which are orthogonal to each other and which provide high coverage of the input database. The mining framework we propose can be interpreted as a dimensionality-reducing transformation from the space of all items to the space of orthogonal dimensions. Relying on information-theoretic concepts, we formulate the diverse dimension decomposition problem with a single objective function that simultaneously captures constraints on coverage, exclusivity, and orthogonality. We show that our problem is NP -hard, and we propose a greedy algorithm exploiting the well-known FP-tree data structure. Our algorithm is equipped with strategies for pruning the search space deriving directly from the objective function. We also prove a property that allows assessing the level of informativeness for newly added dimensions, thus allowing to define criteria for terminating the decomposition. We demonstrate the effectiveness of our solution by experimental evaluation on synthetic datasets with known dimension and three real-world datasets, flickr , del.icio.us and dblp . The problem we study is largely motivated by applications in the domain of collaborative tagging; however, the mining task we introduce in this paper is useful in other application domains as well.