MuMoD: a Bayesian approach to detect multiple modes of protein–DNA binding from genome-wide ChIP data
High-throughput chromatin immunoprecipitation has become the method of choice for identifying genomic regions bound by a protein. Such regions are then investigated for overrepresented sequence motifs, the assumption being that they must correspond to the binding specificity of the profiled protein. However this approach often fails: many bound regions do not contain the ‘expected’ motif. This is because binding DNA directly at its recognition site is not the only way the protein can cause the region to immunoprecipitate. Its binding specificity can change through association with different co-factors, it can bind DNA indirectly, through intermediaries, or even enforce its function through long-range chromosomal interactions. Conventional motif discovery methods, though largely capable of identifying overrepresented motifs from bound regions, lack the ability to characterize such diverse modes of protein–DNA binding and binding specificities. We present a novel Bayesian method that identifies distinct protein–DNA binding mechanisms without relying on any motif database. The method successfully identifies co-factors of proteins that do not bind DNA directly, such as mediator and p300. It also predicts literature-supported enhancer–promoter interactions. Even for well-studied direct-binding proteins, this method provides compelling evidence for previously uncharacterized dependencies within positions of binding sites, long-range chromosomal interactions and dimerization.