A new approach for the joint analysis of multiple ChIP-seq libraries with application to histone modification.
Most approaches for analyzing ChIP-Seq data are focused on inferring exact protein binding sites from a single library. However, frequently multiple ChIP-Seq libraries derived from differing cell lines or tissue types from the same individual may be available. In such a situation, a separate analysis for each tissue or cell line may be inefficient. Here, we describe a novel method to analyze such data that intelligently uses the joint information from multiple related ChIP-Seq libraries. We present our method as a two-stage procedure. First, separate single cell line analysis is performed for each cell line. Here, we use a novel mixture regression approach to infer the subset of genes that are most likely to be involved in protein binding in each cell line. In the second step, we combine the separate single cell line analyses using an Empirical Bayes algorithm that implicitly incorporates inter-cell line correlation. We demonstrate the usefulness of our method using both simulated data, as well as real H3K4me3 and H3K27me3 histone methylation libraries.