Orphan CpG Islands Identify Numerous Conserved Promoters in the Mammalian Genome
CpG islands (CGIs) are vertebrate genomic landmarks that encompass the promoters of most genes and often lack DNA methylation. Querying their apparent importance, the number of CGIs is reported to vary widely in different species and many do not co-localise with annotated promoters. We set out to quantify the number of CGIs in mouse and human genomes using CXXC Affinity Purification plus deep sequencing (CAP-seq). We also asked whether CGIs not associated with annotated transcripts share properties with those at known promoters. We found that, contrary to previous estimates, CGI abundance in humans and mice is very similar and many are at conserved locations relative to genes. In each species CpG density correlates positively with the degree of H3K4 trimethylation, supporting the hypothesis that these two properties are mechanistically interdependent. Approximately half of mammalian CGIs (>10,000) are “orphans” that are not associated with annotated promoters. Many orphan CGIs show evidence of transcriptional initiation and dynamic expression during development. Unlike CGIs at known promoters, orphan CGIs are frequently subject to DNA methylation during development, and this is accompanied by loss of their active promoter features. In colorectal tumors, however, orphan CGIs are not preferentially methylated, suggesting that cancer does not recapitulate a developmental program. Human and mouse genomes have similar numbers of CGIs, over half of which are remote from known promoters. Orphan CGIs nevertheless have the characteristics of functional promoters, though they are much more likely than promoter CGIs to become methylated during development and hence lose these properties. The data indicate that orphan CGIs correspond to previously undetected promoters whose transcriptional activity may play a functional role during development. In the decade since the sequence of the human genome was announced, efforts have been made to annotate all genes with their regulatory sequences. CpG islands are short regions containing the sequence CG at high density that map to regions controlling the expression of most human genes (known as promoters). Using a biochemical method, we have identified and mapped all CpG islands in the human and mouse genomes and find that over half are remote from known gene promoters—so-called “orphans.” Mice, which were thought to possess far fewer CpG islands than humans, turn out to have a very similar number. Surprisingly, orphan CpG islands in both species often mark hitherto unknown promoters. The activity of these novel promoters is particularly dynamic during normal development, as they are often silenced by DNA methylation. In colorectal cancers, however, aberrant DNA methylation affects all CpG islands equally.