How does DNA sequence motif discovery work?
How can we computationally extract an unknown motif from a set of target sequences? What are the principles behind the major motif discovery algorithms? Which of these should we use, and how do we know we've found a 'real' motif? Extracting regulatory motifs1 from DNA sequences seems to be all the rage these days. Take your favorite cluster of coexpressed genes, and with some luck you might hope to find a short pattern of nucleotides upstream of the transcription start sites of these genes, indicating a common transcription factor binding site responsible for their coordinate regulation.