CSAIL Publications and Digital Archive header
bullet Research Abstracts Home bullet CSAIL Digital Archive bullet Research Activities bullet CSAIL Home bullet

link to publications.csail.mit.edu link to www.csail.mit.edu horizontal line


Research Abstracts - 2007
horizontal line

horizontal line

vertical line
vertical line

Genome-wide Identification of Fly Regulatory Motifs and their Functional Roles

Alexander Stark, Pouya Kheradpour & Manolis Kellis

The expression of animal genes is tightly regulated in spatial and temporal patterns during development. Regulation occurs both prior to and during transcription, during splicing, and by regulation of mRNA export, stability and translation. Each of these processes relies on subtle DNA and RNA sequence signals, forming several diverse classes of regulatory motifs. A systematic understanding of gene regulation relies on the global knowledge of these motifs, which are often elusive due to their short length, several degenerate positions, and large distances at which they can act.

Using whole genome alignments of 12 Drosophila species, we undertook a systematic discovery of conserved regulatory motifs in the entire fly genome, including intergenic, intronic, and untranslated regions (similar to [1-3]). In addition, we defined regulatory motifs in protein-coding sequence based on their reading frame-independent conservation, and distinguished RNA motifs by their strand-specific conservation.

We find that the motif composition of promoters, introns, 5'UTRs, and intergenic regions is highly similar and matches that of known enhancers. The highest scoring motifs correspond to known Drosophila transcription factors such as engrailed, apterous, and Antennapedia. The majority of the novel elements show an enrichment of genes, related by common functions or expression patterns and - similar to known transcription factors - are depleted near ubiquitously expressed genes.

A surprisingly large fraction of motifs show strong positional preferences with respect to the transcription start site, splice junctions, translation start- and stop, and the poly-adenylation signal, suggesting their diverse functions in regulating the respective processes at each level of gene regulation, and in precisely defining these positions. In addition, most motifs show clustering in intergenic regions, but are depleted in coding sequence or 3'UTRs.

The most highly conserved motifs in 3'UTRs and coding sequence are complementary to known miRNA 5'ends (see also [3]), suggesting that coding exons contain physiologically relevant and evolutionarily selected miRNA target sites.

The comparative analysis of regulatory motifs in fly species reveals principles of tissue-specific regulation of gene expression and provides a framework for future approaches to understand tissue-formation and development. Finally, it may serve as a model for similar studies in human as dozens of new mammalian genomes become available.


[1] M. Kellis, N. Patterson, B. Birren, B. Berger, E.S. Lander. Methods in comparative genomics: genome correspondence, gene identification and regulatory motif discovery. In Journal of Computational Biology, Volume 11, pp. 319-355, 2004.

[2] M. Kellis, N. Patterson, M. Endrizzi, B. Birren, E.S. Lander. Sequencing and comparison of yeast species to identify genes and regulatory elements. In Nature, Volume 423, pp. 241-254, May 2003.

[3] Xiaohui Xie, Jun Lu, E. J. Kulbokas, Todd R. Golub, Vamsi Mootha, Kerstin Lindblad-Toh, Eric S. Lander and Manolis Kellis. Systematic discovery of regulatory motifs in human promoters and 3'UTRs by comparison of several mammals. In Nature, Volume 434, pp. 338--345, Mar 2004.


vertical line
vertical line
horizontal line

MIT logo Computer Science and Artificial Intelligence Laboratory (CSAIL)
The Stata Center, Building 32 - 32 Vassar Street - Cambridge, MA 02139 - USA
tel:+1-617-253-0073 - publications@csail.mit.edu