CSAIL Publications and Digital Archive header
bullet Research Abstracts Home bullet CSAIL Digital Archive bullet Research Activities bullet CSAIL Home bullet

link to publications.csail.mit.edu link to www.csail.mit.edu horizontal line


Research Abstracts - 2007
horizontal line

horizontal line

vertical line
vertical line

Discovery and Phylogenetic Analysis of microRNAs in Mammalian Species

Shay Artzi, Adam Kiezun & Noam Shomron


MicroRNAs (miRNAs) are small non-coding RNAs that control gene expression by negatively regulating translation. miRNA have emerged as a major class of regulatory genes in most metazoans and as important regulators for a diverse range of biological processes. Understanding the accurate direction of mammalian phylogenetic evolution is of great importance, but even now, when large genomic sequences are known, some relationships within the phylogenetic trees are controversial. We study one such controversial relationship within the mammalian phylogenetic tree: the three-taxon placement of rodent, primates and carnivores, using miRNA genes.

We have created a fully automated tool, miRNAminer, that identifies candidate miRNAs (precursor and mature sequences) by homology search and alignment. We applied the tool to the set of already known miRNAs from the metazoan phylum. miRNAminer searches for non-paralogous sets of miRNAs from the miRNA database from the Sanger Institute.

Using miRNAminer, we complemented the Sanger miRNA database with hundreds of ortholog mammalian miRNA genes. This enabled us to accurately compile a phylogenetic tree based on presence/absence patterns. Our miRNA database enables phylogenetic reconstruction, obtained via entirely different means, namely miRNA genes, leading to evidence supporting a primate-rodent clade with the exclusion of carnivores.

Mining miRNAs

For searching genome databases, miRNAminer uses BLASTN. In our experiments, we used seven mammalian genomes from ENSEMBL. For each candidate miRNA, miRNAminer searches the genome for the precursor sequences of all known miRNAs of the same name. Sequences that have E-value at most 0.1 per chromosome are selected for further evaluation. To further filter potential miRNAs, miRNAminer uses the following criteria about conservation of mature miRNA sequences, conservation of precursor miRNA sequences and precursor miRNA secondary structure, i.e., fold (the first three criteria were proposed in previous work, such as [1, 2]). We estimated the parameters in the following selection criteria from data in the miRNA registry and chose the values that included at least 95% of known miRNA genes:

  1. the fold has a hairpin structure,
  2. in the fold, deltaG < -21 kcal/mole,
  3. in the fold, at least 55% of miRNA precursor nucleotides are paired,
  4. the alignment of mature miRNA sequences has at least 95% identity and the length of the alignment is at most 3 nt shorter than the length of the mature miRNA,
  5. the length of the miRNA precursor sequence is between 70 and 180 nt,
  6. the sequence of 6 nucleotides between positions 2 and 7 on the 5' end of the mature miRNAs has 100% conservation ([3],[4]),
  7. the alignment between miRNA precursor sequences has at least 56% identity and the total length of gaps in the alignment is at most 10,
  8. the mature miRNA and the hairpin loop overlap on no more than 4 nt (i.e., mature miRNA is located almost entirely on the hairpin stem).
Phylogenetic Tree Reconstruction

We exploited the rarity [2] of miRNA loss or convergent evolution to create an algorithm for phylogenetic tree reconstruction that uses information about presence/absence of miRNA in a set of species. The algorithm applies bootstrapping to reconstruct trees for large datasets using a subroutine that works under a stronger assumption of no miRNA loss (i.e., a miRNA occurs in every species descendant from an ancestral species in which the miRNA appeared).

We also created an algorithm to reconstruct phylogenetic trees by minimizing miRNA loss. That is, given miRNA presence data in a given set of species, every phylogenetic tree topology implies a number miRNA that are lost (as predicted by the topology). The problem is to find the tree topology that minimizes this number. We implemented a stochastic search for the minimizing tree and compared that tree with the bootstrapped one. After enough iterations (ca. 30'000), bootstrapping and stochastic search reconstructed identical tree topologies for our experimental data.

Results and Discussion

We used miRNAminer to perform a comprehensive homology search for miRNA precursors in the studied species. For the search, we used all 2925 vertebrate miRNAs listed in the Sanger miRNA registry (release 9.0 of October 2006). Figure 1 shows the summary information of miRNAs listed in the Sanger registry and of new miRNAs identified by our method. To conclusively confirm the presence of the identified candidates in the studied species, an experimental verification is required. However, the candidates identified by our method are close homologs to known miRNAs and as such are not required to meet as stringent criteria to be annotated as miRNAs [5].

Genome miRNA registry 9.0 newly identified sum
H. sapiens 474 25 499
P. troglodyte
83 253 336
M. musculus 373 34 407
R. norvegicus 234 77 311
C. familiaris 6 229 235
B. taurus 98 112 210
M. domestica
107 56 163
Total 1375 768 2163

Figure 1: Known miRNAs and new candidates identified by miRNAminer. The column "miRNA registry 9.0" shows the number of miRNAs listed for the given species in the Sanger miRNA registry release 9.0. The column "newly identified" shows the number of miRNAs candidates identified by miRNAminer.

There is an ongoing discussion about the phylogenetic relationship of mammalian orders. A recent study [6] based on full genome sequences supports a primate-carnivore clade with the exclusion of rodents. Other studies support the primate-rodent clade [7]. We used our bootstrapping reconstruction algorithm from to evaluate both hypothetical phylogenies according to the miRNA data. It is conjectured that miRNA data is nearly homoplasy-free [2], which makes it well suitable for reconstructing phylogenies.

Figure 2 shows the confidence scores for the two hypothesized phylogenies. The numbers were computed as the percentage of bootstrapped trees supporting the bifurcation in each intermediate node. The results strongly favor the primate-rodent clade (Figure 2b) over the primate-carnivore clade (Figure 2a). This confidence score of the relevant split is 62% vs. 20%. The phylogenetic tree in Figure 2b was constructed more than 50% of the time (2.5 times more often than any other tree) for 100,000 iterations of the bootstrapping algorithm. We also exhaustively checked all possible 7-species trees and this tree also minimizes the miRNA loss.

primate-carnivore primate-rodent
(a) primate-carnivore clade
(b) primate-rodent clade

Figure 2: Confidence scores for two hypothesized phylogenies of primates, rodents and carnivores, computed by the bootstrapping algorithm. The scores indicated stronger support for the primate-rodent clade.


[1] X. Xie, J. Lu, E. Kulbokas, T. Golub, V. Mootha, K. Lindblad-Toh, E. Lander, and M. Kellis. Systematic discovery of regulatory motifs in human promoters and 3' UTRs by comparison of several mammals. Nature 434., 2005.

[2] L. F. Sempere, C. N. Cole, M. A. McPeek, and K. J. Peterson. The phylogenetic distribution of metazoan microRNAs: insights into evolutionary complexity and constraint. J. Exp. Zool., 2006.

[3] B. P. Lewis, C. B. Burge, and D. P. Bartel. Conserved seed pairing, often flanked by adenosines, indicates that thousands of human genes are microRNA targets. Cell, 120(1):15­20, January 2006

[4] B. P. Lewis, I. H. Shih, M. W. Jones-Rhoades, D. P. Bartel, and C. B. Burge. Prediction of mammalian microRNA targets. Cell, 115(7):787­798, December 2003.

[5] V. Ambros, B. Bartel, D. P. Bartel, C. B. Burge, J. C. Carrington, X. Chen, G. Dreyfuss, S. R. Eddy, S. Griffiths-Jones, M. Marshall, M. Matzke, G. Ruvkun, and T. Tuschl. A uniform system for microRNA annotation. RNA, 9(3):277­279, March 2003

[6] G. Cannarozzi, A. Schneider, and G. Gonnet. A phylogenomic study of human, dog, and mouse. PLoS Comput Biol, 3(1), January 2007

[7] J. O. O. Kriegs, G. Churakov, M. Kiefmann, U. Jordan, J. Brosius, and J. Schmitz. Retroposed elements as archives for the evolutionary history of placental mammals. PLoS Biol, 4(4), March 2006


vertical line
vertical line
horizontal line

MIT logo Computer Science and Artificial Intelligence Laboratory (CSAIL)
The Stata Center, Building 32 - 32 Vassar Street - Cambridge, MA 02139 - USA
tel:+1-617-253-0073 - publications@csail.mit.edu