CSAIL Publications and Digital Archive header
bullet Technical Reports bullet Work Products bullet Research Abstracts bullet Historical Collections bullet

link to publications.csail.mit.edu link to www.csail.mit.edu horizontal line


Research Abstracts - 2006
horizontal line

horizontal line

vertical line
vertical line

Sequence Motifs Predictive of Tissue-specific Exon Skipping

Neha Soni, Gene Yeo, Tomaso Poggio & Chris Burge


Alternative splicing (AS) plays a major role in increasing protein diversity and regulating gene expression and more than half the human genes have been estimated to be alternatively spliced. Splicing is regulated by the interactions of protein factors with the splicing machinery, where the factors bind to cis-elements in the regulated exons or the flanking introns. Careful studies of the binding affinities of particular splicing trans-factors such as ETR3 [1] and NOVA-1 [2, 17], or mutation of cis-regulatory elements, or a combination, have identified tissue-specific cis and trans relationships. Systematic larger-scale searches using experimental and computational methods [3,4,5,6] have uncovered additional splicing cis-elements. In addition, motifs that regulate tissue-specific alternative splicing have been identified by groups studying small sets of genes [7,8]. We aim to use microarray and sequence data to ultimately find motifs that affect tissue-specific alternative splicing.

Microarray-based prediction of skipped exons

The microarray data is used to assign (MTSS) Tissue-Specific Skipping scores to exons which can thus be divided into Tissue-Specific Skipped (TSS) exon sets. The Rosetta splicing-sensitive microarray dataset interrogated ~11,000 Refseq entries representing ~10,000 human genes in 52 tissues and cell lines [15]. Each gene is characterized by a matrix of probe hybridization intensities, where probes are designed to span spliced exon junctions. After normalisation, these intensity values are separated into bins using a euclidean k-means approach where the bin with the lowest values contains those intensities that potentially correspond to an exon being skipped in the particular tissue. The exons are given an MTSS score by adding up the corresponding bin weights, where the weights for each bin are learned by training this model on a literature-verified set of skipped exons.

microarray intensities
Finding tissue-specific motifs that enhance skipping

Given a TSS set of exon sequences, we searched for common sequence patterns that occur more frequently than chance in constitutive exons and their flanking introns, similar to the method used in [20]. Statistically significant hexamer and pentamer motifs over-represented or under-represented in the exonic and intronic regions of the sequences exhibiting alternative splicing were thus found for the brain, heart, muscle and testis tissues.

These TSS motifs are used to assign sequence-based skipping scores (STSS) to exons. STSS scores were assigned to a test set of exons that were not used in the determination of the motifs. These results were then compared with existing EST evidence of exon skipping in the brain tissue as shown in Figure 2.

brain specific motifs

In Figure 2, the cumulative distribution curves show the differences in various tissue motif-based STSS scores assigned to exons skipped in the brain tissues, and constitutive exons according to EST data. The scores of exons skipped in the brain appear higher, demonstrating that brain motifs can be used to predict brain-specific skipping but the non-brain motifs can not, hence showing the tissue-specificity of these motifs.

Figure 1:

(a) Original hybridisation intensities for the EPB41L2 gene. Darker colors refer to lower hybridisation intensities. (b) Normalized hybridisation intensities for the EPB41L2 gene. (c) Binned intensity values for the EPB41L2 gene. The darkest bins correspond to the exons that are predicted to be skipped while the lighter bins correspond to exons predicted to be included.

Figure 2:

The blue curves represent the cumulative STSS scores assigned to exons that have no EST evidence of skipping in any tissue. The red curves represent the cumulative tissue-based STSS scores for exons that are EST-skipped in the brain, using all four different tissue motifs.

Research Support

Research at CBCL is supported by ONR, Darpa, NSF, Kodak, Sienmens, Daimler-Chrysler, ATR, ATT, Compaq, Honda, CRIEPI.


[1] Faustino NA, Cooper TA. Identification of Putative New Splicing Targets for ETR-3 Using Sequences Identified by Systematic Evolution of Ligands by Exponential Enrichment. In Molecular and Cellular Biology , Feb 2005.

[2] Buckanovich RJ, Posner JB, Darnell RB. Nova, the paraneoplastic Ri antigen, is homologous to an RNA-binding protein and is specifically expressed in the developing motor system. In Neuron 1993 Oct; 11(4):657-72

[3] Wang Z, Rolish ME, Yeo G, Tung V, Mawson M, Burge CB. Systematic identification and analysis of exonic splicing enhancers. In Cell, 119(6):831-45, December 2004.

[4] Fairbrother WG, Yeh RF, Sharp PA, and Burge CB. Predictive identification of exonic splicing enhancers in human genes. In Science , 297(5583):1007-13, August 2002.

[5] Zhang XH, Kangsamaksin T, Chao MS, Banerjee JK, and Chasin LA. Exon inclusion is dependent on predictable exonic splicing enhancers. In Mol Cell Biol. , 25(16):7323-32, August 2005.

[6] Cartegni L, Wang J, Zhu Z, Zhang MQ, and Krainer AR. ESEfinder: A web resource to identify exonic splicing enhancers. In Nucleic Acids Res. , 31(13):3568-71, July 2003.

[7] Minovitsky S, Gee SL, Schokrpur S, Dubchak I, Conboy JG. The splicing regulatory element, UGCAUG, is phylogenetically and spatially conserved in introns that flank tissue-specific alternative exons. In Nucleic Acids Res. 2005 Feb 3;33(2):714-24.

[8] Stamm S, Zhu J, Nakai K, Stoilov P, Stoss P, Stoss O, Zhang MQ. An alternative-exon database and its statistical analysis. In DNA Cell Biol. 2000 Dec; 739-56.

[9] J. Johnson et al. Genome-Wide survey of human alternative pre-mRNA splicing with exon junction microarrays. In Science, Vol. 302, December 2003.

[10] Kreiman G. Identification of sparsely distributed clusters of cis-regulatory elements in sets of co-expressed genes. In Nucleic Acids Research , 2004, Vol. 32, No. 9 2889-2900

[11] Jensen KB, Dredge BK, Stefani G, Zhong R, Buckanovich RJ, Okano HJ, Yang YY, Darnell RB. Nova-1 regulates neuron-specific alternative splicing and is essential for neuronal viability. In Neuron. 2000 Febb; 25(2):359-71

vertical line
vertical line
horizontal line

MIT logo Computer Science and Artificial Intelligence Laboratory (CSAIL)
The Stata Center, Building 32 - 32 Vassar Street - Cambridge, MA 02139 - USA
tel:+1-617-253-0073 - publications@csail.mit.edu