Abstracts - 2006
Sequence Motifs Predictive of Tissue-specific Exon Skipping
Neha Soni, Gene Yeo, Tomaso Poggio & Chris Burge
Alternative splicing (AS) plays a major role in increasing protein diversity and regulating gene expression and more than half the human genes have been estimated to be alternatively spliced. Splicing is regulated by the interactions of protein factors with the splicing machinery, where the factors bind to cis-elements in the regulated exons or the flanking introns. Careful studies of the binding affinities of particular splicing trans-factors such as ETR3  and NOVA-1 [2, 17], or mutation of cis-regulatory elements, or a combination, have identified tissue-specific cis and trans relationships. Systematic larger-scale searches using experimental and computational methods [3,4,5,6] have uncovered additional splicing cis-elements. In addition, motifs that regulate tissue-specific alternative splicing have been identified by groups studying small sets of genes [7,8]. We aim to use microarray and sequence data to ultimately find motifs that affect tissue-specific alternative splicing.
Microarray-based prediction of skipped exons
The microarray data is used to assign (MTSS) Tissue-Specific Skipping scores to exons which can thus be divided into Tissue-Specific Skipped (TSS) exon sets. The Rosetta splicing-sensitive microarray dataset interrogated ~11,000 Refseq entries representing ~10,000 human genes in 52 tissues and cell lines . Each gene is characterized by a matrix of probe hybridization intensities, where probes are designed to span spliced exon junctions. After normalisation, these intensity values are separated into bins using a euclidean k-means approach where the bin with the lowest values contains those intensities that potentially correspond to an exon being skipped in the particular tissue. The exons are given an MTSS score by adding up the corresponding bin weights, where the weights for each bin are learned by training this model on a literature-verified set of skipped exons.
Finding tissue-specific motifs that enhance skipping
Given a TSS set of exon sequences, we searched for common sequence patterns that occur more frequently than chance in constitutive exons and their flanking introns, similar to the method used in . Statistically significant hexamer and pentamer motifs over-represented or under-represented in the exonic and intronic regions of the sequences exhibiting alternative splicing were thus found for the brain, heart, muscle and testis tissues.
These TSS motifs are used to assign sequence-based skipping scores (STSS) to exons. STSS scores were assigned to a test set of exons that were not used in the determination of the motifs. These results were then compared with existing EST evidence of exon skipping in the brain tissue as shown in Figure 2.
In Figure 2, the cumulative distribution curves show the differences in various tissue motif-based STSS scores assigned to exons skipped in the brain tissues, and constitutive exons according to EST data. The scores of exons skipped in the brain appear higher, demonstrating that brain motifs can be used to predict brain-specific skipping but the non-brain motifs can not, hence showing the tissue-specificity of these motifs.
(a) Original hybridisation intensities for the EPB41L2 gene. Darker colors refer to lower hybridisation intensities. (b) Normalized hybridisation intensities for the EPB41L2 gene. (c) Binned intensity values for the EPB41L2 gene. The darkest bins correspond to the exons that are predicted to be skipped while the lighter bins correspond to exons predicted to be included.
The blue curves represent the cumulative STSS scores assigned to exons that have no EST evidence of skipping in any tissue. The red curves represent the cumulative tissue-based STSS scores for exons that are EST-skipped in the brain, using all four different tissue motifs.
Research at CBCL is supported by ONR, Darpa, NSF, Kodak, Sienmens, Daimler-Chrysler, ATR, ATT, Compaq, Honda, CRIEPI.
 Faustino NA, Cooper TA. Identification of Putative New Splicing Targets for ETR-3 Using Sequences Identified by Systematic Evolution of Ligands by Exponential Enrichment. In Molecular and Cellular Biology , Feb 2005.
 Buckanovich RJ, Posner JB, Darnell RB. Nova, the paraneoplastic Ri antigen, is homologous to an RNA-binding protein and is specifically expressed in the developing motor system. In Neuron 1993 Oct; 11(4):657-72
 Wang Z, Rolish ME, Yeo G, Tung V, Mawson M, Burge CB. Systematic identification and analysis of exonic splicing enhancers. In Cell, 119(6):831-45, December 2004.
 Fairbrother WG, Yeh RF, Sharp PA, and Burge CB. Predictive identification of exonic splicing enhancers in human genes. In Science , 297(5583):1007-13, August 2002.
 Zhang XH, Kangsamaksin T, Chao MS, Banerjee JK, and Chasin LA. Exon inclusion is dependent on predictable exonic splicing enhancers. In Mol Cell Biol. , 25(16):7323-32, August 2005.
 Cartegni L, Wang J, Zhu Z, Zhang MQ, and Krainer AR. ESEfinder: A web resource to identify exonic splicing enhancers. In Nucleic Acids Res. , 31(13):3568-71, July 2003.
 Minovitsky S, Gee SL, Schokrpur S, Dubchak I, Conboy JG. The splicing regulatory element, UGCAUG, is phylogenetically and spatially conserved in introns that flank tissue-specific alternative exons. In Nucleic Acids Res. 2005 Feb 3;33(2):714-24.
 Stamm S, Zhu J, Nakai K, Stoilov P, Stoss P, Stoss O, Zhang MQ. An alternative-exon database and its statistical analysis. In DNA Cell Biol. 2000 Dec; 739-56.
 J. Johnson et al. Genome-Wide survey of human alternative pre-mRNA splicing with exon junction microarrays. In Science, Vol. 302, December 2003.
 Kreiman G. Identification of sparsely distributed clusters of cis-regulatory elements in sets of co-expressed genes. In Nucleic Acids Research , 2004, Vol. 32, No. 9 2889-2900
 Jensen KB, Dredge BK, Stefani G, Zhong R, Buckanovich RJ, Okano HJ, Yang YY, Darnell RB. Nova-1 regulates neuron-specific alternative splicing and is essential for neuronal viability. In Neuron. 2000 Febb; 25(2):359-71