CSAIL Research Abstracts - 2005 link to http://publications.csail.mit.edu/abstracts/abstracts05/index.html link to http://www.csail.mit.edu
bullet Introduction bullet Architecture, Systems
& Networks
bullet Language, Learning,
Vision & Graphics
bullet Physical, Biological
& Social Systems
bullet Theory bullet

horizontal line

Comparative Modeling of Mainly-Beta Proteins by Profile Wrapping

Andrew V. McDonnell, Matthew Menke, Nathan Palmer, Jonathan King, Lenore Cowen & Bonnie Berger

Introduction

This work addresses the problem of predicting, from sequence alone, three-dimensional atomic coordinates for mainly-beta protein families of low sequence homology (less than 15%). A method is presented that uses sequence profiles along with empirically-derived pairwise beta-strand interaction probabilities to boost detection of beta-sheet propensity signal in protein sequences. We show that this new profile-based method provides adequate beta-signal amplification to facilitate an accurate alignment of a query sequence onto a super-secondary structural template. From these alignments, we are able to produce putative structures for the aligned regions of the beta-helix and beta-trefoil motifs to an average C-alpha RMSD of 2.0 angstrom and 4.5 angstrom, respectively, in leave-family-out cross-validation. Side-chain positions are also predicted for the structures.

Motivation

The comparative modeling problem is: given only the target amino acid sequence for a protein, and a superfamily or fold class, predict whether the protein folds into a three-dimensional structure which is a member of that superfamily, or fold class (i.e., the structural motif recognition problem); if so, give an accurate residue-by-residue alignment of the portions of the query sequence onto a super-secondary structural template, and finally, produce a prediction of the structure's atomic coordinates based on this alignment. This work studies the comparative modeling of two motifs where producing the correct sequence-target alignment has been considered to be an extremely difficult problem.

Algorithm

In this work, we extend our methodology for structural motif recognition of mainly-beta structures via the statistical capture of long-distance pairwise beta-strand interactions [2] to attack the more difficult problem of comparative modeling for protein domains with low sequence similarity across families. First, we modify the algorithms to produce an alignment of a target sequence onto an abstract motif template. We then map the predicted sequence-template alignment to known three-dimensional structures and model the sidechains in order to predict the full atomic coordinates. In order to increase the accuracy of this alignment, we generalize our algorithm to operate on a sequence profile rather than a single sequence. Sequence profiles present information about residue conservation at each position, and what types of substitutions are allowed at each location [3]. Our method takes advantage of the fact that beta-strand interactions act as a stabilizing mechanism for mainly-beta structures by considering the potential mutations suggested by the target's profile when evaluating pairwise beta-strand alignments. These techniques, in conjunction with abstract structural templates derived from structural alignments, enables accurate alignment of sequences to these templates. Combining these improvements with a backbone-dependant rotamer library sidechain modeling program [1], we are able to accurately model these motifs. The result is the first algorithm that can produce predicted three dimensional coordinates for mainly-beta protein families of low (less than 15%) sequence similarity from sequence data alone.

Results

A program that implements this new algorithm, BetaWrapPro, was developed, and is available at http://betawrappro.csail.mit.edu. On the 14 known beta-helices, this program produces accurate sequence-structure alignments for 76% of the predicted residues. For the beta-trefoils, 86% of the alignments are accurate. Given these high-quality sequence-structure alignments, BetaWrapPro is able to generate accurate three-dimensional structure predictions for the target motifs. The accurately aligned regions of the beta-helix template average less than 2.0 angstrom C-alpha RMSD, while those of the beta-trefoils average 4.5 angstrom RMSD. We have found that BetaWrapPro is also highly successful at detecting the motifs across these families despite the lack of sequence identity, achieving 100% sensitivity at 99.5% specificity in cross-validation on the beta-helices in our data set, and 100% sensitivity at 92.5% specificity in cross-validation for the beta-trefoils. This is an improvement over the results for BetaWrap [2] ($99.0% sensitivity, 95.0% specificity) and Wrap-and-Pack [4] (88.9%, 96.3%).

References

[1] M.J. Bower, F.E. Cohen, and R.L. Dunbrack Jr. Prediction of protein sice-chain rotamers from a backbone-dependent rotamer library: A new homology modeling tool. In J. Mol. Biol., 267: pp. 1268--1282, 1997.

[2] P. Bradley, L. Cowen, M. Menke, J. King, and B. Berger. BETAWRAP: Successful prediction of parallel beta-helices from primary sequence reveals an association with many microbial pathogens. In Proc Natl Acad Sci, USA, 98: pp. 14819--14824, 2001.

[3] M. Gribskov, R. Luthy, and D. Eisenberg. Profile analysis. In Methods in Enzymology, 183: pp. 146--159, 1990.

[4] M. Menke, E. Scanlon, J. King, B. Berger, and L. Cowen. Wrap-and-pack: a new paradigm for beta structural motif recognition with application to recognizing beta trefoils. In Proceedings of the eighth annual international conference on Computational molecular biology, pp. 298--307. ACM Press, 2004.

horizontal line

MIT logo Computer Science and Artificial Intelligence Laboratory (CSAIL)
The Stata Center, Building 32 - 32 Vassar Street - Cambridge, MA 02139 - USA
tel:+1-617-253-0073 - publications@csail.mit.edu
(Note: On July 1, 2003, the AI Lab and LCS merged to form CSAIL.)