Comparative Modeling of Mainly-Beta Proteins by Profile WrappingAndrew V. McDonnell, Matthew Menke, Nathan Palmer, Jonathan King, Lenore Cowen & Bonnie BergerIntroductionThis work addresses the problem of predicting, from sequence alone, three-dimensional atomic coordinates for mainly-beta protein families of low sequence homology (less than 15%). A method is presented that uses sequence profiles along with empirically-derived pairwise beta-strand interaction probabilities to boost detection of beta-sheet propensity signal in protein sequences. We show that this new profile-based method provides adequate beta-signal amplification to facilitate an accurate alignment of a query sequence onto a super-secondary structural template. From these alignments, we are able to produce putative structures for the aligned regions of the beta-helix and beta-trefoil motifs to an average C-alpha RMSD of 2.0 angstrom and 4.5 angstrom, respectively, in leave-family-out cross-validation. Side-chain positions are also predicted for the structures. MotivationThe comparative modeling problem is: given only the target amino acid sequence for a protein, and a superfamily or fold class, predict whether the protein folds into a three-dimensional structure which is a member of that superfamily, or fold class (i.e., the structural motif recognition problem); if so, give an accurate residue-by-residue alignment of the portions of the query sequence onto a super-secondary structural template, and finally, produce a prediction of the structure's atomic coordinates based on this alignment. This work studies the comparative modeling of two motifs where producing the correct sequence-target alignment has been considered to be an extremely difficult problem. AlgorithmIn this work, we extend our methodology for structural motif recognition of mainly-beta structures via the statistical capture of long-distance pairwise beta-strand interactions [2] to attack the more difficult problem of comparative modeling for protein domains with low sequence similarity across families. First, we modify the algorithms to produce an alignment of a target sequence onto an abstract motif template. We then map the predicted sequence-template alignment to known three-dimensional structures and model the sidechains in order to predict the full atomic coordinates. In order to increase the accuracy of this alignment, we generalize our algorithm to operate on a sequence profile rather than a single sequence. Sequence profiles present information about residue conservation at each position, and what types of substitutions are allowed at each location [3]. Our method takes advantage of the fact that beta-strand interactions act as a stabilizing mechanism for mainly-beta structures by considering the potential mutations suggested by the target's profile when evaluating pairwise beta-strand alignments. These techniques, in conjunction with abstract structural templates derived from structural alignments, enables accurate alignment of sequences to these templates. Combining these improvements with a backbone-dependant rotamer library sidechain modeling program [1], we are able to accurately model these motifs. The result is the first algorithm that can produce predicted three dimensional coordinates for mainly-beta protein families of low (less than 15%) sequence similarity from sequence data alone. ResultsA program that implements this new algorithm, BetaWrapPro, was developed, and is available at http://betawrappro.csail.mit.edu. On the 14 known beta-helices, this program produces accurate sequence-structure alignments for 76% of the predicted residues. For the beta-trefoils, 86% of the alignments are accurate. Given these high-quality sequence-structure alignments, BetaWrapPro is able to generate accurate three-dimensional structure predictions for the target motifs. The accurately aligned regions of the beta-helix template average less than 2.0 angstrom C-alpha RMSD, while those of the beta-trefoils average 4.5 angstrom RMSD. We have found that BetaWrapPro is also highly successful at detecting the motifs across these families despite the lack of sequence identity, achieving 100% sensitivity at 99.5% specificity in cross-validation on the beta-helices in our data set, and 100% sensitivity at 92.5% specificity in cross-validation for the beta-trefoils. This is an improvement over the results for BetaWrap [2] ($99.0% sensitivity, 95.0% specificity) and Wrap-and-Pack [4] (88.9%, 96.3%). References[1] M.J. Bower, F.E. Cohen, and R.L. Dunbrack Jr. Prediction of protein sice-chain rotamers from a backbone-dependent rotamer library: A new homology modeling tool. In J. Mol. Biol., 267: pp. 1268--1282, 1997. [2] P. Bradley, L. Cowen, M. Menke, J. King, and B. Berger. BETAWRAP: Successful prediction of parallel beta-helices from primary sequence reveals an association with many microbial pathogens. In Proc Natl Acad Sci, USA, 98: pp. 14819--14824, 2001. [3] M. Gribskov, R. Luthy, and D. Eisenberg. Profile analysis. In Methods in Enzymology, 183: pp. 146--159, 1990. [4] M. Menke, E. Scanlon, J. King, B. Berger, and L. Cowen. Wrap-and-pack: a new paradigm for beta structural motif recognition with application to recognizing beta trefoils. In Proceedings of the eighth annual international conference on Computational molecular biology, pp. 298--307. ACM Press, 2004. |
||
|