We have applied this method to the case of all-alpha protein secondary structure prediction. We worked with a set of 300 non-homologous all-alpha proteins taken from EVA's largest sequence-unique subset of the PDB at the end of July 2005.
For each run of our algorithm, we randomly selected a 150 protein training set and an independent 150 protein test set. The training set is used to learn the elementary free-energies, and the test set is used to evaluate the result. Our predictor minimizes the free-energy function G' using the Viterbi algorithm on a simple 7-state Finite State Machine. The following table summarizes our results. The prediction accuracy is competitive with other state-of-the-art predictors that do not rely on sequence alignment data. Moreover, while some techniques require upwards of 10,000 parameters, our predictor uses only 302 parameters in the form of elementary free-energies [2].
Description | SOV99alpha (%) (train) | SOV99alpha (%) (test) | Qalpha (%) (train) | Qalpha (%) (test) | Training time (s) |
---|---|---|---|---|---|
Best run for SOV99alpha | 76.4 | 75.1 | 79.6 | 78.6 | 123 |
Average of 20 runs | 75.1 | 73.4 | 79.1 | 77.6 | 162 |
Standard deviation of 20 runs | 1.0 | 1.4 | 0.6 | 0.9 | 30 |
This work is a promising first pass at using SVM techniques to find the elementary free-energies needed to predict protein secondary structure. The method we use is general and can be extended beyond the all-alpha case described here. In future work, we plan to extend this method to super-secondary structure prediction, generating contact maps of individual hydrogen bonds in beta sheets.
[1] A. Zemla, Ceslovas Venclovas, K. Fidelis, and B. Rost. A Modified Definition of Sov, a Segment-Based Measure for Protein Secondary Structure Prediction Assessment. Proteins, 34(2), 1999.
[2] Blaise Gassend, Charles W. O'Donnell, William Thies, Andrew Lee, Marten van Dijk, Srinivas Devadas. Secondary Structure Prediction of All-Helical Proteins Using Hidden Markov Support Vector Machines MIT CSAIL Technical Report 1003 (MIT-LCS-TR-1003)
[3] Blaise Gassend, Charles W. O'Donnell, William Thies, Andrew Lee, Marten van Dijk, Srinivas Devadas. Learning Biophysically-Motivated Parameters for Alpha Helix Prediction. Poster at the 10th Annual International Conference on Research in Computational Molecular Biology, Venice Lido, Italy, 2006.
[4] I. Tsochantaridis, T. Hofmann, T. Joachims and Y. Altun. Support Vector Machine Learning for Interdependent and Structured Output Spaces. In Proceedings of the 21st International Conference on Machine Learning, Banff, Canada, 2004.
Computer Science and Artificial Intelligence Laboratory (CSAIL) The Stata Center, Building 32 - 32 Vassar Street - Cambridge, MA 02139 - USA tel:+1-617-253-0073 - publications@csail.mit.edu |