CSAIL Research Abstract

Introduction

Architecture, Systems
& Networks

Language, Learning,
Vision & Graphics

Physical, Biological
& Social Systems

Theory

horizontal line

DNA-Protein Binding and Games

Luis Perez-Breva, Luis E. Ortiz, Tommi Jaakkola & Arvind Jammalamadaka

The Problem

Formulate DNA-Protein binding as a game-theoretic problem, to model the mechanisms responsible for gene expression and regulation.

Motivation

The physical and chemical mechanisms responsible for DNA-Protein interaction are generally known. These mechanisms include: conformational changes induced by polarity and complex formation, formation of hydrogen bonds, and in general, chemical affinity for certain motifs in DNA. They take place locally within certain regions of the DNA once all the participating proteins are on site. However, gene regulation takes place both at a local and a global level, since many sites compete for the same proteins; and the general framework for gene regulation is still subject of debate. The set of genes and proteins for which it has been possible to establish a network of interactions is still small, and even when such network exist the actual mechanisms of the interaction are only partially known. For most of the networks we only have a coarse causal description about the sequence of events that take place.

For example, in eukaryotes, the RNA polymerase II seems to have a very low affinity to bind DNA directly at the initiation sites, instead, transcription relies in the formation of protein complexes that increase the affinity of RNA polymerase II for those sites, thus allowing DNA transcription. These complexes act as gene activators. Conversely, other proteins may bind DNA at certain (possibly different) regions to prevent the formation of the complexes that would enable transcription, and thus act as gene repressors. The formation of these complexes is specific to each site in the DNA, and so are the mechanisms that activate or repress the genes. All these processes are complemented by others that after transcription may prevent or enhance the transit of RNA from the nucleus to the citosol, where translation finally occurs.

Computational and probabilistic methods have enabled the analysis of complex regulatory networks based on protein-protein interactions that hint on molecular pathways. However this methods do not always examine DNA-protein interactions in detail. There are no computational models available to make predictions about the mechanisms that take place in DNA-protein interaction. Without such models, causal relationships in regulatory networks are regarded as binary, and it is hard to make quantitative predictions about the regulatory networks.

Our objective is to model DNA-protein interaction as a competition between proteins and between sites, that allows us to make quantitative predictions about the mechanism of interaction between proteins and DNA.

Given a set of sites and a set of proteins, each site competes with other sites to attract copies of certain proteins. And, from the protein standpoint, the competition is with other proteins in the surroundings of a given site to bind to that site. This setting captures both the local nature of the bindings and the global competition for resources. We note that this analysis can be extended to model other binding strategies such as complex formation and protein coordination.

Previous Work

To the best of our knowledge, modeling of DNA-Protein interaction as a competition is a new approach to the problem of understanding the mechanisms that give rise to biological processes. Traditionally, research in this field [2-9] has focused on the discovery of networks of interaction. In general this is done using causal Bayesian networks to interpret results from micro-array experiments. The Bayesian networks yield a network of interactions between proteins and between these and DNA, that allow making predictions about molecular pathways. In general no attempt is made to capture the details of such interactions, instead, the focus is to lay down a network that allows making predictions about the molecular pathways that will be activated given a set of initial conditions such as the presence or absence of certain proteins.

Our approach addresses a similar question but focuses on the details of the interaction between proteins and DNA, its ultimate goal is to determine the conditions that are required for the interactions to occur, and make quantitative predictions about the chemical species that take part in the process. To some extent, this approach complements previous work, as it can incorporate the knowledge from causal Bayesian networks as conditions for the competition between sites and between proteins. However, this model adds fine-grain detail to regulatory networks; it extends previous approaches by introducing the notion of locality of the interactions, and by specifically addressing the mechanisms of the interaction between DNA and proteins.

Approach

At first we consider only DNA-Protein interactions that do not involve the formation of complexes or coordination. Complexes and coordination will result in later additions to the model in the form of constraints that link the action of a subset of the proteins in the corresponding sites. We further restrict our attention to the processes that occur in the DNA as part of the transcription (gene repression or expression), and we ignore other processes that involve actions on the RNA resulting from transcription.

In this setting, we view proteins and sites as agents in an economy. Sites are viewed as agents with a certain affinity for certain proteins. Over time, sites compete with other sites to attract specific proteins to their surroundings. This introduces locality of the interaction. Proteins are viewed as agents with an affinity for binding a given site, that depends on the chemical equilibrium constant. All the proteins in the surroundings of one site will compete for binding to that site. This model, a double competition between sites and proteins, is based in the theory of competitive equilibrium in abstract economies, that was introduced in economic theory as a generalization of a game in [1].

The procedure we follow to define the model is:

First we define a series of utility functions that each agent of the market aims to maximize. Under certain conditions, these utility functions lead to a (possibly unique) competitive equilibrium over time (initially we focus in competitive equilibriums that exclude the notion of price). After assessing the validity from a biological standpoint of the conditions that guarantee the existence of an equilibrium, we introduce algorithms to find such equilibrium. These algorithms must be combined with methods to estimate site affinities for proteins as well as equilibrium constants that are generally unknown for most protein-DNA interactions studied. The final result is a dynamic equilibrium (the expression of a stationary state) that explains the mechanism by which certain genes come to be expressed. The final step is to evaluate the goodness of the model by comparing its predictions with real data.

Impact

Improving our models of the mechanisms of gene regulation will significantly improve our understanding of protein pathways and the underlying biological processes. This model will complement current models of protein networks by incorporating detail on the causes of the interaction between the proteins and the DNA. Indeed, the added level of detail in the predictions about the mechanisms by which DNA and proteins interact will have an impact in the design of experiments, by incorporating quantitative predictions.

Future Work

Natural extensions to this work include (1) studying cases with protein coordination and complex formation, and (2) introducing the notion of an externally fixed price that will be equivalent to setting the chemical and biological conditions (level of glucose, concentration of ATP, temperature, RNA transport ...) that give rise to different stationary equilibriums, by altering the equilibrium constants and/or the affinities of each site.

References

[1] K. J. Arrow and G. Debreu. Existence of an equilibrium for a competitive economy. Econometrica, 22(3), pp. 265–-290.(1954)

[2] Li et. al. A Map of the Interactome Network of the Metazoan C. elegans. Science, Vol. 303, (Jan 2004).

[3] Giot. et. al. A Protein Interaction Map of Drosophila melanogaster. Science, Vol. 302, (Dec 2003).

[4] Gavin. et. al. Functional organization of the yeast proteome by systematic analysis of protein complexes.Nature, Vol. 415 (Jan 2002).

[5] Ho et. al. Systematic identification of protein complexes in Sacchormyces cerevisiae by mass spectrometry. Nature, Vol. 415 (Jan 2002).

[6] Tong, Lesage, et. al. Global Mapping of the Yeast Genetic Interaction Network. Science, Vol. 303 (Feb 2004).

[7] A. Phizicky, P. Bastiaens, H. Zhu, M. Snyder, and S. Fields. Protein analysis on a proteomic scale. Nature, Vol. 422 (March 2003).

[8] von Mering et. al. Comparative assessment of large-scale data sets of protein-protein interactions. Nature, Vol. 417 (May 2002)

[9] E. Segal, H. Wang and D. Koller. Discovering molecular pathways from protein interaction and gene expression data. Bioinformatics , Vol. 19 (1), pp i264–-i272, February 2003.

Computer Science and Artificial Intelligence Laboratory (CSAIL)
The Stata Center, Building 32 - 32 Vassar Street - Cambridge, MA 02139 - USA
tel:+1-617-253-0073 - publications@csail.mit.edu
(Note: On July 1, 2003, the AI Lab and LCS merged to form CSAIL.)