CSAIL Research Abstracts - 2005 link to http://publications.csail.mit.edu/abstracts/abstracts05/index.html link to http://www.csail.mit.edu
bullet Introduction bullet Architecture, Systems
& Networks
bullet Language, Learning,
Vision & Graphics
bullet Physical, Biological
& Social Systems
bullet Theory bullet

horizontal line

Bayesian Binding Detection of Transcription Factors, PolII, and Histones based on CHIP-chip Data

Yuan (Alan) Qi, Kenzie MacIsaac, Alex Rolfe, Robin Dowell, Tommi S. Jaakkola & David K. Gifford


Chromatin Immunoprecipitation (CHIP)-chip experiments enable the genome wide location analysis of DNA-binding proteins in yeast, mouse, and human cells. We propose a novel statistical method to identify binding locations of transcription factors, PolII, and histones, based on the CHIP-chip data.


The CHIP-chip data presents the following challenges to estimate binding locations:

First, the data is noisy. The noise may come from experimental procedures,  biological  mechanisms, and other unknown reasons. The presence of noise and uncertainly necessitates probabilistic modeling.

Second, there is interference between nearby probes used in CHIP-chip experiments.  The interference makes individual analysis of each  probe insufficient.

Third, due to the high experimental cost, only a limited amount of repeated experiments have been done. The small amount of data increases the difficulty of data analysis.


To address these challenges, we formulate the binding problem in a probabilistic graphical model, which models not only the noise in the data, but also the interference between different probes.

Then, we jointly estimate the unknown binding events using Bayesian inference on this graphical model. When only a small amount of data is available, Bayesian inference tends to outperform maximum likelihood estimation.

Exact Bayesian estimation on this graphical model is computationally prohibitive. Therefore, we develop an efficient approximate Bayesian inference algorithm based on the expectation propagation (EP) framework. This new algorithm estimates both discrete binding events (bound or not) and continuous binding strength.


We have implemented the algorithm and run it on GCN4, PolII, and histone binding data. Also, we have used the resulting binding posterior estimates to guide motif discovery, achieving encouraging results.


We are currently using the estimated binding posterior distributions to guide motif discovery in higher eukaryotes.

Research Support

This work was supported in part by NIH grant GM69676.

horizontal line

MIT logo Computer Science and Artificial Intelligence Laboratory (CSAIL)
The Stata Center, Building 32 - 32 Vassar Street - Cambridge, MA 02139 - USA
tel:+1-617-253-0073 - publications@csail.mit.edu
(Note: On July 1, 2003, the AI Lab and LCS merged to form CSAIL.)