Bayesian Binding Detection of Transcription Factors, PolII, and Histones based on CHIP-chip DataYuan (Alan) Qi, Kenzie MacIsaac, Alex Rolfe, Robin Dowell, Tommi S. Jaakkola & David K. GiffordMotivationChromatin Immunoprecipitation (CHIP)-chip experiments enable the genome wide location analysis of DNA-binding proteins in yeast, mouse, and human cells. We propose a novel statistical method to identify binding locations of transcription factors, PolII, and histones, based on the CHIP-chip data. ChallengesThe CHIP-chip data presents the following challenges to estimate binding locations: First, the data is noisy. The noise may come from experimental procedures, biological mechanisms, and other unknown reasons. The presence of noise and uncertainly necessitates probabilistic modeling. Second, there is interference between nearby probes used in CHIP-chip experiments. The interference makes individual analysis of each probe insufficient. Third, due to the high experimental cost, only a limited amount of repeated experiments have been done. The small amount of data increases the difficulty of data analysis. ApproachTo address these challenges, we formulate the binding problem in a probabilistic graphical model, which models not only the noise in the data, but also the interference between different probes. Then, we jointly estimate the unknown binding events using Bayesian inference on this graphical model. When only a small amount of data is available, Bayesian inference tends to outperform maximum likelihood estimation. Exact Bayesian estimation on this graphical model is computationally prohibitive. Therefore, we develop an efficient approximate Bayesian inference algorithm based on the expectation propagation (EP) framework. This new algorithm estimates both discrete binding events (bound or not) and continuous binding strength. ProgressWe have implemented the algorithm and run it on GCN4, PolII, and histone binding data. Also, we have used the resulting binding posterior estimates to guide motif discovery, achieving encouraging results. FutureWe are currently using the estimated binding posterior distributions to guide motif discovery in higher eukaryotes. Research SupportThis work was supported in part by NIH grant GM69676. |
||
|