Abstracts - 2006
Inferring Gene Interaction Models of BCR Gene Expression in CLL
Corey Kemper, John Fisher & Laurent Vallat
Currently, the disease process for B cell chronic lymphocytic leukemia (B-CLL) is not well understood at the genomic level. We are attempting to infer the disease-dependent regulatory network for B-CLL given time series gene expression data. This requires identifying the relevant genes and then determining how these genes interact with one another time. Given our extremely limited amount of patient data, we must assume a level of simplicity and sparsity in our models.
Gene regulatory networks have been studied for simple organisms, such as yeast. Co-expression networks for a variety of species and cell types have been built that relate genes that behave similarly under varying experimental conditions. These methods, while they do aid in the understanding of how the entire cell regulatory system works, are not disease specific and generally fail to utilize time series data. For genomic research specific to CLL, the work tends to be on a single gene basis, i.e. finding a genetic mutation that is common to a patient subgroup.
CLL results from an acquired unknown injury to the DNA of a single cell in the bone marrow, creating an apoptosis defect. This change in the cell's DNA confers a growth and survival advantage on the cell, and these abnormal lymphocytes accumulate in the blood. CLL is treated by chemotherapy, radiation therapy, biological therapy, or bone marrow transplantation, depending on the exact diagnosis and the progression of the disease. A deeper understanding of the underlying molecular mechanisms that make the cells behave abnormally will allow for better treatment decisions. Additionally, depending on the structure of the network, we will eventually be able to do gene targeting in order to fix what is wrong in the cell.
The first step is acquiring the data. The B cell receptor (BCR) is stimulated in isolated B cells. Using Affymetrix microarray chips, we measure the difference in RNA expression levels for 54,613 genes at 4 timepoints after stimulation.
We now identify genes that are biologically relevant to the BCR pathway. Because stimulating the BCR virtually resets the clock of the cell, we are able to classify genes in terms of their expression profiles over time. We define 5 classes of genes. Classes 1 through 4 are composed of the genes that are predominantly expressed at the corresponding timepoint, e.g. Class 1 genes have high expression at the first timepoint and relatively lower expression at the other three timepoints. The final class is the ‘background’, which is composed of genes that are not differentially activated with BCR stimulation. We use the Expectation-Maximization Algorithm to estimate the parameters of the distributions and the class memberships of each gene.
Given that we have determined which genes we would like to use to infer the network and their class labels, we compute causal, linear matrices that relate one class to another. Using every possible gene pair in Class 1 and Class 2, we compute a matrix F12 that minimizes the squared error of Class 1 genes predicting Class 2 genes. This is repeated for F13, F14, F23, F24, and F34. At this point, every gene has an associated prediction, depending on the label of the gene it is predicting. Those predictions are then combined to best estimate each output gene, while limiting the number of possible inputs with a sparsity constraint. Three sets of networks, one for each patient subgroup, are shown below.
For both the clustering and the predictive modeling, we have used permutation statistics to compute the significance of the results. Additionally, by randomly perturbing the input data, we estimate the stability of the edges in the networks. In each of the three sets of networks, we see that there are few genes that have many outgoing edges. We are currently investigating these genes that appear to be ‘hub’ genes (i.e. those that have > 50 outgoing edges) to see what, if any, role they have been found to play in previous biological experiments.
Acquiring data on the same set of genes when other pathways are active will allow us to further refine what parts of the network are specific to BCR stimulation. Also, further experiments on subsets of the networks (e.g. silencing of particular genes) will allow for biological validation. Finally, the ultimate goal of the project is identifying target genes that relate to the apoptosis defect, predicting the results of altering those genes, and comparing the predictions to the actual experiments.
 Seiya Imoto, Tomoyuki Higuchi, Takao Goto, Kousuke Tashiro, Satoru Kuhara, and Satoru Miyano. Combining Microarrays and Biological Knowledge for Estimating Gene Networks via Bayesian Networks. In Proceedings of the Computational Biosystems Informatics, Stanford, CA, USA, August 2003.
 Dongxiao Zhu and Alfred O. Hero. Gene Co-expression network discovery with controlled statistical and biological significance. In IEEE International Conference on Acoustics, Speech, and Signal Processing, Philadelphia, PA, USA, March 2005.
 Katia Basso, Adam A. Margolin, Gustavo Stolovitzky, Ulf Klein Riccardo Dalla-Favera, and Andrea Califano. Reverse Engineering of Regulatory Networks in Human B Cells. Nature Genetics, Advance Online Publication, pp. 1-9, March 2005.
 Iole Cordone, Serena Masi, Francesca Romana Mauro, Silvia Soddu, Ornella Morsilli, Tiziana Valentini, Maria Luce Vegna, Cesare Guglielmi, Francesca Mancini, Sonia Giuliacci, Ada Sacchi, Franco Mandelli, and Robert Foa. p53 Expression in B-Cell Chronic Lymphocytic Leukemia: A Marker of Disease Progression and Poor Prognosis. Blood, Vol 91, No 11, pp. 4342-4349, June 1998.
 Peter E. Crossen. Genes and Chromosomes in Chronic B-cell Leukemia, Cancer Genet Cytogenet, Vol 94, pp. 44-51, 1997.