Lab Introduction and Succinct Description of Research Activities
New technologies allow for comprehensive characterization of the molecular processes that cause a healthy cell to become cancerous. These technologies produce vast amounts of data. We develop computational methods that will help to extract insights and knowledge from such data.
Cancer can be considered a gene regulatory disease. Normal regulation of genes permits the development and maintenance of a healthy human being. Abnormal regulation leads to various diseases. Cancer cells are maintained in a specific pathological state by gene regulatory circuits. Transcription factors are key elements of such circuits in that they control the expression of other genes while themselves being regulated by the products of genes. Our research aims at an understanding of transcriptional regulatory mechanisms in general, but with a particular focus on those which are affected by genetic lesions that cause cancer.
We are currently pursuing two main research directions. One consists of using epigenetic profiles to classify gene regulatory regions, and to infer their activity status during development and across tissues. The other one focuses on so-called ultraconserved elements, DNA sequences which are almost 100% identical among all vertebrate species. Both projects have as a common goal to crack the still largely enigmatic regulatory code of our genome. For the analysis of epigenetic data, we have developed a probabilistic algorithm to extract prototypical patterns of histone modifications around in vivo transcription factor binding sites. The application of this new method to recently published ChIP-Seq data unambiguously confirmed earlier speculations that some transcription factors bind to nucleosome-free regions whereas others bind to DNA that is wrapped around nucleosomes. In order to gain insights into the function of ultraconserved sequence elements, we have studied their fate after the whole genome duplication event that has occurred in teleost fishes. The retention patterns observed in five completely sequenced fish genomes can possibly provide important clues about cis-regulatory interactions between such elements.
We are also interested in the use of molecular profiling data for medical diagnostic applications. In previous years, we tested several machine learning approaches to predict survival of breast cancer patients from microarray-based gene expression data. We used our experience gained in this work to diagnose Acute Myeloid Leukaemia from flow cytometry data. The training and test data for this projects were provided by the FlowCAP consortium (Flow Cytometry: Critical Assessment of Population Identification Methods) as part of a machine learning challenge organized by the DREAM project (Dialogue for Reverse Engineering Assessments and Methods).
Besides research, our group also develops and maintains bioinformatics databases and web servers. We have introduced a completely redesigned version of the Eukaryotic Promoter Database EPD, with greatly increased coverage of the human and mouse genomes. The new resource is automatically compiled from so-called mass genome annotation data (ChIP-Seq and RNA-Seq) stored in a local repository. A special effort was made to offer powerful visualization tools for exploring the epigenetic environment (nucleosome positions, histone modifications, DNA methylation) of selected promoters (see Figure 1). In parallel to EPD, we have extended the functionality of the ChIP-Seq Web server, a comprehensive online resource for analysing ChIP-Seq data and other types of mass genome annotation data, by providing access to a very large amounts of preprocessed ChIP-Seq and RNA-Seq profiles. Moreover, in collaboration with bio-engineers in the USA, we developed a new web server called ZFN-site which can be used to search genomes for target and off-target sites of zinc finger nuclease. Such enzymes are important genetic engineering tools for site-directed mutagenesis in large genomes.
Figure 1. Epigenetic context of the human MGMT promoter displayed by the new EPD viewer in a UCSC browser window. The picture shows the location of transcription start sites, the abundances of two promoter-specific histone variants, the methylaton status of the DNA and the distribution of RNA polymerase II. The DNA methylation status of the MGMT promoter in certain cancer types is currently evaluated as a predictive marker for therapy choice.
Part of the work described above was carried out in collaboration with the groups of Nicolas Mermod (University of Lausanne, EPFL), Didier Trono (EPFL), Anne Grapin-Botton (University of Copenhagen), Cedric Notredame (Center for Genomic Regulation, Barcelona), Thomas J. Cradick and Anton P. McCaffrey (University of Iowa School of Medicine).
Promoter structure, gene expression database, DNA sequence computational analysis, ChIP-Seq analysis, transcription factor, consensus sequence-based motifs.