CIS – “Get to know your neighbors” Seminar Series
“DNA and Big Data”
John H. Maddocks, head of the Laboratory for Computation and Visualization in Mathematics and Mechanics (LCVMM)
Monday, July 5, 2021 3:15 – 4:15pm (CEST)
It is well understood that the sequence of DNA codes for what genes are expressed, and in which variant. This is the realm of bioinformatics: studying patterns, and in particular local variations, in strings of letters of length some billions, and annotating sequence variants with known changes in biological and medical function. In other words WHAT each part of a genome is responsible for. But there is now a widespread consensus that to understand HOW a genome functions, the sequence-dependence of the physical properties of DNA, such as intrinsic shape and stiffness as expressed in its statistical mechanics, are also crucial. In this talk I will describe two ways in which simple machine learning approaches can be applied to big data sets to address the sequence-dependent statistical mechanics of DNA. First, times series data, generated during long duration, fully atomistic, Molecular Dynamics simulations of short DNA fragments can be used to train a local, sequence-dependent, coarse-grain, Gaussian, equilibrium distribution model that we call cgDNA+ . This first part includes the description of some special properties of any Gaussian with a banded stiffness (or inverse covariance) matrix, which are apparently not widely known. Second I will discuss properties of the large ensembles (millions or more elements) of banded Gaussians that are generated by using the cgDNA+ model to scan genomes, thus closing the circle back to bioinformatics. As time permits I will also give examples of how epigenetic base modifications, such as methylation, strongly affect the statistical mechanics properties of DNA.
John Maddocks obtained his D.Phil in applied mathematics from the University of Oxford in 1981. After various postdoctoral positions (Stanford, Oxford, Minnesota) he joined the faculty of the University of Maryland in 1985. He assumed the Chair of Applied Analysis at the EPFL in 1997. Currently he also holds a Visiting Fellowship funded from the Einstein Research Foundation of Berlin. He has published in a wide range of areas of applied mathematics and mechanics, such as robotics and the mechanics of knots, but since moving to the EPFL the bulk of his research efforts have been directed toward understanding the physics of DNA.