Seminars 2015 ‒ SMAT ‐ EPFL

Dr. Erwan Koch
ETHZ
Friday, September 18, 2015
Time 15:15 – Room MA10

Title: Space-time Max-Stable Models with Spectral Separability

Abstract

Natural disasters may have considerable impact on society as well as on (re)insurance industry. Max-stable processes are ideally suited for the modeling of the spatial extent of such extreme events, but it is often assumed that there is no temporal dependence. Only a few papers have introduced spatio-temporal max-stable models, extending the Smith, Schlather and Brown-Resnick spatial processes. These models suffer from two major drawbacks: time plays a similar role as space and the temporal dynamics is not explicit. In order to overcome these defects, we introduce spatio-temporal max-stable models where we partly decouple the influence of time and space in their spectral representations. We introduce both continuous and discrete-time versions. We then consider particular Markovian cases with a max-autoregressive representation and discuss their properties. Finally, we briefly propose an inference methodology which is tested through a simulation study.

(joint work with Paul Embrechts and Christian Robert)

Dr. Shahin Tavakoli
University of Cambridge
Friday, September 25, 2015
Time 15:15 – Room CM 010

Title: Tests for separability in nonparametric covariance operators of random surfaces

Abstract

The assumption of separability of the covariance operator for a random image or hypersurface can be of substantial use in applications, especially in situations where the accurate estimation of the full covariance structure is unfeasible, either for computational reasons, or due to a small sample size. However, inferential tools to verify this assumption are somewhat lacking in high-dimensional or functional data analysis settings, where this assumption is most relevant. We propose here to test separability by focusing on $K$-dimensional projections of the difference between the covariance operator and a nonparametric separable approximation. The subspace we project onto is one generated by the eigenfunctions of the covariance operator estimated under the separability hypothesis, negating the need to ever estimate the full non-separable covariance. We show that the rescaled difference of the sample covariance operator with its separable approximation is asymptotically Gaussian. As a by-product of this result, we derive asymptotically pivotal tests under Gaussian assumptions, and propose bootstrap methods for approximating the distribution of the test statistics when multiple eigendirections are taken into account. We probe the finite sample performance through simulations studies, and present an application to log-spectrogram images from a phonetic linguistics dataset. http://arxiv.org/abs/1505.02023

Joint SCITAS and STATISTICS seminar
Dr. Diego Kuonen
Statoo Consulting & University of Geneva
Thursday, October 1st, 2015
Time 15:15 – Room CHB330

Title: A Statistician’s ‘Big Tent’ View on Big Data and Data Science

Abstract

There is no question that big data have hit the business, government and scientific sectors. The demand for skills in data science is unprecedented in sectors where value, competitiveness and efficiency are driven by data. However, there is plenty of misleading hype around the terms ‘big data’ and ‘data science’. This presentation gives a professional statistician’s ‘big tent’ view on these terms, illustrates the connection between data science and statistics, and highlights some challenges and opportunities from a statistical perspective.

The slides will be made available at

http://www.statoo.com/BigDataScience

and at

http://www.slideshare.net/kuonen/

Prof. Hans Wackernagel
MINES ParisTech
Friday, October 30, 2015
Time 15:15 – Room MA10

Title: Geostatistical change-of-support models for regular and irregular grids

Abstract

In many spatial applications data are collected on small volumes and prediction is required on larger volumes. The modelling of the deformation of the statistical distribution when values are averaged over larger volumes is called the change-of-support problem. An application to air pollution data is discussed. Recently Zaytsev et al. (2015) have examined the application of change-of-support models to irregular grids, which appear in many fields where numerical models are initialized with inputs defined on irregular grids, like e.g. in petroleum reservoir engineering. The irregular grids consist of a great variety of grid cells of different size and shape and are populated with geostatistical simulation algorithms. Two different generalizations of the discrete Gaussian model are discussed from the point of view of their theoretical assumptions and their practical accuracy.

Prof. Simon N. Wood
University of Bath
Friday, November 13, 2015
Time 15:15 – Room MA10

Title: Additive smooth models for big data

Abstract

Motivated by applications in air-pollution monitoring and electricity grid management, this talk will discuss the development of methods for estimating generalized additive models having of order 10^4 coefficients, from of order 10^8 observations. The strategy combines 4 elements: (i) the use of rank reduced smoothers, (ii) fine scale discretization of covariates, (iii) an efficient approach to marginal likelihood optimization, that avoids computation of numerically awkward log determinant terms and (iv) marginal likelihood optimization algorithms that make good use of numerical linear algebra methods with reasonable scalability on modern multi-core processors. 600 fold speed ups can be achieved relative to the previous state of the art meehods. This enables us to estimate spatio-temporal models for UK `black smoke’ air pollution data over the last 4 decades at a daily resolution, where previously an annual resolution was challenging.

Ms. Giulia Cereda
University of Lausanne and Leiden University
Friday, December 11, 2015
Time 15:15 – Room MA10

Title: Nonparametric Bayesian approach to LR assessment in case of rare type match

Abstract

The evaluation of a match between the DNA profile of a stain found on a crime scene and that of a suspect (previously identified) involves the use of the unknown parameter $\mathbf{p}=(p_1, p_2, …)$, (the ordered vector which represents the frequencies of the different DNA profiles in the population of potential donors) and the names of the different DNA types. We propose a Bayesian non parametric method which considers $P$ as a random variable distributed according to the two-parameter Poisson Dirichlet distribution, and discards the information about the names of the different DNA types. The ultimate goal of this model is to evaluate DNA matches in the rare type case, that is the situation in which the suspect’s profile, matching the crime stain profile, is not in the database of reference.

Mr. Stefan Wager
Stanford University
Wednesday, January 14, 2015
Time 15:15 – Room MA12

Title: Bootstrapping Regularizers

Abstract

The success of a high-dimensional statistical procedure often hinges on the quality of its regularization. In this talk I will show how, by using a parametric bootstrap, we can turn noise models into regularizers that are closely tailored to the problem at hand. To give a concrete example, if our estimation procedure takes a noisy matrix X as input, we may have a noise model that specifies whether each cell Xij is perturbed by Gaussian, Poisson, blankout, or some other form of noise. In each case, a different form of regularization is appropriate. Our method can be applied to a wide variety of statistical techniques, including generalized linear models, low-rank matrix estimation, and sequence models used in natural language processing. If we bootstrap from the simplest possible noise model, we usually recover well-known procedures such as ridge penalization for linear regression or singular-value shrinkage for matrix estimation. But, once we are able to specify a more appropriate bootstrap noise model (e.g., Poisson noise for count data), we often get new procedures that can substantially improve on baselines and have surprising theoretical properties. The resulting approach is closely related to James–Stein shrinkage, and to the Efron–Morris theory of empirical Bayes estimation.

This is work is in collaboration with William Fithian, Julie Josse, Percy Liang, and Sida Wang
Prof. Aurore Delaigle
The University of Melbourne
Thursday, January 15, 2015
Time 14:15 – Room MA31

Title: Deconvolution when the Error Distribution is Unknown

Abstract

In nonparametric deconvolution problems, in order to estimate consistently a density or distribution from a sample of data contaminated by additive random noise it is often assumed that the noise distribution is completely known or that an additional sample of replicated or validation data is available. Methods have also been suggested for estimating the scale of the error distribution, but they require somewhat restrictive smoothness assumptions on the signal distribution, which can be hard to verify in practice. Taking a completely new approach to the problem, we argue that data rarely come from a simple, regular distribution, and that this can be exploited to estimate the signal distributions using a simple procedure, often giving very good performance. Our method can be extended to other problems involving errors-in-variables, such as nonparametric regression estimation. Its performance in practice is remarkably good, often equalling (even unexpectedly) the performance of techniques that use additional data to estimate the unknown error distribution
Joint Statistics/Topology Seminar
Ms Katharine Turner
University of Chicago
Friday, February 20, 2015
Time 15:15 – Room MA10

Title: Functional PCA of persistent homology rank functions

Abstract

Persistent homology provides a method to capture topological and geometrical information in data. I will discuss the topological summary statistics called persistent homology rank functions. Under reasonable assumptions, satisfied in almost all applications, the persistent homology rank functions of interest will lie in an affine subspace. This means we can perform PCA. I will look at examples using point processes and real world data involving colloids and sphere packings. Joint work with Vanessa Robins.

This talk is meant to be an introduction to Topological Data Analysis and no background in algebraic topology nor statistics is required.
Dr. Lloyd Elliott
University of Oxford
Friday, March 6, 2015
Time 14:15 – note unusual time! – Room CM104

Title: : Finding genetic effects in the metabolome

Abstract

Many reactions among metabolites are well understood and catalogued. The metabolome summarizes these reactions by placing edges between metabolites that are involved in the same reaction. Recently, it was shown that the edges in the metabolome can arise through the partial correlation matrix of metabolite concentrations observed in serum or plasma [Shin et al. 2014]. In this work, we use a kinship matrix to partition the partial correlations into genetic and non-genetic components. Edges arising from genetic components indicate genetic effects in the metabolome: some aspects of the corresponding reactions must be heritable (for example, they could involve polymorphic enzymes). Our approach generalizes linear mixed models, and has application to any multiple-phenotype inference for samples with a known kinship matrix (for example, to protein expression, or to genome wide association studies of multiple diseases in the presence of patient histories).
Ms Maud Thomas [CANCELLED]
Université Paris Diderot
Friday, March 13, 2015
Time 15:15 – Room MA10

Title: Tail index estimation, concentration and adaptivity

Abstract

In the univariate domain, the fundamental theorem of Extreme Value Theory (EVT) asserts that if a distribution function belongs to the max- domain of attraction of a distribution Gγ then Gγ is necessarily of type: Gγ (x) = exp (-(1 + γx)-1/γ) . One of the most studied estimator of the tail index γ was proposed by Hill in 1975 for γ > 0. Its construction can be split in two steps: first selection of the largest order statistics followed by the estimation of γ from these selected order statistics. The statistician has then to face a bias-variance dilemma: if the number of order statistics is too large, the Hill estimator suffers a large bias, if it is too small, then the variance is large. In this talk, we will combine Talagrand’s concentration inequality for smooth functions of independent exponentially distributed random variables with three major tools of EVT: the quantile transform, Karamata’s repre- sentation for regularly varying functions, and Rényi’s characterisation of the joint distribution of order statistics of exponential samples. This will allow us to first establish concentration inequalities for the Hill process and then build on these concentration inequalities to analyse the performance of a variant of Lepski’s rule in order to propose an adaptive version of the Hill estimator.
Prof. Robin Henderson
Newcastle University
Friday, May 8, 2015
Time 15:15 – Room MA10

Title: Adaptive Treatment and Robust Control

Abstract

There has been steadily increasing statistical interest over the last ten years in the data-based development of optimal dynamic treatment rules. Given a sequence of observations the aim is to choose at each decision time the treatment that maximises some target, taking into account subject-specific history. This is the same fundamental problem that underpins control methodology in applications (primarily engineering) or theory (often mathematical analysis). This talk looks at similarities and differences between the problems considered by the two schools and the methods that are used for their solutions. After describing an approach to dynamic treatment based on structural nested mean models, we examine how established control methods might be adapted for statistical adaptive treatment problems, and how statistical thinking might bring fresh ideas to the control literature, especially as control methods are now increasingly being used in biomedical applications.
Dr. Sophie Hautphenne
The University of Melbourne
Friday, May 22, 2015
Time 15:15 – Room MA10

Title: New computational approaches for branching processes in population biology

Abstract

Branching processes are powerful modelling tools in population biology. They describe how individuals live and reproduce according to specific probability laws, and can be used to answer a wide range of population-related questions. This research project focuses on a tractable class of branching processes called Markovian binary trees. I will discuss some computational methods and future directions to answer questions related to the estimation of model parameters and the modelling of random environments. The resulting techniques can be used to study significant problems in evolutionary and conservation biology, and I will briefly discuss one application on an endangered bird population in New Zealand. The project is funded by the Australian Research Council under a Discovery Early Career Researcher Award.
Prof. Clément Dombry
Université de Franche-Comté
Friday, May 29, 2015
Time 15:15 – Room MA10

Title: Exact simulation of max-stable processes

Abstract

Max-stable processes play an important role as models for spatial extreme events. Their com- plex structure as the pointwise maximum over an infinite number of random functions makes simulation highly nontrivial. Algorithms based on finite approximations that are used in practice are often not exact and computationally inefficient. We will present two algorithms for exact simulation of a max-stable process at a finite number of locations. The first algorithm general- izes the approach by Dieker & Mikosch (2015) for Brown–Resnick processes and it is based on simulation from the spectral measure. The second algorithm relies on the idea to simulate only the extremal functions, that is, those functions in the construction of a max-stable process that ef- fectively contribute to the pointwise maximum. We study the complexity of both algorithms and proove that the second procedure is always more efficient. Moreover, we provide closed expres- sions for their implementation that cover the most popular models for max-stable processes and extreme value copulas. For simulation on dense grids, an adaptive design of the second algorithm is proposed together with a short simulation study