Past Seminars 2007-2018

Prof. Alexandra Dias
University of Warwick
February 19, 2008

Semi-parametric Estimation of Portfolio Tail Probabilities


In this talk we estimate the probability of occurrence of a large portfolio loss. This amounts to estimate the probability of an event in the far joint-tail of the portfolio loss distribution. These are rare extreme events and we use a semi-parametric procedure from extreme value theory to estimate its probability. We find that the univariate loss distribution is heavy tailed for three market indices and that there is dependence between large losses in the different indices. We estimate the probability of having a large loss in a portfolio composed by these indices and analyze the impact of the portfolio components weights on the probability of a large portfolio loss. With this procedure we are able to estimate the probability of portfolio losses never incurred before without having to assume a parametric dependence model. Furthermore, increasing the number of portfolio components does not bring complications to the estimation.
Prof. Alireza Nematollahi
Shiraz University
February 29, 2008

Periodically Correlated Time Series and Their Spectra


This talk is divided in two parts: Part I briefly introduces the spectral analysis problem, motivates the definition of power spectral density functions, and reviews some important and new techniques in nonparametric and parametric spectral estimation. We also consider the problem in the context of multivariate time series. In the second part, we consider periodically correlated (cyclostationary)time series, their spectra and also the estimation of these spectra, using the techniques introduced in Part I. We use the well known relation between the spectral density matrix of a periodically correlated time series and a stationary vector time series (Gladyshev, 1961). The results we derive here for multivariate time series are of general interest, and can be used in the estimation of stationary vector time series. These can also be used for the estimation of vector AR and ARMA models. The method of estimation is illustrated with simulated and real time series.
Mr. Thomas Fournier
University of Fribourg
March 10, 2008

A Self-Regulated Gene Network


In the last few years, the understanding of gene expression and regulation mechanisms has attracted a wide interest in the scientific community. A popular approach is to model the system as a time-continuous Markov process with values in a countable or finite space. In this talk, after briefly describing the historical background of the mathematical models of (bio-)chemical reactions and the main differences between deterministic and stochastic models, I will discuss mathematically a class of simple self-regulated genes which are the building blocks for many regulatory gene networks, and present a closed formula for the steady state distribution of the corresponding Markov process as well as an efficient numerical algorithm. This approach replace advantageously the time-consuming simulation using the Gillespie algorithm. Based on these results, I will present a realistic self-regulated network that works as a potent genetic switch, and show that this approach exhibits the main features observed experimentally.
Mr. David Kraus
Charles University Prague
March 14, 2008

Data-Driven Smooth Tests in Survival Analysis


The problem of comparison of two samples of possibly right-censored survival is considered. The aim is to develop tests capable of detection of a wide spectrum of alternatives which is useful when there is no clear advance idea of the departure from the null hypothesis. A new class of tests based on Neyman’s embedding idea is proposed. The null hypothesis is tested against a model described by several smooth functions. A data-driven approach to the selection of these functions is studied, i.e., the test is performed against an alternative which is selected based on the observed data. These tests are constructed for two situations: first, we compare survival distributions in two samples, second, we compare two samples under competing risks (comparison of cumulative incidence functions). The small-sample performance is explored via simulations which show that the power of the proposed tests appears to be more stable than the power of some versatile tests previously proposed in the literature. Real data illustrations are given.
Prof. Lutz Duembgen
University of Bern
April 4, 2008

P-Values for Computer-Intensive Classifiers


In the first part I will review briefly traditional aproaches to (model-based) classification and discuss some conceptual problems. I will argue that a more convincing approach are certain p-values for class memberships. These enable us to quantify the uncertainty when classifying a single future observation. Classical results about (Bayes-) optimal classifiers carry over to this new paradigm. We claim that any reasonable classifier can be modified to yield non-parametric p-values for classification. Some simulated and real data sets will illustrate our approach. (This is joint work with Axel Munk (Goettingen) and Bernd-Wolfgang Igl (Luebeck).)
Prof. Werner Stahel
ETH Zurich
May 23, 2008

Linear Mixing Models: Models, Estimation, and “Target Testing”


Monitoring stations collect data on a number $m$ of chemical compounds automatically in short intervals. We study several sets of one year of hourly data on 17 volatile organic compounds (VOC)Such data can be used to identify and quantify the contributions^of several sources of emission, even if they are unknown: Suppose that the pollution is generated by a small number $p SOME TEXT MISSING…

Statistics Seminars – Autumn 2008

Prof. Michael Wolf
Universität Zürich
October 3, 2008

Formalized Data Snooping Based on Generalized Error Rates


It is common in econometric applications that several hypothesis tests are carried out simultaneously. The problem then becomes how to decide which hypotheses to reject, accounting for the multitude of tests. The classical approach is to control the familywise error rate (FWE) which is the probability of one or more false rejections. But when the number of hypotheses under consideration is large, control of the FWE can become too demanding. As a result, the number of false hypotheses rejected may be small or even zero. This suggests replacing control of the FWE by a more liberal measure. To this end, we review a number of recent proposals from the statistical literature. We briefly discuss how these procedures apply to the general problem of model selection. A simulation study and two empirical applications illustrate the methods.
Prof. Anestis Antoniadis
Université Joseph Fourier (Grenoble)
October 10, 2008

The Dantzig Selector in the Regression Model for Right-Censored Data


The Dantzig Selector is an approach that has been proposed recently for performing variable selection in high-dimensional linear regression models with a large number of explanatory variables and a relatively small number of observations. As in the least absolute shrinkage and selection operator (LASSO), this approach sets certain regression coefficients to exactly zero, thus performing variable selection. However, such a framework, contrary to the LASSO, has never been used in regression models for survival data with censoring. A key motivation of this work is to study the variable selection problem for Cox’s proportional hazards function regression models using a framework that extends the theory, the computational advantages and the optimal asymptotic rate properties of the Danzig selector to the much larger class of Cox’s proportional hazards under appropriate sparsity scenarios.
Swiss Statistics Seminar
October 24, 2008

Event Programme


Mr. Marcel Baumgartner
Nestlé, Vevey
Prof. Marloes Maathuis
ETH Zürich
Dr. Fabrizio Ruggeri
Istituto di Matematica Applicata e Tecnologie Informatiche, Milano
Prof. Fred Hamprecht
Universität Heidelberg
Prof. Omiros Papaspiliopoulos
Universitat Pompeu Fabra, Barcelona
Prof. Wilfrid Kendall
University of Warwick
October 30, 2008

Short-Length Routes in Low-Cost Networks via Poisson Line Patterns
(Joint work with David Aldous)


How efficiently can one move about in a network linking a configuration of n cities? Here the notion of "efficient" has to balance (a) total network length against (b) short network distances between cities: this is a problem in "frustrated optimization", linked to the notion of geometric spanners in computational geometry. Aldous and I have shown how to use Poisson line processes and methods from mathematical stereology to produce surprising networks which are nearly of shortest total length, and yet which make the average inter-city distance almost Euclidean. I will discuss this work and further developments: (a) describing actual geodesic paths, (b) exploring the distribution of flow statistics through a typical graph line segment.

Aldous, D.J. & Kendall, W.S. (2008). Short-length routes in low-cost networks via Poisson line patterns. Adv. in Appl. Probab., 40 (1), 1-21.
Prof. John Haslett
Trinity College Dublin
November 7, 2008

A Simple Monotone Process with Application to Radiocarbon-Dated Depth Chronologies


We propose a new and simple continuous Markov monotone stochastic process and use it to make inference on a partially observed monotone stochastic process. The process is piecewise linear, based on additive independent gamma increments arriving in a Poisson fashion. An independent increments variation allows very simple conditional simulation of sample paths given known values of the process. We take advantage of a reparameterization involving the Tweedie distribution to provide efficient computation. The motivating problem is the establishment of a chronology for samples taken from lake sediment cores, i.e. the attribution of a set of dates to samples of the core given their depths, knowing that the ageÐdepth relationship is monotone. The chronological information arises from radiocarbon (14C) dating at a subset of depths. We use the process to model the stochastically varying rate of sedimentation.

Haslett, J. & Parnell, A. (2008). A simple monotone process with application to radiocarbon-dated depth chronologies. J. Roy. Statist. Soc. C, 57 (4): 399 – 418.

Haslett, J., Allen, J.R.M., Buck, C.E. & Huntley, B. (2008). A flexible approach to assessing synchroneity of past events using Bayesian reconstructions of sedimentation history. Quaternary Science Reviews 27 (19): 1872-1885. (link).
Prof. Rainer Dahlhaus
Universität Heidelberg
November 11, 2008

Statistical inference for Locally Stationary Processes


Locally stationary processes are models for nonstationary time series whose behaviour can locally be approximated by a stationary process. In this situation the classical characteristics of the process such as the covariance function at some lag k, the spectral density at some frequency lambda, or eg the parameter of an AR(p)-process are curves which change slowly over time. The theory of locally stationary processes allows for a rigorous asymptotic treatment of various inference problems for such processes. We present different estimation and testing results for locally stationary processes. In particular we discuss nonparametric maximum likelihood estimation under shape restrictions. Empirical process theory for the theoretical treatment of such problems plays a major role. We define an empirical spectral process indexed by a function class and use this process to derive various results on estimation and testing. As a technical tool we derive an exponential inequality and a functional central limit theorem for the empirical spectral process.

Dahlhaus, R. (2000). A likelihood approximation for locally stationary processes. Ann. Statist. 28, 1762-1794.

Dahlhaus, R. and Polonik, W. (2006). Nonparametric quasi maximum likelihood estimation for Gaussian locally stationary processes. Ann. Statist. 34, No. 6, 2790-2824.

Dahlhaus, R. and Polonik, W. (2008). Empirical spectral processes for locally stationary time series. Bernoulli, to appear.
Prof. Nanny Wermuth
Chalmers University of Technology & Göteborg University
President of the Institute of Mathematical Statistics (IMS)
November 21, 2008

Consequences of Research Hypotheses Captured by Special Types of Independence Graph


A joint density of several variables may satisfy a possibly large set of independence statements, called its independence structure. Often this structure is fully representable by a graph that consists of nodes representing variables and of edges that couple node pairs. We consider joint densities of this type, generated by a stepwise process in which all variables and dependences of interest are included. Otherwise, there are no constraints on the type of variables or on the form of the distribution generated. For densities that then result after marginalising and conditioning, we derive what we name the summary graph. It is seen to capture precisely the independence structure implied by the generating process, it identifies dependences which remain undistorted due to direct or indirect confounding and it alerts to possibly severe distortions of these two types in other parametrizations. We use operators for matrix representations of graphs to derive matrix results and translate these into to special types of path.
Prof. Thomas Scheike
University of Copenhagen
December 4, 2008

Estimating Haplotype Effects for Survival Data


We here describe how simple estimating equations can be used for Cox's regression model in the context of assessing haplotype effects for surival data. The estimating equations are simple to implement and avoids the use of the EM algorithm that in the context of the semiparametric Cox model may be slow. The estimating equations also lead to direct estimators of standard errors that are easy to compute, and thus overcome some of the difficulty with obtaining variance estimators b ased on the EM algorithm in this setting. We also develop a useful and simple to implement goodness-of-fit procedures for Cox's regression model in the context of haplotype models. Finally, we use the developed procedures for data that investigate the possible haplotype effects of the PAF-receptor on cardiovascular events in patients with coronary artery disease and compare our results to those based on the EM-algorithm.

Prof. Angelika May
University of Siegen
March 12, 2007

Copula Functions as a Tool for Modelling Dependent Data


The talk will give an introduction into the concept of copula functions, focussing on the separate statistical treatment of the marginal distribution functions and the dependence concept. We discuss several measures for the strenght of dependence beteween data. In financial and actuarial applications, some data show asymmteric dependence in the tails. If this is the case, a transformed copula within the class of Archimedean copulas seem to be an appropriate choice. Despite of nice analytical properties that can easily be deduced, we will show that this approach causes problems for higher dimensions.
Prof. Elizabeth Smith
University of Newcastle
March 23, 2007

A Max-Stable Process Model for Spatial Extremes


The extremes of environmental processes are often of interest due to the damage that can be caused by extreme levels of the processes. These processes are often spatial in nature and modelling the extremes jointly at many locations can be important. A model which enables data from a number of locations to be modelled under a more flexible framework than in previous applications will be discussed. The model is applied to annual maximum rainfall data from five sites in South-West England. A pairwise likelihood is used for estimation and a Bayesian analysis is employed, allowing the incorporation of informative prior information.
Dr. Wolfgang Huber
European Bioinformatics Institute/European Molecular Biology Laboratory
April 4, 2007

Automated Image Analysis for High-Throughput Cell-Based Microscopy Assays with R and Bioconductor


Advances in automated microscopy have made it possible to conduct large scale cell-based assays with image-type phenotypic readouts. In such an assay, cells are grown in the wells of a microtitre plate or on a glass slide under a condition or stimulus of interest. Each well is treated with one of the reagents from the screening library and the response of the cells is monitored, for which in many cases certain proteins of interest are antibody-stained or labeled with a GFP-tag. The resulting data can be in the form of two-dimensional (2D) still images, 3D image stacks or image-based time courses. RNA interference (RNAi) libraries can be used to screen a set of genes (in many cases the whole genome) for the effect of their loss of function in a certain biological process. I will talk about some of the statistical and data analytic issues, and the tools in Bioconductor, in particular, the cellHTS, prada and EBImage packages.
Dr. Robert Gentleman
Fred Hutchinson Cancer Research Center
April 4, 2007

Assessing the Role Played by Multi-Protein Complexes in Determining Phenotype


While proteins are the primary mechanism by which cells carry out the various molecular processes needed for life, it is also true that proteins seldom act alone. Rather they often form multi-protein complexes that carry out particular functions. Using published, known multi-protein complexes, and pathways, in yeast we develop a number of statistical approaches to help elucidate the involvement of different levels of organization on observed changes in phenotype that arise from single gene manipulation experiments (deletion, mutation, up-regulation etc).
Prof. P. R. Parthasarathy
IIT Madras
June 1, 2007

Stochastic Models of Carcinogenesis

Dr. Nadja Leith
University College London
June 8, 2007

Addressing Uncertainty in Numerical Climate Models


IIt is recognised that projections of future climate can differ widely between climate models and it is therefore necessary to account for climate model uncertainty in any risk assessment exercise. Here we suggest that a hierarchical statistical model, implemented in a Bayesian framework, provides a logically coherent and interpretable way to think about climate model uncertainty in general. The ideas will be illustrated by considering the generation of future daily rainfall sequences at a single location in the UK, based on the outputs of four different climate models under the SRES A2 emissions scenario.
Prof. P. R. Parthasarathy
IIT Madras
June 8, 2007

Applied Birth and Death Models

Prof. P. R. Parthasarathy
IIT Madras
June 22, 2007

Exact Transient Solution of State-Dependent Queues

Statistics Seminars – Autumn 2007

Prof. Jon Forster
University of Southampton
October 5, 2007

Bayesian Inference for Multivariate Ordinal Data


Methods for investigating the structure in contingency tables are typically based on determining appropriate log-linear models for the classifying variables. Where one or more of the variables is ordinal, such models do not take this property into account. In this talk, I describe how the multivariate probit model (Chib and Greenberg, 1998) can be adapted so that ordinal data models can be compared using Bayesian methods. By a suitable choice of parameterisation, the conditional posterior distributions are standard and are easily simulated from, and reversible jump Markov chain Monte Carlo computation can be used to estimate posterior model probabilities for undirected decomposable graphical models. The approach is illustrated with various examples.
Dr. Parthanil Roy
ETH Zurich
November 2, 2007

Ergodic Theory, Abelian Groups, and Point Processes Associated with Stable Random Fields


We consider the point process sequence $ \big\{\sum_{\|t\|_\infty \leq n} \delta_{b_n^{-1}X_t}:\, n \geq 1 \big\}$ induced by a stationary symmetric $\alpha$-stable $(0 < \alpha < 2)$ discrete parameter random field $\{X_t\}_{t \in \mathbb{Z}^d}$ for a suitable choice of scaling sequence $b_n \uparrow \infty$. It is easy to prove, following the arguments in the one-dimensional case in Resnick and Samorodnitsky (2004), that if the random field is generated by a dissipative $\mathbb{Z}^d$-action then $b_n=n^{d/\alpha}$ is appropriate and with this choice the above point process sequence converges weakly to a cluster Poisson process. For the conservative case, no general result is known even when $d=1$. In this talk, we look at a specific class of stable random fields generated by conservative actions for which the effective dimension $p \leq d$ can be computed using the structure theorem of finitely generated abelian groups and some basic counting techniques. For this class of random fields, in order to incorporate the clustering effect of extreme observations due to longer memory, we need to normalize the point process itself in addition to using a scaling sequence $b_n =n^{p/\alpha}$. The weak limit of this normalized point process happens to be a random measure but not a point process. A number of limit theorems for various functionals of the random field can be obtained by continuous mapping arguments from these weak convergence results. (This talk is based on a joint work with Gennady Samorodnitsky.)
Prof. Stephen Stigler
University of Chicago
November 20, 2007

The 350th Anniversary of the Birth of Probability


The first printed work in probability was published in 1657 by Christian Huygens. A part of that work is discussed and its connections with modern ideas of risk analysis brought out. One of the "early adopters" was Isaac Newton, who nonetheless made a subtle and heretofore unnoticed error in applying the work.
Prof. Yangyuan Ma
University of Neuchatel
November 23, 2007

Locally Efficient Estimators for Semiparametric Models With Measurement Error


We derive constructive locally efficient estimators in semiparametric measurement error models. The setting is one where the likelihood function depends on variables measured with and without error, where the variables measured without error can be modelled nonparametrically. The algorithm is based on backfitting. We show that if one adopts a parametric model for the latent variable measured with error and if this model is correct, then the estimator is semiparametric efficient; if the latent variable model is misspecified, our methods lead to a consistent and asymptotically normal estimator. Our method further produces an estimator of the nonparametric function that achieves the standard bias and variance property. We extend the methodology to allow for parameters in the measurement error model to be estimated by additional data in the form of replicates or instrumental variables. The methods are illustrated via a simulation study and a data example, where the putative latent variable distribution is a shifted lognormal, but concerns about the effects of misspecification of this assumption and the linear assumption of another covariate demand a more model-robust approach. A special case of wide interest is the partially linear measurement error model. If one assumes that the model error and the measurement error are both normally distributed, then our estimator has a closed form. When a normal model for the unobservable variable is also posited, our estimator becomes consistent and asymptotically normally distributed for the general partially linear measurement error model, even without any of the normality assumptions under which the estimator is originally derived. We show that the method in fact reduces to a same estimator in Liang et al. (1999), thus showing a previously unknown optimality property of their method.
Prof. Paul Emrechts
ETH Zurich
November 30, 2007

VaR-based Risk Management: Sense and (non-)Sensibility


Quantitative Risk Management has as one of its aims the calculation/estimation of risk capital for banks and insurance companies. The standard method used is referred to as VaR, Value-at-Risk, and mathematically corresponds to a (typically high) quantile of a so-called P&L, Profit-and-Loss distribution. Over the recent years, we have witnessed several extreme events in financial markets (including the recent subprime crisis) for which VaR-based risk management did not really work. I will critically discuss this issue and point at directions of research in statistics which may be helpful for finding better models for so-called high-risk scenarios. The talk should be accessible to a more general audience.
Prof. Marloes Maathuis
ETH Zurich
December 7, 2007

Computation of the MLE for Bivariate Interval Censored Data


I will consider the nonparametric maximum likelihood estimator (MLE) for the bivariate distribution of (X,Y), when realizations of (X,Y) cannot be observed exactly, but are only known to lie in certain rectangular regions. Such data arise for example in HIV/AIDS studies. I will discuss the computation of the MLE for this type of data, and will illustrate the approach using the new R-package 'MLEcens'.
Prof. James Carpenter
London School of Hygiene & Tropical Medicine
December 13, 2007

Multilevel Models with Multivariate Mixed Response Types


We build upon the existing literature to propose a class of models for multivariate mixtures of normal, ordered or unordered categorical responses and non-normal continuous distributions, each of which can be defined at any level of a multilevel data hierarchy. We sketch a MCMC algorithm for fitting such models. We show how this unifies a number of disparate problems. The 2-level model is considered in detail, and applied to multiple imputation for missing data. We conclude with a discussion outlining possible extensions and connections in the literature. Beta-software, for Windows, for estimating a two-level version of the model is freely available from under 'software'. Joint work with: Harvey Goldstein (Bristol University) and Mike Kenward (LSHTM).