Prof. Alexander Gluhovsky [Joint Statistics and SIE Seminar]

Purdue University

Monday, January 21, 2013

16:00 – CM 104

Drawing reliable statistical inference from atmospheric and climate data

Abstract

To learn what your data really say can be challenging. Strong assumptions of standard statistical methods (such as the assumption of a normal distribution for observations or that of linear models for observed time series) are rarely met in practice, which may result in misleading inference. And time series analysis of atmospheric and climate data is additionally problematic because records are often prohibitively short, with only one typically available. The talk will outline these problems and demonstrate how modern computer-intensive resampling methods may provide reliable inference without making questionable assumptions about the data generating mechanism. Addressed, in particular, will be computing confidence intervals for parameters of nonlinear time series as well as the construction of confidence bands for their trends. It will also be discussed how resampling and other statistical methods could be made even more efficient for analyses of atmospheric data by involving novel deterministic chaotic models.

Prof. Miguel de Carvalho

Pontificia Universidad Catolica de Chile

Friday, February 15, 2013

15:15 – MA A3 30

Multivariate Extremes: Modelling, Smoothing, and Regression

Abstract

In this talk I discuss smoothing methods for the spectral measure of an extreme-value distribution. A complex issue in conducting inference over this setting is that the spectral measure needs to obey a certain moment condition. A Euclidean likelihood-based estimator for the spectral measure is discussed, and it is shown to have the same limit distribution as the maximum empirical likelihood estimator of Einmahl and Segers, Ann. Statist. 37, 2953–2989 (2009). For real data applications smooth versions of the empirical estimator may be preferred, but these can be readily constructed by suitably convoluting the weights of our empirical likelihood-based method with a kernel on the simplex. I use this setup for developing a regression model for the spectral measure itself, and illustrate the methods in an extreme temperature data analysis.

Prof. Graciela Boente

Universidad Buenos Aires

Friday, March 8, 2013

15:15 – MA A3 30

Robust inference for functional data

Abstract

When working with more than one population, as in the finite dimensional case, a common assumption is to assume the equality of covariance operators. As in the multivariate setting, assuming equality of covariance operators is not satisfactory since the covariance operators may exhibit some common structure. A natural extension to the functional setting of the common principal components model introduced by Flury (1988) is to assume that the covariance operators have common eigenfunctions but different eigenvalues . This model is known as the functional common principal component (FCPC) model and as in principal component analysis, it may be used to reduce the dimensionality of the data, retaining as much as possible of the variability present in each of the populations. When dealing with just one population, classical PCA, searches for directions with maximal dispersion of the data projected on it. Instead of using the variance as a measure of dispersion, a robust scale estimator may be used in the maximization problem, in order to obtain more resistant procedures. Besides, a well-known property of functional principal components is that they provide the best q-dimensional approximation to random elements over separable Hilbert spaces. A robust approach can also be given based on this property using robust scale functionals to find the lower dimensional linear space that provides the best prediction for the data. During this talk, we review some of the proposed approaches to robust functional PCA. Besides, we will discuss an extension that provides robust approach to estimate the common directions and their size for robust functional principal component analysis.

Dr. Marius Horfert

ETHZ

Wednesday, April 17, 2013

15:15 – CM 106

Modeling dependence in high dimensions: Statistical and computational challenges

Prof. Patricia Solomon [CANCELLED]

The University of Adelaide

Wednesday, May 1st, 2013

15:15 – CM 106

Identifying unusual intensive care performance in Australia and New Zealand

Abstract

Critical care is expensive and costs are increasing as Australia’s population ages. It is important therefore that health-care providers, insurers and patients have available accurate measures of hospital performance to provide a basis for planning, for accountability and to inform public debate. However, provider comparisons via league tables have proven to be methodologically challenging as well as politically controversial as witnessed, for example, in the recent inquiry into Queensland’s Bundaberg Base Hospital. Typically, statistical analyses purporting to measure hospital performance suffer from one or more serious deficiencies, ranging from inadequate risk adjustment to no adjustment for multiple comparisons. In this talk, I will discuss the statistical issues involved in modelling hospital outcomes data and describe our current work on using hierarchical generalised linear mixed models to analyse the Australian and New Zealand Intensive Care Society Adult Patient Database, one of the largest binational databases in the world. This is joint work with Associate Professor John Moran of The Queen Elizabeth Hospital, Adelaide, and Dr Jessica Kasza of the School of Mathematical Sciences, University of Adelaide.

Dr. Anthea Monod

The Technion

Friday, May 17, 2013

15:15 – MA A3 30

Estimating Thresholding Levels for Random Fields via Euler Characteristics

Abstract

We introduce Lipschitz-Killing curvature (LKC) regression, a new method to produce (1-\alpha) thresholds for signal detection in random fields that does not require knowledge of the spatial correlation structure. The idea is to fit the observed empirical Euler characteristics to the Gaussian kinematic formula via generalized least squares, which quickly and easily provides statistical estimates of the LKCs — complex topological quantities that are otherwise extremely challenging to compute, both theoretically and numerically. With these estimates, we can then make use of a powerful parametric approximation of Euler characteristics for Gaussian random fields to generate accurate (1-\alpha) thresholds and p-values. Furthermore, LKC regression achieves large gains in speed without loss of accuracy over its main competitor, warping, which we demonstrate in a variety of simulations and applications. This is joint work with Robert Adler (Technion), Kevin Bartz, and Samuel Kou (Harvard), and supported in part by the US-Israel BSF, ISF, NIH/NIGMS, and NSF.

Dr. Yiannis Papastathopoulos

University of Bristol

Wednesday, June 26, 2013

15:15 – MA A1 12

Graphical structures in extreme multivariate events

Abstract

Modelling and interpreting the behaviour of extremes is quite challenging, especially when the dimension of the problem under study is large. Initially, univariate extreme value models are used for marginal tail estimation and then, the inter-relationships between random variables are captured by modelling the dependence of the extremes. Here, we propose graphical structures in extreme multivariate events of a random vector given that one of its components is large. These structures aim to provide better estimates and predictions of extreme quantities of interest as well as to reduce the problems with the curse of dimensionality. The imposition of graphical structures in the estimation of extremes is approached via simplified parameter structure in maximum likelihood setting and through Monte Carlo simulation from conditional kernel densities. The increase in efficiency of the estimators and the benefits of the proposed method are illustrated using simulated datasets.

Prof. John Aston

University of Warwick

Friday, October 11, 2013

15:15 – MA 30

Title: Functions, Covariances, and Learning Foreign Languages

Abstract

Functional Data Analysis (FDA) is an area of statistics concerned with analysing statistical objects which are curves or surfaces. This makes FDA particularly applicable in phonetics, the branch of linguistics concerned with speech, in that each sound or phoneme which makes up a syllable can be characterised as a time-frequency spectrogram surface. One question of particular significance in phonetics is how languages are related, and concepts as simple as how close are two languages have proved difficult to quantify. Recent work on FDA and phonetics in Mandarin and Qiang (Chinese dialects) has suggested that the use of covariance functions might facilitate the finding of new measures of closeness of languages. However, working with covariance functions immediately raises the issue that the functions lie on a manifold (of positive definite operators) rather than in a standard Euclidean space. Here, a new metric for covariance functions is introduced which allows valid inference for covariance functions, but also which possess good properties when examining extrapolations, something that can be used to determine phonetic relationships. The theory and methodology of the new distance metrics for covariance functions will be illustrated using the some of the Romance languages (the languages which are have Latin as a root). [Joint work with Davide Pigoli (Milan), Pantelis Hadjipantelis (Warwick), Ian Dryden (Nottingham), and Piercesare Secchi (Milan)].

Prof. Dana Sylvan

Hunter College of the City University of New York

Friday, October 18, 2013

15:15 – MA 30

Title: Assessment and visualization of threshold exceedance probabilities in complex space-time settings

Abstract

Many studies link exposure to various air pollutants to respiratory illness, making it important to identify regions where such exposure risks are high. One way of addressing this problem is by modeling probabilities of exceeding specific pollution thresholds. We consider particulate matter with diameter less than 10 microns (PM10) in the North-Italian region Piemonte. The problem of interest is to predict and map the daily exceedance probability for the threshold of 50 micrograms per cubic meter of PM10 based on air pollution data, geographic information, as well as exogenous variables. We use a two-step procedure involving nonparametric modeling in the time domain, followed by spatial interpolation. Block bootstrap is employed to evaluate the uncertainty in these predictions. — joint work with M. Cameletti and R. Ignaccolo

Prof. Bernd Sturmfelds

University of California, Berkeley

Wednesday, October 23, 2013

17:15 – CM5

Title: Maximum Likelihood for Matrices with Rank Constraints

Abstract

Maximum likelihood estimation is a fundamental computational task in statistics. We discuss this problem for manifolds of low rank matrices. These represent mixtures of independent distributions of two discrete random variables. This non-convex optimization problems leads to some beautiful geometry, topology, and combinatorics. We explain how numerical algebraic geometry is used to find the global maximum of the likelihood function, and we present a remarkable duality theorem due to Draisma and Rodriguez.

Dr. David Kraus

Universität Bern

Friday, November 15, 2013

15:15 – MA30

Title: Analysis of Incomplete Functional Data

Abstract

Techniques of functional data analysis have been developed for analysing collections of functions, for example curves, surfaces or images. It is customary to assume that all functions are completely (or densely) observed on the same domain. In this work we extend the scope of application of functional data analysis to situations where each functional variable may be observed only on a subset of the domain while no information about the function is available on the complement. For this partial observation regime, we develop main tools of functional data analysis, such as estimators of the mean function and covariance operator and principal components analysis, and show how individual incomplete functions can be recovered form observed fragments. Our work is motivated by a data set from ambulatory blood pressure monitoring where only parts of temporal profiles are observed. This work was done when the author was employed at the University Hospital Lausanne (CHUV).

Ms Susan Wei [CANCELLED]

University of North Carolina at Chapel Hill

Date: TBA

Title: Latent Supervised Learning

Abstract

Machine learning is a branch of artificial intelligence concerning the construction of systems that can learn from data. Algorithms in machine learning can be placed along a spectrum according to the type of input available during training. The two main machine learning algorithms, unsupervised and supervised learning, occupy either end of this spectrum. In this talk I will overview some of my recent research on machine learning tasks that fall somewhere in the middle of this spectrum. I will primarily focus on a new machine learning task called latent supervised learning, where the goal is to learn a binary classifier from continuous training labels that serve as surrogates for the unobserved class labels. A specific model is investigated where the surrogate variable arises from a two-component Gaussian mixture with unknown means and variances, and the component membership is determined by a hyperplane in the covariate space. A data-driven sieve maximum likelihood estimator for the hyperplane is proposed, which in turn can be used to estimate the parameters of the Gaussian mixture. Extensions of the framework to survival data and applications to estimating treatment effect heterogeneity will also be discussed.

Dr Irina Irincheeva

Nestle Institute of Health Sciences

Friday, December 13, 2013

15:15 – MA10

Title: Exploring for non-random missingness

Abstract

Multiple imputations of data Missing At Random (MAR) have now become routine and there are automated procedures to perform it. In practice it is difficult to objectively evaluate how appropriate is the hypothesis of missingness at random. Inspired by a real application, we develop a model for the data Missing Not At Random (MNAR). We use simulations to assess the performance of our approach and to explore the differences between MAR and MNAR imputations.