Massive data presents a fundamental challenge to learning algorithms, which is captured by the following computational dogma: Running time of a learning algorithm increases with the size of the data. The available computational power, however, is growing slowly relative to data sizes. Hence, large-scale problems of interest require increasingly more time to solve.
Our recent research has led us to show that this dogma is false in general, and supports an emerging perspective in computation: Data should be treated as a resource that can be traded off with other resources, such as running time. A detailed understanding of time-data trade-offs, however, requires the two to be studied jointly, and such interdisciplinary theoretical studies are in their infancy. Existing approaches are too specialized, and do not support rigorous distributed implementations. Perhaps most crucially, they only aim at establishing a trade-off, but not characterizing its optimality.
The ERC Time-Data project confronts these challenges in a unified mathematical framework by developing new convex optimization theory and methods for large-scale optimization, and characterizing the time and data required to reach approximate solutions along with their statistical guarantees. Our mathematical framework, in tandem with the new universal primal-dual algorithms and geometric estimators for non-linear models, are expected to change the way we treat data in statistical sciences, promising substantial computational flexibility for data-driven learning.
Information scalable optimization and data acquisition
Many natural and man-made signals can be described as having a few degrees of freedom relative to their size due to natural parameterizations or constraints; examples include bandlimited signals, collections of signals observed from multiple viewpoints in a network-of-sensors, and per-flow traffic measurements of the Internet. Low-dimensional models (LDM) mathematically capture the inherent structure of such signals via combinatorial and geometric data models, such as sparsity, unions-of-subspaces, manifolds, and mixtures of factor analyzers, and are emerging to revolutionize the way we treat inverse problems (e.g., signal recovery, parameter estimation, or structure learning) from dimensionality-reduced or incomplete data.
One outstanding application of LDM’s is compressive sensing (CS), which exploits sparse representations. While CS recovery algorithms based on convex optimization seek sparse solutions, they—staggeringly—never take advantage of the crucial low-dimensional scaffold upon which the CS problem resides. Rather, considerable effort has been spent on developing sub-optimal, greedy approaches for computational scalability, but has thus far failed to establish a disciplined methodology one must follow when approaching new problems, such as linear programming approaches for approximate dynamic programming (e.g., to solve large scale Markov decision problems prevalent in control systems).
Assuming our problem resides on a sparse set via learning, we investigate how to integrate combinatorial, sparse projections in convex optimization algorithms for massive gains in computational complexity. We believe that such gains will only incur minimal penalties in accuracy or algorithmic convergence rates due to the existence of a supporting sparse set. We study such trade-offs and their implications to develop extremely efficient, unconventional optimization algorithms, and—more importantly—to unify convex and combinatorial optimization that can enable cross-pollination of decades of research in both.
The method of randomized, non-adaptive acquisition is key in CS for providing algorithmic signal recovery guarantees. It is however an open question whether such non-adaptivity is inherently required to acquire sparse signals. In this full generality, the answer to this question is most probably no. For concreteness, it is necessary to further specialize in the design of model adaptive sensing matrices. While the existing CS theory focuses mainly on the stable embedding properties of the sampling matrices, we challenge this view by also casting the sensing operation as a way of transferring information and leveraging new results in theoretical computer science and information theory. We specifically focus on extremal combinatorial objects, such as randomness extractors and expanders, which are emerging to play fundamental roles in coding theory.
Learning theory and methods for low-dimensional signal models
Solutions of inverse problems from compressive or incomplete measurements using low dimensional models (LDM) are predicated upon the knowledge of the appropriate low-dimensional signal model. Unfortunately, the majority of inverse problems are not directly amenable to natural parameterizations that inherently reduce the degrees of freedom to facilitate solutions; hence, we must identify any supporting LDM’s from data. On the positive side, seeking low-dimensional models is theoretically well-grounded: decades of learning research shows that learned low dimensional models have provably better prediction capabilities on new data than complex models.
On the negative side, exploring the space of even the low-dimensional structures is problematic because the number of different LDM’s grows exponentially with the dimension. It is often necessary to resort either to abstract functional spaces that can concisely capture the underlying characteristics of signals, or to heuristics to find good candidates. The mathematical developments on identifying and exploiting proper functional spaces are arduous, time consuming, and are not automatically generalizable.
In sharp contrast to existing work, our contention is that fundamentally new approaches based on explicitly combinatorial theory are needed in this area, for instance, to transfer the theoretical guarantees for sparsifying dictionaries designed via functional space assumptions to the real world applications in a machine learning framework. Our key insight is that geometric properties of sparsifying dictionaries translate into combinatorial concepts, such as (approximately) submodular functions. As a discrete equivalent of convexity, these combinatorial tools based on submodularity are naturally suited to deal with the discrete, structured sparsity patterns present in real world sensor data. This insight opens up a large set of tools from combinatorial optimization that promises to lead to novel, efficient algorithms for sparse representation and inference.
In many learning applications, we are not interested in obtaining a precise signal reconstruction as in compressive sensing, but rather are only interested in making some kind of detection, classification, or clustering. Even in these cases, our society is investing massively in the collection and processing of data of all kinds, while the information we seek occupies a tiny space compared to the ambient data dimension. This suggests a great potential for new learning methods using information operators for targeted dimensionality reduction. Therefore, we aim to extend the utility of linear measurements well beyond signal recovery to inference tasks, such as detection, classification, and parameter learning.