We address the problem of simultaneously learning and combining several Bayesian models in an online, continual learning setting. In particular, we propose online Bayesian stacking (OBS), which combines Bayesian models by optimizing a log-score over predictive distributions. We make a novel connection by phrasing OBS as a portfolio selection problem, which unlocks a rich, well-studied theoretical framework with efficient algorithms and extensive regret analysis. This framework additionally elucidates a connection to online Bayesian model averaging (BMA), showing a similar algorithmic approach to different cost functions. Additional interpretation and analysis from the empirical Bayes perspective are provided, showing OBS optimizes the evidence of a mixture of estimators. Following the theoretical development of Bayesian stacking, we apply OBS to an illustrative toy problem. Finally, to show real-world effectiveness, we apply OBS to online basis expansions and Gaussian processes, replacing the online BMA approaches in the literature.
NeurIPS ’24
Tangent Space Causal Inference: Leveraging Vector Fields for Causal Discovery in Dynamical Systems
Causal discovery with time series data remains a challenging yet increasingly important task across many scientific domains. Convergent cross mapping (CCM) and related methods have been proposed to study time series that are generated by dynamical systems, where traditional approaches like Granger causality are unreliable. However, CCM often yields inaccurate results depending upon the quality of the data. We propose the Tangent Space Causal Inference (TSCI) method for detecting causalities in dynamic systems. TSCI works by considering vector fields as explicit representations of the systems’ dynamics and checks for the degree of synchronization between the learned vector fields. The TSCI approach is model-agnostic and can be used as a drop-in replacement for CCM and its generalizations. We present both a basic TSCI algorithm, which is lightweight and more effective than the basic CCM algorithm, as well as augmented versions of TSCI that leverage the expressive power of latent variable models and deep learning. We validate our theory on standard systems, and we demonstrate improved causal inference performance across a number of benchmarks.
FUSION ’24
A Gaussian Process-based Streaming Algorithm for Prediction of Time Series With Regimes and Outliers
Online prediction of time series under regime switching is a widely studied problem in the literature, with many celebrated approaches. Using the non-parametric flexibility of Gaussian processes, the recently proposed INTEL algorithm provides a product of experts approach to online prediction of time series under possible regime switching, including the special case of outliers. This is achieved by adaptively combining several candidate models, each reporting their predictive distribution at time t. However, the INTEL algorithm uses a finite context window approximation to the predictive distribution, the computation of which scales cubically with the maximum lag, or otherwise scales quartically with exact predictive distributions. We introduce LINTEL, which uses the exact filtering distribution at time t with constant-time updates, making the time complexity of the streaming algorithm optimal. We additionally note that the weighting mechanism of INTEL is better suited to a mixture of experts approach, and propose a fusion policy based on arithmetic averaging for LINTEL. We show experimentally that our proposed approach is over five times faster than INTEL under reasonable settings with better quality predictions.
Practical Bayesian learning often requires (1) online inference, (2) dynamic models, and (3) ensembling over multiple different models. Recent advances have shown how to use random feature approximations to achieve scalable, online ensembling of Gaussian processes with desirable theoretical properties and fruitful applications. One key to these methods’ success is the inclusion of a random walk on the model parameters, which makes models dynamic. We show that these methods can be generalized easily to any basis expansion model and that using alternative basis expansions, such as Hilbert space Gaussian processes, often results in better performance. To simplify the process of choosing a specific basis expansion, our method’s generality also allows the ensembling of several entirely different models, for example, a Gaussian process and polynomial regression. Finally, we propose a novel method to ensemble static and dynamic models together.
We introduce Dagma-DCE, an interpretable and model-agnostic scheme for differentiable causal discovery. Current non- or over-parametric methods in differentiable causal discovery use opaque proxies of “independence” to justify the inclusion or exclusion of a causal relationship. We show theoretically and empirically that these proxies may be arbitrarily different than the actual causal strength. Juxtaposed with existing differentiable causal discovery algorithms, Dagma-DCE uses an interpretable measure of causal strength to define weighted adjacency matrices. In a number of simulated datasets, we show our method achieves state-of-the-art level performance. We additionally show that Dagma-DCE allows for principled thresholding and sparsity penalties by domain-experts. The code for our method is available open-source at https://github.com/DanWaxman/DAGMA-DCE, and can easily be adapted to arbitrary differentiable models.
2023
ACSSC ’23
Fusion of Gaussian Process Predictions With Monte Carlo
Marzieh Ajirak, Daniel Waxman, Fernando Llorente, and Petar M. Djurić
In 2023 57th Asilomar Conference on Signals, Systems, and Computers, 2023
In science and engineering, we often work with models designed for accurate prediction of variables of interest. Recognizing that these models are approximations of reality, it becomes desirable to apply multiple models to the same data and integrate their outcomes. In this paper, we operate within the Bayesian paradigm, relying on Gaussian processes as our models. These models generate predictive probability density functions (pdfs), and the objective is to integrate them systematically, employing both linear and log-linear pooling. We introduce novel approaches for log-linear pooling, determining input-dependent weights for the predictive pdfs of the Gaussian processes. The aggregation of the pdfs is realized through Monte Carlo sampling, drawing samples of weights from their posterior. The performance of these methods, as well as those based on linear pooling, is demonstrated using a synthetic dataset.
EUSIPCO ’23
Detecting Confounders in Multivariate Time Series Using Strength of Causation
One of the most important problems in science is understanding causation. This is particularly challenging when one has access to observational data only and is further compounded in the presence of latent confounders. In this paper, we propose a method for detecting confounders in multivariate time series using a recently introduced concept referred to as differential causal effect (DCE). The solution is based on feature-based Gaussian processes that are used for estimating both, the DCE of the observed time series and the latent confounders. We demonstrate the performance of the proposed method with several examples. They show that the proposed approach can detect confounders and can accurately estimate causal strengths.
talks
“Causal Inference via Quantifying Influences” at the Acoustics Research Institute of the Austrian Academy of Sciences (Institut für Schallforschung der Österreichische Akademie der Wissenschaften) [abstract link] [slides]
“Bayesian Combination” at the 2023 Bellairs Workshop on Machine Learning and Statistical Signal Processing for Data on Graphs