Bayesian Methods in Cosmology [CEA]

These notes aim at presenting an overview of Bayesian statistics, the underlying concepts and application methodology that will be useful to astronomers seeking to analyse and interpret a wide variety of data about the Universe. The level starts from elementary notions, without assuming any previous knowledge of statistical methods, and then progresses to more advanced, research-level topics. After an introduction to the importance of statistical inference for the physical sciences, elementary notions of probability theory and inference are introduced and explained. Bayesian methods are then presented, starting from the meaning of Bayes Theorem and its use as inferential engine, including a discussion on priors and posterior distributions. Numerical methods for generating samples from arbitrary posteriors (including Markov Chain Monte Carlo and Nested Sampling) are then covered. The last section deals with the topic of Bayesian model selection and how it is used to assess the performance of models, and contrasts it with the classical p-value approach. A series of exercises of various levels of difficulty are designed to further the understanding of the theoretical material, including fully worked out solutions for most of them.

Read this paper on arXiv…

R. Trotta
Mon, 9 Jan 17

Comments: 86 pages, 16 figures. Lecture notes for the 44th Saas Fee Advanced Course on Astronomy and Astrophysics, “Cosmology with wide-field surveys” (March 2014), to be published by Springer. Comments welcome


Accelerating cross-validation with total variation and its application to super-resolution imaging [CL]

We develop an approximation formula for the cross-validation error (CVE) of a sparse linear regression penalized by $\ell_1$-norm and total variation terms, which is based on a perturbative expansion utilizing the largeness of both the data dimensionality and the model. The developed formula allows us to reduce the necessary computational cost of the CVE evaluation significantly. The practicality of the formula is tested through application to simulated black-hole image reconstruction on the event-horizon scale with super resolution. The results demonstrate that our approximation reproduces the CVE values obtained via literally conducted cross-validation with reasonably good precision.

Read this paper on arXiv…

T. Obuchi, S. Ikeda, K. Akiyama, et. al.
Wed, 23 Nov 16

Comments: 5 pages, 1 figure

Filling the gaps: Gaussian mixture models from noisy, truncated or incomplete samples [IMA]

We extend the common mixtures-of-Gaussians density estimation approach to account for a known sample incompleteness by simultaneous imputation from the current model. The method called GMMis generalizes existing Expectation-Maximization techniques for truncated data to arbitrary truncation geometries and probabilistic rejection. It can incorporate an uniform background distribution as well as independent multivariate normal measurement errors for each of the observed samples, and recovers an estimate of the error-free distribution from which both observed and unobserved samples are drawn. We compare GMMis to the standard Gaussian mixture model for simple test cases with different types of incompleteness, and apply it to observational data from the NASA Chandra X-ray telescope. The python code is capable of performing density estimation with millions of samples and thousands of model components and is released as an open-source package at

Read this paper on arXiv…

P. Melchior and A. Goulding
Fri, 18 Nov 16

Comments: 12 pages, 6 figures, submitted to Computational Statistics & Data Analysis

Bayes Factors via Savage-Dickey Supermodels [IMA]

We outline a new method to compute the Bayes Factor for model selection which bypasses the Bayesian Evidence. Our method combines multiple models into a single, nested, Supermodel using one or more hyperparameters. Since the models are now nested the Bayes Factors between the models can be efficiently computed using the Savage-Dickey Density Ratio (SDDR). In this way model selection becomes a problem of parameter estimation. We consider two ways of constructing the supermodel in detail: one based on combined models, and a second based on combined likelihoods. We report on these two approaches for a Gaussian linear model for which the Bayesian evidence can be calculated analytically and a toy nonlinear problem. Unlike the combined model approach, where a standard Monte Carlo Markov Chain (MCMC) struggles, the combined-likelihood approach fares much better in providing a reliable estimate of the log-Bayes Factor. This scheme potentially opens the way to computationally efficient ways to compute Bayes Factors in high dimensions that exploit the good scaling properties of MCMC, as compared to methods such as nested sampling that fail for high dimensions.

Read this paper on arXiv…

A. Mootoovaloo, B. Bassett and M. Kunz
Fri, 9 Sep 16

Comments: 24 pages, 11 Figures

Generalisations of Fisher Matrices [CEA]

Fisher matrices play an important role in experimental design and in data analysis. Their primary role is to make predictions for the inference of model parameters – both their errors and covariances. In this short review, I outline a number of extensions to the simple Fisher matrix formalism, covering a number of recent developments in the field. These are: (a) situations where the data (in the form of (x,y) pairs) have errors in both x and y; (b) modifications to parameter inference in the presence of systematic errors, or through fixing the values of some model parameters; (c) Derivative Approximation for LIkelihoods (DALI) – higher-order expansions of the likelihood surface, going beyond the Gaussian shape approximation; (d) extensions of the Fisher-like formalism, to treat model selection problems with Bayesian evidence.

Read this paper on arXiv…

A. Heavens
Wed, 22 Jun 16

Comments: Invited review article for Entropy special issue on ‘Applications of Fisher Information in Sciences’. Accepted version

Looking for a Needle in a Haystack? Look Elsewhere! A statistical comparison of approximate global p-values [CL]

The search for new significant peaks over a energy spectrum often involves a statistical multiple hypothesis testing problem. Separate tests of hypothesis are conducted at different locations producing an ensemble of local p-values, the smallest of which is reported as evidence for the new resonance. Unfortunately, controlling the false detection rate (type I error rate) of such procedures may lead to excessively stringent acceptance criteria. In the recent physics literature, two promising statistical tools have been proposed to overcome these limitations. In 2005, a method to “find needles in haystacks” was introduced by Pilla et al. [1], and a second method was later proposed by Gross and Vitells [2] in the context of the “look elsewhere effect” and trial factors. We show that, for relatively small sample sizes, the former leads to an artificial inflation of statistical power that stems from an increase in the false detection rate, whereas the two methods exhibit similar performance for large sample sizes. Finally, we provide general guidelines to select between statistical procedures for signal detection with respect to the specifics of the physics problem under investigation.

Read this paper on arXiv…

S. Algeri, J. Conrad, D. Dyk, et. al.
Fri, 12 Feb 16

Comments: Submitted to EPJ C

Preprocessing Solar Images while Preserving their Latent Structure [IMA]

Telescopes such as the Atmospheric Imaging Assembly aboard the Solar Dynamics Observatory, a NASA satellite, collect massive streams of high resolution images of the Sun through multiple wavelength filters. Reconstructing pixel-by-pixel thermal properties based on these images can be framed as an ill-posed inverse problem with Poisson noise, but this reconstruction is computationally expensive and there is disagreement among researchers about what regularization or prior assumptions are most appropriate. This article presents an image segmentation framework for preprocessing such images in order to reduce the data volume while preserving as much thermal information as possible for later downstream analyses. The resulting segmented images reflect thermal properties but do not depend on solving the ill-posed inverse problem. This allows users to avoid the Poisson inverse problem altogether or to tackle it on each of $\sim$10 segments rather than on each of $\sim$10$^7$ pixels, reducing computing time by a factor of $\sim$10$^6$. We employ a parametric class of dissimilarities that can be expressed as cosine dissimilarity functions or Hellinger distances between nonlinearly transformed vectors of multi-passband observations in each pixel. We develop a decision theoretic framework for choosing the dissimilarity that minimizes the expected loss that arises when estimating identifiable thermal properties based on segmented images rather than on a pixel-by-pixel basis. We also examine the efficacy of different dissimilarities for recovering clusters in the underlying thermal properties. The expected losses are computed under scientifically motivated prior distributions. Two simulation studies guide our choices of dissimilarity function. We illustrate our method by segmenting images of a coronal hole observed on 26 February 2015.

Read this paper on arXiv…

N. Stein, D. Dyk and V. Kashyap
Tue, 15 Dec 15

Comments: N/A

Estimating sparse precision matrices [IMA]

We apply a method recently introduced to the statistical literature to directly estimate the precision matrix from an ensemble of samples drawn from a corresponding Gaussian distribution. Motivated by the observation that cosmological precision matrices are often approximately sparse, the method allows one to exploit this sparsity of the precision matrix to more quickly converge to an asymptotic 1/sqrt(Nsim) rate while simultaneously providing an error model for all of the terms. Such an estimate can be used as the starting point for further regularization efforts which can improve upon the 1/sqrt(Nsim) limit above, and incorporating such additional steps is straightforward within this framework. We demonstrate the technique with toy models and with an example motivated by large-scale structure two-point analysis, showing significant improvements in the rate of convergence.For the large-scale structure example we find errors on the precision matrix which are factors of 5 smaller than for the sample precision matrix for thousands of simulations or, alternatively, convergence to the same error level with more than an order of magnitude fewer simulations.

Read this paper on arXiv…

N. Padmanabhan, M. White, H. Zhou, et. al.
Mon, 7 Dec 15

Comments: 11 pages, 14 figures, submitted to MNRAS

Parameter inference with estimated covariance matrices [CEA]

When inferring parameters from a Gaussian-distributed data set by computing a likelihood, a covariance matrix is needed that describes the data errors and their correlations. If the covariance matrix is not known a priori, it may be estimated and thereby becomes a random object with some intrinsic uncertainty itself. We show how to infer parameters in the presence of such an estimated covariance matrix, by marginalising over the true covariance matrix, conditioned on its estimated value. This leads to a likelihood function that is no longer Gaussian, but rather an adapted version of a multivariate $t$-distribution, which has the same numerical complexity as the multivariate Gaussian. As expected, marginalisation over the true covariance matrix improves inference when compared with Hartlap et al.’s method, which uses an unbiased estimate of the inverse covariance matrix but still assumes that the likelihood is Gaussian.

Read this paper on arXiv…

E. Sellentin and A. Heavens
Fri, 20 Nov 15

Comments: To be published in MNRAS letters

Frequentist tests for Bayesian models [IMA]

Analogues of the frequentist chi-square and $F$ tests are proposed for testing goodness-of-fit and consistency for Bayesian models. Simple examples exhibit these tests’ detection of inconsistency between consecutive experiments with identical parameters, when the first experiment provides the prior for the second. In a related analysis, a quantitative measure is derived for judging the degree of tension between two different experiments with partially overlapping parameter vectors.

Read this paper on arXiv…

L. Lucy
Tue, 10 Nov 15

Comments: 8 pages, 4 figures

Detecting Unspecified Structure in Low-Count Images [IMA]

Unexpected structure in images of astronomical sources often presents itself upon visual inspection of the image, but such apparent structure may either correspond to true features in the source or be due to noise in the data. This paper presents a method for testing whether inferred structure in an image with Poisson noise represents a significant departure from a baseline (null) model of the image. To infer image structure, we conduct a Bayesian analysis of a full model that uses a multiscale component to allow flexible departures from the posited null model. As a test statistic, we use a tail probability of the posterior distribution under the full model. This choice of test statistic allows us to estimate a computationally efficient upper bound on a p-value that enables us to draw strong conclusions even when there are limited computational resources that can be devoted to simulations under the null model. We demonstrate the statistical performance of our method on simulated images. Applying our method to an X-ray image of the quasar 0730+257, we find significant evidence against the null model of a single point source and uniform background, lending support to the claim of an X-ray jet.

Read this paper on arXiv…

N. Stein, D. Dyk, V. Kashyap, et. al.
Fri, 16 Oct 15

Comments: N/A

Comparing non-nested models in the search for new physics [CL]

Searches for unknown physics and deciding between competing physical models to explain data rely on statistical hypotheses testing. A common approach, used for example in the discovery of the Brout-Englert-Higgs boson, is based on the statistical Likelihood Ratio Test (LRT) and its asymptotic properties. In the common situation, when neither of the two models under comparison is a special case of the other i.e., when the hypotheses are non-nested, this test is not applicable, and so far no efficient solution exists. In physics, this problem occurs when two models that reside in different parameter spaces are to be compared. An important example is the recently reported excess emission in astrophysical $\gamma$-rays and the question whether its origin is known astrophysics or dark matter. We develop and study a new, generally applicable, frequentist method and validate its statistical properties using a suite of simulations studies. We exemplify it on realistic simulated data of the Fermi-LAT $\gamma$-ray satellite, where non-nested hypotheses testing appears in the search for particle dark matter.

Read this paper on arXiv…

S. Algeri, J. Conrad and D. Dyk
Fri, 4 Sep 15

Comments: We welcome examples of non-nested models testing problems

A Gibbs Sampler for Multivariate Linear Regression [IMA]

Kelly (2007, hereafter K07) described an efficient algorithm, using Gibbs sampling, for performing linear regression in the fairly general case where non-zero measurement errors exist for both the covariates and response variables, where these measurements may be correlated (for the same data point), where the response variable is affected by intrinsic scatter in addition to measurement error, and where the prior distribution of covariates is modeled by a flexible mixture of Gaussians rather than assumed to be uniform. Here I extend the K07 algorithm in two ways. First, the procedure is generalized to the case of multiple response variables. Second, I describe how to model the prior distribution of covariates using a Dirichlet process, which can be thought of as a Gaussian mixture where the number of mixture components is learned from the data. I present an example of multivariate regression using the extended algorithm, namely fitting scaling relations of the gas mass, temperature, and luminosity of dynamically relaxed galaxy clusters as a function of their mass and redshift. An implementation of the Gibbs sampler in the R language, called LRGS, is provided.

Read this paper on arXiv…

A. Mantz
Fri, 4 Sep 15

Comments: 9 pages, 5 figures, 2 tables

Stochastic determination of matrix determinants [CL]

Matrix determinants play an important role in data analysis, in particular when Gaussian processes are involved. Due to currently exploding data volumes linear operations – matrices – acting on the data are often not accessible directly, but are only represented indirectly in form of a computer routine. Such a routine implements the transformation a data vector undergoes under matrix multiplication. Meanwhile efficient probing routines to estimate a matrix’s diagonal or trace, based solely on such computationally affordable matrix-vector multiplications, are well known and frequently used in signal inference, a stochastic estimate for its determinant is still lacking. In this work a probing method for the logarithm of a determinant of a linear operator is introduced. This method rests upon a reformulation of the log-determinant by an integral representation and the transformation of the involved terms into stochastic expressions. This stochastic determinant determination enables large-size applications in Bayesian inference, in particular evidence calculations, model comparison, and posterior determination.

Read this paper on arXiv…

S. Dorn and T. Ensslin
Mon, 13 Apr 15

Comments: 8 pages, 5 figures

Weighted principal component analysis: a weighted covariance eigendecomposition approach [IMA]

We present a new straightforward principal component analysis (PCA) method based on the diagonalization of the weighted variance-covariance matrix through two spectral decomposition methods: power iteration and Rayleigh quotient iteration. This method allows one to retrieve a given number of orthogonal principal components amongst the most meaningful ones for the case of problems with weighted and/or missing data. Principal coefficients are then retrieved by fitting principal components to the data while providing the final decomposition. Tests performed on real and simulated cases show that our method is optimal in the identification of the most significant patterns within data sets. We illustrate the usefulness of this method by assessing its quality on the extrapolation of Sloan Digital Sky Survey quasar spectra from measured wavelengths to shorter and longer wavelengths. Our new algorithm also benefits from a fast and flexible implementation.

Read this paper on arXiv…

L. Delchambre
Tue, 16 Dec 14

Comments: 12 pages, 9 figures

Monte Carlo error analyses of Spearman's rank test [IMA]

Spearman’s rank correlation test is commonly used in astronomy to discern whether a set of two variables are correlated or not. Unlike most other quantities quoted in astronomical literature, the Spearman’s rank correlation coefficient is generally quoted with no attempt to estimate the errors on its value. This is a practice that would not be accepted for those other quantities, as it is often regarded that an estimate of a quantity without an estimate of its associated uncertainties is meaningless. This manuscript describes a number of easily implemented, Monte Carlo based methods to estimate the uncertainty on the Spearman’s rank correlation coefficient, or more precisely to estimate its probability distribution.

Read this paper on arXiv…

P. Curran
Mon, 17 Nov 14

Comments: Unubmitted manuscript (comments welcome); 5 pages; Code available at this https URL

Bayesian Evidence and Model Selection [CL]

In this paper we review the concept of the Bayesian evidence and its application to model selection. The theory is presented along with a discussion of analytic, approximate and numerical techniques. Application to several practical examples within the context of signal processing are discussed.

Read this paper on arXiv…

K. Knuth, M. Habeck, N. Malakar, et. al.
Thu, 13 Nov 14

Comments: 39 pages, 8 figures. Submitted to DSP. Features theory, numerical methods and four applications

Finding the Most Distant Quasars Using Bayesian Selection Methods [IMA]

Quasars, the brightly glowing disks of material that can form around the super-massive black holes at the centres of large galaxies, are amongst the most luminous astronomical objects known and so can be seen at great distances. The most distant known quasars are seen as they were when the Universe was less than a billion years old (i.e., $\sim\!7%$ of its current age). Such distant quasars are, however, very rare, and so are difficult to distinguish from the billions of other comparably-bright sources in the night sky. In searching for the most distant quasars in a recent astronomical sky survey (the UKIRT Infrared Deep Sky Survey, UKIDSS), there were $\sim\!10^3$ apparently plausible candidates for each expected quasar, far too many to reobserve with other telescopes. The solution to this problem was to apply Bayesian model comparison, making models of the quasar population and the dominant contaminating population (Galactic stars) to utilise the information content in the survey measurements. The result was an extremely efficient selection procedure that was used to quickly identify the most promising UKIDSS candidates, one of which was subsequently confirmed as the most distant quasar known to date.

Read this paper on arXiv…

D. Mortlock
Tue, 20 May 14

Comments: Published in at this http URL the Statistical Science (this http URL) by the Institute of Mathematical Statistics (this http URL)

Functional Regression for Quasar Spectra [CL]

The Lyman-alpha forest is a portion of the observed light spectrum of distant galactic nuclei which allows us to probe remote regions of the Universe that are otherwise inaccessible. The observed Lyman-alpha forest of a quasar light spectrum can be modeled as a noisy realization of a smooth curve that is affected by a `damping effect’ which occurs whenever the light emitted by the quasar travels through regions of the Universe with higher matter concentration. To decode the information conveyed by the Lyman-alpha forest about the matter distribution, we must be able to separate the smooth `continuum’ from the noise and the contribution of the damping effect in the quasar light spectra. To predict the continuum in the Lyman-alpha forest, we use a nonparametric functional regression model in which both the response and the predictor variable (the smooth part of the damping-free portion of the spectrum) are function-valued random variables. We demonstrate that the proposed method accurately predicts the unobservable continuum in the Lyman-alpha forest both on simulated spectra and real spectra. Also, we introduce distribution-free prediction bands for the nonparametric functional regression model that have finite sample guarantees. These prediction bands, together with bootstrap-based confidence bands for the projection of the mean continuum on a fixed number of principal components, allow us to assess the degree of uncertainty in the model predictions.

Read this paper on arXiv…

M. Ciollaro, J. Cisewski, P. Freeman, et. al.
Mon, 14 Apr 14

Inverse Bayesian Estimation of Gravitational Mass Density in Galaxies from Missing Kinematic Data [CL]

In this paper we focus on a type of inverse problem in which the data is expressed as an unknown function of the sought and unknown model function (or its discretised representation as a model parameter vector). In particular, we deal with situations in which training data is not available. Then we cannot model the unknown functional relationship between data and the unknown model function (or parameter vector) with a Gaussian Process of appropriate dimensionality. A Bayesian method based on state space modelling is advanced instead. Within this framework, the likelihood is expressed in terms of the probability density function ($pdf$) of the state space variable and the sought model parameter vector is embedded within the domain of this $pdf$. As the measurable vector lives only inside an identified sub-volume of the system state space, the $pdf$ of the state space variable is projected onto the space of the measurables, and it is in terms of the projected state space density that the likelihood is written; the final form of the likelihood is achieved after convolution with the distribution of measurement errors. Application motivated vague priors are invoked and the posterior probability density of the model parameter vectors, given the data is computed. Inference is performed by taking posterior samples with adaptive MCMC. The method is illustrated on synthetic as well as real galactic data.

Read this paper on arXiv…

Wed, 8 Jan 14

A Generalized Savage-Dickey Ratio [CL]

In this brief research note I present a generalized version of the Savage-Dickey Density Ratio for representation of the Bayes factor (or marginal likelihood ratio) of nested statistical models; the new version takes the form of a Radon-Nikodym derivative and is thus applicable to a wider family of probability spaces than the original (restricted to those admitting an ordinary Lebesgue density). A derivation is given following the measure-theoretic construction of Marin & Robert (2010), and the equivalent estimator is demonstrated in application to a distributional modeling problem.

Read this paper on arXiv…

Thu, 7 Nov 13