DNest4: Diffusive Nested Sampling in C++ and Python [CL]


In probabilistic (Bayesian) inferences, we typically want to compute properties of the posterior distribution, describing knowledge of unknown quantities in the context of a particular dataset and the assumed prior information. The marginal likelihood, also known as the “evidence”, is a key quantity in Bayesian model selection. The Diffusive Nested Sampling algorithm, a variant of Nested Sampling, is a powerful tool for generating posterior samples and estimating marginal likelihoods. It is effective at solving complex problems including many where the posterior distribution is multimodal or has strong dependencies between variables. DNest4 is an open source (MIT licensed), multi-threaded implementation of this algorithm in C++11, along with associated utilities including: i) RJObject, a class template for finite mixture models, (ii) A Python package allowing basic use without C++ coding, and iii) Experimental support for models implemented in Julia. In this paper we demonstrate DNest4 usage through examples including simple Bayesian data analysis, finite mixture models, and Approximate Bayesian Computation.

Read this paper on arXiv…

B. Brewer and D. Foreman-Mackey
Tue, 14 Jun 16

Comments: Submitted. 31 pages, 9 figures


Numerical methods for solution of the stochastic differential equations equivalent to the non-stationary Parker's transport equation [SSA]


We derive the numerical schemes for the strong order integration of the set of the stochastic differential equations (SDEs) corresponding to the non-stationary Parker transport equation (PTE). PTE is 5-dimensional (3 spatial coordinates, particles energy and time) Fokker- Planck type equation describing the non-stationary the galactic cosmic ray (GCR) particles transport in the heliosphere. We present the formulas for the numerical solution of the obtained set of SDEs driven by a Wiener process in the case of the full three-dimensional diffusion tensor. We introduce the solution applying the strong order Euler-Maruyama, Milstein and stochastic Runge-Kutta methods. We discuss the advantages and disadvantages of the presented numerical methods in the context of increasing the accuracy of the solution of the PTE.

Read this paper on arXiv…

A. Wawrzynczak, R. Modzelewska and M. Kluczek
Thu, 24 Sep 15

Comments: 4 pages, 2 figures, presented on 4th International Conference on Mathematical Modeling in Physical Sciences, 2015

Stochastic approach to the numerical solution of the non-stationary Parker's transport equation [SSA]


We present the newly developed stochastic model of the galactic cosmic ray (GCR) particles transport in the heliosphere. Mathematically Parker transport equation (PTE) describing non-stationary transport of charged particles in the turbulent medium is the Fokker-Planck type. It is the second order parabolic time-dependent 4-dimensional (3 spatial coordinates and particles energy/rigidity) partial differential equation. It is worth to mention that, if we assume the stationary case it remains as the 3-D parabolic type problem with respect to the particles rigidity R. If we fix the energy it still remains as the 3-D parabolic type problem with respect to time. The proposed method of numerical solution is based on the solution of the system of stochastic differential equations (SDEs) being equivalent to the Parker’s transport equation. We present the method of deriving from PTE the equivalent SDEs in the heliocentric spherical coordinate system for the backward approach. The obtained stochastic model of the Forbush decrease of the GCR intensity is in an agreement with the experimental data. The advantages and disadvantages of the forward and the backward solution of the PTE are discussed.

Read this paper on arXiv…

A. Wawrzynczak, R. Modzelewska and A. Gil
Wed, 23 Sep 15

Comments: 4 pages, 2 figures, presented on International Conference on Mathematical Modeling in Physical Sciences, 2014

A stochastic method of solution of the Parker transport equation [SSA]


We present the stochastic model of the galactic cosmic ray (GCR) particles transport in the heliosphere. Based on the solution of the Parker transport equation we developed models of the short-time variation of the GCR intensity, i.e. the Forbush decrease (Fd) and the 27-day variation of the GCR intensity. Parker transport equation being the Fokker-Planck type equation delineates non-stationary transport of charged particles in the turbulent medium. The presented approach of the numerical solution is grounded on solving of the set of equivalent stochastic differential equations (SDEs). We demonstrate the method of deriving from Parker transport equation the corresponding SDEs in the heliocentric spherical coordinate system for the backward approach. Features indicative the preeminence of the backward approach over the forward is stressed. We compare the outcomes of the stochastic model of the Fd and 27-day variation of the GCR intensity with our former models established by the finite difference method. Both models are in an agreement with the experimental data.

Read this paper on arXiv…

A. Wawrzynczak, R. Modzelewska and A. Gil
Wed, 23 Sep 15

Comments: 8 pages, 7 figures, presented on 24th European Cosmic Ray Symposium 2014

Uncertainty for calculating transport on Titan: a probabilistic description of bimolecular diffusion parameters [EPA]


Bimolecular diffusion coefficients are important parameters used by atmospheric models to calculate altitude profiles of minor constituents in an atmosphere. Unfortunately, laboratory measurements of these coefficients were never conducted at temperature conditions relevant to the atmosphere of Titan. Here we conduct a detailed uncertainty analysis of the bimolecular diffusion coefficient parameters as applied to Titan’s upper atmosphere to provide a better understanding of the impact of uncertainty for this parameter on models. Because temperature and pressure conditions are much lower than the laboratory conditions in which bimolecular diffusion parameters were measured, we apply a Bayesian framework, a problem-agnostic framework, to determine parameter estimates and associated uncertainties. We solve the Bayesian calibration problem using the open-source QUESO library which also performs a propagation of uncertainties in the calibrated parameters to temperature and pressure conditions observed in Titan’s upper atmosphere. Our results show that, after propagating uncertainty through the Massman model, the uncertainty in molecular diffusion is highly correlated to temperature and we observe no noticeable correlation with pressure. We propagate the calibrated molecular diffusion estimate and associated uncertainty to obtain an estimate with uncertainty due to bimolecular diffusion for the methane molar fraction as a function of altitude. Results show that the uncertainty in methane abundance due to molecular diffusion is in general small compared to eddy diffusion and the chemical kinetics description. However, methane abundance is most sensitive to uncertainty in molecular diffusion above 1200 km where the errors are nontrivial and could have important implications for scientific research based on diffusion models in this altitude range.

Read this paper on arXiv…

S. Plessis, D. McDougall, K. Mandt, et. al.
Thu, 13 Aug 15

Comments: N/A

Approximate Bayesian Computation for Forward Modeling in Cosmology [CEA]


Bayesian inference is often used in cosmology and astrophysics to derive constraints on model parameters from observations. This approach relies on the ability to compute the likelihood of the data given a choice of model parameters. In many practical situations, the likelihood function may however be unavailable or intractable due to non-gaussian errors, non-linear measurements processes, or complex data formats such as catalogs and maps. In these cases, the simulation of mock data sets can often be made through forward modeling. We discuss how Approximate Bayesian Computation (ABC) can be used in these cases to derive an approximation to the posterior constraints using simulated data sets. This technique relies on the sampling of the parameter set, a distance metric to quantify the difference between the observation and the simulations and summary statistics to compress the information in the data. We first review the principles of ABC and discuss its implementation using a Population Monte-Carlo (PMC) algorithm. We test the performance of the implementation using a Gaussian toy model. We then apply the ABC technique to the practical case of the calibration of image simulations for wide field cosmological surveys. We find that the ABC analysis is able to provide reliable parameter constraints for this problem and is therefore a promising technique for other applications in cosmology and astrophysics. Our implementation of the ABC PMC method is made available via a public code release.

Read this paper on arXiv…

J. Akeret, A. Refregier, A. Amara, et. al.
Wed, 29 Apr 15

Comments: Submitted to Journal of Cosmology and Astroparticle Physics. 16 pages, 5 figures, 1 algorithm. The code is available at this https URL

Stochastic determination of matrix determinants [CL]


Matrix determinants play an important role in data analysis, in particular when Gaussian processes are involved. Due to currently exploding data volumes linear operations – matrices – acting on the data are often not accessible directly, but are only represented indirectly in form of a computer routine. Such a routine implements the transformation a data vector undergoes under matrix multiplication. Meanwhile efficient probing routines to estimate a matrix’s diagonal or trace, based solely on such computationally affordable matrix-vector multiplications, are well known and frequently used in signal inference, a stochastic estimate for its determinant is still lacking. In this work a probing method for the logarithm of a determinant of a linear operator is introduced. This method rests upon a reformulation of the log-determinant by an integral representation and the transformation of the involved terms into stochastic expressions. This stochastic determinant determination enables large-size applications in Bayesian inference, in particular evidence calculations, model comparison, and posterior determination.

Read this paper on arXiv…

S. Dorn and T. Ensslin
Mon, 13 Apr 15

Comments: 8 pages, 5 figures

Inference for Trans-dimensional Bayesian Models with Diffusive Nested Sampling [CL]


Many inference problems involve inferring the number $N$ of objects in some region, along with their properties $\{\mathbf{x}_i\}_{i=1}^N$, from a dataset $\mathcal{D}$. A common statistical example is finite mixture modelling. In the Bayesian framework, these problems are typically solved using one of the following two methods: i) by executing a Monte Carlo algorithm (such as Nested Sampling) once for each possible value of $N$, and calculating the marginal likelihood or evidence as a function of $N$; or ii) by doing a single run that allows the model dimension $N$ to change (such as Markov Chain Monte Carlo with birth/death moves), and obtaining the posterior for $N$ directly. In this paper we present a general approach to this problem that uses trans-dimensional MCMC embedded {\it within} a Nested Sampling algorithm, allowing us to explore the posterior distribution and calculate the marginal likelihood (summed over $N$) even if the problem contains a phase transition or other difficult features such as multimodality. We present two example problems, finding sinusoidal signals in noisy data, and finding and measuring galaxies in a noisy astronomical image. Both of the examples demonstrate phase transitions in the relationship between the likelihood and the cumulative prior mass.

Read this paper on arXiv…

B. Brewer
Mon, 17 Nov 14

Comments: Submitted. Comments welcome. 14 pages, 7 figures. Software available at this https URL

Bayesian Evidence and Model Selection [CL]


In this paper we review the concept of the Bayesian evidence and its application to model selection. The theory is presented along with a discussion of analytic, approximate and numerical techniques. Application to several practical examples within the context of signal processing are discussed.

Read this paper on arXiv…

K. Knuth, M. Habeck, N. Malakar, et. al.
Thu, 13 Nov 14

Comments: 39 pages, 8 figures. Submitted to DSP. Features theory, numerical methods and four applications

Efficient Exploration of Multi-Modal Posterior Distributions [IMA]


The Markov Chain Monte Carlo (MCMC) algorithm is a widely recognised as an efficient method for sampling a specified posterior distribution. However, when the posterior is multi-modal, conventional MCMC algorithms either tend to become stuck in one local mode, become non-Markovian or require an excessively long time to explore the global properties of the distribution. We propose a novel variant of MCMC, mixed MCMC, which exploits a specially designed proposal density to allow the generation candidate points from any of a number of different modes. This new method is efficient by design, and is strictly Markovian. We present our method and apply it to a toy model inference problem to demonstrate its validity.

Read this paper on arXiv…

Y. Hu, M. Hendry and I. Heng
Tue, 19 Aug 14

Comments: 6 pages, 1 figure

Estimating the distribution of Galaxy Morphologies on a continuous space [GA]


The incredible variety of galaxy shapes cannot be summarized by human defined discrete classes of shapes without causing a possibly large loss of information. Dictionary learning and sparse coding allow us to reduce the high dimensional space of shapes into a manageable low dimensional continuous vector space. Statistical inference can be done in the reduced space via probability distribution estimation and manifold estimation.

Read this paper on arXiv…

G. Vinci, P. Freeman, J. Newman, et. al.
Tue, 1 Jul 14

Comments: 4 pages, 3 figures, Statistical Challenges in 21st Century Cosmology, Proceedings IAU Symposium No. 306, 2014

Exploring Multi-Modal Distributions with Nested Sampling [IMA]


In performing a Bayesian analysis, two difficult problems often emerge. First, in estimating the parameters of some model for the data, the resulting posterior distribution may be multi-modal or exhibit pronounced (curving) degeneracies. Secondly, in selecting between a set of competing models, calculation of the Bayesian evidence for each model is computationally expensive using existing methods such as thermodynamic integration. Nested Sampling is a Monte Carlo method targeted at the efficient calculation of the evidence, but also produces posterior inferences as a by-product and therefore provides means to carry out parameter estimation as well as model selection. The main challenge in implementing Nested Sampling is to sample from a constrained probability distribution. One possible solution to this problem is provided by the Galilean Monte Carlo (GMC) algorithm. We show results of applying Nested Sampling with GMC to some problems which have proven very difficult for standard Markov Chain Monte Carlo (MCMC) and down-hill methods, due to the presence of large number of local minima and/or pronounced (curving) degeneracies between the parameters. We also discuss the use of Nested Sampling with GMC in Bayesian object detection problems, which are inherently multi-modal and require the evaluation of Bayesian evidence for distinguishing between true and spurious detections.

Read this paper on arXiv…

Fri, 20 Dec 13

D3PO – Denoising, Deconvolving, and Decomposing Photon Observations [IMA]


The analysis of astronomical images is a non-trivial task. The D3PO algorithm addresses the inference problem of denoising, deconvolving, and decomposing photon observations. The primary goal is the simultaneous reconstruction of the diffuse and point-like photon flux from a given photon count image. In order to discriminate between these morphologically different signal components, a probabilistic algorithm is derived in the language of information field theory based on a hierarchical Bayesian parameter model. The signal inference exploits prior information on the spatial correlation structure of the diffuse component and the brightness distribution of the spatially uncorrelated point-like sources. A maximum a posteriori solution and a solution minimizing the Gibbs free energy of the inference problem using variational Bayesian methods are discussed. Since the derivation of the solution does not dependent on the underlying position space, the implementation of the D3PO algorithm uses the NIFTY package to ensure operationality on various spatial grids and at any resolution. The fidelity of the algorithm is validated by the analysis of simulated data, including a realistic high energy photon count image showing a 32 x 32 arcmin^2 observation with a spatial resolution of 0.1 arcmin. In all tests the D3PO algorithm successfully denoised, deconvolved, and decomposed the data into a diffuse and a point-like signal estimate for the respective photon flux components.

Read this paper on arXiv…

Mon, 11 Nov 13