Mathematical Foundations of the GraphBLAS [CL]

The GraphBLAS standard ( is being developed to bring the potential of matrix based graph algorithms to the broadest possible audience. Mathematically the Graph- BLAS defines a core set of matrix-based graph operations that can be used to implement a wide class of graph algorithms in a wide range of programming environments. This paper provides an introduction to the mathematics of the GraphBLAS. Graphs represent connections between vertices with edges. Matrices can represent a wide range of graphs using adjacency matrices or incidence matrices. Adjacency matrices are often easier to analyze while incidence matrices are often better for representing data. Fortunately, the two are easily connected by matrix mul- tiplication. A key feature of matrix mathematics is that a very small number of matrix operations can be used to manipulate a very wide range of graphs. This composability of small number of operations is the foundation of the GraphBLAS. A standard such as the GraphBLAS can only be effective if it has low performance overhead. Performance measurements of prototype GraphBLAS implementations indicate that the overhead is low.

Read this paper on arXiv…

J. Kepner, P. Aaltonen, D. Bader, et. al.
Tue, 21 Jun 16

Comments: 9 pages; 11 figures; accepted to IEEE High Performance Extreme Computing (HPEC) conference 2016

BEANS – a software package for distributed Big Data analysis [IMA]

BEANS software is a web based, easy to install and maintain, new tool to store and analyse data in a distributed way for a massive amount of data. It provides a clear interface for querying, filtering, aggregating, and plotting data from an arbitrary number of datasets. Its main purpose is to simplify the process of storing, examining and finding new relations in the so-called Big Data.
Creation of BEANS software is an answer to the growing needs of the astronomical community to have a versatile tool to store, analyse and compare the complex astrophysical numerical simulations with observations (e.g. simulations of the Galaxy or star clusters with the Gaia archive). However, this software was built in a general form and it is ready to use in any other research field or open source software.

Read this paper on arXiv…

A. Hypki
Fri, 25 Mar 16

Comments: 14 pages, 6 figures, submitted to MNRAS, comments are welcome

Sapporo2: A versatile direct $N$-body library [IMA]

Astrophysical direct $N$-body methods have been one of the first production algorithms to be implemented using NVIDIA’s CUDA architecture. Now, almost seven years later, the GPU is the most used accelerator device in astronomy for simulating stellar systems. In this paper we present the implementation of the Sapporo2 $N$-body library, which allows researchers to use the GPU for $N$-body simulations with little to no effort. The first version, released five years ago, is actively used, but lacks advanced features and versatility in numerical precision and support for higher order integrators. In this updated version we have rebuilt the code from scratch and added support for OpenCL, multi-precision and higher order integrators. We show how to tune these codes for different GPU architectures and present how to continue utilizing the GPU optimal even when only a small number of particles ($N < 100$) is integrated. This careful tuning allows Sapporo2 to be faster than Sapporo1 even with the added options and double precision data loads. The code runs on a range of NVIDIA and AMD GPUs in single and double precision accuracy. With the addition of OpenCL support the library is also able to run on CPUs and other accelerators that support OpenCL.

Read this paper on arXiv…

J. Bedorf, E. Gaburov and S. Zwart
Thu, 15 Oct 15

Comments: 15 pages, 7 figures. Accepted for publication in Computational Astrophysics and Cosmology

GenASiS Basics: Object-oriented utilitarian functionality for large-scale physics simulations [IMA]

Aside from numerical algorithms and problem setup, large-scale physics simulations on distributed-memory supercomputers require more basic utilitarian functionality, such as physical units and constants; display to the screen or standard output device; message passing; I/O to disk; and runtime parameter management and usage statistics. Here we describe and make available Fortran 2003 classes furnishing extensible object-oriented implementations of this sort of rudimentary functionality, along with individual `unit test’ programs and larger example problems demonstrating their use. These classes compose the Basics division of our developing astrophysics simulation code GenASiS (General Astrophysical Simulation System), but their fundamental nature makes them useful for physics simulations in many fields.

Read this paper on arXiv…

C. Cardall and R. Budiardja
Fri, 10 Jul 15

Comments: Computer Physics Communications in press

Remark on "Algorithm 916: Computing the Faddeyeva and Voigt functions": Efficiency Improvements and Fortran Translation [IMA]

This remark describes efficiency improvements to Algorithm 916 [Zaghloul and Ali 2011]. It is shown that the execution time required by the algorithm, when run at its highest accuracy, may be improved by more than a factor of two. A better accuracy vs efficiency trade off scheme is also implemented; this requires the user to supply the number of significant figures desired in the computed values as an extra input argument to the function. Using this trade-off, it is shown that the efficiency of the algorithm may be further improved significantly while maintaining reasonably accurate and safe results that are free of the pitfalls and complete loss of accuracy seen in other competitive techniques. The current version of the code is provided in Matlab and Scilab in addition to a Fortran translation prepared to meet the needs of real-world problems where very large numbers of function evaluations would require the use of a compiled language. To fulfill this last requirement, a recently proposed reformed version of Humlicek’s w4 routine, shown to maintain the claimed accuracy of the algorithm over a wide and fine grid is implemented in the present Fortran translation for the case of 4 significant figures. This latter modification assures the reliability of the code to be employed in the solution of practical problems requiring numerous evaluation of the function for applications tolerating low accuracy computations (<10-4).

Read this paper on arXiv…

M. Zaghloul
Wed, 27 May 15

Comments: 11 pages, 5 tables, Under review

The NIFTY way of Bayesian signal inference [IMA]

We introduce NIFTY, “Numerical Information Field Theory”, a software package for the development of Bayesian signal inference algorithms that operate independently from any underlying spatial grid and its resolution. A large number of Bayesian and Maximum Entropy methods for 1D signal reconstruction, 2D imaging, as well as 3D tomography, appear formally similar, but one often finds individualized implementations that are neither flexible nor easily transferable. Signal inference in the framework of NIFTY can be done in an abstract way, such that algorithms, prototyped in 1D, can be applied to real world problems in higher-dimensional settings. NIFTY as a versatile library is applicable and already has been applied in 1D, 2D, 3D and spherical settings. A recent application is the D3PO algorithm targeting the non-trivial task of denoising, deconvolving, and decomposing photon observations in high energy astronomy.

Read this paper on arXiv…

M. Selig
Wed, 24 Dec 14

Comments: 6 pages, 2 figures, refereed proceeding of the 33rd International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Engineering (MaxEnt 2013), software available at this http URL and this http URL

External Use of TOPCAT's Plotting Library [IMA]

The table analysis application TOPCAT uses a custom Java plotting library for highly configurable high-performance interactive or exported visualisations in two and three dimensions. We present here a variety of ways for end users or application developers to make use of this library outside of the TOPCAT application: via the command-line suite STILTS or its Jython variant JyStilts, via a traditional Java API, or by programmatically assigning values to a set of parameters in java code or using some form of inter-process communication. The library has been built with large datasets in mind; interactive plots scale well up to several million points, and static output to standard graphics formats is possible for unlimited sized input data.

Read this paper on arXiv…

M. Taylor
Fri, 31 Oct 14

Comments: 4 pages, 1 figure

HOPE: A Python Just-In-Time compiler for astrophysical computations [IMA]

The Python programming language is becoming increasingly popular for scientific applications due to its simplicity, versatility, and the broad range of its libraries. A drawback of this dynamic language, however, is its low runtime performance which limits its applicability for large simulations and for the analysis of large data sets, as is common in astrophysics and cosmology. While various frameworks have been developed to address this limitation, most focus on covering the complete language set, and either force the user to alter the code or are not able to reach the full speed of an optimised native compiled language. In order to combine the ease of Python and the speed of C++, we developed HOPE, a specialised Python just-in-time (JIT) compiler designed for numerical astrophysical applications. HOPE focuses on a subset of the language and is able to translate Python code into C++ while performing numerical optimisation on mathematical expressions at runtime. To enable the JIT compilation, the user only needs to add a decorator to the function definition. We assess the performance of HOPE by performing a series of benchmarks and compare its execution speed with that of plain Python, C++ and the other existing frameworks. We find that HOPE improves the performance compared to plain Python by a factor of 2 to 120, achieves speeds comparable to that of C++, and often exceeds the speed of the existing solutions. We discuss the differences between HOPE and the other frameworks, as well as future extensions of its capabilities. The fully documented HOPE package is available at this http URL and is published under the GPLv3 license on PyPI and GitHub.

Read this paper on arXiv…

J. Akeret, L. Gamper, A. Amara, et. al.
Fri, 17 Oct 14

Comments: Submitted to Astronomy and Computing. 13 pages, 1 figure. The code is available at this http URL

CosmoMC Installation and Running Guidelines [IMA]

CosmoMC is a Fortran 95 Markov-Chain Monte-Carlo (MCMC) engine to explore the cosmological parameter space, plus a Python suite for plotting and presenting results (see this http URL). This document describes the installation of the CosmoMC on a Linux system (Ubuntu 14.04.1 LTS 64-bit version). It is written for those who want to use it in their scientific research but without much training on Linux and the program. Besides a step-by-step installation guide, we also give a brief introduction of how to run the program on both a desktop and a cluster. We share our way to generate the plots that are commonly used in the references of cosmology. For more information, one can refer to the CosmoCoffee forum (this http URL) or contact the authors of this document. Questions and comments would be much appreciated.

Read this paper on arXiv…

M. Li and P. Wang
Fri, 5 Sep 14

Comments: The aim of this article is to help the undergraduate and postgraduate students to get into the field of cosmology. Thus, it was not submitted to any particular journal and is publicly available. Totally 10 pages, 0 figures

Achieving 100,000,000 database inserts per second using Accumulo and D4M [CL]

The Apache Accumulo database is an open source relaxed consistency database that is widely used for government applications. Accumulo is designed to deliver high performance on unstructured data such as graphs of network data. This paper tests the performance of Accumulo using data from the Graph500 benchmark. The Dynamic Distributed Dimensional Data Model (D4M) software is used to implement the benchmark on a 216-node cluster running the MIT SuperCloud software stack. A peak performance of over 100,000,000 database inserts per second was achieved which is 100x larger than the highest previously published value for any other database. The performance scales linearly with the number of ingest clients, number of database servers, and data size. The performance was achieved by adapting several supercomputing techniques to this application: distributed arrays, domain decomposition, adaptive load balancing, and single-program-multiple-data programming.

Read this paper on arXiv…

J. Kepner, W. Arcand, D. Bestor, et. al.
Fri, 20 Jun 14

Comments: 6 pages; to appear in IEEE High Performance Extreme Computing (HPEC) 2014