# Multi-GPU maximum entropy image synthesis for radio astronomy [IMA]

The maximum entropy method (MEM) is a well known deconvolution technique in radio-interferometry. This method solves a non-linear optimization problem with an entropy regularization term. Other heuristics such as CLEAN are faster but highly user dependent. Nevertheless, MEM has the following advantages: it is unsupervised, it has an statistical basis, it has a better resolution and better image quality under certain conditions. This work presents a high performance GPU version of non-gridded MEM, which is tested using interferometric and simulated data. We propose a single-GPU and a multi-GPU implementation for single and multi-spectral data, respectively. We also make use of the Peer-to-Peer and Unified Virtual Addressing features of newer GPUs which allows to exploit transparently and efficiently multiple GPUs. Several ALMA data sets are used to demonstrate the effectiveness in imaging and to evaluate GPU performance. The results show that a speedup from 1000 to 5000 times faster than a sequential version can be achieved, depending on data and image size. This has allowed us to reconstruct the HD142527 CO(6-5) short baseline data set in 2.1 minutes, instead of the 2.5 days that takes on CPU.

M. Carcamo, P. Roman, S. Casassus, et. al.
Thu, 9 Mar 17
36/54

|

# Acceleration of low-latency gravitational wave searches using Maxwell-microarchitecture GPUs [IMA]

Low-latency detections of gravitational waves (GWs) are crucial to enable prompt follow-up observations to astrophysical transients by conventional telescopes. We have developed a low-latency pipeline using a technique called Summed Parallel Infinite Impulse Response (SPIIR) filtering, realized by a Graphic Processing Unit (GPU). In this paper, we exploit the new \textit{Maxwell} memory access architecture in NVIDIA GPUs, namely the read-only data cache, warp-shuffle, and cross-warp atomic techniques. We report a 3-fold speed-up over our previous implementation of this filtering technique. To tackle SPIIR with relatively few filters, we develop a new GPU thread configuration with a nearly 10-fold speedup. In addition, we implement a multi-rate scheme of SPIIR filtering using Maxwell GPUs. We achieve more than 100-fold speed-up over a single core CPU for the multi-rate filtering scheme. This results in an overall of 21-fold CPU usage reduction for the entire SPIIR pipeline.

X. Guo, Q. Chu, S. Chung, et. al.
Thu, 9 Feb 17
38/67

|

# OpenCluster: A Flexible Distributed Computing Framework for Astronomical Data Processing [IMA]

The volume of data generated by modern astronomical telescopes is extremely large and rapidly growing. However, current high-performance data processing architectures/frameworks are not well suited for astronomers because of their limitations and programming difficulties. In this paper, we therefore present OpenCluster, an open-source distributed computing framework to support rapidly developing high-performance processing pipelines of astronomical big data. We first detail the OpenCluster design principles and implementations and present the APIs facilitated by the framework. We then demonstrate a case in which OpenCluster is used to resolve complex data processing problems for developing a pipeline for the Mingantu Ultrawide Spectral Radioheliograph. Finally, we present our OpenCluster performance evaluation. Overall, OpenCluster provides not only high fault tolerance and simple programming interfaces, but also a flexible means of scaling up the number of interacting entities. OpenCluster thereby provides an easily integrated distributed computing framework for quickly developing a high-performance data processing system of astronomical telescopes and for significantly reducing software development expenses.

S. Wei, F. Wang, H. Deng, et. al.
Thu, 19 Jan 17
3/42

|

# Performance Optimisation of Smoothed Particle Hydrodynamics Algorithms for Multi/Many-Core Architectures [CL]

We describe a strategy for code modernisation of Gadget, a widely used community code for computational astrophysics. The focus of this work is on node-level performance optimisation, targeting current multi/many-core Intel architectures. We identify and isolate a sample code kernel, which is representative of a typical Smoothed Particle Hydrodynamics (SPH) algorithm. The code modifications include threading parallelism optimisation, change of the data layout into Structure of Arrays (SoA), auto-vectorisation and algorithmic improvements in the particle sorting. We measure lower execution time and improved threading scalability both on Intel Xeon ($2.6 \times$ on Ivy Bridge) and Xeon Phi ($13.7 \times$ on Knights Corner) systems. First tests on second generation Xeon Phi (Knights Landing) demonstrate the portability of the devised optimisation solutions to upcoming architectures.

F. Baruffa, L. Iapichino, N. Hammer, et. al.
Tue, 20 Dec 16
85/88

Comments: 18 pages, 5 figures, submitted

# Learning an Astronomical Catalog of the Visible Universe through Scalable Bayesian Inference [CL]

Celeste is a procedure for inferring astronomical catalogs that attains state-of-the-art scientific results. To date, Celeste has been scaled to at most hundreds of megabytes of astronomical images: Bayesian posterior inference is notoriously demanding computationally. In this paper, we report on a scalable, parallel version of Celeste, suitable for learning catalogs from modern large-scale astronomical datasets. Our algorithmic innovations include a fast numerical optimization routine for Bayesian posterior inference and a statistically efficient scheme for decomposing astronomical optimization problems into subproblems.
Our scalable implementation is written entirely in Julia, a new high-level dynamic programming language designed for scientific and numerical computing. We use Julia’s high-level constructs for shared and distributed memory parallelism, and demonstrate effective load balancing and efficient scaling on up to 8192 Xeon cores on the NERSC Cori supercomputer.

J. Regier, K. Pamnany, R. Giordano, et. al.
Fri, 11 Nov 16
11/40

# A Survey of High Level Frameworks in Block-Structured Adaptive Mesh Refinement Packages [CL]

Over the last decade block-structured adaptive mesh refinement (SAMR) has found increasing use in large, publicly available codes and frameworks. SAMR frameworks have evolved along different paths. Some have stayed focused on specific domain areas, others have pursued a more general functionality, providing the building blocks for a larger variety of applications. In this survey paper we examine a representative set of SAMR packages and SAMR-based codes that have been in existence for half a decade or more, have a reasonably sized and active user base outside of their home institutions, and are publicly available. The set consists of a mix of SAMR packages and application codes that cover a broad range of scientific domains. We look at their high-level frameworks, and their approach to dealing with the advent of radical changes in hardware architecture. The codes included in this survey are BoxLib, Cactus, Chombo, Enzo, FLASH, and Uintah.

A. Dubey, A. Almgren, J. Bell, et. al.
Fri, 28 Oct 16
37/73