Implementing Ideas for Improving Software Citation and Credit [IMA]

Improving software citation and credit continues to be a topic of interest across and within many disciplines, with numerous efforts underway. In this Birds of a Feather (BoF) session, we started with a list of actionable ideas from last year’s BoF and other similar efforts and worked alone or in small groups to begin implementing them. Work was captured in a common Google document; the session organizers will disseminate or otherwise put this information to use in or for the community in collaboration with those who contributed.

Read this paper on arXiv…

P. Teuben, A. Allen, G. Berriman, et. al.
Tue, 22 Nov 16

Comments: 4 pages; to be published in ADASS XXVI (held Oct 16-20, 2016) proceedings


The Durability and Fragility of Knowledge Infrastructures: Lessons Learned from Astronomy [CL]

Infrastructures are not inherently durable or fragile, yet all are fragile over the long term. Durability requires care and maintenance of individual components and the links between them. Astronomy is an ideal domain in which to study knowledge infrastructures, due to its long history, transparency, and accumulation of observational data over a period of centuries. Research reported here draws upon a long-term study of scientific data practices to ask questions about the durability and fragility of infrastructures for data in astronomy. Methods include interviews, ethnography, and document analysis. As astronomy has become a digital science, the community has invested in shared instruments, data standards, digital archives, metadata and discovery services, and other relatively durable infrastructure components. Several features of data practices in astronomy contribute to the fragility of that infrastructure. These include different archiving practices between ground- and space-based missions, between sky surveys and investigator-led projects, and between observational and simulated data. Infrastructure components are tightly coupled, based on international agreements. However, the durability of these infrastructures relies on much invisible work – cataloging, metadata, and other labor conducted by information professionals. Continual investments in care and maintenance of the human and technical components of these infrastructures are necessary for sustainability.

Read this paper on arXiv…

C. Borgman, P. Darch, A. Sands, et. al.
Wed, 2 Nov 16

Comments: Paper presented at the 2016 Annual Meeting of the Association for Information Science and Technology, October 14-18, 2016, Copenhagen, Denmark. 10 pages; this https URL

Quantitative Evaluation of Gender Bias in Astronomical Publications from Citation Counts [IMA]

We analyze the role of first (leading) author gender on the number of citations that a paper receives, on the publishing frequency and on the self-citing tendency. We consider a complete sample of over 200,000 publications from 1950 to 2015 from five major astronomy journals. We determine the gender of the first author for over 70% of all publications. The fraction of papers which have a female first author has increased from less than 5% in the 1960s to about 25% today. We find that the increase of the fraction of papers authored by females is slowest in the most prestigious journals such as Science and Nature. Furthermore, female authors write 19$\pm$7% fewer papers in seven years following their first paper than their male colleagues. At all times papers with male first authors receive more citations than papers with female first authors. This difference has been decreasing with time and amounts to $\sim$6% measured over the last 30 years. To account for the fact that the properties of female and male first author papers differ intrinsically, we use a random forest algorithm to control for the non-gender specific properties of these papers which include seniority of the first author, number of references, total number of authors, year of publication, publication journal, field of study and region of the first author’s institution. We show that papers authored by females receive 10.4$\pm$0.9% fewer citations than what would be expected if the papers with the same non-gender specific properties were written by the male authors. Finally, we also find that female authors in our sample tend to self-cite more, but that this effect disappears when controlled for non-gender specific variables.

Read this paper on arXiv…

N. Caplar, S. Tacchella and S. Birrer
Mon, 31 Oct 16

Comments: Abridged version to be submitted to Nature Astronomy. Comments welcome. For readers with very little time, the central result of the paper is covered by Figure 6 (Section 5)

Instruments on large optical telescopes — A case study [IMA]

In the distant past, telescopes were known, first and foremost, for the sizes of their apertures. Advances in technology (not merely those related to astronomical detectors) are now enabling astronomers to build extremely powerful instruments to the extent that instruments have now achieved importance comparable or even exceeding the usual importance accorded to the apertures of the telescopes. However, the cost of successive generations of instruments has risen at a rate far above that of the rate of inflation. Here, given the vast sums of money now being expended on optical telescopes and their instrumentation, I argue that astronomers must undertake “cost-benefit” analysis for future planning. I use the scientific output of the first two decades of the W. M. Keck Observatory as a laboratory for this purpose. I find, in the absence of upgrades, that the time to reach peak paper production for an instrument is about six years. The prime lifetime of instruments (sans upgrades), as measured by citations returns, is about a decade. I investigate how well instrument builders are rewarded (via citations by users of their instruments) and find acknowledgements ranging from 60% to 100%. Next, given the increasing cost of operating optical telescopes, the management of existing observatories continue to seek new partnerships. This naturally raises the question “What is the cost of a single night of telescope time”. I provide a rational basis to compute this quantity. I then end the paper with some thoughts on the future of large ground-based optical telescopes, bearing in mind the explosion of synoptic precision photometric, astrometric and imaging surveys across the electromagnetic spectrum, the increasing cost of instrumentation and the rise of mega instruments.

Read this paper on arXiv…

S. Kulkarni
Wed, 22 Jun 16

Comments: 29 pages, 16 figures, destination: PASP

Aggregation and Linking of Observational Metadata in the ADS [IMA]

We discuss current efforts behind the curation of observing proposals, archive bibliographies, and data links in the NASA Astrophysics Data System (ADS). The primary data in the ADS is the bibliographic content from scholarly articles in Astronomy and Physics, which ADS aggregates from publishers, arXiv and conference proceeding sites. This core bibliographic information is then further enriched by ADS via the generation of citations and usage data, and through the aggregation of external resources from astronomy data archives and libraries. Important sources of such additional information are the metadata describing observing proposals and high level data products, which, once ingested in ADS, become easily discoverable and citeable by the science community. Bibliographic studies have shown that the integration of links between data archives and the ADS provides greater visibility to data products and increased citations to the literature associated with them.

Read this paper on arXiv…

A. Accomazzi, M. Kurtz, E. Henneken, et. al.
Fri, 29 Jan 16

Comments: 4 pages, Proceedings of the ADASS XXV conference

Improving Software Citation and Credit [CL]

The past year has seen movement on several fronts for improving software citation, including the Center for Open Science’s Transparency and Openness Promotion (TOP) Guidelines, the Software Publishing Special Interest Group that was started at January’s AAS meeting in Seattle at the request of that organization’s Working Group on Astronomical Software, a Sloan-sponsored meeting at GitHub in San Francisco to begin work on a cohesive research software citation-enabling platform, the work of Force11 to “transform and improve” research communication, and WSSSPE’s ongoing efforts that include software publication, citation, credit, and sustainability.
Brief reports on these efforts were shared at the BoF, after which participants discussed ideas for improving software citation, generating a list of recommendations to the community of software authors, journal publishers, ADS, and research authors. The discussion, recommendations, and feedback will help form recommendations for software citation to those publishers represented in the Software Publishing Special Interest Group and the broader community.

Read this paper on arXiv…

A. Allen, G. Berriman, K. DuPrie, et. al.
Tue, 29 Dec 15

Comments: Birds of a Feather session organized by the Astrophysics Source Code Library (ASCL, this http URL ); to be published in Proceedings of ADASS XXV (Sydney, Australia; October, 2015). 4 pages

The data sharing advantage in astrophysics [IMA]

We present here evidence for the existence of a citation advantage within astrophysics for papers that link to data. Using simple measures based on publication data from NASA Astrophysics Data System we find a citation advantage for papers with links to data receiving on the average significantly more citations per paper than papers without links to data. Furthermore, using INSPEC and Web of Science databases we investigate whether either papers of an experimental or theoretical nature display different citation behavior.

Read this paper on arXiv…

S. Dorch, T. Drachen and O. Ellegaard
Tue, 10 Nov 15

Comments: 4 pages, 2 figures, Conference proceedings of Focus Meeting 3 on Scholarly Publication in Astronomy, IAU GA 2015, Honolulu

Quantifying the Cognitive Extent of Science [CL]

While the modern science is characterized by an exponential growth in scientific literature, the increase in publication volume clearly does not reflect the expansion of the cognitive boundaries of science. Nevertheless, most of the metrics for assessing the vitality of science or for making funding and policy decisions are based on productivity. Similarly, the increasing level of knowledge production by large science teams, whose results often enjoy greater visibility, does not necessarily mean that “big science” leads to cognitive expansion. Here we present a novel, big-data method to quantify the extents of cognitive domains of different bodies of scientific literature independently from publication volume, and apply it to 20 million articles published over 60-130 years in physics, astronomy, and biomedicine. The method is based on the lexical diversity of titles of fixed quotas of research articles. Owing to large size of quotas, the method overcomes the inherent stochasticity of article titles to achieve <1% precision. We show that the periods of cognitive growth do not necessarily coincide with the trends in publication volume. Furthermore, we show that the articles produced by larger teams cover significantly smaller cognitive territory than (the same quota of) articles from smaller teams. Our findings provide a new perspective on the role of small teams and individual researchers in expanding the cognitive boundaries of science. The proposed method of quantifying the extent of the cognitive territory can also be applied to study many other aspects of “science of science.”

Read this paper on arXiv…

S. Milojevic
Tue, 3 Nov 15

Comments: Accepted for publication in Journal of Informetrics

Measuring Metrics – A forty year longitudinal cross-validation of citations, downloads, and peer review in Astrophysics [CL]

Citation measures, and newer altmetric measures such as downloads are now commonly used to inform personnel decisions. How well do or can these measures measure or predict the past, current of future scholarly performance of an individual? Using data from the Smithsonian/NASA Astrophysics Data System we analyze the publication, citation, download, and distinction histories of a cohort of 922 individuals who received a U.S. PhD in astronomy in the period 1972-1976. By examining the same and different measures at the same and different times for the same individuals we are able to show the capabilities and limitations of each measure. Because the distributions are lognormal measurement uncertainties are multiplicative; we show that in order to state with 95% confidence that one person’s citations and/or downloads are significantly higher than another person’s, the log difference in the ratio of counts must be at least 0.3 dex, which corresponds to a multiplicative factor of two.

Read this paper on arXiv…

M. Kurtz and E. Henneken
Mon, 2 Nov 15

Comments: Author’s version of manuscript accepted for publication in the Journal of the Association for Information Science and Technology (JASIST); 35 pages 16 figures

A New Ranking Scheme for the Institutional Scientific Performance [IMA]

We propose a new performance indicator to evaluate the productivity of research institutions by their disseminated scientific papers. The new quality measure includes two principle components: the normalized impact factor of the journal in which paper was published, and the number of citations received per year since it was published. In both components, the scientific impacts are weighted by the contribution of authors from the evaluated institution. As a whole, our new metric, namely, the institutional performance score takes into account both journal based impact and articles specific impacts. We apply this new scheme to evaluate research output performance of Turkish institutions specialized in astronomy and astrophysics in the period of 1998-2012. We discuss the implications of the new metric, and emphasize the benefits of it along with comparison to other proposed institutional performance indicators.

Read this paper on arXiv…

S. Bilir, E. Gogus, O. Tas, et. al.
Tue, 18 Aug 15

Comments: 12 pages, 3 figures and 2 tables, accepted for publication in Journal of Scientometric Research

Greek Astronomy PhDs: The last 200 years [CL]

We have recently compiled a database with all doctoral dissertations (PhDs) completed in modern Greece (1837-2014), in the general area of astronomy and astrophysics, as well as in space and ionospheric physics. A preliminary statistical analysis of the data is presented, along with a discussion of the general trends observed.

Read this paper on arXiv…

V. Charmandaris
Fri, 10 Jul 15

Comments: 8 pages, 7 figures, (original file also available at this http URL )

Astrophysics Source Code Library Enhancements [IMA]

The Astrophysics Source Code Library (ASCL; is a free online registry of codes used in astronomy research; it currently contains over 900 codes and is indexed by ADS. The ASCL has recently moved a new infrastructure into production. The new site provides a true database for the code entries and integrates the WordPress news and information pages and the discussion forum into one site. Previous capabilities are retained and permalinks to continue to work. This improvement offers more functionality and flexibility than the previous site, is easier to maintain, and offers new possibilities for collaboration. This presentation covers these recent changes to the ASCL.

Read this paper on arXiv…

R. Hanisch, A. Allen, G. Berriman, et. al.
Tue, 11 Nov 14

Comments: 4 pages; to be published in ADASS XXIV Proceedings. ASCL can be accessed at this http URL

Data engineering for archive evolution [IMA]

From the moment astronomical observations are made the resulting data products begin to grow stale. Even if perfect binary copies are preserved through repeated timely migration to more robust storage media, data standards evolve and new tools are created that require different kinds of data or metadata. The expectations of the astronomical community change even if the data do not. We discuss data engineering to mitigate the ensuing risks with examples from a recent project to refactor seven million archival images to new standards of nomenclature, metadata, format, and compression.

Read this paper on arXiv…

R. Seaman
Wed, 15 Oct 14

Comments: 11 pages, this is a longer version of a poster paper submitted to the proceedings of ADASS XXIV

Two years of ALMA bibliography – lessons learned [IMA]

Telescope bibliographies are integral parts of observing facilities. They are used to associate the published literature with archived observational data, to measure an observatory’s scientific output through publication and citation statistics, and to define guidelines for future observing strategies.
The ESO and NRAO librarians as well as NAOJ jointly maintain the ALMA (Atacama Large Millimeter/submillimeter Array) bibliography, a database of refereed papers that use ALMA data.
In this paper, we illustrate how relevant articles are identified, which procedures are used to tag entries in the database and link them to the correct observations, and how results are communicated to ALMA stakeholders and the wider community. Efforts made to streamline the process will be explained and evaluated, and a first analysis of ALMA papers published after two years of observations will be given.

Read this paper on arXiv…

S. Meakins, U. Grothkopf, M. Bishop, et. al.
Mon, 28 Jul 14

Comments: 7 pages; to be published in the Proceedings of SPIE, vol. 9149, 9149-81 (2014)

The recent Italian regulations about the open-access availability of publicly-funded research publications, and the documentation landscape in astrophysics [CL]

In October 2013 Italy enacted a law containing the first national regulations about the open-access availability of publicly-funded research results (publications).This contribution examines how these new regulations match with the specific situation of that open-access pioneering discipline which is astrophysics.

Read this paper on arXiv…

M. Marra
Thu, 24 Jul 14

Comments: To be published in the proceedings of LISA VII Conference, Naples, Italy, 18-20.6.2014

Looking before leaping: Creating a software registry [IMA]

What lessons can be learned from examining numerous efforts to create a repository or directory of scientist-written software for a discipline? Astronomy has seen a number of efforts to build a repository or directory of scientist-written software, one of which is the Astrophysics Source Code Library (ASCL). The ASCL ( was founded in 1999, had a period of dormancy, and was restarted in 2010. When taking over responsibility for the ASCL in 2010, Allen sought to answer the opening question, hoping this would better inform her work. We also provide specific steps the ASCL is taking to try to improve code sharing and discovery in astronomy and share recent improvements to the resource.

Read this paper on arXiv…

A. Allen and J. Schmidt
Tue, 22 Jul 14

Comments: 3 pages; submission for WSSSPE2

The Virtual Observatory Registry [IMA]

In the Virtual Observatory (VO), the Registry provides the mechanism with which users and applications discover and select resources — typically, data and services — that are relevant for a particular scientific problem. Even though the VO adopted technologies in particular from the bibliographic community where available, building the Registry system involved a major standardisation effort, involving about a dozen interdependent standard texts. This paper discusses the server-side aspects of the standards and their application, as regards the functional components (registries), the resource records in both format and content, the exchange of resource records between registries (harvesting), as well as the creation and management of the identifiers used in the system based on the notion of authorities. Registry record authors, registry operators or even advanced users thus receive a big picture serving as a guideline through the body of relevant standard texts. To complete this picture, we also mention common usage patterns and open issues as appropriate.

Read this paper on arXiv…

M. Demleitner, G. Greene, P. Sidaner, et. al.
Mon, 14 Jul 14

Comments: N/A

Computing and Using Metrics in the ADS [CL]

Finding measures for research impact, be it for individuals, institutions, instruments or projects, has gained a lot of popularity. More papers than ever are being written on new impact measures, and problems with existing measures are being pointed out on a regular basis. Funding agencies require impact statistics in their reports, job candidates incorporate them in their resumes, and publication metrics have even been used in at least one recent court case. To support this need for research impact indicators, the SAO/NASA Astrophysics Data System (ADS) has developed a service which provides a broad overview of various impact measures. In this presentation we discuss how the ADS can be used to quench the thirst for impact measures. We will also discuss a couple of the lesser known indicators in the metrics overview and the main issues to be aware of when compiling publication-based metrics in the ADS, namely author name ambiguity and citation incompleteness.

Read this paper on arXiv…

E. Henneken, A. Accomazzi, M. Kurtz, et. al.
Thu, 19 Jun 14

Comments: to appear in proceedings of LISA VII conference, Naples, Italy

Bibliometric Indicators of Young Authors in Astrophysics: Can Later Stars be Predicted? [CL]

We test 16 bibliometric indicators with respect to their validity at the level of the individual researcher by estimating their power to predict later successful researchers. We compare the indicators of a sample of astrophysics researchers who later co-authored highly cited papers before their first landmark paper with the distributions of these indicators over a random control group of young authors in astronomy and astrophysics. We find that field and citation-window normalisation substantially improves the predicting power of citation indicators. The two indicators of total influence based on citation numbers normalised with expected citation numbers are the only indicators which show differences between later stars and random authors significant on a 1% level. Indicators of paper output are not very useful to predict later stars. The famous $h$-index makes no difference at all between later stars and the random control group.

Read this paper on arXiv…

F. Havemann and B. Larsen
Mon, 14 Apr 14

The Unified Astronomy Thesaurus [IMA]

The Unified Astronomy Thesaurus (UAT) is an open, interoperable and community-supported thesaurus which unifies the existing divergent and isolated Astronomy & Astrophysics vocabularies into a single high-quality, freely-available open thesaurus formalizing astronomical concepts and their inter-relationships. The UAT builds upon the existing IAU Thesaurus with major contributions from the astronomy portions of the thesauri developed by the Institute of Physics Publishing, the American Institute of Physics, and SPIE. We describe the effort behind the creation of the UAT and the process through which we plan to maintain the document updated through broad community participation.

Read this paper on arXiv…

A. Accomazzi, N. Gray, C. Erdmann, et. al.
Thu, 27 Mar 14

Principles of scientific research team formation and evolution [CL]

Research teams are the fundamental social unit of science, and yet there is currently no model that describes their basic property: size. In most fields teams have grown significantly in recent decades. We show that this is partly due to the change in the character of team-size distribution. We explain these changes with a comprehensive yet straightforward model of how teams of different sizes emerge and grow. This model accurately reproduces the evolution of empirical team-size distribution over the period of 50 years. The modeling reveals that there are two modes of knowledge production. The first and more fundamental mode employs relatively small, core teams. Core teams form by a Poisson process and produce a Poisson distribution of team sizes in which larger teams are exceedingly rare. The second mode employs extended teams, which started as core teams, but subsequently accumulated new members proportional to the past productivity of their members. Given time, this mode gives rise to a power-law tail of large teams (10-1000 members), which features in many fields today. Based on this model we construct an analytical functional form that allows the contribution of different modes of authorship to be determined directly from the data and is applicable to any field. The model also offers a solid foundation for studying other social aspects of science, such as productivity and collaboration.

Read this paper on arXiv…

S. Milojevic
Thu, 13 Mar 14

10 Simple Rules for the Care and Feeding of Scientific Data [CL]

This article offers a short guide to the steps scientists can take to ensure that their data and associated analyses continue to be of value and to be recognized. In just the past few years, hundreds of scholarly papers and reports have been written on questions of data sharing, data provenance, research reproducibility, licensing, attribution, privacy, and more, but our goal here is not to review that literature. Instead, we present a short guide intended for researchers who want to know why it is important to “care for and feed” data, with some practical advice on how to do that.

Read this paper on arXiv…

Fri, 10 Jan 14

Ideas for Advancing Code Sharing (A Different Kind of Hack Day) [IMA]

How do we as a community encourage the reuse of software for telescope operations, data processing, and calibration? How can we support making codes used in research available for others to examine? Continuing the discussion from last year Bring out your codes! BoF session, participants separated into groups to brainstorm ideas to mitigate factors which inhibit code sharing and nurture those which encourage code sharing. The BoF concluded with the sharing of ideas that arose from the brainstorming sessions and a brief summary by the moderator.

Read this paper on arXiv…

Tue, 31 Dec 13

Astrophysics Source Code Library: Incite to Cite! [IMA]

The Astrophysics Source Code Library (ASCL, this http URL) is an online registry of over 700 source codes that are of interest to astrophysicists, with more being added regularly. The ASCL actively seeks out codes as well as accepting submissions from the code authors, and all entries are citable and indexed by ADS. All codes have been used to generate results published in or submitted to a refereed journal and are available either via a download site or froman identified source. In addition to being the largest directory of scientist-written astrophysics programs available, the ASCL is also an active participant in the reproducible research movement with presentations at various conferences, numerous blog posts and a journal article. This poster provides a description of the ASCL and the changes that we are starting to see in the astrophysics community as a result of the work we are doing.

Read this paper on arXiv…

Wed, 25 Dec 13