PESummary: the code agnostic Parameter Estimation Summary page builder

PESummary is a Python software package for processing and visualising data from any parameter estimation code. The easy to use Python executable scripts and extensive online documentation has resulted in PESummary becoming a key component in the international gravitational-wave analysis toolkit. PESummary has been developed to be more than just a post-processing tool with all outputs fully self-contained. PESummary has become central to making gravitational-wave inference analysis open and easily reproducible.


Motivation and significance
Bayesian inference is central for many areas of science [e.g. 1, 2, 3, 4] as it allows for the identification of model parameters which best describes the collected data [e.g. 5,6], see Section 2.1.1 for more details. Since the first detection of gravitational-waves (GWs) [7], cosmic ripples predicted by Einstein's theory of General Relativity [8,9], Bayesian inference has been used to infer the astrophysical source properties from GWs measurements [see e.g. 10,11,12], understanding the properties of noise within the GW detectors [13] and understanding the astrophysical population of GW sources [14]. For a detailed review of how Bayesian inference is used within the GW community see [15]. In addition, with more areas of science entering the 'open data era', there is a need for a robust, easy to use and code agnostic software to interpret and distribute the output from all Bayesian inference codes.
PESummary, the Parameter Estimation Summary page builder, is a Python package that provides an intuitive, object-oriented user interface for displaying, interpreting, comparing and distributing posterior samples from any parameter estimation code. PESummary's unified input/output (I/O) system enables the comparison of samples from codes that previously stored data in incompatible formats. . Once a gravitational-wave is observed and uploaded to GraceDB, numerous Bayesian inference analyses are launched to extract the astrophysical source properties from the gravitational-wave measurement. The output data from all analyses is then passed to PESummary for post-processing. Alongside webpages for displaying, interpreting and comparing the Bayesian inference data, PESummary produces skymaps and source classifications, and a single universal 'metafile' containing information about each Bayesian inference analyses undertaken. This file can then distributed to the wider community. For the LVK workflow, skymaps and source classification probabilities are uploaded to GraceDB for circulation.
Since its first use in 2019, PESummary has grown to be a key component in the workflow of the LIGO Scientific [16], Virgo [17] and KAGRA [18] collaborations (LVK), as shown in Figure 1. It is used to analyse and compare the outputs from the popular GW Bayesian inference software packages: LALInference [19], RIFT [20,21,22], PyCBC [23] and Bilby [24,25,26] (both PyCBC and Bilby provide an interface for a number of popular sampling packages [e.g. 27, 28]), as well as circulating both skymaps through the GW candidate event database GraceDB [29] and Bayesian inference data through the LIGO Document Control Center. PESummary was critical in the parameter estimation analysis of GW190412 [30], GW190425 [31] and the re-analysis of the first gravitational-wave transient catalog (GWTC-1) [12] using Bilby. The released data were also in the PESummary data format [32,33]. This article does not present a complete record of all the capabilities of PESummary, for more information, see https://lscsoft.docs.ligo.org/ pesummary/stable docs/index.html.
PESummary can be used for more than simply post-processing parameter estimation results. This library includes, but not limited to, new Kernel Density Estimators [46] to estimate PDFs of random variables within specified domains (one and two dimensional), a unified infrastructure for reading data in different formats, a release ready HDF5 [36] file for storing one or more Bayesian inference analyses, a comprehensive plotting suite and webpage module for analysing and comparing Bayesian inference data, and checkpointing, a requirement for cloud based computing.
PESummary is designed to be as modular and adaptable as possible, ensuring the code will age gracefully with advances in Bayesian inference software. Where possible, the code is kept general meaning additional postprocessing techniques can easily be implemented into the PESummary framework.

Software Architecture
PESummary is structured around a small number of class objects. Each object provides class and instance methods to provide the user with a complete interface for all operations.

Tabular Data
Bayesian inference returns the best fit model parameters through a posterior probability distribution function (PDF) calculated through Bayes' theorem [47], where p(θ|d) is the posterior probability of the model having parameters θ given the data d, p(θ) is the prior probability of the model having parameters θ, p(d|θ) is the likelihood of observing the data given a model with parameters θ and p(d) is the evidence of the data, a model independent quantity. Often the likelihood and the posterior probability distribution function are unknown analytically. Consequently, most Bayesian inference packages are designed to draw samples from the unknown posterior PDF [48,49]. Samples approximating a posterior PDF for a single parameter, otherwise known as a marginalized posterior PDF, are then identified by integrating the posterior PDF over all other parameters. Often, these posterior samples are output in the form of a table saved in a 'result file'. PESummary's stores posterior samples in a custom high level table structure, pesummary.utils.samples dict.SamplesDict, a subclass of the Python builtin dict object. This class offers numerous methods for analysing the stored data, including the .plot method. Each marginalized posterior distribution is stored as a pesummary.utils.samplesdict.Array object, an inherited class of numpy.ndarray [50]. This structure provides direct access to the optimised array functions from NumPy [35] and the usability and familiarity of dictionaries.
A popular algorithm for sampling from the posterior distribution is Markov-Chain Monte-Carlo [48]. For this case, multiple Markov chains are often run in parallel to test convergence through the Gelman-Rubin statistic [51]. PE-Summary stores multiple Markov chains in the pesummary.utils.samplesdict.MCMCSamplesDict) class, a dictionary where each chains posterior samples are represented by a pesummary.utils.samples dict.SamplesDict object. Algorithms for removing burnin are available via the .burnin() method. Convergence between chains can be measured via the .gelmanrubin() method.
Often multiple Bayesian inference analyses are performed to identify how the PDFs vary for different settings. Consequently, PESummary provides the pesummary.utils.samples dict.MultiAnalysisSamplesDict class, inherited from the the Python builtin dict object, for storing multiple pesummary.utils.samples dict.SamplesDict tables, each with an assigned label.

Packages
PESummary has followed the same methodology as Bilby by separating the top level code into 2 packages: core and gw. The core package provides all of the necessary code for analysing, displaying and comparing data files from general inference problems. The gw specific package contains GW functionality, including converting posterior distributions, deriving event classifications and GW specific plots, see Sec.2.2.2.

The core package
The pesummary.core package provides all of the code required for post-processing general inference result files. It is designed to be generic and therefore work with the output from any Bayesian inference software. It provides a unified interface for producing plots, calculating useful statistics and generating webpages.
Plots are generated via the pesummary.core.plots.plot module. One dimensional marginalized posterior distributions are visualised as either a histogram, a kernel density estimate (scipy.stats.gaussian kde or pesummary.core.plots.plots.bounded 1d kde.Bounded 1d kde) or cumulative distributions. Diagnostic plots displaying the marginalized trace or autocorrelation are also available. Comparison plots between multiple result files can be achieved by producing comparison one dimensional marginalized posterior distributions, cumulative distributions, box plots, violin plots, two dimensional contour and scatter plots or Jenson-Shannon [52] and Kolmogorov-Smirnov [53,54] statistics.
The pesummary.core.webpage.webpage module is a Python wrapper for writing HTML. The pesummary.core.webpage.webpage.page class provides functionality for generating multi-level navigation bars, tables of images and/or data, modal carousels and more. The design and functionality of the webpages are controlled through the pesummary.core.css and pesummary.core.js modules; each containing custom style sheets (CSS) or JavaScript files (JS) respectively.
The combine corner.js script contains functionality to generate custom corner plots by manipulating a pre-made figure [made with e.g. 38]. By providing an interface for the user to specify parameters they wish to compare, the PESummary webpages allow for the PDF, and its correlations, to be interactively explored.

The gw package
The gw package provides the functionality for parameter estimation specific to GW astronomy. Building on the core package, the gw module provides additional methods and classes for handling GW specific data files. Although the gw package is tailored for compact binary coalescence data files, we provide GW specific methods which can be applied to the Bayesian inference data from any transient GW source.
The gw package provides a comprehensive conversion module (pesummary.gw.file.conversions) for deriving alternative posterior distributions from the input, e.g. the primary mass and secondary mass from chirp mass and mass ratio. Assuming a binary black hole merger, the conversion class provides multiple methods for estimating the properties of the final black hole: final mass (or equivalently the energy radiated in gravitational-waves), final spin, peak luminosity and final kick velocity [55,56,57,58,59,60,61,62,63,64,65,66,67]. All fits are calibrated to numerical relativity and are interchangeable via keyword arguments provided to the conversion class.
On the 17th August 2017, alerts [68] were sent out to more than 60 international groups of astronomers notifying them that a GW had just been detected: GW170817 [69]. Each group began observing the night sky to independently observe the source of the GW. GW170817 opened the window of a long-awaited multi-messenger astronomy [70]. Through interfacing with the ligo.skymap [71] package, PESummary provides an intuitive method for generating skymaps: data files showing the most likely source location of the GWs. These skymaps are distributed to astronomers through GraceDB, see Figure 1. In the last GW observing period, 26 alerts were sent out regarding possible GW candidates, each containing skymaps [72]. PESummary also interfaces with the PEPredicates [73] and P-Astro [74] packages to provide source classification probabilities for the type of compact binary merger observed; such as the probability that the system is a binary black hole or a binary neutron star. Both the skymap and source classification probabilities are vital for electromagnetic follow-up campaigns.
PESummary takes advantage of the plotly [43] interactive plotting package to produce interactive corner plots for extrinsic and intrinsic parameters. The pesummary.gw.plots.publication module also allows for 'publication' quality plots to be produced, for instance those in [12].

Software Functionalities 2.3.1. Unified input/output
Bayesian inference packages output their posterior samples in varying formats. In just the GW community alone, LALInference, Bilby and RIFT output their data in 3 different formats: HDF5 [36], JSON [75] and dat. Although Bilby is able to output it's data in HDF5, it's posterior samples are stored differently to LALInference: LALInference stores posterior samples as a numpy.recarray while Bilby writes each marginalized posterior distribution as a separate dataset, and an alternative software package for reading HDF5 files is required. Each data file also stores different metadata in different locations. This incompatibility makes it difficult to compare the contents from different Bayesian inference samplers both in and out of the GW community.
PESummary provides a unified infrastructure for reading data in different formats via the pesummary.io module. The universal pesummary.io.read() function reads data stored in all common file formats via the pesummary.core.file.formats module. The GW specific package extends the allowed file formats via the pesummary.gw.file.formats module. See Table 2   from different data files can be compared through the common .samplesdict property. All read result files can be written to a specified file format via the common .write method. This method calls the universal pesummary.io.write() function. PESummary offers the pesummary.core.file.meta file module for producing a universal file containing posterior samples and metadata for one or more analyses. This single 'metafile' aims to store all information regarding the Baysian inference analysis. This includes prior samples, configuration files, injection information, environment information for reproducibility etc. For GW analyses, the pesummary.gw.file.meta file module allows for the PSD, skymap data and calibration information to also be saved.

Executables
PESummary provides multiple executable Python scripts to act as an intermediary between the command line and the core functionality. See Table 3. summarypages is the main executable and combines summaryclassification, summaryclean, summarycombine, summarydetchar, summaryplots and summarypublication. summarypages is the most general executable provided by PESummary. Its GW workflow is described in Fig. 2. The core workflow is similar, except GW specific plots and webpages are not generated and the core classes are used.
The summarypages executable takes one or more result files as input via the --samples command line argument. A plethora of optional command line arguments are also available to allow for complete customisation 1 . All options are then passed to the pesummary.gw.inputs.GWInput class (in- herited from the pesummary.core.inputs.Input class) which checks the inputs. Amongst these checks, the samples are extracted from all result files (see Section 2.3.1), converted to generate additional PDFs, and stored in a pesummary.utils.samples dict.MultiAnalysisSamplesDict object with an assigned label.
All properties of the GWInput class are then used to initiate the pesummary.cli.summaryplots.PlotGeneration class. Here, all plots are generated on multiple CPUs, if specified, with the .generate plots method. This includes source classification plots, marginalized posteriors, interactive plots and more. Initial skymaps are generated based on a twodimensional histogram of the right ascension and declination samples. A subprocess is then launched to generate the more accurate ligo.skymap skymap. This implementation allows for the rest of the pipeline to continue without having to wait for the often time-consuming ligo.skymap skymap to be produced. Once complete, the initial skymap is overwritten.
All webpages are then produced by running the .generate webpages method once the pesummary.cli.summarypages.WebpageGeneration class is initialised with all GWInput properties. This includes single webpages for each parameter in each result file, interactive corner pages, comparison pages for all common parameters, a version page providing version and environment information and a downloads page. Finally a single metafile containing all information from the run is produced. This includes environment information, posterior samples, command line arguments and more. This file is available for download via the downloads page.
The output from summarypages is completely self-contained; allowing for it to distributed if required. An example of the summarypages output can be seen here: https://pesummary.github.io/GW190412/home.html. This output was produced from Listing 4.
Amidst the collection of optional command line arguments, the pesummary.gw module also provides a dynamic argument parser. This allows for dependent arguments to be passed from the command line (see section 3.5).

Illustrative Examples
Below we provide a limited set of examples to demonstrate some of the features of PESummary. All data and scripts that are used as part of this section can be downloaded from https://github.com/pesummary/pesummarypaper-data. More examples can be found in the PESummary repository: https://git.ligo.org/lscsoft/pesummary/-/tree/master/examples. The following examples assume that you have cloned the pesummary-paper-data repository and you are in the base directory.

Example 1: Running with emcee
This example shows the flexibility of the PESummary post-processing package. We run the Fitting a model to data tutorial provided as part of the emcee [76] sampling package. We save the posterior samples to a '.dat' file and run with 8 chains. All post-processing is then handled with PESummary, see Figure 3 for example output. The emcee code snippet, as well as the posterior samples can be downloaded from the pesummary-paper-data repository.

Example 2: Reading a result file
This example demonstrates how to read in a PESummary 'metafile' using the pesummary.io module. We use the posterior samples.h5 file produced as part of Listing 1 (running Listing 1 is not necessary for this example, the metafile is available in the pesummary-paper-data repository) and specify that we would like to use the core package meaning only the core file formats are allowed. We show how to extract the posterior samples for a specified analysis, how to access the marginalized posterior samples, and how to extract the prior samples and any stored configuration files. Some of the attributes of each object are also shown. For full documentation, run the inbuilt Python help function help [77]. from pesummary.io import read f = read("posterior_samples.h5", package="core") multi_analysis_samples = f.samples_dict print(type(multi_analysis_samples)) analysis = multi_analysis_samples.labels[0] posterior_samples = multi_analysis_samples[analysis] print(type(posterior_samples)) print(posterior_samples.keys()) marginalized = posterior_samples["m"] print(type(marginalized)) print(marginalized.average(type="median")) priors = f.priors["samples"] config = f.config

(GW) Example 3: analysing public LIGO and Virgo posterior samples
This example demonstrates how to extract and analyse public LIGO and Virgo posterior samples. It includes demonstrations of how to produce 'standard' plots for the 'combined' analysis stored in the PESummary 'metafile' through the .plot method, see Figure 4. We use the publicly available https://dcc.ligo.org/DocDB/0163/P190412/009/posterior samples.h5 file, which has been copied to the pesummary-paper-data repository. This specific file was chosen since it includes additional information such as the PSD, calibration envelopes, priors etc. Owing to GitHub's strict 100MB file limit, the file located in the pesummary-paper-data repository is stored under gitlfs [78]. For instructions on how to successfully download this file, see the pesummary-paper-data README. Virgo posterior samples This example demonstrates how a summary page can be generated from publicly available LIGO and Virgo data files. We use the same publicly available data file as Listing 3. This 'metafile' contains a total of 10 Bayesian inference analyses, each using different standard models used by the LVK. In this example, we choose to compare only a select subset: an analysis conducted with the IMRPhenomPv3HM model [79] and the SEOBNRv4PHM model [80,81,55] (see [82] for details). We specify that we would like to use the gw package (--gw), run on 15 CPUs (--multi process 15), and generate 'publication' quality plots (--publication). The output page can be found here: https://pesummary.github.io/GW190412/home.html. 25 [83], requires a known list of arguments. This means that they cannot depend on another variable. Through the pesummary.gw.commandline module, PESummary allows the user to specify CLAs which change depending on the provided label. Below we show how PSDs can be provided to summarypages dynamically through the --{} psd CLA. Result files and psd data was created using the 'make data for listing5.py' script made available in the pesummary-paper-data repository.  12]. We use publicly available posterior samples released as part of GWTC-1 (ignoring the GW170809 [12] prior choices file): https://dcc.ligo.org/public/0157/P1800370/005/GWTC-1 sample release.tar.gz, which have been copied to the pesummary-paperdata repository. Figure 5 shows an example of the output. When running this Listing, PESummary will print multiple warnings to stdout. These warnings are expected and shows that PESummary is robust to potential failures and will continue to produce an output while still warning the reader appropriately. One such example is a message warning the user that the './GWTC-1 sample release/GW170817 GWTC-1.hdf5' cannot be read in and data will not be added to the plot as this file contains 'multiple posterior sample tables: IMRPhenomPv2NRT highSpin posterior, IMRPhe-nomPv2NRT lowSpin posterior' and we have not specified which we wish to load from the command line.

Impact
PESummary is now a widely used library in the LVK. In just over one year, PESummary has become the post-processing software for the main LVK GW parameter estimation codes [24,25,19,84,85,86,87] and is relied upon for the open data release of the LVK parameter estimation data products [88]. Rather than releasing multiple data products in different formats spread across numerous URLs, PESummary has greatly simplified this to simply releasing a single file [32,33]. This can propel future research in the field, as  Figure 4 in Ref. [12], Right: Recreation of Figure 5 in Ref. [12]. researches no longer have to create custom scripts for combining, retrieving, and recreating the initial analysis.
The flexibility and intuitive Python executables provided by PESummary, has significantly improved the efficiency of researchers (often early-career graduates or graduate scientists), by reducing the need for repetitive tasks. In particular, PESummary's interactive corner plots has led to the increased knowledge of degeneracies between parameters for researchers within the LVK.
PESummary is also fully incorporated into the automated low-latency alert workflow [89]. Posterior based classifications are therefore automatically produced and posted to the online GW Candidate Event Database [29] from the automatic parameter estimation follow-up. This information will greatly aid electromagnetic followup campaigns.

Conclusions
In this paper we have described PESummary, the Parameter Estimation Summary page builder. This Python package provides a modern interface for displaying interpreting, comparing and distributing Bayesian inference data from any parameter estimation code. PESummary has been used extensively by the international GW community, and has been crucial for Advanced LIGO's [90] and Advanced Virgo's [91] third gravitational wave observing run.
Although PESummary is primarily used for post-processing GW Bayesian inference data from compact binary coalescenses, looking forward, we plan to incorporate other GW fields, e.g. Tests of General Relativity [see e.g. 92]. This will enable PESummary to be the driving force behind the post-processing and distribution of all Bayesian inference data output from the LVK. We will also continue to expand our already comprehensive plotting suite to improve its versatility for any Bayesian inference analysis.

Conflict of Interest
We wish to confirm that there are no known conflicts of interest associated with this publication and there has been no significant financial support for this work that could have influenced its outcome.