SWIGLAL: Python and Octave interfaces to the LALSuite gravitational-wave data analysis libraries

The LALSuite data analysis libraries, written in C, implement key routines critical to the successful detection of gravitational waves, such as the template waveforms describing the merger of two black holes or two neutron stars. SWIGLAL is a component of LALSuite which provides interfaces for Python and Octave, making LALSuite routines accessible directly from scripts written in those languages. It has enabled modern gravitational-wave data analysis software, used in the first detection of gravitational waves, to be written in Python, thereby benefiting from its ease of development and rich feature set, while still having access to the computational speed and scientific trustworthiness of the routines provided by LALSuite.


Motivation and Significance
The choice of programming language is a critical decision in the design of scientific software. Languages such as C provide a low level of abstraction between the programmer and the machine architecture, and are compiled to machine code for best performance. The lack of abstraction, however, places a higher burden on the developer to manually handle low-level tasks, such as memory management, which detracts from the scientific problem at hand. High-level scripting languages, of which Python [1] and Octave [2] are two examples, provide a higher level of abstraction from the machine architecture, freeing the developer to focus on the algorithm, reducing development time, and facilitating the rapid prototyping of new ideas. They also provide a richer set of features, either built into the language or else available through easy-to-install packages downloaded from a central repository. They are generally not compiled to machine code, however, and therefore performance may not match that provided by low-level machine code.
Often, a new software package will want to make use of existing libraries which provide routines which are particularly efficient, well-tested and trusted by the wider scientific community, and/or difficult to re-implement. In such cases, the developer may be constrained to use a particular language -the same language as the existing library -and therefore be forced to accept the costs and benefits of that particular language. A solution to this problem is to write a software wrapper around the existing library, which then exposes its routines so that it can be used from the programming language of choice. For example, software wrappers can enable the developer to make use of libraries written in C, while also benefiting from the ease of development and rich feature set provided by high-level languages such as Python.
The first detections of gravitational waves, from the merger of two black holes [3] and from two neutron stars [4], were made possible through, amid many other advances, the careful implementation and rigorous testing of data analysis software. LALSuite (LSC Algorithm Library Suite; [5]) is a collection of software routines for gravitational-wave data analysis, written in C, and developed since 2000. As of version 6.67 [6], LALSuite provides, along with ∼ 230 executables, 9 libraries which collectively export a large number of symbols, and represents a significant code base of hundreds of thousands of lines of C code (Table 1). It provides atomic data types for fixed-width integer, floating-point, and complex numbers; and compound data types called "structs", accompanied by functions which create, destroy, and manipulate them. (Structs in C are conceptually equivalent to classes in Python and other high-level languages; this paper will hereafter use the term "class" to refer to both low-level structs and high-level classes.) The LALSuite libraries provide extensive, well-tested routines for gravitationalwave data analysis, in particular for searches for binary black holes and binary neutron stars, which have been carefully vetted by members of the LIGO Scientific Collaboration and Virgo Collaboration. These include the template signal waveforms for such events, as predicted by general relativity, which tend to be complicated mathematical expressions [e.g. 7] which are time-consuming to implement and verify. More recent gravitational-wave data analysis software has sought to take advantage of the ease of development and extensive package library of Python; without access to LALSuite routines, however, developers

Software Description and Illustrative Examples
Generation of the SWIGLAL interface uses SWIG (Simplified Wrapper and Interface Generator; [8]), a software development tool. SWIG parses the header files of a C/C++ library and identifies the symbols the library exports. It then generates the wrapper code required to interface the library with a variety of high-level languages, including Python and Octave. Because it takes C/C++ header files directly as input, SWIG does not require additional code to be written specifically for each exported symbol. Given the large number of symbols exported by LALSuite (Table 1), the automation provided by SWIG relieves LALSuite developers of a significant maintenance burden. SWIG wrapper code can be further customised by adding directives which modify the SWIGgenerated wrapper code. For example, specific directives can be applied to every class in order to add constructors and destructors. SWIG does not, however, provide a general framework for automating the application of many directives to arbitrary classes of symbols. To fully automate interface generation, SWIG-LAL runs SWIG twice: first as a simple C/C++ header parser, then as an wrapper code generator. The workflow is as follows: 1. For each LALSuite library, SWIGLAL generates a basic SWIG interface which simply incorporates all the C header files provided by that library. 2. The basic SWIG interface is input to SWIG with its -xml option, which generates an XML file containing a syntax tree of all symbols exported by the LALSuite library headers.  Example usage of the SWIGLAL() macro in the wrapping of the LAL class REAL4Vector. The ARRAY STRUCT 1D() macro exposes the "data" field of the REAL4Vector class as a native scripting-language array of length "length".
3. The XML syntax tree is input to a custom Python script, generate swigiface.py. It parses the XML syntax tree, gathers information about the exported symbols, and generates the full SWIG interface, which augments the basic interface with additional SWIG directives to implement desired functionalities. 4. The full SWIG interface is input to SWIG with its -python or -octave options to generate wrapper code for Python or Octave respectively, which are then compiled into dynamically loadable modules. Python modules are loaded using the import directive; Octave modules are loaded by simply calling the name of the library, e.g. " lal ;" for the LAL library.
The workflow is implemented as a collection of macros and build rules in the GNU Autoconf/Automake build system used by LALSuite. Autoconf macros perform configuration tasks, e.g. finding a compatible version of the swig binary, and determining the C/C++ preprocessor/compiler/linker flags needed to build the Python/Octave modules. Automake macros implement the workflow to build the basic and full SWIG interfaces, and the Python/Octave modules, as described above.
A key design objective of SWIGLAL is that the interfaces should resemble and behave, in the supported high-level language, as close to native code written in that language as possible. To that end, SWIGLAL provides a library of custom SWIG directives which modify the wrapper code to mediate between the expected behaviour of native Python/Octave code and the semantics of the Clanguage LALSuite libraries. The interface file SWIGCommon.i provides common directives used in all languages, while SWIGPython.i and SWIGOctave.i provide directives specific to the Python and Octave interfaces respectively. Each LALSuite library may also provide library-specific directives.
It is also sometimes necessary to add SWIG directives directly to the C header files, in order to further modify the wrapper code for particular functions or classes. SWIGLAL provides numerous macros, defined in SWIGCommon.i which are then added to the C header files within #ifdef SWIG . . . #endif blocks and wrapped in a common macro, SWIGLAL(); Figure 1 shows an example usage. This approach keeps SWIG-related code added to the C header files as succinct as possible. Figure 1 provides an example: the extensive code required to expose the LAL REAL4Vector class as a native scripting-language array is hidden within the ARRAY STRUCT 1D() macro. Example expansion of the % swiglal struct extend () macro for the LAL class LIGOTimeGPS. This class contains only static fields, and so SWIGLAL provides a constructor, copy constructor, and destructor for this class. The SWIG %extend directive adds methods to an existing class; methods named after the class are interpreted as constructors, while methods named after the class with the prefix "∼" are interpreted as destructors. The %swiglal new instance() macro allocates a new LIGOTimeGPS instance using XLALCalloc(); the %swiglal new copy() macro creates a copy of an existing LIGOTimeGPS instance; and the % swiglal struct call dtor () macro calls the destructor function XLALFree(). Example expansion of the % swiglal struct extend () macro for the LAL REAL4Vector class. Since this class points in dynamically-allocated memory in its "data" field ( Figure 1), only the destructor is provided, which calls the destructor function XLALDestroyREAL4Vector().
The remainder of this section describes some of the issues encountered in fulfilling the objective of the SWIGLAL interface to closely resemble native Python/Octave code, and how those issues are addressed.

Class constructors and destructors
LALSuite classes can be separated into two groups, based on their memory requirements. Classes which contain only static fields, and do not point to dynamically-allocated memory, can be straightforwardly allocated and freed with XLALMalloc()/XLALCalloc() and XLALFree() 1 . For classes which point to dynamically-allocated memory, custom constructor and destructor functions are provided; they are generally named after the class prefixed with "XLALCreate. . . " and "XLALDestroy. . . ". The SWIGLAL generate swig iface.py script determines to which group each LALSuite class belongs, by using the XML parse Illustration of memory ownership tracking in SWIGLAL: Definition of the LAL REAL4TimeSeries class. The "data" field of this class points to an instance of the REAL4Vector class. The XLALCreateREAL4TimeSeries() function allocates memory for a new REAL4TimeSeries instance, and for a new REAL4Vector instance which is pointed to by the "data" field. The XLALDestroyREAL4TimeSeries() function destroys both the REAL4TimeSeries instance and the pointed-to REAL4Vector instance.
tree to determine if a destructor "XLALDestroy. . . " exists for a particular class. The script then outputs calls to the macro % swiglal struct extend () as part of the full SWIG interface. Figures 2a and 2b show two examples of the expansion of % swiglal struct extend () for a class with only static fields (LIGOTimeGPS) and a class with dynamically-allocated memory (REAL4Vector). The provision of correct destructors is necessary to free the user from manual memory management, which high-level languages are expected to handle. The provision of constructors for classes with static fields provides methods for creating new classes from high-level languages without access to low-level memory functions like XLALMalloc().

Memory ownership paradigms
LALSuite assumes that all class instances are referred to exactly once. When a class instance is destroyed, all dynamically-allocated memory associated with the instance is freed, including any instances of other classes that are pointed to by the parent instance; put another way, the parent instance "owns" the memory of the child instances it points to. High-level languages, however, allow multiple references to be taken to a particular class instance, and memory is only freed once no references to that instance remain. Class instances are responsible for freeing their own memory, but do not "own" the memory of any instances of other classes they point to. Figures 3a and 3b illustrate how the tension between these different paradigms of memory ownership could potentially cause problems. The LAL REAL4TimeSeries class contains a pointer to an instance of the REAL4Vector class 2 , and its constructor and destructor functions create and destroy all dynamic memory associated with a REAL4TimeSeries instance, including the REAL4Vector pointer  Illustration of memory ownership tracking in SWIGLAL: Example usage in Python. The user creates a new REAL4TimeSeries instance at line 2, and assigns values to the data array pointed to by the REAL4Vector instance in lines 3 and 4. The user stores a reference to the "data" member of the REAL4TimeSeries instance in line 5, and attempts to delete the REAL4TimeSeries instance in line 6 using the Python del operator. This does not, however, trigger an immediate call to XLALDestroyREAL4TimeSeries(), since SWIGLAL knows that the user retains a reference to the REAL4Vector instance in the variable " ts data ". The data contained in the REAL4Vector instance therefore remains accessible (line 8), and XLALDestroyREAL4TimeSeries() is called only when " ts data " is destroyed (line 9). ure 3a). In Python, however, the REAL4TimeSeries and REAL4Vector instances have no parent-child relationship; the user is free to create a REAL4TimeSeries instance (Figure 3b, line 2), store a reference to the REAL4Vector instance it points to [line 6], then delete the REAL4TimeSeries instance [line 7] and assume the REAL4Vector instance will continue to be valid [line 8]. This is incompatible with the LALSuite memory ownership model, which would destroy the REAL4Vector instance along with the REAL4TimeSeries that pointed to it, corrupting the reference stored to the REAL4Vector by the user.
To resolve this tension, the SWIGLAL interface implements a system which tracks the memory ownership relationship between instances. In Figure 3b, line 6, SWIGLAL modifies the wrapper code for the "data" field of "ts" to record that the REAL4Vector instance "ts .data", assigned to " ts data ", is owned by the REAL4TimeSeries instance. This record is stored in an associative array called the parent map. The parent map also records a reference count of the number of times "ts .data" has been accessed. Then, in line 7, the Python del operator is called on "ts", which would normally immediately call the REAL4TimeSeries destructor; here SWIGLAL intervenes to check whether "ts" exists in the parent map, i.e. whether it owns the memory of another class instance. Since "ts" owns the memory of "ts .data", the destructor function XLALDestroyREAL4TimeSeries() is not called, and so the memory allocated for the REAL4Vector instance stored by " ts data " is not destroyed. Finally, in line 10, the Python del operator is called on " ts data "; here SWIGLAL checks who owns the memory of " ts data " (i.e. the original "ts" object) and whether there are any outstanding references to that memory. Since both "ts" and " ts data " have been destroyed, it is safe for SWIGLAL to call the now call the destructor function XLALDestroyREAL4TimeSeries() for the REAL4TimeSeries instance created in line 2.
The SWIGLAL memory ownership tracking system, combined with the na-tive reference counting of objects in Python and Octave, completely frees the user from any manual memory management, as is appropriate for a high-level language, while respecting the LALSuite memory management paradigm. Memory allocated by LALSuite functions is only freed once it is no longer used, and conversely is retained only as long as needed, thus minimising memory usage.

Fixed-length and dynamically-sized arrays
Gravitational-wave data analysis frequently involves operations on large timeand/or frequency-domain data series, and LALSuite provides many functions and classes to represent such data, such the REAL4Vector (Figure 1) and REAL4TimeSeries (Figure 3a) classes. Such data should be accessible from within the SWIGLAL interface as native array objects, and in an efficient manner without copying of data between the C class instance and its high-level language representation. SWIGLAL provides several typemaps for converting numerical arrays to/from native array objects; for Python, NumPy [9] arrays are used, while for Octave the native matrix type (or subclasses thereof) are used. For fixed-length C arrays, SWIGLAL supports both one-and two-dimensional arrays; typemaps are provided for both function arguments and C structure fields. Dynamicallyallocated arrays are typically implemented as specific classes in LALSuite; SWIG-LAL provides directives which are added to those classes to provide the type conversion. For the REAL4Vector class, for example (Figure 1), the ARRAY STRUCT 1D() macro modifies the wrapper code for the "data" field, so that e.g. in Python it accepts any valid sequence of floating-point numbers on assignment, and exposes the "data" field as a NumPy array [10] view which directly accesses the underlying C memory.
Some LALSuite array classes store only array data, and nothing else: REAL4Vector (Figure 1) is such a class, while REAL4TimeSeries (Figure 3a) contains additional fields. SWIGLAL provides additional typemaps for pure-array classes such as REAL4Vector so that functions can accept both class instances and native array objects as arguments. For example, the Python interface to a function which takes a REAL4Vector instance as an argument will also accept a NumPy array of the appropriate type. Figure 4 shows the output of an example Python script, listed in the Appendix, which extracts the strain data at the time of the first detected gravitational wave event GW 150914 [cf. Figure 1 of 3]. The script reads in strain data from the LIGO Hanford detector [11] at the time of the event, available from [12]; whitens and band-pass-filters the data so that the event is clearly visible; and plots the processed strain data in the vicinity of the event. The script is not intended as an example of best-practise signal processing for gravitationalwave data analysis, but as an illustration of what may be accomplished in 35 lines of Python code, by harnessing the power of LALSuite routines through the SWIGLAL interface.

Sources Constants Variables Functions (LOC) Classes
LALSuite, version 6.67 [5]. LOC Table 2 show the usage of the SWIGLAL interfaces by Python code within LALSuite itself, and by seven other gravitational-wave data analysis packages. The table gives, for each LALSuite library: the number of source files (out of the package total) which reference the SWIGLAL interface for that library (e.g. by importing the interface in Python using "import"), and the number of distinct symbols referred to by the package. The table also lists an estimate of the total lines of C code represented by the LALSuite functions referenced from each package; the estimates include any nested calls to other LALSuite functions. Python code within LALSuite is a substantial user of SWIGLAL, in terms of source files (∼ 3-30%), and lines of C code utilised (∼ 180k).
The PyCBC [20,21,22,23] and GstLAL [24,25] data analysis packages were used in the first detections of gravitational waves [3,4]. PyCBC makes use of the LAL library in over 10% of its source files, mostly for manipulating timeand frequency-domain data series. It uses 25 functions from LALFrame to read and write gravitational wave data in the standard Frame format [26] produced by gravitational-wave observatories. It uses 50 functions from LALSimulation to generate template waveforms for matched filtering of the gravitational-wave data. It uses a few functions from LALPulsar for template bank generation [27]. The total lines of LALSuite code utilised by PyCBC is ∼ 72k. While primarily written in C, GstLAL uses 16 functions from the LAL library to manipulate time-and frequency-domain data, and compute the geocentric time delay to the gravitational-wave observatories, from Python scripts. It uses 7 functions from LALSimulation to generate template waveforms in Python. The total lines of LALSuite code utilised by GstLAL from Python is ∼ 61k.
Bilby [28] aims to be a user-friendly package for Bayesian inference for use in gravitational-wave data analysis [e.g . 29]. It accesses LALSuite through the SWIGLAL interfaces in ∼ 5-10% of its source files. The LAL library is used to handle time-and-frequency domain gravitational-wave data, and LALSimulation is used to generate template waveforms for computing the Bayesian likelihood function. A total of ∼ 60k lines of LALSuite code are utilised.
GWpy [30] is a general package for easily accessing, visualising, and studying gravitational-wave data. It makes use of the LAL and LALFrame libraries, primarily for manipulating gravitational-wave data in the Frame format, in about ∼ 5% of its source files. It uses ∼ 4k lines of LALSuite code in total.
PyFstat [31], CWInPy [32], and OctApps [33] are data analysis packages focused on the search for continuous gravitational waves from rapidly rotating neutron stars; this class of gravitational wave signals has not yet been detected. PyFstat uses LAL and LALPulsar (in ∼ 20% of its source files) to compute the F-statistic [34], a standard data analysis routine for continuous gravitational wave searches. CWInPy uses a few LALSuite routines to e.g. handle frequency-domain data, convert between time standards, and access properties of the gravitational-wave observatories. OctApps uses routines, predominately from LALPulsar, to compute the F-statistic and its associated parameter space metric [35] for designing continuous gravitational wave searches. Both PyFstat and OctApps use ∼ 14-15k lines of LALSuite code, which CWInPy uses ∼ 3k.

Conclusions
LALSuite is an important, well-tested component of the gravitational-wave data analysis software stack. SWIGLAL makes innovative use of SWIG to provide automatically-generated interfaces to LALSuite for Python and Octave, with an emphasis on modelling native code behaviour in those languages. The interfaces have facilitated the development of modern gravitational-wave data analysis software written in Python, in particular PyCBC which was used in the first discovery of gravitational waves. The extensive use of the interfaces by a wide variety of Python and Octave packages for gravitational-wave data analysis demonstrates the impact and usefulness of SWIGLAL.

Conflict of Interest
We wish to confirm that there are no known conflicts of interest associated with this publication and there has been no significant financial support for this work that could have influenced its outcome. g w p s d a t f o u r i e r f = np . i n t e r p ( g w f o u r i e r f , g w p s d f ,