ExoData: A python package to handle large exoplanet catalogue data

Exoplanet science often involves using the system parameters of real exoplanets for tasks such as simulations, fitting routines, and target selection for proposals. Several exoplanet catalogues are already well established but often lack a version history and code friendly interfaces. Software that bridges the barrier between the catalogues and code enables users to improve the specific repeatability of results by facilitating the retrieval of exact system parameters used in an articles results along with unifying the equations and software used. As exoplanet science moves towards large data, gone are the days where researchers can recall the current population from memory. An interface able to query the population now becomes invaluable for target selection and population analysis. ExoData is a Python interface and exploratory analysis tool for the Open Exoplanet Catalogue. It allows the loading of exoplanet systems into Python as objects (Planet, Star, Binary etc) from which common orbital and system equations can be calculated and measured parameters retrieved. This allows researchers to use tested code of the common equations they require (with units) and provides a large science input catalogue of planets for easy plotting and use in research. Advanced querying of targets are possible using the database and Python programming language. ExoData is also able to parse spectral types and fill in missing parameters according to programmable specifications and equations. Examples of use cases are integration of equations into data reduction pipelines, selecting planets for observing proposals and as an input catalogue to large scale simulation and analysis of planets.


Introduction
The field of exoplanets is rapidly expanding with current population now in the thousands and new measurements published increasingly frequently. A catalogue of Exoplanets is necessary to keep track of these systems and their parameters, ideally being open and editable by all with a version history to enable researchers to reproduce results using the exact same values. This is especially important in large scale simulations and other work on multiple targets where a catalogue version may be more easily given over individual values and sources.
A second obstacle lies in the interface between the catalogue and the code. Such an interface can add value in its ability to calculate values using published equations, easily generate plots, estimate parameters, while keeping the catalogue up to date. It can also take into account all the fringe cases which can trip up a standard loading code (such as certain missing values and planets without a host star). The ability to replicate the catalogue is further enhanced through quoting both a catalogue and interface version. The exact parameters used for every target in the catalogue can then be easily obtained including any calculated parameters and estimations by ExoData. This tool then allows instant access to the exact parameters used in a paper.

Open Exoplanet Catalogue
The Open Exoplanet Catalogue (OEC, Rein, 2012) is an open-source, version controlled catalogue of exoplanets and their system parameters. It is hierarchical, preserving the format of the system (including binary star layouts). OEC uses git for version control and the catalogue exists as a series of XML files (one per system). This format makes OEC much more diverse than most other exoplanet catalogues by letting users create their own 'forks' of the catalogue where they can make their own changes and adaptations whilst still able to receive updates from the original. The advantages of version control include being able to see the exact changes made to each version, easily roll back the catalogue to any previous state and report the exact version of the catalogue your using to other researchers (using the commit SHA-1 2 ).
The open nature of the catalogue means that anyone can download the full database and history of the catalogue, contribute their own changes and set up their own branched versions.

Exoplanet Background Theory
An exoplanet is a planet that orbits a star other than the Sun. We can describe a simple exoplanet system with a planet orbiting a star with a stellar radius of R stellar mass of M planetary radius R p , planetary mass M p and semi-major axis of the orbit a.

Describing the star
The stellar luminosity L is given by the Stefan-Boltzmann equation applied to the surface area of a sphere, L = 4πR 2 σT 4 (1) where σ is the Stefan-Boltzmann constant. Stellar temperature can be estimated using the stellar mass-radius relation described in Cox (2000): where k and x are constants for each stellar sequence. We can also describe stellar temperature using the main sequence relationship The stars luminosity distance d (with negligible extinction) is given by rearranging the absolute magnitude relation m − M = 5 log 10 d − 5 where m is the apparent magnitude and M is the absolute magnitude. Whilst m is easily measured and commonly known M is often undefined. We can estimate M for a star using an absolute magnitude lookup table 3 based on spectral type. In cases where we do not have a measured stellar magnitude in the band required but do have a measured value for another band we can use the conversion factors given in table A5 of Kenyon and Hartmann (1995) to convert between magnitudes based on the stellar spectral type.

Describing the planet
We can infer the period of the orbit P using Kepler's third law 2 In git versions each commit (snapshot) of the code has a SHA-1 hash of the source which is functionally unique and is used to reference that version of the code.
3 http://xoomer.virgilio.it/hrtrace/Sk.htm_SK3 from Schmidt-Kaler (1982) 3 where G is the gravitational constant. We can rewrite this in terms of the semi-major axis given the stellar mass and period. Note that the M p term is often excluded as M M p .
The semi-major axis can additionally be inferred from the temperature of the planet T p and host star T given an albedo for the planet A p and a greenhouse constant (Tessenyi et al., 2012): The mean planetary effective temperature can be expressed by evaluating the radiation in and out of the planet with a greenhouse effect contribution using a rearranged version of Eqn. 7 From Newton's law of Gravitation we can calculate the surface gravity of the planet This can be expressed as log g, the base 10 log of g in CGS units. If the mass of a planet is unknown, according to mass-radius relationships of known exoplanets (e.g. Grasset et al. (2009)) the mass can be crudely estimated for a given planetary radius by assuming the density of the planetary class it is likely to be in. For this purpose we infer super-Earths as Earth density, and Jupiter and Neptune like planets with their respective densities. Assumed values are described later in Table 2 but are easily programmable (see assumptions module section 4.4).
The scale height H is the increase in altitude for which the atmospheric pressure decreases by a factor of e.
where µ is the mean molecular weight of the planetary atmosphere and g is the planetary surface gravity.

Transiting Exoplanets
An exoplanet transits if it can be observed passing in front of its host star. During a transit the planet occults light from the star with the change in flux at mid transit known as the transit depth. The transit depth can be estimated as the square of the ratio of the planetary radius to the stellar radius (however, it is possible for a planet to graze the limb creating a partial transit).
The impact factor b is the projected distance between the planet and star centres during mid transit and it is described as (Seager and Mallen-Ornelas, 2003 If the orbit is circular, we can calculate the duration of the transit T 14 from first to last contact using Eqn. 3 from (Seager and Mallen-Ornelas, 2003): where k is R p R s . If the planet has an eccentric orbit where the exact solution is more complex we adopt the approximation described by Kipping (2011): where a R = (a/R ) and c is the planet-star separation at the moment of mid-transit in units of stellar radii where ω is the argument of pericentre and b P,T is the adjusted impact parameter described by b P,T = (a P /R ) P,T cos i.

Installation
ExoData is a Python package tested on Python versions 2.7, 3.4 and 3.5. The easiest way to get the package is through the Python packaging tool pip,

pip install exodata
Alternatively the source code can be downloaded from https://github.com/ryanvarley/ exodata. The package can then be installed by the following from the command line in the package directory.

python setup.py install
The package has a test suite which can be used to verify the package is working as expected. To run the tests use python setup.py test You can choose to use the package without downloading your own copy of the catalogue, instead choosing to automatically fetch the latest version each time. If you want to use your own version you will need to download a version of the Open Exoplanet Catalogue (OEC) 4 to your machine. The easiest way to do this and keep the catalogue up to date is through git. Move to the folder you want to store the catalogue in and then git clone https://github.com/OpenExoplanetCatalogue/open_exoplanet_catalogue.git You can then 'import exodata' in Python and setup the catalogue. import exodata databaseLocation = "/open-exoplanet-catalogue/systems/" exocat = exodata.OECDatabase (databaseLocation) where databaseLocation should be the full path to the systems folder in the OEC directory or any other folder that contains OEC style XML files you want to use.
To update the catalogue move to the folder where you downloaded the catalogue and type git pull origin master When using ExoData in publications you should give the commit SHA-1 of the OEC version used and the ExoData version number. This document was created with the Open Exoplanet Catalogue (dc8c08a4ba0c64dd039e96c801d12f17c82a7ff3, 1st May 2016) using ExoData Version 2.1.5.

Dependencies
The following dependencies are required to run ExoData. They are installed automatically by setuptools if required when following the above installation procedure.

Usage
ExoData is split into a series of modules dealing with the exoplanet database, equations, plots and units (see Table 1 for a list with descriptions).
The code contains 5 main objects types, the database which holds all objects and provides functionality (such as searching) and the astronomical objects (Systems, Binaries, Stars and Planets). Moons and other types can be easily added when needed.
Like the Open Exoplanet Catalogue (OEC), the full structure of a system is preserved. Planets are children of the star (or binary) they orbit, stars are grouped in binaries where present and binaries can exist within a binary. This offers a significant structural advantage over most linear formats.
Note that the examples that follow in this paper display the raw output from the console which means most numbers are displayed showing 11 decimal places which include any floating point errors and so should not be taken as the true uncertainty in the measurement.

Descrption Assumptions
Holds classification assumptions such as at what mass or radius a planet is defined as a super-Earth. Astroclasses Classes for the System, Binary, Star and Planet object types. Astroquantities Expands the Product Quantities Python package with astronomical units like Solar Radius and compound units such as g/cm 3 . Database Holds the database class and the various search methods. Equations Implementation of exoplanet related equations including orbital equations, planet and star characterisations and estimations. Example Generate example systems for testing code.

Flags
Each object has a flag object attached which lets you know which assumptions have been made such as 'calculated temperature'. Plots Plot functions for common plot types that can be used to to easily display data from the catalogue.

As a catalogue interface
The database is initialised by providing the OECDatabase object with the catalogue location on your machine. import exodata # Either using your own catalogue version databaseLocation = "/open_exoplanet_catalogue/systems/" exocat = exodata.OECDatabase(databaseLocation) # OR downloading the latest version each time exocat = exodata.load_db_from_url() The exocat variable now contains the initialised database which can be accessed in several ways. There are lists for each type of object (i.e. exocat.planets, exocat.stars etc), a dictionary for each object (i.e. exocat.planetDict) and a list of transiting planets (exocat.transitingPlanets).
You can retrieve a particular planet either by searching or giving the exact name.
We can then traverse the hierarchy for the planet.

Retrieving Values
With an object selected we can query information from the database. In addition to retrieving values straight from the database we can also directly call some equations from the exodata.equations module with the variables pre-filled (see Appendix B for a full list).

Querying the Database
Querying can be performed on object list in the form of Python for loops or list comprehensions. ExoData will return numpy.nan if a value is absent to avoid breaking loops and comparisons with exodata.MissingValue exceptions. MissingValue Exceptions will still be recorded in the logfile and so can be examined if needed. An example of such a query is returning the planets discovered using the radial velocity method. You can use this same method to get a list of every planets radius. Note that some values will return as numpy.nan, we can filter these in the initial query so a normal numpy.mean will work.
For complicated queries a Python for loop may be clearer. Here we show all planets with a radius between 1.5 and 1.7 Jupiter radii that were discovered by the transit method and have an orbital eccentricity greater than 0.1.

Units
In exodata, all equations use and require units. This means values can easily be rescaled and all equations are dimensionally checked. Units are provided through the product quantities package 5 which provides all the standard units and constants provided by the National Institute of Standards and Technology (NIST). Some astronomical units (see table C.9 for the list of values and sources) are added through the internal exodata.astroquantities module which imports all Product Quantities units for ease of use. We can then import the units from both modules using: >>> import exodata.astroquantites as aq Product Quantities adds units to numbers as a special numpy array type. A unit is added to a value by multiplying the values by the unit i.e. >>> planetRadius = 10 * aq.R_j Product Quantities provides several methods for dealing with values with a unit. Rescaling can be done using the .rescale method e.g. to rescale solar mass to units of Earth's mass.
>>> print (1.99e+30 * aq.kg).rescale(aq.M_e) array(333211.10011570295) * M_e Units can be converted to their unit-less counterparts (i.e for passing to code without unit support) by wrapping them in a float() or np.array() argument. You should always remove units with a rescale to ensure the value is in the unit you expect.

Equations
The equations module implements the equations described in section 2 as Python classes. The classes take input of every variable bar one and will output the variable left out. They therefore implement all permutations of the equation with respect to the variables. As

Assumptions
The assumptions module handles how non universally defined parameters are set along with how some missing values may be filled. This include how planets are categorised (i.e. super-Earth) and allows researchers to use their own definitions for these planet classes along with adding new classes.
By default ExoData sets these to commonly assumed values.  molecular weight and density where required. For the mass limits we use the classification boundaries adopted by (Tinetti et al., 2013).
Assumptions are implemented to make it easy to change these limits and also add more by editing the assumptions dictionary located at exodata.assumptions.planetAssumptions.
This dictionary has the keys 'masstype', 'radiusType' and 'tempType' which define how a planet is classified. Each key is a list of rules defining a separate type of planet in the format (upperlimit, name). For example we have defined the following for 'massType' [(10 * aq.M_e, "Super-Earth"), (20 * aq.M_e, "Neptune"), (float("inf"), "Jupiter")] Setting the last rule to infinity defines the last rule having no upper limit. The first rule is assumed to have no lower limit. The rules should be listed in descending order of magnitude and can be added to, modified or removed as necessary.
The dictionary also has keys defining certain value assumptions based upon the classifications set previously. These are 'mu', 'albedo' and 'density'. These are fed into the planet classes and can be used in calculations if needed. Instead of containing a list they contain a dictionary which takes a classification 'name' defined previously and provides a value for that class. Both the 'mu' and 'density' keys are defined by mass and radius types, for example: {"Super-Earth": 18 * aq.atomic_mass_unit, "Neptune" : 2.3 * aq.atomic_mass_unit, "Jupiter" : 2 * aq.atomic_mass_unit} Again these values can be changed and modified as needed and new rules added for any new classes added to the previous set of rules.

Flags
When a catalogue value is missing it is calculated or estimated using an appropriate equation in exodata.equations if possible. While this is in many cases desirable some assumptions are more accurate than others and it is useful to know when a parameter has been filled. To this end we raise a flag in an object if a certain actions have occurred. The list of flags and the functions that raise them is shown in Table 3.
For example when we ask for the semi-major axis of a planet (planet.a) and a value is not recorded in the catalogue the interface will attempt to calculate it using Kepler's Third Law. If it is successful the value will be returned and the flag 'Calculated SMA' will be raised. A list of Table 3: List of flags and the functions that raise them when a parameter is missing and is calculated or estimated instead.

Flag
Calculated using Estimated Mass estimateDistance() Calculated SMA calcSMA() Fake Any planet that is in the 'Fake Planets' list in the xml files Estimated Distance estimateDistance() Estimated magV magV (V magnitude) Calculated Period calcPeriod() Calculated Temperature calcTemperature() flags that have been raised in an object can be seen by looking at the flags object of planet or star class (i.e planet.flags).
If instead you want only raw catalogue values with no attempt at filling in missing values you can turn off this behaviour using by setting params.estimateMissingValues=False.

Plotting with ExoData
ExoData includes some graphing functions for easily displaying catalogue data. The main types of plots available are parameter against parameter and the number of planets per parameter bin. The general format is setting up the plot class with a list of objects to plot and then calling a plot function of the type you want with any other visual arguments. The plot style can be changed using standard matplotlib styles. In our examples we used the 'whitegrid' style by the Seaborn module 6 . In order for plots to be visible in the command line you may need to import matplotlib 7 and use the plt.show() after calling a plot function.

Number of planets by discovery method
This plots the number of planets discovered by each type of discovery method per year as a stacked bar chart. The format is exodata.plots.DiscoveryMethodByYear(planet_list, methods_to_plot ).plot(method_labels) where planet list is a list of planet objects to plot, methods to plot is the discovery methods to include (in OEC syntax) and method labels are the labels to use in the legend for each discovery method. We can create a plot of the radial velocity and transit discovery methods with the following (shown in Fig. 1).

Number of planets per parameter bin
These plots take input of the object list, the parameter to plot and the bin limits to produce a histogram-like plot or pie chart. The format is exodata.plots. DataPerParameterBin( list_of_objects, parameter, bin_limit ).plotBarChart(*plot_arguments) Less than / greater than limits can be made by setting the first or last limit to float('-inf') or float('inf') i.e. the following generates Fig. 3A.

Conclusion
ExoData provides a rounded tool for researchers wishing to use parameters of an exoplanet or exoplanetary system in code. Uses of the module include querying the exoplanet database for targets for observing proposals, running planets through a simulator and easy access to the catalogue for population statistics.
By using a version controlled catalogue like the Open Exoplanet Catalogue and the ExoData interface researchers can more easily verify published results by easily obtaining the exact same planet and system parameters used by quoting both the catalogue and interface version. This is different from most catalogues which only show the latest version of which many unknown values may have changed. The interface builds upon the catalogue by making it easier to access parameters, calculate values and query the database.

Acknowledgements
This work is supported by a UCL IMPACT studentship and STFC grant code ST/P000282/1. Appendix B. List of methods provided per object in the astroclasses module

Appendix A. List of equations
The following tables (Table B.5 to B.8) list the methods (functions) and attributes (variables) accessible from each object. Note that objects are nested and so while the planet object has a 'ra' method this is actually passed to the system object.