Nauyaca: a new tool to determine planetary masses and orbital elements through transit timing analysis

Transit Timing Variations (TTVs) is currently the most successful method to determine dynamical masses and orbital elements for Earth-sized transiting planets. Precise mass determination is fundamental to restrict planetary densities and thus infer planetary compositions. In this work, we present Nauyaca, a Python package dedicated to find planetary masses and orbital elements through the fitting of observed mid-transit times from a N-body approach. The fitting strategy consists in performing a sequence of minimization algorithms (optimizers) that are used to identify high probability regions in the parameter space. These results from optimizers are used for initialization of a Markov chain Monte Carlo (MCMC) method, using an adaptive Parallel-Tempering algorithm. A set of runs are performed in order to obtain posterior distributions of planetary masses and orbital elements. In order to test the tool, we created a mock catalog of synthetic planetary systems with different number of planets where all of them transit. We calculate their mid-transit times to give them as an input to Nauyaca, testing statistically its efficiency in recovering the planetary parameters from the catalog. For the recovered planets, we find typical dispersions around the real values of $\sim$1-14 M$_{\oplus}$ for masses, between 10-110 seconds for periods and between $\sim$0.01-0.03 for eccentricities. We also investigate the effects of the signal-to-noise and number of transits in the correct determination of the planetary parameters. Finally, we suggest choices of the parameters that govern the tool, for the usage with real planets, according to the complexity of the problem and computational facilities.


INTRODUCTION
Transit timing variations (TTVs; Holman & Murray 2005;Agol et al. 2005) is to date the most successful method to measure precise masses of Earth-sized transiting planets harbored in multiplanet systems that could not be identified by other means, for instance, radial velocities (Steffen 2016;Mills & Mazeh 2017). TTVs contain valuable information that allows us to determine orbital properties which are useful to predict transit ephemeris, orbital stability and dynamical evolution (for a review of TTVs see Agol & Fabrycky 2018, and references therein).
Two main approaches have been largely developed to invert TTVs based on analytical and numerical approximations. Analytical approaches take advantage of the low computational cost at the expense of the limitation to specific scenarios such as planets near mean motion resonances, low eccentric orbits or specific two-planet systems (Lithwick et al. 2012;Linial et al. 2018;Nesvorný & Morbidelli 2008;Nesvorný 2009;Nesvorný & Beaugé 2010;Agol & Deck 2016). Numerical models including N-body integrations seem to be an unavoidable route to study more diverse scenarios than those considered by the analytical techniques arXiv:2110.03830v1 [astro-ph.EP] 7 Oct 2021 but also to complement and double-check results from these analytical methods.
The inversion problem requires methods to fully explore the parameter space of the planets independently of the type of models mentioned above. Many works have used combinations of techniques and models to deal with the TTVs inversion problem. For example, minimization routines and analytical models (Nesvorný & Beaugé 2010); genetic algorithm and numerical N-body models (Carpintero & Melita 2018); a coupled simulated annealing algorithm plus Nbody (Meschiari & Laughlin 2010); Markov chain Monte Carlo (MCMC) methods and analytic models (Tuchow et al. 2019); MCMC and N-body (Becker et al. 2015;Jontof-Hutter et al. 2016;Agol et al. 2020); minimization plus MCMC using N-body simulations (Masuda 2014); multimodal nested sampling algorithm combined with either Nbody (e.g., Nesvorný et al. 2013;Saad-Olivera et al. 2017, or with both, N-body and analytic models (e.g., Masuda 2017; Yoffe et al. 2021); a combination of MCMC plus analytical and numerical models (Lithwick et al. 2012;Hadden & Lithwick 2016, and MCMC and minimization plus analytical tools (Gajdoš & Parimucha 2019). Despite of the numerous works that employ computational tools, there is a scarcity of available tools in a ready-to-use package that allows to deal with the TTVs inversion problem in an intuitive, easy and confident way.
In this work, we introduce Nauyaca 1 , an easy-to-use Python tool dedicated to deal with the TTVs inversion problem from the N-body approach. The numerical tool, even being computational expensive compared to analytical approximations, is more adaptable to address general situations as, for example, the number of planets, prograde or retrograde orbits, planets out of resonances, with eccentric and noncoplanar orbits. The only required data are transit ephemeris per planet and the stellar mass and radius. Additionally, any prior knowledge about any planetary parameter can be supplied in order to better constrain the parameter space.
This paper is structured as follows: in Section 2, we describe the main features of the tool and the modules functionality. Section 3 describes the creation of a mock catalog of synthetic planetary systems and mid-transit times to test the tool. In section 4 we discuss the election of the parameter space and the parameters for the minimization and MCMC algorithms. In section 5, we apply Nauyaca to the whole simulated catalog and discuss the consistency between the input planetary parameters and those found by the tool. Section 6 is dedicated to briefly highlight caveats of our tool, and finally in section 7 we summarize our findings and we make 1 Bothrops asper, a kind of pit viper from Central America (Jolliff 2005) suggestions about the procedure and parameters to deal with real data.

METHODS
We implemented a Python package, named Nauyaca, focused in the determination of planet masses and orbital elements through mid-transit times fitting for planets around single parent stars. Our tool is equipped with minimization routines and a Markov chain Monte Carlo method exclusively adapted to fit transit times series from an N-body approach. Nauyaca manages the exploration of the parameter space with the main goal of finding solutions of planetary parameters that produce mid-transit times consistent with observations, for each planet. The tool can work in a parallelized scheme, so multi-core machines are preferred for best performing.
We incorporated TTVFast 2 (Deck et al. 2014) an optimized fast code (5-20 times faster than standard methods) to make transit timing models. In short, TTVFast receives a set of initial conditions for the planets (mass and orbits) and the stellar mass, and perform an N-body simulation. At the same time, an incorpored Keplerian interpolator calculates mid-transit times by interpolating the orbits when a planet is detected crossing the star. Even though N-body simulations could be time consuming and computationally expensive, we decided to not implement analytical or semi-analytical model approximations because many of these models are just valid for planets in low order eccentricities, near first-order orbital resonances or valid for a fixed number of planets. We opted for a general purpose approach that could work with more diverse planetary configurations and thus, we decided to use TTVFast.
We define Θ j ≡ {mass, P, ecc, inc, ω, M, Ω} j as a set of planet parameters for the j-th planet, where the parameters are, respectively: the mass, period, eccentricity, inclination, argument of periastron, mean anomaly and longitude of ascending node. The orbital elements in Θ j correspond to the instantaneous elements at a specific reference time t 0 which is specified by the user or automatically selected by Nauyaca. Orbital elements are defined in a fixed astrocentric coordinate system with the X-Y plane spanning the plane of the sky and the observer located orthogonally at +Z.
Given a constant stellar mass M * and radius R * , the algorithms incorporated in Nauyaca make proposals to conform an initial condition (Θ) for all the planets in order to run TTVFast, which results in a model of transit times per planet. Then, to perform the TTVs fitting, we assume that transit errors are independent, following a normal distribution, and thus we set the log-likelihood function in the form, where, with j denoting numbered planets and i denoting their respective transit epochs over the total number of transits N tran .
Here t corresponds to the observed transit time for a specified epoch, t sim is the simulated transit time calculated by the model and σ corresponds to the uncertainty in the central time. We take errors σ j (i) as the mean of the upper and lower errors of the i-th transit time. The total log-likelihood is calculated by adding the individual log-likelihoods log L j of the planets. The fitting procedure is performed over the available transit times, but it is possible to include planets in the system without transit times that interact with the transiting planets. The explored space also includes those of the non-transiting (or undetected) planets, and thus we can get information about their planetary parameters. However, considering planets without transit data is left for a future work.

Modules
Nauyaca incorporates several modules with techniques adapted specifically for the TTVs inversion problem. We describe the functionalities of these modules below.

Setup
Here we describe the preparation of the data before running the simulations. In the context of object-oriented programming, we define a Planetary System object by specifying the stellar mass and radius and the harbored planets. Transit times fitting will be operated over Planetary Systems. Then, we define the planets by establishing the information about their transit times ephemeris and the allowed parameter space. The transit ephemeris per planet should include the epoch number, time of transit and lower and upper timing uncertainties. In the simulations, the transit epochs (integer transit numbers) of all the planets are counted starting from 0 after t 0 . Thus, the user must be aware that the epoch numbers are properly labeled and referenced to the same reference time t 0 .
The time span of the simulations is automatically chosen to encompass the full time of observations in the ephemeris data. If no time step is defined for the simulations, it is automatically set to be 30 steps per orbit of the internal planet (∼3.33% of the internal planet period). With this time step 10 4 10 3 10 2 10 1 10 0 10 1 10 2  Deck et al. (2014) demonstrated the effectiveness in determining transit times with an accuracy within 10 seconds for a ∼22.3 days period planet. A time step < 5% of the internal planet period is recommendable to reach that accuracy.
Regarding the parameter space, the tool requires specific boundaries for each parameter in order to perform an effective sampling, avoiding non-physical regions or redundant information. Each planet in the Planetary System must define its own parameter space. If there is not constraint for any or none of the parameters in Θ j , a set of default boundaries are established. By default, masses are restricted to be in the range from 1 Lunar mass to 80 Jupiter masses. However, when information of the stellar mass is supplied, the upper planetary mass limit is recalculated to be at most 1% of the stellar mass. This is done to keep valid the N-body Hamiltonian internally solved by TTVFast. In Figure 1, we show the currently measured planet masses M p or M p sini from Exoplanet Archive 3 . It is shown the planet mass limit corresponding to 1% of the stellar mass. We find that ∼95% of the currently known measured planet masses have mass ratios with the parent star lower than 1%, and thus the selected cutoff of 1% is statistically valid for most of the currently known planets. The default period space is limited to be between 0.1 and 1000 days. The eccentricity is limited between 1e-6 and 0.9, where the lower limit is different from exactly 0 to avoid undefined orbital angles. Inclination is defined between 0 • and 180 • . In the case of periodic bound-

Differential Evolution
Powell Nelder-Mead aries for the argument of periastron, ω, and mean anomaly, M, fixing boundaries in the range 0 • to 360 • could lead to an improper sampling process when the solution is near to these borders. This problem is solved internally by parameterizing the angles by means of ω + M and ω -M. The boundaries in the parameterized space encompass 720 • . At the end of the sampling process, the results in the parameterized space are mapped to be between 0 • and 360 • , for the individual ω and M. It is an internal process so the user only defines these angles between 0 • and 360 • . Ascending node also must be defined between 0 • and 360 • . These default ranges can be modified by the user to better constrain any parameter or to keep it as fixed. In the case of assuming constant parameters, these are not part of the sampling process.
After setting the parameter space, Nauyaca normalize the boundaries of all the planets to dimensionless boundaries between 0 and 1. The whole sampling process (both for optimization and the MCMC) is performed with the normalized boundaries and returned to physical values at the end of the runs. This normalization is done to remove the differences in orders of magnitude between parameters, which enhances the sampling performance.

Optimization module
It combines minimization algorithms ordered sequentially to reach solutions Θ j that best explain the transit times. These results can be used as initial guesses for the MCMC. We tested many algorithms and their calling order and found that the sequence Differential Evolution (DE; Storn & Price 1997), Powell (PW;Powell 1964) and Nealder-Mead (NM; Gao & Han 2012) progressively minimize the total χ 2 (eq. 2). This sequence is not arbitrary but reflects the nature of exploration of each algorithm. First, DE is capable of exploring a large parameter space without the necessity of requiring an initial guess. The algorithm is fully stochastic and neither information about the smoothness of the space is required. The only mandatory information needed is the parameter space to explore, which have been set in the Setup module. A drawback of this algorithm is the slow convergence rate, and thus the outputs of this method could correspond to non-converged solutions. Even so, these solutions are better than any random proposal in the parameter space. Therefore, these are used as an initial guess to run PW, which is suitable to perform a minimum searching, assuming that the space around the starting point is continuous although complex. Finally, NM takes the solution previously found by PW and performs a downhill simplex method assuming that locally the parameter space is smooth and unimodal.
We show in Figure 2 an example of the progressive χ 2 minimization for the transit times fitting of a two-planet system with 190 transit times in total. For this example we performed 320 realizations. Background lines connect solutions of the same run, where a gradual descending of the χ 2 throughout the sequence is observed. The box plots show the intervals of the χ 2 achieved with the different algorithms, where typically the χ 2 is reduced around 4-5 orders of magnitude between the first and last methods.
Performing multiple realizations of this process enhances the chance of finding a global minimum but also provides us with a global view of the most probable regions in the parameter space (this will be addressed in detail in subsection 4.3). It is a fast way of finding a starting point region with valuable information that can help to delimit the searching radius where the MCMC routine will explore.

MCMC module
We adapted a Markov chain Monte Carlo method to explore the parameter space. We chose the Parallel-Tempering sampler, ptemcee (Swendsen & Wang 1986;Earl & Deem 2005;Vousden et al. 2015;Foreman-Mackey et al. 2013) which is a well-suited sampler for exploring multi-modal and a highly dimensional parameter space, such as in the case of planetary systems, whose parameter space increases as ∼7 times the number of planets (one for mass and six orbital elements). The main idea behind the technique is using armies of walkers belonging to a "temperature ladder" that explore the parameter space in different detail. The posterior distribution π is modified according to the temperature T , given by π T ∝ L(Θ) 1/T p(Θ), being L and p the likelihood and the prior distributions, respectively. Walkers belonging to hotter tem-peratures sample more efficiently from the prior and those in colder temperatures better sample regions with high probability. Walkers in different temperatures have a probability of swapping positions in the parameter space that depends on their current positions and temperatures, such that those in colder temperatures can also explore from the prior and vice-versa. This technique avoids the walkers getting stuck in local solutions and explore more efficiently the whole parameter space where other standard samplers could fail (see Vousden et al. (2015) for more details about this technique).

MOCK CATALOG
We create a catalog of synthetic planetary systems, defining stellar masses and radius, and planetary masses and orbital elements. We calculated transit times for these planets and apply Nauyaca to this synthetic catalog in order to test the efficiency in recovering the planetary parameters (catalog entries) that gave rise to the synthetic transit times per planet. Throughout the text, we will refer as trues to the parameter values reported in our mock catalog.

Catalog creation
In order to set up planetary systems as realistic as possible, we selected masses and radii for the stellar hosts; masses, radii and orbital periods for planets from the available data of confirmed planetary systems from the Exoplanet Archive 4 . This was done to create synthetic planetary systems with stellar and planetary parameters based on observations. The catalog includes different planetary multiplicities (number of planets in the system), ranging from two up to five planets. Below we describe the procedure followed to create the catalog.
First, we selected systems with planets discovered by the transit method. Second, we selected systems with reported stellar masses and radii. For those with unavailable data but with reported effective temperature, surface gravity and metallicity, we derived the stellar mass or radius from the work of Torres et al. (2010). Third, we selected those systems where all of their planets have reported masses, radii and periods (systems where planets have these parameters unreported were discarded). This procedure reduces significantly the number of planets, dropping until 2.5% (105 planets) of the original observed planets. In order to increase the number of synthetic planetary systems we applied an oversampling technique (SMOTE; Chawla et al. 2011)  We restricted the intervals of the inclination inc and ascending node Ω to get near coplanar and prograde orbits. These restrictions in the construction of the catalog are independent of the tool test. In practice, we can expand the boundaries of any parameter to be explored, as long as they have a physical meaning (see section 4.1).
Once the complete set of planetary parameters are established, an N-body integration is performed during 10 6 orbits of the internal planet using REBOUND (Rein & Liu 2012;Rein & Tamayo 2015). We used the chaos indicator MEGNO (Cincotta et al. 2003;Maffione et al. 2011;Rein & Tamayo 2015) to test the dynamical stability of the proposed planetary system. A MEGNO value around ∼ 2 indicates quasiperiodic motion (regular stable orbits). If the set of parameters of the proposed planetary system turns out to be stable (without planetary ejection or with 1.7 < MEGNO < 2.3) then we append the final state of the N-body simulation as an entry in the catalog. It should be pointed out that the orbital elements of these entries correspond to the osculating elements at the end of the stable N-body runs and they are not the same that the random values taken from the intervals of parameters indicated above. Finally, we use these entries as initial conditions to run TTVs simulations encompassing 130 transits of the internal planet using TTVFast. We selected that number to mimic the number of transits for planets with periods < 10 days, during ∼ 3 years of observation. We use a fixed number of transits rather than a fixed time span since it is more relevant for the inversion problem (Nesvorný & Morbidelli 2008). As a result we get 130 transit times ephemeris for all the internal planets, and a few less for the remaining planets according to their periods and orbital configurations.
We assign synthetic errors to these transit times according to the analytical approximation of the timing precision (Agol & Fabrycky 2018), where τ is the approximate duration of the transit ingress τ ≈ 2.2min R p /R ⊕ M * /M −1/3 P/10d 1/3 which is a function of planetary radius R p , orbital period P and stellar mass M * , δ = (R p /R * ) 2 is the depth of the transit, anḋ N is related with the Poisson noise due to the count rate of the star. We assumedṄ to be constant and equal to 1×10 7 e − /min to mimic the value in the Kepler CCDs for a star of magnitude ∼12 in the Kepler band (Gilliland et al. 2011). Uncertainties from Eq. 3 take planetary parameters derived from the Kepler photometry, and therefore, our study is centered in the characterization of Kepler-like transits. Finally, we added white noise to the transit time models assuming a normal distribution with the mean equal to the true transit . Histograms of the stellar and planetary parameters that compose the full mock catalog. Different colored lines represent the properties grouped by planet multiplicity for systems with two (solid blue), three (dashed orange), four (dotted cyan) and five (dash-dotted red) planets. Remaining orbital elements (not shown here) are taken from uniform distributions. Appendix A contains the data tables of these histograms.
times and with standard deviation equal to the typical uncertainties given by Eq. 3. The full mock catalogue is composed by twelve two-planet systems, eight three-planet systems, eight four-planet systems and four five-planet systems, giving a total of 32 systems harboring 100 planets in total. We remark that all the planets in the catalog transit, and thus our current study does not include the characterization of non-transiting planets. Tables in Appendix A include these parameters of the simulated mock catalog. Figure 3 shows the distributions of the parameters in the catalog according to the planetary multiplicity.
The mock catalog of planetary parameters produces a variety of TTVs with different properties, namely, amplitudes, periodicity and time span. TTVs signals in our catalog have amplitudes consistent with almost zero minutes, to amplitudes reaching 180 minutes, and with an overall mean of ∼18 minutes. Mean TTVs amplitudes grouped by multiplicity reach 21, 21, 10 and 23 minutes for two, three, four and five-planet systems, respectively. It translates into a wide range of signal-to-noise ratios, ranging from ∼1 up to ∼400 with a mean of ∼23 (using the definition of the ratio of the TTVs amplitude and the timing uncertainty; Nesvorný 2019). There is also a variety in observation time span, ranging from ∼230 to ∼5800 days with a mean of 1200 days.

SETTINGS
In this section we outline the election of the parameters to run the tool, including the restrictions of the parameter space, the election of fine-tuning parameters for the MCMC and the usage of the optimizer results to determine a reasonable starting point for the sampling process.

Planetary boundaries
As described in section 2.1.1, Nauyaca requires specific boundaries for each planetary parameter in Θ. These boundaries delimit the parameter space that will be explored. Here, we will discuss how to delimit the parameter space (including the establishment of constant parameters) before carrying out the recovery test over the mock catalog.
From fitting real transit observations, is possible to determine the planetary radius ratio, the orbital period from consecutive transits and the impact parameter. In the case of the last two, we measure from current data of observed exoplanets (from Exoplanet Archive) that orbital periods are determined, on average, with uncertainties ∼ 10 −4 days. On the other hand, orbital inclinations are on average, better determined than 1 • (also according to Exoplanet Archive). Considering nearly coplanar orbits, imply that line of nodes are nearly aligned and mutual inclinations are close to 0 • , which can help to reduce the dimensionality of the problem. Ob- . Comparison between three strategies to initialize walkers for the MCMC with the information from the optimizers. This example considers the walkers initialization over the two-dimensional space of the masses, for a system with two planets (ID=pl2_id52; Table 4). The first column of panels shows the solutions found by the optimizers. The position of the true masses of these planets is shown by the cyan star. Color bar shows the logarithm of the χ 2 for these solutions. Remaining columns show the walkers initialization using different strategies (see text for a full description  70,110] deg. Here, the lower and upper boundaries for masses correspond to approximately a Moon mass and 10 times the true planetary masses, respectively. The boundaries in period P are around the true period of the planet with a width δt corresponding to a typi-cal observed period error given the orbital period itself. We estimated δt by doing a linear regression between observed period errors as a function of the orbital period (data from Exoplanet Archive), such that δt = mP + b [days], where we determined m = 2.11×10 −5 and b = 4.4019×10 −5 . The lower limit in eccentricity (ecc) was established small but different from 0 in order to avoid undefined argument of periastron, whilst the upper limit was allowed to get values up to 0.3. This chosen range of eccentricities is compatible with the 80% of the currently observed planet eccentricities. Argument of periastron (ω) and mean anomaly (M) take the usual definition between 0 o and 360 o . The boundaries of the ascending nodes (Ω) for all the planets, except the internals, were restricted to a search radius around ≈ Ω 1,True ± 20 deg, and thus considering only prograde orbits for simplicity.

Parameters for the MCMC
We inspected the dependence of the parameters that govern the MCMC. The chosen parallel-tempering MCMC method (Vousden et al. 2015) is fine-tuned by two main parameters, namely, number of temperatures (N temps ) and maximum temperature (T max ) to build the temperature ladder. The sampling performance also depends on the number of walkers per temperature (N w ) and finally on the number of iterations (N iter ) per chain. We carried out many tests with combinations of these parameters and find that a N temps 15 with N w 150 in a run over N iter ∼ 5 × 10 5 are enough in most cases to recover the true parameters Θ j,True from the mock catalog within 1σ. We also note that a T max ∼ 10 2 − 10 3 is a good election of the maximum temperature. A T max = ∞ as suggested in Vousden et al. (2015) is not adequate for our purposes since walkers belonging to this temperature would propose steps outside our predefined boundaries. We confirm this behavior on several occasions during our tests. Other parameters to control the dynamics of the temperature adjustment are internally defined within Nauyaca following the suggestions by Vousden et al. (2015).
For the recovery test presented in this work, we impose uniform log-priors for simplicity with the functional form, where b low and b upp correspond to the lower and upper boundaries for the k-th planetary parameter. Table 1 shows a list of the model parameters for each planet as well as the adopted range of uniform priors. The election of these validity ranges has been described previously in subsection 4.1. From these parameters, inc was set as a constant to the true inclination, inc True , taken from the mock catalog. For each system, Ω consider uniform priors except for the innermost planet (Planet 1 in the catalog entries) which is set to the true value Ω 1,True . Thereby, the posterior probability (up to a constant) is given by the sum of the log-likelihood (Eq. 1) and the logprior (4) functions. In practice, any other prior functions can be supplied to Nauyaca to calculate the posterior probability.

MCMC initialization from Optimizer results
Solving the inversion problem of a number of interacting planets, N pla , implies to explore a parameter space of dimensions ∼ 7N pla . The nature of the problem is computationally demanding since an N-body integration should be done at each iteration per walker. The wall clock time also increases with the observational time span. Thereby, choosing a strategical initial guess for walkers adapted for the paralleltempering MCMC is crucial to minimize these side effects.
Starting random points from the prior function would be a reasonable choice assuming that we have an informed previous knowledge about the parameters. In general, it would not be the case for planets characterized for the first time. This motivate us to use optimizer results to make an educated initial guess about the planetary parameters. Optimizers take advantage of both, a low computing time consumption in comparison with a full MCMC run (bewteen 1 and 3% of the total time), but also, in finding many modes in the parameter space since realizations are independent among themselves. Although these solutions could be just rough approximations of more detailed solutions, they could contribute to identify high probability regions suitable for initialization. Initializing walkers near an optimum sensible place is better than any random point in the parameter space (Hogg & Foreman-Mackey 2018). Even more, initialization using multi-start local optimization results is found to enhance the exploration quality and be more suitable for multi-chain methods (e.g., Hug et al. 2013;Ballnus et al. 2017).
We present three strategies implemented in Nauyaca to initialize walkers using the information from the optimizers, namely, gaussian, picked and ladder. In order to set initial values from these strategies, we first make a filter over the total number of optimizer solutions (N opt ) by sorting them according to their χ 2 . Then, we take a fraction ( f best ; defined between 0 and 1) of that ordered list which includes the uppermost solution. That subset of solutions Θ opt is then used to initialize walkers.
In Figure 4, we show examples of initialization using these strategies for many values of f best . For the current example, we focus on the masses space of two planets identified with ID=pl2_id52 in Table 4. We performed 320 realizations of the optimizers to draw an initial walker population for a ladder of N temps = 10 temperatures, considering different values of the parameter f best . The individual solutions from Θ opt are shown with squares in the first column, and the cyan star shows the position of the true masses. Colored dots are the initial walkers belonging to different temperatures. As f best is reduced, solutions with high χ 2 are discarded. We detail these initialization strategies adapted to the paralleltempering MCMC: 1. Gaussian. Walkers are drawn below a gaussian centered in the mean of Θ opt with a 1 − σ value corresponding to the data dispersion, for each dimension. This is the simplest initialization method and possibly the most frequently used. The main difference is that here the mean and standard deviation are based on the independent random realizations and not on previous knowledge of the planetary parameters (for example, masses from radial velocity measurements). Thus, if the optimizers find a well-restricted solution for any dimension, the MCMC will be able to find the global high probability region faster in contrast with an uni f orm random initialization. From Figure 4, it is seen that using this strategy while reducing f best could help to identify the zone in the parameter space where the global minimum could exist. Thus, using this strategy most of the local modes are covered but there is not special attention in the individual modes or in the possible correlations between parameters.
2. Picked. Solutions from Θ opt are randomly picked and the initial walkers are drawn in the vicinity of those solutions. If the optimizers find many modes, this strategy ensures that walkers will be drawn around all the modes which could correspond to local minima or the global minimum. From Figure 4, it is seen that this strategy confines the initial population of walkers as f best is reduced while keeping the dependency between parameters. Here, walkers are equally distributed around any mode independently of their temperature and therefore all the modes are initially sampled with the same frequency. This could be suitable for solving problems where apparently there is not a unique region where optimizer results agglomerate.
3. Ladder. Solutions from Θ opt are divided in an integer number of chunks equal to the number of temperatures. Walkers belonging to temperature 1 (the main temperature; blue dots) are drawn from the first chunk, which includes the uppermost solution (i.e., with the lower χ 2 ). Walkers belonging to the second temperature are drawn from the first and second chunks. The same rule is followed for the rest of the temperatures until finally, walkers for the hottest temperature (yellow dots) are drawn around all the solutions in Θ opt . From Figure 4 it is seen that using this strategy, the modes with the lower χ 2 are highlighted as f best is reduced. Unlike the picked strategy, ladder assigns the outstanding modes to colder walkers while hotter walkers sample the more disperse ones. It allows the exploration of other modes but avoiding getting stuck in these local minima.
The election of the best strategy and parameter f best can vary according to the problem. Ideally, a visual inspection of the optimizer solutions (as in Figure 4) could help to identify modes in the parameter space that allows to decide how to initialize walkers. In practice, a statistical indicator (e.g., standard deviation) or a clustering method could be helpful to choose the initialization: gaussian for high dispersed data; picked for highly multi-modal parameter spaces and ladder for multi-modal with a main mode (as in the example of Figure 4).
Note however that the usage of the optimization module and the proposed initialization strategies are not a mandatory step in Nauyaca previous to the implementation of the MCMC. Any initial walker population can be provided by the user as long as these proposals are inside the physical boundaries described in section 2.1.1. Nonetheless, we empirically find that following this heuristic procedure notably enhances the MCMC performance at the expense of low computational cost.

RESULTS AND DISCUSSION
We applied Nauyaca and the same fitting procedure to the mock catalog with the aim of inverting the process going from the synthetic transit times to the planetary parameters and then comparing with the original values in the mock catalog, which we will refer to as the true values. For each run, we kept fixed the inclinations and the ascending node of the first planet, as described in section 4.1. All the solutions are determined at the synthetic reference epoch t 0 = 0 days, to match with the reference time of the catalog construction.
The procedure consists of the following steps: 1) Providing the true stellar mass and radius, transit ephemeris per planet (mid-transit times) and initializing the parameter space; 2) running the optimizers and choosing the best ∼ 5% − 10% of the solutions to initialize walkers using the ladder strategy, and 3) taking the data from step 2 as initial walkers population for running the MCMC over a fixed number of steps and using a Gelman-Rubin statistic (< 1.01; Gelman & Rubin 1992) and a Geweke test (Z-score < 1; Geweke 1991) to assess the convergence and stationarity of the chains. We performed the same procedure with the parameters summarized in Table 2 according to the number of planets in the system (N pla ). Since the dimensionality of the parameter space scales with the number of planets, we increase the number of optimizer realizations (N opt ) and MCMC steps, accordingly. Along the MCMC runs we did a thinning by saving the current state of the chains at predefined number of steps (shown in Table 2), which also allows us to diminish the memory requirements. At the end of the runs we measure the mean autocorrelation time of the averaged chains and we determined typical values between 30 and 90, with a mean of 70 steps. With these values we determined the effective sample size, getting typical values between 2,400 and 6,800 with a mean of 3,800 independent samples. For a pair of systems with 3 (ID=pl3_id4) and 4 (ID=pl4_id3) planets, we repeated the MCMC process with the same parameters in Table 2 but changing the initialization strategy from ladder to gaussian. By doing this, we initialize the MCMC with less informative points and we find consistent results within 1-σ with the initial results.
Using 16 cores per job (i.e., per planetary system), Nauyaca was able to fit two-planet systems in ∼15 hours, on average. The time increased with increased complexity of the planetary system, reaching up to ∼5.6 days for five-planet systems. Most of the time is spent on running the MCMC, since in comparison, the optimizers are quite fast, running N opt solutions (specified in Table 2) within 10 minutes and 2.5 hours depending on number of optimizer runs. Note however that the wall-clock time depends on many factors, as for example the number of planets, the time span of observations and number of transits to fit, which in our case exceed ∼150 transits for two-planet systems (see number of transits per planet in Appendix A). Thus, lower computational requirements would be enough for simpler systems than those considered in this work. All the simulations were performed with the supercomputer Miztli at the Universidad Nacional Autónoma de México.

Optimizer results
We run the optimizers to explore the parameter space with the aim of minimizing the differences between the observed and modeled transit times (Eq. 2). Since these runs are independent from each other, increasing the number of realizations enhances the chance of finding more minima, that could correspond to local minima or the global minimum. We take a fraction (namely f best ) of the whole set of solutions with the best χ 2 to build a subset of solutions Θ opt , which will be used to initialize walkers.
Setting an optimum composition of Θ opt is an interplay between N opt , f best and the number of temperatures. Thus, there is not a unique way of defining these parameters. Though, we noticed that in most cases f best < 10% is an appropriate fraction using N opt ∼ 300 − 500. We used f best shown in Table 2, according to planet multiplicity. That selection translates in 21, 25, 26 and 48 solutions that compose Θ opt for systems with two, three, four and five planets, respectively. Taking higher values of f best means taking more solutions from optimizers with increasing χ 2 . Usually solutions with high χ 2 are located far from the meaningful modes, resembling to random proposals (as can be noticed from Figure 4). Thus, selecting almost all the optimizer solutions ( f best ∼ 1) could reduce the contribution of the optimizers to locate optimal initial regions to draw walkers.
In Figure 5, we show the results of the optimizers by measuring the differences between the solutions found and the true values. The solutions considered in this figure are part of the subset denoted previously as Θ opt . Therefore, most of the solutions from the optimizers with higher χ 2 values were discarded. Color map indicates the percentage of solutions falling inside each bin. Darkest bins near to zero depict planets where 50% of the solutions better approximate to the true solutions. Sub-panels with almost uniform yellow/white colors are those parameters less constrained by the optimizers.
For the case of the planet masses, 90% of the solutions in Θ opt are typically determined within a factor ∼2-6 with respect to the true masses. Even though the dispersion tends to increase with the increasing of the mass and the number of planets, the solutions approach to the correct mass with a relative low dispersion. In the case of the periods, the solutions tend to span over all the allowed space. We find this behavior is due to the narrowness between our selected lower and upper boundaries, which scales according to the period (see section 4.1). We noticed that when the boundaries are enlarged  Figure 5. Optimizer results shown as the differences between the solutions found and the true values, grouped by the number of planets (columns) and for each planetary parameter (rows). Individual sub-panels are binned along the horizontal axis by the true planetary parameters. Note that these labels are arranged in increasing order but they are not equally spaced. Color code shows the percentage of the solutions found by optimizers falling inside each bin. The total number of solutions for two, three, four and five-planet systems are 21, 25, 26 and 48, respectively, which correspond to Nopt times fbest, reported in Table 2. (for example, with a width of ±0.01 days) the solutions agglomerate around the true periods. Eccentricities are in gen-eral well-restricted only for two-planet systems, where the dispersion reaches up to ∼0.06 around the true values. For The recovery test was done over the transit times originated by these planets. Columns show the results according to the number of planets in each system. Horizontal axes denote the input data (true planetary parameters) and vertical axes denote the differences between the posterior medians recovered by the MCMC and the true parameters. Dark color circles show data consistent within 1-σ and light color squares those consistent within 2-σ. Vertical error bars are colored with dark or light colors for 1-σ and 2-σ, respectively. Gray crosses denote the planets whose parameters were inconsistent with the true input data. The percentage of planets which are consistent with the input data are summarized in Table 3. higher multiplicities the solutions scatter out, with a typical dispersion of ∼0.08. We find that optimizers tend to overes-  timate eccentricities as the number of planets increases. For the argument of periastron (ω) and mean anomaly (M), a similar behavior is found. They are better restricted only for twoplanet systems, since for higher multiplicities the dispersion increases, tending to be evenly distributed over the parameter space. Ascending nodes are the less restricted angles given our established boundaries for this parameter. Note however that for all the recovery test we just considered limited prograde solutions (as described in section 4.1). Although the initial search region explored by the MCMC would be narrowed down as a result of the optimization process, the chains are initialized in these high density regions and explore all the allowed parameter space since the adopted uniform priors are not modified.

MCMC results
In the work carried out by Nesvorný & Beaugé (2010) the authors test their tool TTVIM (which combines a minimization algorithm with an analytic approximation for inverting TTVs) using synthetic transit observations with the aim of recovering planet masses and orbital elements of a non-transiting planet. Nesvorný & Beaugé (2010) used upper limits in relative and absolute errors in planet parameters to decide whether the solutions were correctly recovered or not. Unlike Nesvorný & Beaugé (2010), we considered as recovered to those solutions from the posterior distributions consistent with the true values within 68% (1-σ) and 95% (2σ) credible interval of the total posteriors, assuming that they follow a normal distribution. These solutions are marked in Figure 6 as circles and squares. The comparison is made for planet parameters: mass, period (P), eccentricity (ecc), periastron longitude (ϖ; defined as the sum of ascending node and argument of periastron, ϖ = Ω + ω) and the true longitude ( ; defined as = ν + ϖ, where ν is the true anomaly, which is obtained from a Fourier expansion with the terms of the mean anomaly and eccentricity). Some of our posteriors exhibit more than one peak, manifesting the nature of the selected sampler. Median values and errors are usually centered on the prominent peaks so the data in Figure 6 are representative of our results. In practice, a dynamical stability test could help to distinguish the physically possible solutions. This study is underway but it is out of the scope of the present work.
From Figure 6, we take the discrete values of the differences between recovered and true data (excluding the nonrecovered data marked as gray crosses), assuming that discrete medians are representative of the MCMC results. This was done for each planet multiplicity and planet parameter. From the resulting distributions, we measure the standard deviation that represent, on average, the typical precision achieved in the recovery test for different parameters and planet multiplicities. For masses, we found the minimum dispersion of 0.7 M ⊕ for the two-planet systems, and a maximum of 14 M ⊕ for five-planet systems. For P, a minimum dispersion of 9 seconds for two-planets and a maximum of 110 seconds for five-planet systems. For ecc, a minimum dispersion of 0.007 for two-planets, and a maximum of 0.03 for three-planet systems. For ϖ, a minimum of 35 • for fiveplanets, and a maximum of 50 • for three-planet systems. For , a minimum dispersion of 3 • for two-planets, and a maximum of 7.5 • for four-planet systems. Intermediate dispersions were found for multiplicities not mentioned in these ranges.
We also calculate the Kolmogorov-Smirnov (KS) statistic over the whole distribution of true values and the posterior medians for all the parameters, for planets in the full catalog. We found, for masses, a KS-statistic=0.07 (p-value=0.96); for P, KS-statistic=0.01 (p-value=1.0); for ecc, KS-statistic=0.25 (p-value=3×10 −3 ); for ϖ KS-statistic=0.17 (p-value=0.11), and for , KS-statistic=0.03 (p-value=0.99). Thus, we cannot reject the hypothesis that the distribution of masses, periods (P) and true longitudes ( ) are statistically drawn from the same input distributions. Figure 7 shows the percentage of planets consistent with the true parameters within 1-σ and 2-σ, when grouping planets with the same multiplicity. In Table 3 we summarize these results also considering the global percentages for the full catalog (independently of multiplicity). For masses, periods and true longitudes, the global recovery percentages are consistent with the expected statistical values, i.e., around ∼68% of the planet parameters being recovered within 1-σ Table 3. Percentage of planets consistent with the true parameters within 68% (1−σ) and 95% (2−σ) credible interval. Percentages are given for specific planet multiplicities and for the full catalog.

Planet multiplicity
Mass Period Eccentricity Periastron longitude (ϖ) True longitude ( ) 1-σ 2-σ 1-σ 2-σ 1-σ and ∼95% within 2-σ. These parameters also exhibit consistency with the input catalog according to the KS test. However, we note that ecc and ϖ have similar recovery percentages but far below these statistical expected values. For these two parameters we found respectively, 52% and 55% within 1-σ and 80% and 76% within 2-σ (see Table 3). We investigate the possible causes of both, the similarity and low recovery rates by inspecting the dependence between both parameters. We found that ∼45% of planets with ecc 0.05 recover both, ecc and ϖ at the same time. The remaining planets do not recover at least one of these parameters. In contrast, planets with ecc 0.05 recover both parameters ∼90% of the times, showing that the low recovery rates occur mainly for low eccentric orbits. In our catalog, most of the planets belonging to systems with three or more planets have eccentricities below ∼0.05, as shown in Figure  3. Even more, from Figure 7, it is seen that eccentricity is the only parameter that diminishes its recovery rate as the number of planets increases (teal lines with stars).
We find that the cause of the similarity and the global low recovery rate for ecc and ϖ is a combination of the difficulty to find a well-defined argument of periastron as the orbits tend to be more circular, which in our case occurs mainly for planets belonging to high multiplicity systems. We will show in section 5.4 that the trend of the eccentricity shown in Figure 7 and the low recovery rate of ecc (and consequently ϖ) can be partially explained by the diminishing of the number of transits available for external planets belonging to high multiplicity systems.

TTVs
In Figure 8 we illustrate the TTVs of two systems with simulated data of different quality and their orbits. The top figure corresponds to the system with ID=pl2_id52 in Table  4 for which we estimate a signal-to-noise ratio (S/N) of 8.4 and 5.0 for planets 1 and 2, respectively. The bottom figure corresponds to the system with ID=pl3_id1 in Table 5 for which we estimate a S/N of 1.6, 2.08 and 2.06 for planets 1, 2 and 3, respectively. Left panels show 100 orbital configu-rations (thin colored orbits) reconstructed from the planetary parameters taken randomly from our MCMC posteriors. For comparison, the true orbits are plotted as solid black lines. Transits are detected each time the planets cross the +Z axis, which corresponds to the line of sight. Right panels show the 100 TTV signals (colored solid lines) using the same planet parameters of the orbits. Synthetic observations are shown by empty circles with error bars. For these two systems, the TTV models are consistent with the synthetic data but with a better planetary determination for the top system, which has a better S/N. Hence, the quality in the transit times directly affects the determination of the orbital configurations. For example, we identify that in the case of the three-planet system, the argument of periastron show a wide range of possible solutions with respect to the true position (dashed black lines).
We statistically measured the fitted residuals from the whole catalog by taking 100 random samples per planet from the MCMC posteriors and then calculating their corresponding TTVs. We take the differences between data and the fitted transit times, grouping these residuals according to their multiplicity. The resulting distributions are shown in Figure 9. We find typical standard deviation for these residuals being of 2.5 minutes for two-planet, 2.3 minutes for three-planet, 3 minutes for four-planet and 5.3 minutes for five-planet systems.

The impact of the data quality and quantity
The TTVs inversion problem challenges not only the methods or algorithms, but also the observational data requirements. In this subsection we address this issue by considering the quality (measured by the signal-to-noise ratio, S/N) and quantity (N tran ) of the data required in order to correctly determine planetary parameters. We focus on the determination of planetary mass and eccentricity.
Previous works have proposed different answers regarding the required number of transits and data quality to invert TTVs. Saad-Olivera et al. (2019) suggest that a large number of TTVs combined with a high S/N can be enough to ro- Planet 3 Random fits Figure 8. Examples of TTVs fitting for a two-planet system (ID=pl2_id52 ; Table 4) and three planet system (ID=pl3_id1 ; Table 5). In left panels, dark solid lines correspond to the true orbits and the lighter color orbits correspond to 100 orbital solutions randomly selected from the MCMC posteriors. Black dotted lines indicate the position of the argument of periastron. In this reference frame, observer is located at +Z. These orbit plots have been made with an adapted version of the OrbitPlot in REBOUND (Rein & Liu 2012). Right panels show the simulated TTVs (circles with error bars), while colored lines show the 100 random fits with the same planet parameters of the orbits. In both systems, the internal planets are labeled as Planet 1 and the order increases outwards.
bustly estimate planetary parameters without the necessity of radial velocity measurements. A detailed analysis regarding the characterization of non-transiting planets was conducted by Veras et al. (2011), finding that at least 50 transits are needed to invert the problem. Nesvorný & Morbidelli (2008) developed a method based on perturbation theory to characterize two-planet systems, and found that a S/N ∼15-30 and about N tran 20 are typically required to uniquely characterize these systems. We carried out an analysis of the parameters involved in the fitting procedure, looking for possible correlations between the data quality and quantity, the intrinsic planetary parameters (including the number of planets, N pla ) and the results we found using Nauyaca. A first inspection suggested a trend with the number of transits and thus it is convenient to define the total signal-to-noise (S/N) T as the product of S/N × √ N tran . Figure 10 shows the fractional error in the determination of the planet mass and eccentricity, and the dependency with the (S/N) T , N tran and N pla . Here, 2-σ mass and 2-σ ecc correspond to our derived uncertainties taken from the posteriors encompassing the 95% credible interval. These uncertainties are divided by the true mass and eccentricity, and thus, Figure 10 can work as a guide to put some constraints in the determination of these parameters given the quality and quantity of the data.
In general, it is seen that the fractional error (related with the adequate determination of the parameters) have a correlation with the quality of the data and in second term with the quantity. Moreover, the deficiency of one of these quantities is in some cases compensated by the other one. We observed that planets with individual high (S/N) T can constrain the planetary mass and eccentricity even with a relative low N tran . Complementary, a large N tran can compensate a low (S/N) T . Planets with a low (S/N) T exhibit a larger dispersion of their fractional errors for their masses and eccentricities, although, those with a large N tran usually have a better determination (i.e., a lower fractional error). Comparing the top and bottom panels, it can be distinguished that the majority of the non-recovered planets are those with a number of transits lower than 60-80 (white/yellow crosses) corresponding to systems with three, four and five planets. However, planets of these multiplicities with low (S/N) T can still find the correct solution but with a poor constraint (white/yellow circles in the upper left part of the top diagrams). Also notice from the bottom panels that the trend between the fractional errors and the total (S/N) T seems to hold when grouped by planet multiplicities.
From these results it is seen that (S/N) T and N tran have a significant impact in the determination of these parameters. The low recovery rate for ecc discussed in section 5.2 is then explained by the low (S/N) T of our synthetic data and the low number of transits per planet ( 20-40) for systems with many planets. Low (S/N) T can be due to noisy measurements of the transit timing or due to low amplitudes of the TTVs signals. In the first case, the limitation is in part by the instruments and observational strategy. In the second case, low amplitudes correspond to planets in less perturbed (almost circular) orbits. In this situation, inverting TTV signals result in less constrained orbits.

CONSIDERATIONS AND CAVEATS
Here we highlight many of the considerations made in this work that could be relevant for the users of Nauyaca. We also point out many of the limits and caveats when using this tool.
• In this work we assume the continuous monitoring of the planet transits and thus, there is not data gaps in our ephemeris. In practice this could be unrealistic, specially for ground-based observations where telescope time is limited. Although, as discussed by Nesvorný & Morbidelli (2008), the number of transits is determinant in the planetary characterization rather than the time distribution.
• We remind that there is not a unique way of choosing the fine-tuning parameters for running the algorithms in Nauyaca, since it will depend on the problem to solve. Although, the parameters chosen here for synthetic systems can work as a guide for real planets.
• Regarding the usage of Nauyaca with real systems, the user must be aware that stellar mass and radius are well restricted to consider them as constants. In cases where one or both of these parameters exhibit large uncertainties, we suggest running many simulations with a grid of stellar parameters including the values with the lower and upper uncertainties.
• We remind that TTVFast and therefore Nauyaca is adapted to deal with planets around single parent stars, and hence planets in circumbinary systems or other configurations are not allowed.
• We also must take into account that the fitting is performed over the mid-transit times instead of the own TTVs. Since TTVs are the transit time signals in an O-C diagram, the calculated data depends on the number of epochs and the used method to make the linear regression over the transit times. Fitting to TTVs instead of transit times alone can result in an imprecise period determination of about 40 minutes for a ∼70 days period planet, as shown by Carpintero & Melita (2018).
For that reason, we performed the fitting methods over the raw mid-transit times.
• The simulations are performed over a pre-fixed time span with a pre-selected timestep which by default is set to P 1 /30, where P 1 is the period of the internal planet.
• When computing the mid-transit times, the planet radius is not taken into account to assess whether a transit occurs. Only the coordinates of the planet center are considered to determine if the planet transits. Thus, grazing transits (as for example in WASP-67b; Mancini et al. 2014) are currently not considered.

SUMMARY AND CONCLUSIONS
In this work we present Nauyaca 5 , a Python package that encompasses minimization routines (Optimization module)  Figure 10. Influence of the data quality and quantity over the proper determination of the planet mass (left panels) and eccentricity (right panels). Horizontal axes denote the total signal-to-noise (S/N)T = S/N × √ Ntran and vertical axes are the fractional errors derived from our results. The 2-σ corresponds to the 95% credible interval from the posteriors and MassTrue and eccTrue are the true values in the catalog. The recovered planets are marked with circles and the non-recovered planets are marked with crosses. In top panels, colors correspond to the number of transits per planet (Ntran) binned by 20 transits. In the bottom panels, planets are colored according to the number of planets in their systems. and a Markov chain Monte Carlo method (MCMC module) exclusively adapted to find planet parameters (masses and orbital elements) that best reproduce the transit times based on numerical simulations. Even though the numerical method is more computationally expensive (compared to analytical approximations), is more suitable to address more general situations, such as considering many planets with varied orbital configurations. Nauyaca requires transit ephemeris per planet and stellar mass and radius. Additionally, any previous knowledge about validity ranges can be supplied in order to better constrain the parameter space.
Previous studies of transit-timing analysis have used synthetic data to test new techniques or to quantify the relation between properties of TTVs and planetary parameters (e.g., Nesvorný & Morbidelli 2008;Meschiari & Laughlin 2010;Veras et al. 2011). However, in most cases these studies have been limited to study two-planet systems where one of the planets transits and the other acts as the perturber. Here, we analyze a large sample of synthetic transiting planets with planetary parameters based on the current planet data from Exoplanet Archive (see section 3.1). For these planets we calculate the mid-transit times to use them as input to Nauyaca (section 5). This allows us to characterize the performance of Nauyaca by measuring the consistency rate betwen the catalog entries and the parameters determined by the tool.
For all the systems, we run optimization algorithms to test the performance for many planet multiplicities (section 5.1). Optimizers take advantage of the low consumption time in contrast with a full MCMC run and provides an overall outlook of the regions in the parameter space with higher probability (the planet boundaries were defined in section 4.1). We find that the best performance is achieved for the twoplanet systems for any dimension, and for higher multiplicities the results are varied. The optimizers would define high density regions for the masses of the planets within a factor ∼2-6, allowing to initialize the chains around 20% of the real masses. Eccentricities are in general well restricted only for two-planet systems. Nevertheless, the optimizers for higher multiplicities are not well suited to define high density regions where the MCMC chains would be initialized. Orbital angles remain in general loosely constrained, except for twoplanet systems.
We drawn the initial walker population for the paralleltempering MCMC, using the best <10% of the solutions from the optimizer runs. We find a good agreement between the input parameters in the catalog and those found in the recovery test with the MCMC, at 2-σ (section 5.2). The global recovery percentages for masses, periods and true longitudes are in concordance with the statistical expected values of ∼68% of the planet parameters being recovered within 1-σ and ∼95% within 2-σ. These parameters also exhibit consistency with the input catalog according to a Kolmogorov-Smirnov test. In contrast, eccentricities and periastron longitudes have similar low recovery rates. Even more, the recovery rate for eccentricity diminish as the number of planets increases. We found that the cause of the similarity and the global low recovery rate between ecc and ϖ is a combination of the difficulty to determine the argument of periastron as the orbits tend to be circular and the reduction of the number of transits for external planets, mainly for high multiplicity systems. In these scenarios, the signal-to-noise ratio of the data plays a determinant role to correctly determine these parameters.
Depending on the planet multiplicities, typical mass precision accomplished in the recovery test range from ∼1-14 M ⊕ . Periods achieve a precision between 10 seconds and 2 minutes. Eccentricities reach a precision between ∼0.01-0.03. Periastron longitudes and true longitudes have typical precisions of ∼ 40 • and 6 • , respectively.
We investigate the dependence of data quality and quantity (section 5.4) and the proper determination of the planetary mass and eccentricity. We find that, in general, quality is better than quantity, although in many cases one parameter can compensate the deficiency of the other one. However, we warn that the results in this part of our study should be taken as a guide, since the simulated planet sample in our catalog is limited.
Finally, we make suggestions about the fine-tuning parameters involved in the procedure. Depending on the computation facilities, parameters in Table 2 could be a reasonable starting point to make optimization and MCMC runs. Note that in our mock catalog, the "observed" timespan and the number of transits in the systems are varied. Therefore, the fine-tuning parameters can be scaled to be adapted to specific problems. We suggest performing, at least, between 100-150 optimizer runs (N opt ) times the number of planets in the system, and taking < 10-30% of the best solutions to initialize walkers. The election of the initialization strategy presented in this work (section 4.3) depends on the parameter space itself, which is unknown by nature. We suggest using the ladder or picked strategy if nothing is known about the parameter space (as in the majority of the situations) and the gaussian strategy if the parameter space is somehow constrained, as for example when considering fixed angles, fixed periods, circular orbits, etc. Of course, the suggested methodology used in this work is not mandatory when using Nauyaca, since for example, optimization routines and the proposed initialization strategies are optional. The user has the freedom to select the tools that are best suited to its TTVs inversion problem. However, we note an improvement in the MCMC performance and the results when following the proposed method shown in this work.
In a forthcoming work, beyond the scope of this paper, we will show the application of the tool to more specific situations, as for example systems with non-transiting planets, with highly mutual inclinations and with missed transit data. We will also show the application to real systems in order to revisit planet parameters of previous characterized planets and also with new planets.
We provide the data used throughout this work in electronic format at 10.5281/zenodo.5218498. Nauyaca first release can be found at 10.5281/zenodo.5230451.

ACKNOWLEDGMENTS
The authors gratefully acknowledge the computing time granted by DGTIC-UNAM for access to the Supercomputer Miztli in the HTC group, under the project with code LANCAD-UNAM-DGTIC-361. We thank Gabriel Perren and Luis M. Pavón for useful discussions. We also thank the anonymous referees for their useful comments and suggestions made to improve the quality of the manuscript. EFCC also acknowledge for the PhD grant awarded by CONACYT Graduate Fellowship. This work was supported by UNAM-PAPIIT IN-107518 and BG-101321. HV was supported by the project UNAM-PAPIIT IN-101918. This research has made use of the NASA Exoplanet Archive, which is operated by the California Institute of Technology, under contract with the National Aeronautics and Space Administration under the Exoplanet Exploration Program.

APPENDIX A. CATALOG TABLES
Here we collect the data tables of the mock catalog that show the planet properties of the synthetic planetary systems. Tables are grouped by planet multiplicity. System IDs follow the syntax pl{N pla }_id{ID}, where N pla is the number of planets in the system and ID is an internal identifier number. Orbital elements correspond to the osculating elements at the simulated reference time of t 0 = 0 days. These tables can be found in electronic format at: 10.5281/zenodo.5218498.