Refining the transit timing and photometric analysis of TRAPPIST-1: Masses, radii, densities, dynamics, and ephemerides

We have collected transit times for the TRAPPIST-1 system with the Spitzer Space Telescope over four years. We add to these ground-based, HST and K2 transit time measurements, and revisit an N-body dynamical analysis of the seven-planet system using our complete set of times from which we refine the mass ratios of the planets to the star. We next carry out a photodynamical analysis of the Spitzer light curves to derive the density of the host star and the planet densities. We find that all seven planets' densities may be described with a single rocky mass-radius relation which is depleted in iron relative to Earth, with Fe 21 wt% versus 32 wt% for Earth, and otherwise Earth-like in composition. Alternatively, the planets may have an Earth-like composition, but enhanced in light elements, such as a surface water layer or a core-free structure with oxidized iron in the mantle. We measure planet masses to a precision of 3-5%, equivalent to a radial-velocity (RV) precision of 2.5 cm/sec, or two orders of magnitude more precise than current RV capabilities. We find the eccentricities of the planets are very small; the orbits are extremely coplanar; and the system is stable on 10 Myr timescales. We find evidence of infrequent timing outliers which we cannot explain with an eighth planet; we instead account for the outliers using a robust likelihood function. We forecast JWST timing observations, and speculate on possible implications of the planet densities for the formation, migration and evolution of the planet system.


INTRODUCTION
The TRAPPIST-1 planetary system took the exoplanet community by surprise thanks to the high multiplicity of small transiting planets orbiting a very-lowmass star (∼ 0.09M ; Gillon et al. 2016Gillon et al. , 2017Luger et al. 2017b;Van Grootel et al. 2018). The unexpected nature stems from the fact that this system was found through a survey of only 50 nearby ultracool dwarf stars (Jehin et al. 2011;Gillon et al. 2013), suggesting either a high-frequency of such systems around the latest of the M-dwarfs (He et al. 2016), or perhaps this discovery was fortuitous (Sestovic & Demory 2020;Sagear et al. 2020). The proximity of the host star (∼12pc) makes it brighter in the infrared (K = 10.3) than most ultracool dwarfs. Its small size (∼ 0.12R ) means that its planets' masses and radii are large relative to those of the star, which enables precise characterization of the planets' properties. The system provides the first opportunity for a detailed study of potentially rocky, Earthsized exoplanets with incident fluxes spanning the range of the terrestrial planets in our Solar System. As such, it has galvanized the exoplanet community to study this system in detail, both observationally and theoretically, and has fueled hopes that atmospheric signatures (or even biosignatures) might be detected with the James Webb Space Telescope (Barstow & Irwin 2016;Morley et al. 2017;Batalha et al. 2018;Krissansen-Totton et al. 2018;Wunderlich et al. 2019;Fauchez et al. 2019;Lustig-Yaeger et al. 2019).
More conservatively, the system provides a potential laboratory for comparative planetology of terrestrial planets, and may provide insight and constraints on the formation and evolution of terrestrial planets around the lowest-mass stars. In particular, transiting multiplanet systems afford an opportunity to constrain the interior compositions of exoplanets. Sizes from transit depths combined with masses from transit-timing variations yield the densities of the planets (e.g. Agol & Fabrycky 2017). In the case of rocky planets with a thin atmosphere, the bulk density can constrain the core-mass fraction and/or Mg/Fe mass-ratio (Valencia et al. 2007), although for any given planet there is still a degeneracy between a larger core-mass fraction and a volatile enve-lope (Dorn et al. 2018). In a multi-planet system, the bulk density as a function of planet orbital distance may be used to partly break the compositional degeneracy by assuming a common refractory composition and a water composition which increases with orbital distance (Unterborn et al. 2018;Lichtenberg et al. 2019).
The TRAPPIST-1 system was initially found with a ground-based pilot survey using a 60-cm telescope, revealing two short-period transiting planets, and two additional orphan transits (Gillon et al. 2016;Burdanov et al. 2018). Subsequent ground-based study of the system revealed several additional orphan transits, leading to an incomplete picture of the number of planets and the architecture of the system. A 20-day observation run with the Spitzer Space Telescope (Werner et al. 2004) resolved the confusion, revealing the periods of six of the seven transiting planets (Gillon et al. 2017), but only a single transit observed of the outermost planet left its orbit in question. A subsequent observation campaign of the system with the K2 mission included four additional transits of the outer planet, identifying its period, and revealed a series of generalized three-body Laplace relations (GLRs) 1 between adjacent triplets of planets (Luger et al. 2017b). Additional observations with Spitzer continued to monitor the transit times of the seven planets at higher precision than afforded by ground-based observations. An initial analysis of the Spitzer data to determine planetary radii and masses was presented in Delrez et al. (2018a) and Grimm et al. (2018). In total, Spitzer observed TRAPPIST-1 for more than 1000hrs, and the resulting time-series photometry includes 188 transits (Ducrot et al. 2020).
Although the planets in the TRAPPIST-1 system have short orbital periods, ranging from 1.5 to 19 days, the dynamical interactions accumulate gradually with time, which requires longer-timescale monitoring to accurately constrain the orbital model. The GLRs also cause adjacent pairs of planets to reside near mean-motion resonances, for which jP −1 i ≈ (j + k)P −1 i+1 for integers j and k for the ith and i + 1th planets. This proximity causes a resonant timescale for k = 1 given by (1) (Lithwick et al. 2012) which is the characteristic timescale of the transit timing variations (TTVs) of the outer five planets. The period of the resonant terms for each of these pairs of planets is P T T V ≈ 491±5 days (ranging from 485 to 500 days for each pair). This timescale has two consequences for measuring the masses of the planets from transit-timing variations: 1) the transit times for each planet need to be sampled on this timescale, preferable covering two cycles so that the amplitude and phase of the cycles may be distinguished from the planets' orbital periods; 2) this resonant period also sets the timescale for the amplitude variability of "chopping" (short-timescale transit-timing variations), which can help to break a degeneracy between mass and eccentricity for the resonant terms (Lithwick et al. 2012;Deck & Agol 2015). As a consequence, we expect the measurements of the masses of the system to require sampling on a timescale of t min ≈ 2P T T V ≈ 2.7 years. Consequently, the current paper is the first with a survey time, t survey = 4.114 yr, such that t survey >t min for the TRAPPIST-1 system. Prior studies used the data available at the time (Delrez et al. 2018a), with t survey <t min , causing ample degeneracy in the dynamical model, and hence larger uncertainties in the masses of the planets (Gillon et al. 2017;Grimm et al. 2018). Even so, these papers were ground-breaking as they enable the first density determinations of temperate, Earth-sized planets exterior to the Solar System. Both papers indicated densities for the planets which were lower than the value expected for an Earth-like composition (with the exception of planet e), indicating that these planets might have a significant volatile content. However, these conclusions were subject to significant uncertainty in the planet masses, making the determination of the compositions less definitive as the uncertainties were still consistent with rocky bodies at the 1 − 2σ level. In addition, the masses of all of the planets are highly correlated due to the fact that the dynamical state of all of the planets needs to be solved together and their masses and radii are measured relative to the star, so model comparisons with individual planets are not independent.
In this paper we revisit a transit-timing and photometric analysis with the completed Spitzer program using the more extensive transit dataset we now have in hand. The goal of this program is to provide a more precise understanding of the masses, radii, and densities of the planets. These measurements may be used for planetary science with the extrasolar planets in the TRAPPIST-1 system, whose similarity to the sizes, masses and effective insolation range of the terrestrial planets in our Solar System is the closest match known. In addition, we refine the dynamical state of the system, revisiting some of the questions explored in Grimm et al. (2018). Our final goal is to prepare for upcoming observations with the James Webb Space Telescope (JWST; Gardner et al. 2006). More precise constraints on the parameters of the planets will not only improve the precision with which we can schedule observations, but also provide the best possible predictions of potential environmental characteristics that could be discriminated observationally. This work will therefore help to optimize both the acquisition and interpretation of observations of the TRAPPIST-1 system with JWST.
In §2 we summarize the observational data which are analyzed in this paper. In §3 we discuss the nature of transit timing outliers, and the robust likelihood function we use for characterizing the system. This is followed by a description of our N-body transit-timing analysis in §4. With the improved N-body model, we revisit the photometric fit to the Spitzer data using a photodynamical model in §5. The results of these two analyses are combined to obtain the planet bulk properties in §6. In §7 we derive revised parameters for the host star. In §8 we search for an eighth planet with transit-timing. In §9 we interpret the mass-radius measurements for the planets in terms of interior and atmospheric structure models. Discussion and conclusions are given in §10 and §11.
We provide Julia and python code for running the Markov chains, creating the figures, and creating the paper PDF in https://github.com/ericagol/TRAPPIST1_Spitzer. The 3.5 GB data/ directory in the repository may be found as a zip file, data.zip, at https://drive.google.com/file/d/14iEW6jupY8dnGlYGXJIrWlFftdwRcfLo and https://doi.org/10.5281/zenodo.4060252. In each figure we embed links to the code (</>) which produced that figure.
2. NEW TRAPPIST-1 OBSERVATIONS Since the work described in Grimm et al. (2018) based on 284 transits, we have added an additional 163 transit times from a combination of Spitzer ( §2.1) and ground-based observations ( §2.2) for a total of 447 transits. With preliminary transit-timing fits, we found evidence for outliers amongst the measured times ( §3), which we account for with a robust likelihood model.
Each transit time is measured as a Barycentric Julian Date (BJD TDB ), correcting for the location of the Earth/spacecraft relative to the Solar System barycenter (Eastman et al. 2010) at the time of each transit observation. We next describe our data.

Spitzer Observations
The dataset used in this work includes the entire photometry database of TRAPPIST-1 observations with Spitzer Space Telescope's Infrared Array Camera (IRAC; Carey et al. 2004) since the discovery of its planetary system. This represents all time series observations gathered within the DDT programs 12126 (PI: M. Gillon), 13175 (PI: L. Delrez) and 14223 (PI: E. Agol). These cover a total of 188 transits observed from Feb 2016 to Oct 2019 and include 64, 47, 23, 18, 16, 13, and 7 transits of planets b, c, d, e, f, g, and h, respectively (Ducrot et al. 2020). All of these data can be accessed through the online Spitzer Heritage Archive database 2 . Spitzer IRAC Channels 1 (3.6 µm, 0.75 µm wide) and 2 (4.5 µm, 1.015 µm wide) were used during the Spitzer Warm Mission (Fazio et al. 2004;Storrie-Lombardi & Dodd 2010) with 61 and 127 transits observed in each band, respectively. Observations were obtained with IRAC in subarray mode (32×32 pixel windowing of the detector) with an exposure time of 1.92 s and a cadence of 2.02 s. In order to minimize the pixel-phase effect (Knutson et al. 2008) the peak-up mode was used (Ingalls et al. 2016) to fine-tune the positioning of the target on the detector following the IRAC Instrument Handbook. 3 Finally, calibration was performed using Spitzer pipeline S19.2.0 to output data as cubes of 64 subarray images of 32×32 pixels (the pixel scale being 1.2 arcsec). Each set of exposures was summed over a 2.15 minute cadence to allow for a tractable data volume for carrying out the photometric analysis, which is described in detail in Delrez et al. (2018a) and Ducrot et al. (2020).
These observations were carried out in an I+z filter with exposure times 23s, 50s and 50s, respectively; character-istics of this filter are described in Murray et al. (2020). Observations were also performed with the Liverpool Telescope (LT; Steele et al. 2004) and the William-Herschel Telescope (WHT), both installed at the Roque de los Muchachos Observatory, La Palma. Only one transit of planet b and one of d were targeted with the WHT whereas 15 transits of several planets were targeted with LT. For LT observations, the IO:O optical wide field camera was used in Sloan z' band with 20s exposure time. One transit of b was observed with the Himalayan Chandra Telescope (HCT). Finally, a total of 26 transits were observed in the near-IR (1.2 -2.1 µm) with the WFCAM near-IR imager of the the United Kingdom Infra-Red Telescope (UKIRT; Casali et al. 2007), the IRIS2IR-imager installed on the the Anglo-Australian Telescope (AAT; Tinney et al. 2004), and the HAWK-I cryogenic wide-field imager installed on Unit Telescope 4 (Yepun) of the ESO Very Large Telescope (VLT; Siebenmorgen et al. 2011). These observations are summarized in Table 1 : 504 transit observations were collected with 57 duplicate (or triplicate) transits which were observed by a second (or third) observatory simultaneously, for a total of 447 unique planetary transit times which are used in our analysis. Additional information may be found in Gillon et al. (2016) for WHT and TRAPPIST, in Ducrot et al. (2018) for SSO and LT, and in Gillon et al. (2017) and Burdanov et al. (2019) for AAT, UKIRT and VLT.
For all ground-based observations, a standard calibration (bias, dark and flat-field correction) was applied to each image, and fluxes were measured for the stars in the field with the DAOPHOT aperture photometry software (Stetson 1987). Differential photometry was then performed using an algorithm developed by Murray et al. (2020) to automatically choose and combine multiple comparison stars, optimized to use as many stars as possible, weighted appropriately (accounting for variability, color and distance to target star), to reduce the noise levels in the final differential lightcurves. This reduction and photometry was followed by an MCMC analysis to retrieve transit parameters.

Transit time measurements and analysis
Gathering together the heterogeneous sample of transits obtained from a variety of ground-and spacebased telescopes, we transformed the time stamps to the BJD TDB time standard prior to photometric analysis. We analyzed the datasets together with a global photometric analysis of all single-planet transits, as described in Ducrot et al. (2020), with a separate analysis of the overlapping transits once the single-transit analysis was completed.
For each planet a fixed time of transit for epoch zero (T 0 ) and fixed period (P ) were used, but with timing offset ("TTV") as a fitted parameter for each transit as described by Ducrot et al. (2020). To derive T 0 and P , a linear regression of the timings as a function of their epochs was performed for each planet to derive an updated mean transit ephemeris; their exact values can be found in Table 4 of Ducrot et al. (2020). The timing offsets are then added back to the ephemeris to obtain the measured transit times and uncertainties.
The final observed dataset for the transit-timing analysis is given by: y = ({t obs,ij , σ ij ; j = 1, ..., N i }; i = 1, ..., 7), where i labels each of the seven planets, N i is the number of transits for the ith planet (Table 1), and j labels each transit for the ith planet, so that t obs,ij is the jth observation of the ith planet, and σ ij is the corresponding measurement error. The total number of transits is N trans = Np i=1 N i = 447 where N p is the number of transiting planets. Table 14 lists the complete set of transit times and uncertainties which were utilized in the present analysis.
With this sample of transit times collected, we proceed to describe our dynamical analysis, starting with the likelihood function and evidence for outliers.

EXCESS OF OUTLIERS AND ROBUST LIKELIHOOD MODEL
We first carried out a preliminary 7-planet, planeparallel N-body model fit to the transit times using a χ 2 log likelihood function, i.e. assuming a Gaussian uncertainty for each transit time given by the derived timing uncertainty, which we optimized using the Levenberg-Marquardt algorithm. We found that the residuals of the fit have many more outliers than is probable assuming a Gaussian distribution for the timing uncertainties. Figure 1 shows the cumulative distribution function (CDF) and a histogram of the normalized residuals versus a single Gaussian probability distribution function (PDF) with unit variance (orange line). This CDF distribution function disagrees with the Gaussian CDF in the wings for P (>z) 0.1 and P (>z) 0.9, where z=(t obs,ij −t ij (x dyn ))/σ ij are the normalized residuals, with the model time, t ij (x dyn ), as a function of the dynamical model parameters, x dyn , described below. This indicates that there is a significant excess of outliers with large values of |z| relative to a Gaussian distribution. The histogram in Figure 1 also demonstrates this clearly: there are 8 data points with z< − 3 and 7 with z>3. With 447 transit time measurements, we would only expect ≈ 1.2 data points with |z|>3 if the distribution were Gaussian with accurately estimated uncertainties. This excess is even more apparent at |z|>4.
We have examined the individual transits that show these discrepancies, and there is nothing unusual about their light curves, such as flares, overlapping transits, or other anomalies. The outliers appear for each of the planets (save h), in both ground-and space-based data, and for measurements with different sizes of uncertainties. We do not think that our N-body model is in error (and we have tried to fit with an extra planet, without a significant improvement in the number of outliers; see below). Consequently, we believe that these outliers are due to variations in the measured times of transits which are not associated with dynamics of the system.
We suspect instead that these outliers are a result of some systematic error(s) present in the data. There are a variety of possibilities: uncorrected instrumental/observational systematics; time-correlated noise due to stellar variability; stellar flares (which may be too weak to be visible by eye, but might still affect the times of transit); or stellar spots (Oshagh et al. 2013;Ioannidis et al. 2015). Again, our examination of the light curves did not point to a single culprit, so we are unable to model and/or correct for any of these effects. Our data are not unique in this respect: similar outliers have been seen in other transit-timing analyses, as described in Jontof-Hutter et al. (2016).
Our transit-timing model will be affected by these timing outliers, which make an excessive contribution to the χ 2 of the model, and thus can affect the inference of the model parameters. This can cause both the parameters and the uncertainties to be mis-estimated. To make progress, we have modified the likelihood model to account for outliers.
We used a heavy-tailed likelihood function which better describes the residual distribution: a Student's tdistribution (Jontof-Hutter et al. 2016). We fit the normalized residuals to a model in which the width of the distribution was allowed to vary, which we parameterize with an additional factor multiplying the variance, which we refer to below as V 1 . For the Student's tdistribution there is only one additional free parameter: the number of degrees of freedom, ν, which we treat as a continuous parameter. Figure 1 shows a histogram of the outliers of the bestfit transit-timing model (described below), and shows that the Student's t-distribution gives a much higher probability for outliers.
With the description of the dataset complete, we next describe our efforts to model the data.

TRANSIT-TIMING ANALYSIS
In this section we describe our transit-timing analysis in detail, starting first with a description of our dynamical model.

N-body integration
We integrate the N-body dynamics in Cartesian coordinates with a novel symplectic integrator, NbodyGra-dient, which is based on the algorithm originally described in Hernandez & Bertschinger (2015), derived from the non-symplectic operator of Gonçalves Ferrari et al. (2014). 4 The time-evolution operator of the integrator is a succession of Kepler 2-body problems and simple "kick" and "drift" operators. The advantage over traditional symplectic methods (Wisdom & Holman 1991) is that the dominant error is due to three-body interactions, while in the standard methods, the dominant error is due to two-body interactions, meaning close encounters between non-stellar bodies are treated poorly (Hernandez & Dehnen 2017). The Kepler problem for each pair is solved with an efficient universal Kepler solver (Wisdom & Hernandez 2015). The symplectic integrator is made to be time-symmetric to yield secondorder accuracy (Hernandez & Bertschinger 2015). Then, a simple operator is introduced to double the order of the method (Dehnen & Hernandez 2017). We have found that numerical cancellations occur between Kepler steps and negative drift operators, and so we have introduced an analytic cancellation of these terms to yield an algorithm which is numerically stable, which converges for small time steps (Agol & Hernandez 2020).
The initial conditions are specified with Jacobi coordinates (Hamers & Zwart 2016) and we use a set of orbital elements for each planet given by where N p is the number of planets for a total of 5N p dynamical parameters. In addition we take the star to have a mass, m 0 = M * /M , which we fix to one. The units of time for the code are days, while the length scale of the code is taken to be m 1/3 0 AU. 5 The initial orbital ephemeris, (P i , t 0,i ), consists of the period and initial time of transit which each planet would have if it orbited a single body with a mass of the sum of the star and the interior planets, unperturbed by the exterior planets. We use these variables (in lieu of initial semi-major axis and mean longitude) as they are well constrained by the observed times of transits. We convert these analytically to the time of periastron passage, once the Kepler equation is solved , to yield the initial eccentric anomaly for each initial Keplerian. Finally, the eccentricity, e i , and longitude of periastron, ω i , for each Keplerian we parameterize in terms of e i cos ω i and e i sin ω i to avoid the wrapping of the angle ω i . We transform from Ja-4 The code may be found at https://github.com/ericagol/NbodyGradient 5 Note that as we take m 0 = 1 in our simulations, we need to multiply the output of positions and velocities from the code by (M * /M ) 1/3 to scale to a stellar mass M * . cobi coordinates to Cartesian coordinates to complete the initial conditions. For our transit-timing analysis, we assume that the planets are plane-parallel and edge-on in their orbits, allowing us to neglect the inclination and longitude of nodes for each planet.
A symplectic integration time step, h, is selected to be small, <5%, compared with the orbital period of the innermost planet (Wisdom & Holman 1991). For most of our integrations we use a time step of h = 0.06 days, or about 4% of the orbital period of planet b.
The model transit times are found by tracking the positions of each planet relative to the star across a time step. Then, when the dot product of the relative velocity of the planet and star with their relative position goes from negative to positive, and the planet is between the star and observer, we flag a routine which iterates with Newton's method to find the model transit time, which is taken to be when this dot product equals zero (Fabrycky 2010), corresponding to the mid-point of the transit if acceleration is negligible over the duration of the transit. The resulting model we obtain is for the jth transit of the ith planet, giving each model transit time as a function of the initial conditions, t ij (x dyn ), which can then be compared to the observed times, t obs,ij .
Once the model transit times have been found for every planet over the duration of the time integration, these are then matched with the observed transit times to compute the likelihood using the Student's t probability distribution. The log likelihood function for each data point is given by where Γ(x) is the Gamma function (Fisher 1925). The total log likelihood function which we optimize is given by where N p is the number of planets; we use N p = 7 for most of our analysis.
Note that we assume that the timing errors are uncorrelated. Most transits are well separated in time, and thus this is an accurate assumption as the noise should be uncorrelated on these timescales. There are a small number of transits (about 6%) that overlap in time, and thus may have correlated uncertainties; we do not account for this in the likelihood function.

Uncertainty analysis
We carried out the uncertainty analysis on the model parameters with three different approaches: 1. Laplace approximation.

Markov-chain Monte Carlo.
First, in our Laplace approximation analysis, we assume a uniform prior on the model parameters and expand the likelihood as a multi-dimensional normal distribution. We maximize the likelihood model using the Levenberg-Marquardt algorithm, which requires the gradient and Hessian of the negative log likelihood. Once the maximum likelihood is found, we compute an approximate Hessian at the maximum likelihood (see Appendix A). The inverse of the Hessian matrix yields an estimate of the covariance amongst the parameters at the maximum likelihood, whose diagonal components provide an initial estimate for the parameter uncertainties; we will also use the Hessian for more efficient sampling of the Markov chain.
The second approach we use is to compute the likelihood profile for each model parameter. In this case each parameter is varied over a grid of values over a range given by ±3σ xi , where σ xi equals the square root of the diagonal component for the ith model parameter from the covariance matrix. At each value along the grid for each parameter we optimize the likelihood with a constraint which keeps the parameter pinned at the grid point. This results in a profile of the maximum likelihood of each parameter, optimized with respect to all other parameters, which yields a second estimate for the uncertainties on the parameters. The likelihood profile approach does not assume a normal distribution and is useful for checking for a multi-modal probability distribution which can trip up Markov-chain analysis.
However, both of these error estimates are incomplete as they do not account for non-linear correlations between parameters, for the non-Gaussian shape of the posterior probability, nor for the prior probability distribution. 6 Nevertheless, the agreement between the two estimates gives a starting point for evaluating our Markov chain analysis, and for gauging the convergence of the chains, which we describe below.
In our initial Markov chain sampling, we found that the parameters of the Student's t-distribution, ν and V 1 , were strongly non-linearly correlated and displayed a likelihood profile which was non-Gaussian. After experimenting with reparameterization, we found that log ν and V 1 e 1/(2ν) gave a parameterization which showed a nearly Gaussian likelihood profile in each parameter, and also showed more linear correlations between these two parameters. Accordingly we chose to sample in these transformed parameters so that our set of model parameters is x = (x dyn , log ν, V 1 e 1/2ν ).
In appendix B we define the prior function Π(x) which multiplies the likelihood to give the posterior probability distribution, so that we can proceed to discussing the Markov chain sampling of the posterior probability of the model parameters given the data.

Markov chain sampler
We sample our posterior probability, P (x), with a Markov chain sampler. There are 37 free parametersfour orbital elements and one mass-ratio for each planet, and two parameters for the Student's t-distribution. Given the high dimensionality of our model we chose to use a Markov chain sampler which efficiently samples in high dimensions: Hamiltonian Monte Carlo (HMC; Duane et al. 1987;Neal 2011;Betancourt 2017;Monnahan et al. 2016). 7 This sampler requires the gradient of the likelihood function with respect to the model parameters. The gradient of the likelihood requires the gradient of each model transit time with respect to the initial conditions of the N-body integrator.
We have written a module for our N-body integrator which computes the gradient of each model transit time by propagating a Jacobian for the positions and 6 In principle we could include a prior in the Laplace and likelihood profile analyses. 7 aka "Hybrid Monte Carlo." Note that the "Hamiltonian" referred to in HMC is not a physical Hamiltonian, but an artificial one used for treating the negative log probability as a potential energy function, and adding a kinetic energy term, with an artificial momentum conjugate to each model parameter ("coordinate"). For a description of HMC and a discussion of applications to cosmology, including N-body, see Leclercq et al. (2014) and Jasche & Kitaura (2010) and references therein.
velocities of all bodies across every time step throughout the N-body integration (Agol & Hernandez 2020), which is multiplied by the Jacobian of the coordinates at the initial timestep computed with respect to the initial Keplerian elements and masses, which specify the initial conditions and comprise the N-body model parameters. When a transit time is found during the N-body integration with NbodyGradient, we compute the derivative of each transit time with respect to the coordinates at the preceding time step, which we multiply times the Jacobian at that step to obtain the derivatives of each transit time with respect to the initial conditions. The gradient of the prior with respect to the model parameters, and the gradient of the likelihood with respect to the model times and the Student's t-distribution parameters, are each computed with automatic differentiation, using forward-mode derivatives (Revels et al. 2016). The gradient of the likelihood with respect to the dynamical model parameters is found by applying the chain rule to the automatic derivatives of the likelihood with respect to the model times with the derivatives computed in the N-body model (from NbodyGradient).
For our HMC analysis, we augment the simulation parameters with a set of conjugate momenta, p, with the same dimension. We sample from the probability distribution, e −H(x,p) , where H is a Hamiltonian given by the negative log posterior, where p is defined from Hamilton's equations, We take the mass matrix, M, to be the approximate Hessian matrix evaluated at the maximum likelihood, M = H(x 0 ) (eqn. A5). Similarly, the Hamiltonian can be used to compute the evolution of the parameter "coordinates,"ẋ = + ∂H ∂p .
The dot represents the derivative with respect to an artificial "time" coordinate which can be used to find a trajectory through the (x, p) phase space which conserves the "energy" defined by this Hamiltonian. We carry out a Markov chain using the standard approach for HMC. First, we draw the initial momentum from the multi-variate Gaussian distribution defined by the kinetic energy term in the Hamiltonian, where Z n ∼ N (0, 1) is an element of a vector of random normal deviates for n = 1, ..., N param . We then carry out a leapfrog integration of Hamilton's equations for N leap steps from the starting point with a "time" step to obtain a proposal set of parameters (x prop , p prop ). Since energy is not conserved precisely due to the finite differencing of the leapfrog integration, we then apply a Metropolis rejection step to accept the proposal step with probability to determine whether to accept the proposed step and add it to the Markov chain, or to reject it and copy the prior step to the chain.
We carried out some trial integrations to tune two free parameters: 0 and N leap,0 . We draw the "time"-step, , for each integration from the absolute value of a Normal distribution with width 0 , i.e. ∼ |N (0, 0 )|. The number of leapfrog steps for each integration we draw from a uniform probability, N leap ∼ round(N leap,0 U(0.8, 1.0)). We found that a choice of 0 = 0.1 and N leap,0 = 20 results in a proposal for which the Metropolis rejection gives a high average acceptance rate of 70%.
We ran 112 HMC chains for 2000 steps each (i.e. 2000 leapfrog integrations). Each leapfrog integration averaged about 7 minutes, and so the chains took nine days and four hours to complete. 8 We found a minimum mean effective sample size of 57 over all chains, for a total number of independent samples of 6409.

Results
The transit-timing variations are shown in Figure 2, along with our best-fit model. The model is a very good description of the data, although a few outliers are clearly visible by eye. As advertised, the outer five planets show large-amplitude oscillations with the timescale P T T V . We have created a second figure in which a polynomial with an order between 5-30 is fit and removed from the data, and the resulting differences are shown in Fig. 3. The result shows high-frequency variations which are associated with the synodic periods of pairs of adjacent planets, typically referred to as "chopping." The chopping TTVs encode the mass-ratios of the companion planets to the star without the influence of the eccentricities, and thus provide a constraint on the planet-star mass ratios which is less influenced by degeneracies with the orbital elements (Deck & Agol 2015). The chopping variations are clearly detected for each planet (except planet d), which contributes to the higher precision of the measurements of the planet masses in this paper.
The results of the posterior distribution of our transittiming analysis are summarized in Table 2 with the mean and ±34.1% confidence intervals (1σ) computed from the standard deviation of the Markov chains. The correlations between parameters are depicted in Figure  29. There are 35 parameters which describe the planets, in addition to two parameters for the Student's t-distribution, log ν = 1.3609±0.2337 and V 1 e 1/(2ν) = 0.9688±0.1166 ( Figure 4). The posterior mass-ratios and ephemerides are consistent with nearly Gaussian distributions. The eccentricity vectors show deviations from a Gaussian distribution for the inner two planets b and c, as shown in Figure 5. The Laplace approximation covariance uncertainty estimates are overplotted as Gaussian distributions very closely match the likelihood profile for each parameter. This agreement is reassuring: it indicates that the likelihood distribution is closely approximated by a multi-dimensional normal distribution near the maximum likelihood. In the eccentricity-vector coordinates, the prior probability distribution is peaked at zero to ensure that the volume of phase-space at larger eccentricities does not dominate the probability distribution, as shown in the lower right panel of Figure 5. For the planets which have a likelihood distribution which overlaps strongly with zero, the prior distribution causes the Markov chain posterior to have a significantly different distribution from the likelihood profile. This is not due to the prior favoring small eccentricities; rather it is simply a correction for the bias which results by using e i cos ω i and e i sin ω i as Markov chain parameters which favors higher eccentricities (Ford 2006).
The marginalized posterior distributions of the ratio of the planet masses to the star, scaled to a stellar mass of 0.09 M , are given in Table 2 and shown in Figure 6. The likelihood profile of the planet-to-star mass ratios is also plotted in Figure 6 and appears to be well-behaved. These likelihood profiles are also approximately Gaussian in shape, and track the inverse Hessian evaluated at the maximum likelihood to estimate the covariance (also plotted). Compared with the mass estimates from Grimm et al. (2018), the masses of each planet have increased with the exception of planet e which has decreased and planet h which remains the same (Table 3). The mass ratios of the posterior distribution from the Markov chain are slightly shifted to smaller values than the likelihood profile and Laplace approximation probabilities for all planets save b and g.
The Student's t-distribution parameters show a posterior distribution which is shifted from the likelihood profile/Laplace probability distribution (Fig. 4). This bias is due to the fact that the likelihood distribution of these parameters shifts upwards whenever the transittiming model parameters deviate from their maximumlikelihood values. The peak of the posterior distribution of these parameters corresponds to ν = 3.9 and V 1/2 1 = 0.87, which indicates that the core of the distribution is narrower than the transit-timing uncertainties indicate, while the wings of the distribution are close to ν = 4, which was the value used by Jontof-Hutter et al. (2016).

Independent N-body TTV analysis
In addition to the N-body code described above, we use the GPU hybrid symplectic N-body code GENGA The mass of the star is taken to be M = 0.09M , and the time step of the N-body integration is set to h = 0.05 days. The likelihood is assumed to be a normal distribution with the timing errors derived from the timing analyses. For comparison, we have rerun the likelihood-profile computation described above using a normal distribution in place of a Student's tdistribution. The derived masses from the two different analyses agree well with a maximal deviation of the median masses of better than 0.4%, while the mass-ratio uncertainties agree to better than 13%. The eccentricities and longitudes of periastron at the initial time agree as well. We interpret this as a validation of the numerical techniques being employed in this paper.
With the transit-timing analysis completed, we now use the N-body model to improve the estimate of the stellar density and the planet-to-star radius ratios. To do so we create a photodynamic model, described next.

PHOTODYNAMICAL ANALYSIS
With the mass-ratios and orbital parameters derived from the transit-timing analysis, we wish to improve our derivation of the planet and stellar parameters from the Spitzer photometry. The transit depth, transit duration, and ingress/egress duration combined with orbital period constrain the impact parameters and density of the star (Seager & Mallen-Ornelas 2003). Combining these constraints for each of the planets enables a more precise constraint upon the density of the star (Kipping et al. 2012). The transit durations are affected by the (small) eccentricities, but to a lesser extent. We account for the dynamical constraints on the transit-timing model to improve the photometric constraints upon these pa-  Agol (2015) due to adjacent planets, also with a low-order polynomial removed. For the inner four planets we have only plotted data with uncertainties smaller than the chopping semi-amplitude (many observations have large uncertainties which would obscure the plot). ƭ Table 2. Parameters of the TRAPPIST-1 system from transit-timing analysis and their 1σ uncertainties. Note that the mass ratios, µ= Mp/M * , of the planets are computed relative to a star, which is assumed to have a mass of 0.09 M (this is later combined with the estimate of stellar mass to give our estimates of the planet masses). We also report µ in units of 10 −5 , and the fractional precision on the measurement of µ, σµ/µ. The parameters P , t0, e cos ω, and e sin ω describe the osculating Jacobi elements at the start of the simulation, on date BJDTDB −2, 450, 000 = 7257.93115525 days.   rameters, albeit with the dynamical parameters fixed at the maximum likelihood. We fit a "photo-dynamical" model (Carter et al. 2012) to the data with the following procedure. From the bestfit plane-parallel, edge-on transit time model, we compute the sky velocity, v sky , at each of the mid-transit times, t 0 , from the model (in N-body code units). We then convert the code units to physical units using the density of the star, obtaining the sky velocity in units of R * /day. We account for quadratic limb-darkening of the star with parameters (q 1,Ch1 , q 2,Ch1 , q 1,Ch2 , q 2,Ch2 ) in the two Spitzer channels, and for each planet we specify a planet-to-star radius ratio (R p /R * ) and we assume mid-transit impact parameter (b 0 ), which is constant for all transits of a given planet. We assume that the limbdarkening parameters are a function of wavelength for the two Spitzer channels, while we treat the planet radius ratios as identical in both wavebands based on their consistency across all planets in Ducrot et al. (2020), giving a total of 19 free parameters for the photodynamical model.
We ignore acceleration during the transits, treating the impact parameters as a function of time as in units of the stellar radius, R * . Although this expression ignores the curvature and inclination of the orbits, as well as the acceleration of the planet, the star is so small compared with the orbital radii that this approximation is extremely accurate. The transit model is integrated with an adaptive Simpson rule over each Spitzer exposure time (which has a uniform duration binned to 2.15 minutes), as described in Agol  We compute a photometric model for all seven planets for all of the Spitzer data in selected windows around each of the observed transits. Starting with Spitzer photometric data, which were already corrected for systematic variations based on the analysis by Ducrot et al.
(2020), we fit each transit window with the transit model multiplied by a cubic polynomial, whose coefficients are solved for via regression at each step in the Markov chain. We transform the q 1 , q 2 limb-darkening parameters to u 1 , u 2 in each band using the formalism of Kipping (2013) for computing the transit model from Agol et al. (2019). After carrying out an initial optimization of the model, we take the photometric error to be the scatter in each observation window to yield a reduced chi-square of unity in that window. With this photometric scatter, we compute a χ 2 of the model with respect to the Spitzer photometric data, and we optimize the model using a Nelder-Mead algorithm.

Photodynamic Results
To compute the uncertainties on the photodynamical model parameters, we use an affine-invariant Markov chain Monte Carlo algorithm (Goodman & Weare 2010). 9 We used a prior which places bounds on each parameter given in Table 4  of the results of the fit are given in Table 5, while the correlations between parameters are shown in Figure 30. We utilized 100 walkers run for 50,000 generations, discarding the first 1500 generations for burn-in. We computed the effective sample size using the integrated autocorrelation length, finding a minimum effective sample size of 6000 over all 19 parameters 10 .
To help visualize the model, a photodynamical model with the best-fit parameters is shown in Figure 7 computed over 1600 days. Planets b and c have short periods, and are far from a j:j+1 period ratio. Hence both of these planets show weak TTVs, and straighter, but still slightly meandering, riverplots. The outer five planets are pairwise close to a series of j:j+1 resonances, showing strong transit-timing variations on the timescale of the TTV period of ≈ 490 days. The other prominent feature for the outer 4 planets is the slight zig-zag of transits due to chopping (shown in Figure 3). Table 3 shows the radius-ratios from Delrez et al. (2018a) alongside those from the present analysis. The precision of the measurements did not improve significantly, while the radius-ratios shifted by 1-2σ. Figure 8 shows the posterior probability distribution of impact parameters in units of the stellar radius, b 0 , derived from the photodynamical model. Figure 9 shows the probability distribution of stellar density. The density correlates with the impact parameters of each planet, reaching a tail of lower values for higher impact parameters of each planet. The tail of the density probability distribution has an approximately exponential scaling with the density below the peak, and cuts off as a normal distribution above. In table 5 we report the median and 68.3% confidence interval of the stellar density. The inferred density is both slightly larger and more precise than prior analyses (Delrez et al. 2018a), which we discuss below.
Combining the measured density with the measured orbital periods of the planets, we derive the semi-major 10 Using https://github.com/tpapp/MCMCDiagnostics.jl axis of each planet in units of the stellar radius, With the measured impact parameters, we compute the inclinations of the planets from (Winn 2010) where we have neglected the eccentricity in this formula due to the extremely small values of the eccentricities of the planets from the transit-timing analysis (cf Table 2). The resulting inclination posterior distribution is displayed in Figure 10. Although the inclination is derived from the impact parameters, which we constrain to be positive, in practice the photodynamical model cannot distinguish between inclinations of I and 180 − I (Fig. 10), and so we created a histogram of these two options with equal probability.

Mutual inclinations and stellar density
The outer four planets, e through h, have inclinations which are more precisely determined, and, remarkably, their peak probabilities are aligned very closely, to less than 0.1 • , save for the degeneracy of I vs. 180 − I. The inner three planets have poorer constraints upon their inclinations due to the larger uncertainty of their impact parameters (as seen in Figure 8). Yet, their inclination posteriors have significant overlap with the outer four planets.
As just mentioned, since each inclination may only be inferred relative to the center of the star, the derived distribution is reflected through 180 − I. However, if some of the planets orbited above and some below the plane of disk of the star, it would be very improbable for the outer four planets to show such a precise alignment. We conclude that it may be likely that all of the planets transit the same hemisphere of the star, and as shown in Luger et al. (2017a): the planets' 3D orbital inclinations are likely precisely aligned. This also implies that their longitudes of ascending node are likely aligned as well, and so in principle we can place a prior on the scatter of the mutual inclinations of the planets. We have re-run a photodynamic Markov chain with an inclination prior such that the planets' inclinations are drawn from a Gaussian about their mean value, with a standard deviation of σ θ which is allowed to freely vary in the chain. We find a very tightly aligned distribution of inclinations under this assumption, shown in Figure 11. We also find that very small values of σ θ are preferred, with σ θ = 0.041 •+0. The inclination prior also enables a more precise and symmetric estimate of the density of the star, ρ * /ρ =53.22±0.53. Why is this? Well, the inclination prior tightens the distribution of the impact parameters of planets b and c (as can be seen by comparing Figures  10 and 11). These inner two planets have deep and frequent transits and the sharpest ingress and egress, and hence they provide the tightest constraint upon the density of the star of all seven planets (Ducrot et al. 2020). Thus, given that the inclination prior tightens the distributions of inclinations of these two planets, the stellar density posterior is correspondingly tighter, and the low stellar density tail of the posterior is eliminated (see Figure 9). Despite this tighter constraint upon the stellar density, we decide to forego its use in computing the densities of the planets given the assumptions inherent in the inclination prior.
The coplanarity of the planets may be used to constrain the presence of a more distant, inclined planet given the scatter in their mutual inclinations induced by gravitational perturbations (Jontof-Hutter et al. 2018). Such an analysis should be carried out, but we leave this to future work.

PLANET DENSITIES AND MASS-RADIUS RELATION
With the completion of the transit-timing analysis and photodynamic analysis, we are now ready to revisit the mass-radius relation of the TRAPPIST-1 planets.  Table 5. Parameters derived from the photodynamic model. Top: Stellar density (in units of solar density), limb-darkening parameters (q1/q2) in Spitzer Channel 1 and 2, and stellar density in cgs units and limb-darkening parameters u1 and u2. Bottom: Planet-to-star radius ratio, Rp/R * ; transit depth, (Rp/R * ) 2 ; transit duration, T (from first to fourth contact); ingress/ egress duration, τ (from first to second contact or third to fourth contact); impact parameter in units of stellar radius, b/R * ; ratio of semi-major axis to stellar radius, a/R * ; and inclination I in degrees. To derive the masses of the planets, we draw planetto-star mass ratios from the posterior distribution of the transit-timing analysis ( §4), which we multiply by the 11 Note that "M" is being used in three ways here: spectral category . ƭ mass of the star drawn from a normal distribution with M * = 0.0898±0.0023M . We then draw the planet-tostar radius ratios and stellar density from the posterior distribution from the photodynamic analysis ( §5). With the same mass draw, we compute the stellar radius as which we multiply by each of the radius ratios drawn from the same sample to obtain the planet radii. We carry this out for a large number of samples to derive the probability distribution of the masses and radii of the entire posterior probability sample of the planets. The probability distribution for the masses and radii of the seven planets are shown in Figure 12. The maximum likelihood values and the posterior distributions (for 1-and 2-σ confidence) are both plotted in this figure. We postpone to §9 a detailed analysis of the densities and resulting constraints on the bulk compositions of the planets.
In addition to masses and radii, we also derive other planetary properties, given in Table 6. Each of the planets has a density intermediate between Mars (ρ ♂ = 3.9335 g/cm 3 = 0.713 ρ ⊕ ) and Earth (ρ ⊕ = 5.514 g/cm 3 ). The surface gravities span a range from 57% of Earth (planet h) to 110% of Earth (planet b).  Burgasser & Mamajek (2017) found an older age for the host star, 7.6±2.2 Gyr, which implies an inflated radius for the star compared with evolutionary models.
Our analysis differs slightly from our prior Spitzer analyses (Delrez et al. 2018a;Ducrot et al. 2020) in that we do not place a prior upon the quadratic limbdarkening coefficients of the TRAPPIST-1 host star. This is motivated by the fact that late M dwarf atmospheres are very complex to model and have yet to match observed spectra precisely (Allard et al. 2011(Allard et al. , 2012Juncher et al. 2017), and thus it is possible that limb-darkening predictions may not be reliable. We investigated using a higher-order quartic limb-darkening law, and found that this was disfavored by the Bayesian Information Criterion, and that the best-fit model differed negligibly in the model parameters. We also simulated more realistic limb-darkening models based on 3D stellar atmospheres (Claret 2018) and found that a quadratic law was sufficient to recover the correct model parameters with negligible systematic errors.
The TRAPPIST-1 system has the advantage that the planets sample different chords of the stellar disk (Figure 8 ; also see Delrez et al. 2018a), and given the large number of transiting planets, we are afforded multiple constraints upon the stellar limb-darkening parameters. Figure 13 shows our posterior constraints upon the limbdarkening parameters of the star based on our photodynamical model, which are reported in Table 5.
Based on the updated stellar density, we have updated the physical parameters of the star. We adopt the luminosity from Ducrot et al. (2020) and the mass from Mann et al. (2019) given the complete and careful analysis from both of those papers. With our updated constraint upon the density of the star, we re-derive the other parameters of the star, which are summarized in Table 7. In this table the stellar effective temperature was computed from the stellar luminosity and radius, with errors computed via Monte Carlo.

SEARCH FOR AN EIGHTH PLANET
With the detection of multiple transits of the six inner planets in TRAPPIST-1, and a single transit of planet h, a clue as to the orbital period of planet h was the series of GLRs found between adjacent triplets of planets (Papaloizou 2014). This relation was then used to predict candidate periods of planet h, based on different integer pairs for its commensurability with planets f and g, and a search through the prior data eliminated all but one possibility at 18.766 days. A subsequent observation of the TRAPPIST-1 system with the K2 spacecraft revealed four more transits of planet h occurring at precisely the period that was predicted (Luger et al. 2017b). The existence of the GLRs amongst the known seven planets has been used to forecast the possible existence of an eighth planet interior (Pletser & Basano 2017) and exterior (Kipping 2018) to the seven known transiting planets. There is yet to be a definitive detection of an eighth transiting planet based upon the currently available data (Ducrot et al. 2020).
It may be possible to detect an exterior eighth planet via transit-timing variations induced on the inner seven planets. Planet h should experience the strongest perturbations by an exterior eighth planet due to the fact that transit-timing variations are a very strong function of the proximity of planets to one another, and also to Figure 12. Mass-radius relation for the seven TRAPPIST-1 planets based on our transit-timing and photodynamic analysis. Each planet's posterior probability is colored by the equilibrium temperature (see colorbar), with the intensity proportional to probability, while the 1 and 2σ confidence levels from the Markov chain posterior are plotted with solid lines. Theoretical mass-radius relations are overplotted using the model in Dorn et al. (2016) for an Earth-like Fe/Mg=0.83 ratio with a core ( black dashed) and core-free ( red), and a range of cored models with Fe/Mg = 0.75±0.2 ( grey). U18 refers to Unterborn et al.
(see text). The solid black line was calculated for a 5% water composition, for irradiation low enough (i.e. for planets e, f, g and h) that water is condensed on the surface (assuming a surface pressure of 1 bar and a surface temperature of 300 K). The umber dashed and solid lines were calculated for a 0.01% and a 5% water composition, respectively, for irradiation high enough (i.e. for planets b, c and d) that water has fully evaporated in the atmosphere, with the U18 interior model (Turbet et al. 2020). The Earth, Venus and Mars are plotted as single points, also colored by their equilibrium temperatures. ƭ resonance. Table 8 shows predictions for the period of planet "i", P i , assuming a GLR configuration with planets g and h given by for a range of 1 ≤ p, q ≤ 3, which is the same range of integers for the GLRs amongst the inner seven planets. Interestingly these cases are all close to a j:j+1 period ratio with planet h, and thus should strongly perturb planet h due to forcing at this frequency.
We carried out a transit-timing search for an eighth planet by placing planets with mass ratios between 2×10 −6 −5×10 −5 at these four trial orbital periods in a coplanar configuration with the other seven planets drawn from a random orbital phase at the initial time, and with eccentricity vector elements drawn from a random normal of width 0.005. We placed a Gaussian prior on the eccentricity vector elements of the eighth planet with a standard deviation of 0.14 to avoid unstable configurations. We then optimized the likelihood with the We then carried out a search for evidence of perturbations by planet i by determining if the optimized likelihood of the transiting planets was improved by adding an eighth planet to the transit timing model, using the Bayesian Information Criterion (BIC) to penalize the additional degrees of freedom of the eight-planet model (Wit et al. 2012). We searched for a change to BIC for the eight-planet model over the seven-planet model with a difference of better than 5 log N trans = 30.5. Given that the inner seven planets show orbital eccentricities with values 0.01, we only considered an eighth planet candidate plausible if it shows an eccentricity less than this cutoff.
In all 11,200 trial optimization cases we found that only two of the eight-planet models did exceed the BIC criterion, but both significantly exceed an eccentricity of 0.01. Figure 14 shows the change in BIC versus orbital period and mass for planet "i", assuming a mass of the star of M * = 0.09M . These two cases with ∆BIC>0 do not appear to be plausible planet candidates: they only just exceed the BIC criterion; they both have large eccentricities; and they are not in close proximity to a GLR with planets g and h (even though the initial parameters of the optimization were started near a GLR). We also carried out a search for an eighth planet interior to planet b, and found even smaller improvements in the log likelihood than in the exterior case.
We have not carried out an exhaustive search for eightplanet models at other orbital periods due to the significant volume of parameter space to search. However, it is still possible that an exterior eighth planet is perturbing planet h, and may modify its transit times to a point that affects the posterior masses we infer from our seven planet model. In principle one could include the effect of an eighth planet on the mass inference by adding it to the Markov chain modeling; in practice this would be a challenging model to sample due to the multi-modal nature of the parameter space. We defer such analysis to future work.

INTERIOR COMPOSITIONS
In this section we present theoretical interpretation of the planets' interior properties based upon the massradius relation we inferred in §6. As there is significant degeneracy in the possible interior compositions, we present a menu of different possibilities in §9.2. However, we start with an approach which is less dependent upon the assumption of interior composition, which we term the "normalized density."

Initial analysis of planet densities across the system
The probability distribution for the masses and radii of the seven planets are shown in Figure 12 alongside several theoretical mass-radius relationships added for comparison. We have added three rocky mass-radius relationships with different molar bulk Fe/Mg compositions: (1) 2007) for silicates. We have also added the theoretical massradius relationships for planets endowed with a water layer, both for planets which are irradiated less ( black line; water) and more ( umber lines; steam) than the runaway greenhouse irradiation threshold (Turbet et al. 2020).
The comparison of measured masses and radii with theoretical mass-radius relationships reveals several striking results. First, all seven TRAPPIST-1 planets appear to be consistent with a line of interior isocomposition at the 1σ level. There are multiple theoretical mass-radius curves that overlap with all seven planets' mass-radius probability distributions (Fig. 12), which may be a good indication that the composition varies little from planet to planet. Secondly, all of the TRAPPIST-1 planets have lower uncompressed densities than Solar System terrestrial planets. This likely means that the TRAPPIST-1 planets either have a lighter interior (e.g. lower iron content) or are enriched with volatiles (e.g. water).
We next searched for variations of density across the planets. For this, we took each planetary density calculated from 10 4 samples and divided by the density of the closest pair of mass and radius of a fully differentiated 20 wt% iron, 80 wt% silicate (MgSiO 3 ) interior planet, which is less iron rich than Earth. A planet with a normalized density of 1 has exactly the same density as a 20 wt% iron, 80 wt% MgSiO 3 planet, while a normalized density >1 (or <1, resp.) is denser (or lighter, resp.) than a 20 wt% iron, 80 wt% MgSiO 3 planet. Fig. 15 shows the resulting histograms of the posterior probability of the normalized TRAPPIST-1 planet densities. We then plot in Fig. 16  . Probability density function of the normalized density of all seven planets in the system. ƭ orbital periods of the planets. The normalized planet density appears very uniform across the seven planets, with perhaps a slight decrease with the increase of the orbital period (or the distance to the host star). We fit a line to the normalized density, y, versus orbital period, P , for 10 4 posterior samples, and found a relation of y = (1.042±0.034)−(0.0043±0.0036)P where the coefficients are the 68.3% confidence interval. There is only weak evidence for a declining trend of normalized density with orbital period: 88% of the fits to the 10 4 posterior samples have slopes with a negative value, while 12% of the slopes fit have a positive value. If in the future more precise data strengthen this trend, then this may indicate that either (i) the outer planets are depleted in heavy elements (e.g. iron) compared to the inner ones, or (ii) the outer planets are enriched in volatiles (e.g. water) compared to the inner ones. However, based on the current data we suggest that the planets' compositions could be rather uniform in nature. The interpretation of these observations in terms of internal compositions is discussed in more detail next.

Range of possible interior compositions and volatile contents
In this subsection, we discuss a range of possible compositions of the planets based on their measured densities, starting with a volatile-poor model in which the densities are fit by varying the core-mass fraction ( §9.2.1), and followed by an analysis in which the solid planets are taken to have an Earth-like composition, to which is added a water fraction needed to create the observed densities ( §9.2.2). Alternatively, the planets might be explained with an enhanced oxygen content

Core Mass Fraction
If we assume that the planets' atmospheres contribute a negligible amount to their total radius, and that the planets are fully differentiated, composed of rocky mantles (MgSiO 3 ) and iron cores only, then the densities may be used to constrain the portion of the planets' mass which is contained within their cores.
We evaluated the core mass fractions (CMF) of the TRAPPIST-1 planets as follows. For each mass/radius pair in our posterior distribution we have estimated the core-mass fraction by linearly interpolating between precalculated mass-radius relationships with our employed interior model. We arbitrarily set each mass/radius pair lighter than a pure silicate (MgSiO 3 ) planet to a CMF of 0. Alternatively, we repeated the same procedure but discarding all CMF values lower or equal to 0. However, we found that the estimate of the core mass fraction is only marginally changed (and only for planets g and h).
Our core mass fraction estimates are provided in Fig 17 and Table 9. Estimates range from 16.1 +3.5 mean of all planets of 21±4 wt% (taking into account the correlations between the planets' core-mass fractions). There may be a slight trend of the inferred CMF, which decreases with increasing orbital period. The trend is qualitatively similar to that reported on the normalized density (see Fig. 16), with similarly weak support: only 88% of the linear fits to the 10 4 posterior CMF values have a slope with orbital period which is negative, whilst 12% are positive.

Surface water content
The observed (weak) variation in the planet densities among all seven planets may instead be due to their differing volatile (e.g. water) inventories.
If we assume a rocky Earth-like interior (CMF=32.5%, fully-differentiated) and only allow an additional condensed 12 water layer to contribute to the total radius, we can estimate the water mass fractions of the seven planets (b: 2.8 +2.1 −1.9 wt%, c: 2.3 +1.8 −1.7 wt%, d: 4.4 +2.0 −1.5 wt%, e: 2.9 +1.7 −1.5 wt%, f: 4.5 +1.8 −1.2 wt%, g: 6.4 +2.0 −1.6 wt%, h: 5.5 +4.5 −3.1 wt%). The lower densities of planets d, f, g, and h can allow for two to three times as much water than for planets b, c, and e. For this simple estimate we assumed a 12 Note that it is likely unwarranted to assume condensed surface water for the inner three planets given their location within the runaway greenhouse zone (Turbet et al. 2020).
water layer with a surface temperature of 300 K at 1 bar. Actual surface conditions and assumed iron content can, however, lead to much larger differences in the estimated water budgets between the inner three and outer four planets. This stems from the fact that the inner three planets are more irradiated than the runaway greenhouse irradiation limit (Kopparapu et al. 2013;Wolf 2017;Turbet et al. 2018) for which all water is vaporized, forming a thick H 2 O-dominated steam atmosphere. Taking into account the expectation that water should be vaporized for the three inner TRAPPIST-1 planets (Turbet et al. 2019(Turbet et al. , 2020, their water mass fractions drop drastically to less than 0.01 wt%, i.e. more than several times lower than the water ocean mass fraction of the Earth. Figure 18 shows the expected water mass fractions for each of the TRAPPIST-1 planets, and for four distinct interior compositions (18, 25, 32.5 and 50 wt% iron content). It shows that the same qualitative trend of water versus orbital period is relatively robust across a large range of assumptions on the interior composition thanks to the transition from runaway greenhouse for planets bd to surface liquid water for planets e-h.
Higher estimated water budgets for the outer three or four planets could be a clue that they formed beyond the water condensation line at ≈0.025 AU (Unterborn et al. 2018). This could also be due to the significant differences in water loss (through atmospheric escape) arising from variations of irradiation and gravity among the TRAPPIST-1 planets (Lissauer 2007;Bolmont et al. 2017;Bourrier et al. 2017). However, again, we caution again that trends in the planetary volatile content are only weakly supported by the current data.

Core-free planets
Given that the data may be consistent with an isocomposition mass-radius relation, we next consider another intriguing possibility: that the interiors of the planets are fully oxidized. For example, if, instead of forming a core, all of the iron is oxidized and remains in the mantle, the size of a planet may increase by a few percent (Elkins-Tanton & Seager 2008). This turns out to be about the amount of radius inflation necessary to match the TRAPPIST-1 planets when compared with our Solar system planets.
If we assume that the refractory ratios match a Solar composition, and that all seven planets lack an atmosphere, then it turns out that all seven planets are consistent with a core-free, oxidized composition ( Fig.  12; Table 9. Core mass fractions and water mass fractions inferred for each TRAPPIST-1 planet, as well as the weighted means. Hence, although this hypothesis efficiently explains the TRAPPIST-1 data, it remains to be seen whether a geochemical model can be constructed which results in high oxidation of iron throughout the processes of planet formation and evolution (Kite et al. 2020).

DISCUSSION
Here we discuss some of the implications of the results in the foregoing sections.

Timing uncertainties
As reported in §3, the transit timing measurements we have made show an excess of outliers with respect to the measurement uncertainties of each transit. We were un-able to identify a culprit (or culprits) for these discrepancies, but wish to speculate on what may be the origin of these outliers. The cumulative distribution of these outliers (Fig. 1) indicates that about 10% of transits are affected at some level. It is also interesting to note that the core of the distribution has a slightly smaller width of about 87% of the measurement errors, indicating that for about 90% of the transits, the uncertainties may be overestimated. This may be a consequence of inflating the uncertainties to account for correlated noise rather than modeling the data with, for example, a Gaussian process; further re-analysis of the data will be needed to check this hypothesis.
Could the timing outliers be due to stellar flares? In Vida et al. (2017) and Ducrot et al. (2020), the frequency distribution of stellar flares is shown to be rising towards smaller flare energies. This could mean that the more frequent, but lower energy flares, occur at a level that is swamped by the photon noise, and thus not visible to an observer. We used the spectrum and energy calibration of Spitzer flares measured by Ducrot et al. (2020) to extrapolate the frequency of lower energy flares (which are not detected in Spitzer due to photon noise). As an example, for planet h the transit time can be affected by a flare which occurs at ingress or egress (duration 2τ ≈ 10 min). We estimate that a flare of energy 10 31 erg could cause a 1.5σ timing outlier if it occurs during ingress or egress. This has a probability of only ≈ 0.3% to occur during the 10 minutes of ingress or egress, and thus cannot be responsible for 10% of outliers for planet h. We carried out a similar estimate for the other planets, and we conclude that low-level flaring activity cannot be the cause of the timing outliers.
Other possible causes of the timing outliers are correlated stellar variability, star spot crossings, or instrumental systematics. We don't yet have an estimate of the magnitudes of these effects, and so cannot reach a conclusion about where the origin of the timing outliers lies.

Possible systematic errors
In this section we consider possible factors which might affect our inference of the densities of the planets. Simulated planetary densities predict core-massfractions which are similar to Earth, with a very small scatter (Scora et al. 2020). Hence, the fact that the TRAPPIST-1 planets have inferred planetary densities which are less than this could be due to systematic uncertainties which are not captured by our modeling.
The transit depths determine the planet-to-star radius ratios, but these measurements are affected by the nonuniform surface brightness of the star. Fortunately the multiple impact parameters of the planets yield a constraint upon the infrared limb-darkening, which is fairly weak compared with optical bands. However, star-spots can also affect the inferred transit depths (Czesla et al. 2009;Oshagh et al. 2013Oshagh et al. , 2014McCullough et al. 2014;Rackham et al. 2018;Kipping 2012). If spots are present on an active latitude which is not on the same hemisphere as the planetary transit chords, this can cause all of the planet radii to be mis-inferred by a similar factor.
TRAPPIST-1 may have complex surface inhomogeneities, including regions brighter or darker than the mean photosphere Zhang et al. 2018;Wakeford et al. 2019). It is possible that bright or dark regions could bias the apparent transit depths towards larger or smaller measurements, depending on which type of inhomogeneity dominates. Time-variable contamination should average out with many observations, while time-steady inhomogeneity will not, such as active latitudes, polar spots, or even hemispheric asymmetry (Yadav et al. 2015;Brown et al. 2020). We modeled the transit-transmission in the K2, SPECULOOS, LT, nearinfrared, and Spitzer bands from Ducrot et al. (2020) for all seven planets using the contamination formula from Rackham et al. (2018) with a time-steady, threetemperature model with the temperatures of the three components ranging from 2000-2980 K and the covering fraction varying from 0 to 1. The mean effective temperature is constrained by our stellar model parameters (Table 7). We assumed that all seven planets transit the region with the larger covering fraction, and that their transit depths are achromatic. We ran a Markov chain fit to the transmission spectra, interpolating the fluxes in the bands between the effective temperature grid points which were spaced by 20 K; we find that the posterior parameters with maximum likelihood are temperatures of (2980, 2331, 2071) K with covering fractions of (0.8, 82.1, 17.1)%. We then computed the expected impact on the transit depths in the two IRAC channels. The constraints are tight: we find that the observed radii should only change by a factor of 1.0072±0.0097 in Channel 1 and 1.0071±0.0108 in Channel 2 (these are the ratios of the observed radii to the actual radii). These factors are consistent with unity at better than 1σ, and have uncertainties which are comparable to or smaller than the uncertainties on the absolute planetary radii. We conclude that this form of self-contamination does not greatly influence our results, but should lead to caution in the interpretation. This constraint is much stronger than the analysis of .
Our mass precisions are predicated on a complete model of the dynamics of the system. We neglect tides and general relativity, which are too small in amplitude to affect our results at the current survey duration and timing precision (Bolmont et al. 2020). Should an eighth planet be lurking at longer orbital periods, which has yet to reveal itself via significant transit-timing variations or transits, this may modify our timing solution and shift the masses slightly. In our timing search for an additional planet, however, we found that such a planet might only cause shifts at the ≈1σ level. This possibility begs for caution in interpreting the potential variation of iron fraction with orbital period: should an eighth planet be present beyond planet h, its timing impact would likely affect the masses of the exterior planets more significantly than the interior planets. Drawing stronger conclusions about the variation of planet iron/core mass fractions will likely require longer-term monitoring, especially of planet h, and/or higher precision timing measurements such as are expected with JWST, to place tighter constraints on an eighth planet.

Planet masses and radii in context
In our current analysis of the transit-timing data for TRAPPIST-1, we have found larger mass ratios for all planets save planet e compared with our most recent analysis in Grimm et al. (2018). Even though most of the planets have shifted by 1σ or more, this does not indicate that the prior analysis was in error. In fact, the masses of all of the planets are strongly correlated, and thus when one planet shifts in the transit-timing solution, they all shift. With the more extensive dataset analyzed here, we provide a better constraint over the transit-timing timescale, and can also better account for outliers thanks to some redundancy in our measurements. Given the high precision of the Spitzer timing measurements, we expect that our current analysis may remain the most reliable constraint upon the masses of the planets until the transit times can be measured with JWST.
In Figure 19 we compare our measurements for the seven TRAPPIST-1 planets with our Solar System planets and with exoplanets with radii <1.7R ⊕ and masses measured to >5σ retrieved from the NexSci database on 26 Feb 2020 (Akeson et al. 2013;Christiansen 2018), as well as planet parameters reported in Dai et al. (2019) and Kepler-93b from Dressing et al. (2015). 13 The uncertainties on the other planets' masses are the best available to date from radial-velocity measurements, and yet they are much larger than the uncertainties for the TRAPPIST-1 planets, whether considered in a relative or absolute sense. The larger uncertainties of the RV planets makes the core-mass fractions difficult to constrain for these more massive planets -core-free and cored models are consistent with most of these planets' parameters at the 1σ level (Fig. 19). Nevertheless, it is notable that the rocky planets for which we currently have data seem to be similar in composition to the Earth (Dressing et al. 2015); however, the actual range of bulk rock compositions of rocky exoplanets relative to their host stars is currently debated. This also appears consistent with the observation that the evaporation valley requires rocky planets and their gaseous brethren to have a composition which is a mix of silicates and iron (Owen & Wu 2017).

Comparison with radial velocities
Given the measurements of the masses we have made with transit-timing, this brings up the question: what radial-velocity uncertainties would be required to make mass measurements of similar precision?
The precision of the mass measurements may be placed in context by comparing with current radialvelocity capabilities. The predicted semi-amplitudes for the seven planets are given in Table 10. The predicted radial-velocity variation of the star induced by the TRAPPIST-1 planets is plotted in Figure 20, also based upon our mass measurements from transit timing. The sums of the semi-amplitudes of the planets equals ≈12.7 m/sec, which is close to the peak amplitude when the planets are all orbiting on the same side of the star (near 218 days in the plotted figure). How does this compare with current RV measurements?
Recently Hirano et al. (2020) were able to make high precision measurements of the radial velocity (RV) of the TRAPPIST-1 host star, achieving a constraint on the linear variation of the star to a precision of 2.5 m/sec which they ascribe to to stellar variability. To compare this with our transit-timing results, the semi-amplitude precision which would be needed to achieve the same mass error bars that we have achieved with transittiming ranges from 2.4-19 cm/sec, up to 100 times more precise than the radial-velocity measurements. Future observations may be able to achieve higher precision radial velocity measurements of TRAPPIST-1, but will continue to contend with stellar variability (Klein & Donati 2019).
Were these planets orbiting a Sun-like star, the semiamplitude RV error would need to be even smaller to achieve the same mass precision we have achieved with transit timing. Table 10 lists what semi-amplitudes precisions would be required if each one of these planets was placed around a Solar twin at one astronomical unit. The required precision ranges from 1-6 millimeters/second. This is nearly two orders of magnitude more precise than the highest precision RV measurements for short-period exoplanets reported to date, such as Tau Ceti g, which has a reported RV semi-amplitude precision of 11 cm/sec (Feng et al. 2017). We conclude that the mass precisions of Earth-sized, Earth-insolation planets based on radial velocity must be improved by two orders of magnitude to match our TTV precision for the TRAPPIST-1 system.

Planetary dynamics
In this section we discuss some of the dynamical aspects of the planetary system: the eccentricities, the longitudes of periastron, and the GLR angles.

Eccentricities
The posterior distribution of the initial eccentricities of the planets is shown in Figure 21. In prior analyses of the transit-timing variations of the TRAPPIST-1 system we found that the inner two planets, b and c, had significant eccentricities (Grimm et al. 2018). In contrast, with the current analysis we find that the eccentricity probability distributions of these two planets are . Planets with smaller mass uncertainty are shown in a darker red color. Also plotted is a mass-radius relation with a core-mass fraction compatible with Earth (blue), and a core-free model in which the refractory elements retain the Solar abundance ratios (purple  Table 10. RV semi-amplitudes, Kp, for the TRAPPIST-1 planets predicted from our measured masses. Equivalent RV precision required to measure the masses to the same precision as measured with TTVs around TRAPPIST-1. Also, equivalent RV precision required if each planet were placed around a Solar twin at one astronomical unit.  Figure 21. Probability distribution of the eccentricities of the planets at the initial time based upon the transit-timing model. ƭ significant near zero eccentricity. This is consistent with N-body models which include tidal damping of the orbits, which predict that the planets b and c should have low eccentricities, 10 −3 (Luger et al. 2017b;Turbet et al. 2018). The other planets are all consistent with the predictions of the tidal evolution model (Luger et al. 2017a). Figure 22 shows the posterior probability distribution for the eccentricity vectors of each planet. The only two planets consistent with zero eccentricity at 1σ confidence are planets b and c (blue and orange contours). The other five planets have non-zero eccentricities. Now, the eccentricity vectors plotted in Figure 22 show the values at the initial time. However, over time, the eccentricity vector of each planet can be decomposed into two components: the mean eccentricity vector (over some timescale) and the variable component (which is time variable, with multiple oscillation timescales driven by the mutual planetary perturbations). Figure 23 shows the eccentricity over a single oscillation for all seven planets. The outer five planets are close to first-order resonances with adjacent planets, and the super-period for each of these planets is close to P T T V ≈490 days thanks to the near-GLR commensurability for all triplets of planets. This leads to a nearly circular oscillation over this timescale due to circulation of the first-order resonances driving oscillations in the eccentricity vectors of each of these planets. The inner two planets are close to second and third order resonances with adjacent planets (b and c are close to 8:5, which is third order, while c and d are close to 5:3, which is second order). Since the strength of these in- . Posterior probability distribution for the eccentricity vectors at the initial time for each of the planets. Contours are 1 and 2σ confidence limits. The maximum likelihood parameters are shown as solid points. ƭ teractions scales as a higher power of eccentricity, these planets show much smaller variation in the time-variable components of their eccentricity vectors. Since planets b and c are close to a third order resonance, their eccentricity vectors show a three-fold symmetry. On longer timescales these patterns precess, filling a circular pattern over time. The time-variable eccentricity vector patterns are very similar over the range of posterior values, indicating that it is primarily this component which is constrained by the transit timing variations of the planets.
The total eccentricity vectors show a wider range of behavior, thanks to a wider variation of the mean eccentricity, as shown in Figure 24. It is clear from this figure that each planet executes an eccentricity-vector oscillation about a mean value (which was subtracted off for figure 23). Unfortunately the mean eccentricity is less constrained by the transit-timing variations (Linial et al. 2018), and so there is a much wider range of eccentricity vectors which is allowed which manifests as strong correlations amongst the eccentricity vectors of pairs of planets ( Figure 29).

Laplace angles
A remarkable property of the TRAPPIST-1 system is the near-commensurability of adjacent triplets of planets (Luger et al. 2017a), akin to Laplace resonances, with GLR angles given by where λ i is the mean longitude of the ith planet, and p and q are small integers. In the case of an isolated triplet of planets, a stable configuration takes on φ = 180 • , but when planets are captured into a series of GLR commensurabilities, their mutual torques displace the stable configuration (Delisle 2017). Long-term dynamical simulations show that these GLR angles can take on stable values for extended durations, and sometimes can quickly jump in value, flipping symmetrically about 180 degrees (Mah 2018;Brasser et al. 2019), resulting in two possible angles for each triplet of stars, φ and 360 − φ. Based on the prior measured planet-to-star mass ratios, Mah (2018) predicted the value of the three-body resonance angles resulting from the values at the end of the simulation.
In Figure 25 we show the GLR angles for the following triples: Differences between the predicted and observed angles agree within 0.5-10 degrees, where the predicted values for φ are taken from Mah (2018), but allowing φ bcd and φ cde to be flipped about 180 degrees. It is possible with the updated mass-ratios from our analysis that the predictions will be more accurate, which awaits further simulation.

Long-term stability
Prior studies of the TRAPPIST-1 system by Tamayo et al. (2017) found long-lived configurations for systems which had formed via migration. Quarles et al. (2017) examined the stability of the TRAPPIST-1 system, refining the large uncertainties from prior measurement (Gillon et al. 2017) to further constrain the masses of the system. Given the much tighter constraints we have placed upon the masses of the planets and the orbital eccentricities, here we re-examine the long-term stability of our posterior distribution.
We have used the GPU N-body integrator GENGA (Grimm & Stadel 2014) to carry out long-term simulations of a set of 10 4 posterior samples from the timing analysis. These simulations were carried out for 10 7 years, which corresponds to 2.4 billion orbital periods of planet b, and 195 million orbital periods of planet h. We used a time step of 0.06 days, which gives a total number of 6.1·10 10 integration steps. We find that 100% of these posterior samples are stable over this entire timescale.
To check the stability of the samples, we analyzed the evolution of the semi-major axis, a, and eccentricity, e,  Table 11 gives the average over all samples, and the maximum differences between the first and the last Myr. In all cases the variations are small, ≤0.002. These results suggest that the simulations could be stable even on a much longer time scale. In addition, we have carried out long-term (50 Myr) integrations with tidal damping for two posterior samples, one with low and one with high values of the eccentricity of planet b. Using a range of values of tidal damping (from 1/10 to 100 times Earth's), we find in all cases that the system remained stable (using Posidonius; Bolmont et al. 2020). More interesting is the evolution of the five GLR resonant angles, shown in Figure 26. In order to describe the evolution of the GLR angles, we define three categories: • Category I: remaining in GLR for 10 Myr, with a maximum difference to the initial value of less than 45 • • Category II: remaining in GLR for 10 Myr, with a maximum difference to the initial value of more than 45 • . In this category, the GLR angles can jump between different states.
• Category III: not remaining in GLR for 10Myr.   Table 11. Evolution of the semi-major axes, a, and eccentricities, e, from 10 4 samples over 10Myr. For each sample and planet, the difference of the average of a and e over the first and last Myr are compute as ∆ā and ∆ē; we report the maximum over all samples. These numbers show that all samples remain stable over 10Myr.
The threshold of 45 • is chosen arbitrarily, but is found to be practical to distinguish simulations where the GLR angles jump between different states (Category II), or remain in the same state (Category I). Figure 26 shows the three different categories in different colors, as well as a histogram of all 10,000 simulations over 10 Myr for all five GLR angles. The exact number of simulations in the three categories are given in With our transit timing model we can forecast the probabilities of future transit times, and hence better help to plan transit observations with JWST. This is important for both optimizing the efficient use of the telescope, and for determining when transits might overlap (i.e. two or more planets crossing the face of the star at the same time). This is especially important for transit transmission spectroscopy as the signal will be small, and hence many transits may need to be observed. With observation of initial transits with JWST the ephemerides can be refined/updated; however, our current forecasts provide the starting point for planning JWST observations. Table 15 gives our forecast for upcoming times of transit through October, 2023 to cover the first 2 years of the JWST mission (six months after the end of Cycle 1, given the present launch date of October 2021).

Simulated JWST TTV analysis
Based on the measured properties of TRAPPIST-1, we have carried out a preliminary analysis forecasting future transit observations with the James Webb Space Telescope. Already there are several JWST Guaranteed Time Observation (GTO) programs which plan to observe the TRAPPIST-1 planetary system, primarily for the purposes of spectroscopic characterization (GTO programs 1177(GTO programs , 1201(GTO programs , 1279(GTO programs and 1331. 14 It is very likely that additional observations will be scheduled during guest observing time throughout the duration of the JWST mission as the detection of spectroscopic features requires observations of multiple transits for each of the planets (Morley et al. 2017;Barstow & Irwin 2016;Lustig-Yaeger et al. 2019;Fauchez et al. 2020). An effort to coordinate these observations amongst the exoplanet and planetary science communities is underway via the TRAPPIST-1 JWST Community Initiative (Gillon et al. 2020). All to say, long-term studies of 14 For specifications of these programs, see https://www.stsci.edu/jwst/observing-programs/ approved-gto-programs.
TRAPPIST-1 for spectroscopy will also yield transit times for each transit observed, enabling a transit-timing analysis of the results.
To estimate the maximum possible precision of observations with JWST, we have simulated a five-year program in which every transit of every planet in TRAPPIST-1 is observed with NIRSPEC (Birkmann et al. 2016). The NIRSPEC instrument was chosen as its prism mode covers 0.5-5 microns, covering the peak of the SED of the star, and thus maximizing the number of photons detected, which is about two orders of magnitude per transit greater than collected by Spitzer. Although such a complete set of transits will be impossible to collect (thanks to limits due to scheduling and time-allocation), this analysis yields an estimate of the most optimistic results we might expect from JWST.
We have carried out simulations of transits of each of the planets as observed by NIRSPEC. We include realistic estimates of photon noise and correlated stellar variability based on the pattern of variations detected with the Spitzer Space Telescope, using a Gaussian Process model created with celerite (Foreman-Mackey et al. 2017). We do not include instrumental systematics under the assumption that over the timescales of ingress/egress, which are what limit the timing precision, that the noise contribution will be dominated by photon noise and stellar variations. From these simulations, we found that the posterior timing precision ranges from 0.6-1.7 second per transit, much more precise than the measurements reported in the present paper.
Next, we created a simulated set of transit-timing observations at the two windows each year when the TRAPPIST-1 system is observable with JWST ( Figure  27). For each transit time, we drew the time from the distribution of uncertainties from the posteriors of the simulated transit data.
Finally, we utilized our code for transit-timing analysis to optimize a plane-parallel model with seven planets. At the maximum likelihood of the fit, we computed the Hessian to estimate the uncertainties on the model parameters. Figure 27 shows the simulated transit-timing observations with JWST. This includes about 600 transits observed with the telescope (again, the maximum possible over the nominal 5-year JWST mission). Figure 28 shows the results of the mass measurements in the simulations. We find that the masses can be recovered to better than 0.02% for planets d-h, and to 0.1% for planets b and c.
Of course, it will be impossible to arrange such a large number of transit observations of this system. But, even if the number of observations is an order of magni-tude smaller, we expect that the signal-to-noise should scale with the square root of the number of measurements made, and thus the outer planets will still have mass measurements precise to the order of a part-perthousand.

Stellar parameters
The stellar density we derive using the photodynamic model, ρ * = 53.17 +0.72 −1.18 ρ , is in 1σ agreement with prior analyses. Most recently, Delrez et al. (2018a) found a density of ρ * = (52.3±2.2)ρ , twice as uncertain as our analysis. Our approach yields a density of superior precision due to several factors. The transit times in the Spitzer data are constrained by all of the measured transits in the photodynamic model so that fewer degrees of freedom are needed to fit the times (37 free parameters in the N-body model versus 447 transit times fit to each transit).
The stellar mass we take from the analysis by Mann et al. (2019), M * = 0.0898±0.0023M . 15 This mass has a precision of 2.6%, which limits the mass precision for several of the planets. We are at the point that to improve the mass measurements of the planets we will need to improve the measurements of the star.
We used the luminosity estimate from Ducrot et al. (2020), which is slightly lower than that estimated by Gonzales et al. (2019) due to a difference in the measured bolometric flux. We are consistent with Gonzales et al. (2019) for the reported value of R 2 T 4 ef f at 1σ, while our T ef f is more precise (28 K vs 42 K), R is 2.5 times more precise, and our log g is more precise by an order of magnitude.

CONCLUSIONS
The Spitzer discovery of seven transiting planets orbiting the TRAPPIST-1 star by Gillon et al. (2017) promised the determination of the interior compositions of these planets via dynamical analysis. We have now analyzed the complete set of transit time measurements of the TRAPPIST-1 planets from Spitzer, augmented by additional transits from the ground, K2, and HST. Our primary conclusions are: 1. We have measured the masses, radii and densities to high fractional precision, 1-8%, based on an N-body model and a photodynamical model with seven planets. This improves upon RV current precision by up to two orders of magnitude.
2. The pattern of masses and radii may be consistent with a uniform planetary composition for all seven 15 https://github.com/awmann/M_-M_K-planets which have lower uncompressed densities than the Earth, Mars or Venus, with weaker evidence for a declining normalized density with orbital period ( 88% confidence). The planet properties may either be consistent with a core mass fraction of 21±4 wt%, or an Earth-like core and mantle with a surface water content which varies from <0.001% for the inner three planets to ≈5% for the outer four, or core-free planets with highly oxidized iron in the mantle which elevates the interior light element content. These are not unique explanations.
3. The planets appear to be dynamically-cold, with eccentricities less than ≈1%, and inclinations which may be coplanar to a few hundredths of a degree. 4. The system is stable on long timescales, and shows a pattern of generalized Laplace resonances with angles which match predictions from migration simulations of Mah (2018).
5. We provide a forecast of the future times of transit for the planets (Table 15) to help in planning observations with JWST, which may yield more precise constraints upon the planets' masses.

6.
We have yet to find strong evidence for an eighth planet.
Based upon these properties, we next speculate on some possible scenarios for the formation and evolution of the system. 11.1. Expectations for the compositions of the TRAPPIST-1 planets from formation scenarios As mentioned, our analysis suggests that the TRAPPIST-1 planets have somewhat lower uncompressed bulk densities than Earth (see Table 6 and Fig.  12). It is possible that these lower densities result from a deficit of high-density material (e.g., less iron) relative to Earth, or an excess of low-density material (e.g., having more water), or both; in this section we speculate about formation scenarios which may be consistent with these planets' bulk densities.
In general, planets which formed within the same proto-planetary disk are expected to have similar budgets in relative refractory elements (Bond et al. 2010;Elser et al. 2012) but can have very different volatile element budgets (Öberg & Bergin 2016). Similar relative refractory elements (Fe, Mg, Si) implies similar core mass fractions for all seven planets, assuming full differentiation. As suggested by Dorn et al. (2018), the refractory composition may best be described by studying  T1b  T1c  T1d  T1e T1f T1g T1h Figure 28. Simulated planet massses based on 5 years of JWST observations of every TRAPPIST-1 transit with NIR-SPEC. The recovered mass (Mout) minus the input mass versus the input mass (Min). The masses relative to the star can be recovered to better than 0.1% precision. ƭ the densest planet of the system, planet c with 22-31% CMF. Thus, with this assumption, all of the planets may likely have a 22-31% CMF but different light element mass fractions (that may increase slightly with orbital period, Fig. 19).
Is an overall CMF of 22-31% realistic for terrestrial planet interiors? This range of CMF implies lower Fe/Mg and Fe/Si values compared to Earth (and the Sun). Elemental abundances of rocky interiors are expected to be reflected in the photospheric abundance of the host star as argued by Unterborn et al. (2018) and Dorn et al. (2018). Unfortunately, measuring the photospheric abundances of this cool and active host star remains very challenging. However, Unterborn et al.
(2018) estimated the stellar Fe/Mg number ratio to be 0.75±0.2 by analysing Sun-like stars of similar metallicity to TRAPPIST-1, which may be slightly lower than the Solar value. The corresponding mass-radius curve for a rocky interior of this range of Fe/Mg value is plotted in Figure 12 (brown curve and shaded region). It overlaps well with the densest planets c and b. This means that the expected range of stellar abundances supports a possible overall CMF value of 22-31%, assuming full differentiation.
Could there be a variation of Fe/Mg ratios among the planets? Rocky planet accretion should preserve the integrated iron/rock ratio. Consider a population of planetary embryos and planetesimals that accrete into a system of rocky planets. Giant collisions between growing planetary embryos can change the iron/rock ratios of individual objects by preferentially stripping the outer, rock-dominated layers from differentiated embryos (e.g. Benz et al. 1988;Marcus et al. 2010;Asphaug & Reufer 2014). But from a system-wide perspective, it is a zero-sum game unless rock or iron is preferentially lost from all of the planets. Rock is the major component of loosely-bound impact debris and more likely to be lost either by differential aerodynamic drag (Weidenschilling 1977) or solar wind drag (Spalding & Adams 2020), and so the integrated iron/rock ratio should only increase. Hypothetical variations in Fe/Mg can otherwise be caused if large portions of planetary building blocks condense at different high temperatures (>1200 K). During planet formation, such temperatures are only reached in a tiny region very close to the ultracool dwarf star. Consequently, both Unterborn et al. (2018) and Dorn et al. (2018) have assumed that all seven planets have similar refractory element ratios (i.e., Fe/Si, Fe/Mg). Whether rocky planets can have a wider compositional distribution than that of stars remains to be seen.
Alternatively, the lower measured bulk densities of the TRAPPIST-1 planets relative to Earth-like composition might be explained by core-free interiors (Elkins-Tanton & Seager 2008) in which the oxygen content is high enough such that all iron is oxidized. If the refractory elements (Mg,Fe,Si) follow Solar abundances, a fully oxidized interior would contain about 38.2 wt% of oxygen, which lies between the value for Earth (29.7 wt%) and CI chondrites (45.9 wt%). Such an interior scenario can easily describe the observed bulk densities (purple line in Fig. 12). And this may bolster the long-range migration scenario in which the planets formed in a highly oxidizing environment which enabled the iron to remain in the mantle even after migration. Based on the elemental composition, these models have an oxygen fugacity of ∆IW = −0.91, 16 which is more oxidized than Earth or even Mars, but is comparable to the oxidation state of small bodies, both in our solar system and accreted by white dwarfs (Doyle et al. 2019). 16 Oxygen fugacity is stated relative to the Iron-Wüstite equilibrium reaction Fe+0.5O 2 =FeO (Wüstite) such that However, the evidence for a core-free planet rests on knowing the refractory abundances of the TRAPPIST-1 host star, which have yet to be constrained. Alas, our interpretation of the planets' compositions may be limited by our imprecise knowledge of the host star: its radius, its mass, its photospheric inhomogeneity, and its refractory abundances all affect our measurement and interpretation of the masses, radii, and compositions of the TRAPPIST-1 planets. In this paper our measurements of the relative planetary radii and masses have reached such a precision that the fault may now lie in the star.

Future work
We conclude by pointing out directions for building upon the work described in this paper: We have yet to identify the origin of timing outliers which show an excess relative to a normal distribution. This may be addressed with higher precision measurements which may be able to identify a source of noise responsible for these outliers.
2. Our analysis assumes a plane-parallel system with seven planets, and does not yet couple the dynamical and photometric analysis (our photodynamics held the dynamical model fixed). Future analysis with a fully-coupled photodynamical model with 3D orbits and more than seven planets may be warranted.
3. We need more transits measured for planets d and h, in order to better measure the amplitude and phase on the transit-timing variation timescale, as well as to better constrain the presence of planets beyond h. 6. More detailed spectral analysis of the stellar photosphere to ascertain the impact of an inhomogeneous stellar atmosphere on the radius ratios would be warranted.
We anticipate that once JWST launches, we will obtain higher precision constraints upon the dynamics of the system, yielding much improved constraints upon the planets' bulk densities, which will further improve the interpretation of their interior compositions.

ACKNOWLEDGEMENTS
This work is based in part on observations made with the Spitzer Space Telescope, which is operated by the Jet Propulsion Laboratory, California Institute of Technology under a contract with NASA. Support for this work was provided by NASA through an award issued by JPL/Caltech. EA was supported by a Here we approximate the posterior probability distribution as a multi-dimensional Gaussian, assuming a uniform prior. The log likelihood for each data point with indices i and j may be written as a function of the observed transit times and uncertainties, the modeled transit times, and the Student's t-distribution model parameters, such that where all of the dependence on the dynamical model parameters enters through t ij (x dyn ). The maximum posterior probability also corresponds to the maximum likelihood in this limit, in which case we expand the log likelihood for the ith planet and jth transit as a Taylor series: where we have used the fact that the gradient of the log likelihood vanishes at the maximum likelihood value of the model parameters, x 0 , and the indices k, l = 1, ..., 5N p + 2 for x k and x l , where the first 5N p parameters are the dynamical parameters, x dyn , and the last two parameters are the Student's t-distribution likelihood parameters, log ν and V 1 e 1/2ν . Now, the width of the Gaussian distribution at the maximum likelihood is governed by the Hessian matrix, with elements given by which involves second derivatives of the negative log likelihood with respect to the model parameters. The derivatives of t ij with respect to x dyn we compute with the NbodyGradient code; however, the second derivatives of the transit times with respect to the dynamical model parameters are not computed with our N-body code. We drop these transit time second derivative terms, which we justify as follows.
For the Hessian matrix elements which involve second derivatives with respect to both dynamical model parameters, 1 ≤ k, l ≤ 5N p , we can write: where t ij = t ij (x dyn ) is implied in this and subsequent equations. Now, at the maximum likelihood there is a balance of residuals which are both positive and negative, such that the second component of this equation has terms with positive and negative signs for different values of i and j. This causes the second term in this equation to average to a small value compared with the first term when the sum is carried out over i and j (the planet and transit indices). So, we drop the second term in this equation.
Adding in the cases of the Hessian matrix elements which involve the likelihood parameters, (x 5Np+1 , x 5Np+2 ) = (log ν, V 1 e 1/2nu ), we compute the Hessian as where the partial derivatives with respect to t ij (x dyn ), x 5Np+1 = log ν, and x 5Np+2 = V 1 e 1/2ν are computed with automatic differentiation.  Table 13. Prior probability boundary limits for the TRAPPIST-1 planet parameters. The bounds are chosen so as to not affect the parameters as much as possible.
The inverse of the Hessian matrix is used in the Levenberg-Marquardt optimization, and when evaluated at the maximum likelihood, is used to estimate the covariance matrix, from which the square root of the diagonal components are used to estimate the widths of the posterior distribution for each model parameter, x = (x dyn , log ν, V 1 e 1/2ν ), which are plotted in Figures 6, 5, and 4. This approximated Hessian is also used to define the mass matrix for the HMC simulations.

B. TRANSIT TIMING PRIOR
We use a uniform prior for each mass and orbital element, with smooth bounds on each, with the exception of the initial eccentricity vectors. Since we sample in the eccentricity vector of each planet, e i = (e i cos ω i , e i sin ω i ), the volume of parameter space scales ∝ e i , and so an 1/e i prior is needed to yield a posterior which has a uniform probability with eccentricity, e i , for the ith planet (Eastman et al. 2013).
In addition to the eccentricity prior, we place smooth bounds on the parameters. For each bound we choose upper and lower limits for which the prior starts to transition from 1 to 0 with a cubic dependence. For the bound on a function of our parameters of value ξ we specify ξ≤ξ 1 3y 2 − 2y 3 ; y = ξ−ξ1 ξ2−ξ1 ξ 1 <ξ<ξ 2 1 ξ 2 ≤ξ≤ξ 3 3y 2 − 2y 3 ; y = ξ4−ξ ξ4−ξ3 so that the total prior is given by where the values of ξ 1 −ξ 4 and each transformation of parameters, f ={f j (x); j=1, ..., N bound }, are given in Table 13, where N bound = 4N p + 2. The prior probability, then, is given by Π(x), which we multiply by the likelihood function before sampling.   (Table 14), and the forecast times (Table 15) are given in this appendix.  Table 14. Observed transit times with uncertainties, along with the mean, tpost, and standard deviation, σpost of the times from the posterior sample. Times are in BJDTDB − 2, 450, 000 while uncertainties are in days.