Dynamical tomography of gravitationally bound systems

We study the inverse problem of deducing the dynamical characteristics (such as the potential field) of large systems from kinematic observations. We show that, for a class of steady-state systems, the solution is unique even with fragmentary data, dark matter, or selection (bias) functions. Using spherically symmetric models for simulations, we investigate solution convergence and the roles of data noise and regularization in the inverse problem. We also present a method, analogous to tomography, for comparing the observed data with a model probability distribution such that the latter can be determined.


Introduction
Matter in the universe is usually contained in systems bound together by gravitation: planets and their moons, planetary systems, star clusters, galaxies, and groups of galaxies. Indeed, the modern concepts of gravitation and gravitational potential were brought about by the realization that a mathematically well definable universal force field must keep celestial bodies in their observed orbits. Newton's solution to the inverse problem of "What kind of a force keeps two pointlike bodies in an elliptic orbit around each other?" was the inverse-square law of attraction 1 . In such a force field, the equations of motion for a body can be written as where t ∈ R is the time, x ∈ R 3 describes the position of the body, and Φ : R 3 → R is the gravitational potential; here Φ(x) ∼ x −1 .
Outside isolated two-body configurations, the two-body solution only serves as a useful first approximation for systems dominated by one massive body; for such a class of systems, the perturbations from this solution can be examined analytically to a relatively high precision. The behaviour of a system consisting of a moderate number of bodies of more or less equal weight must be numerically integrated assuming the Newtonian potential for each pointlike mass. However, if the system consists of many (> 10 6 or so) gravitationally interacting bodies collectively forming the potential, we can assume it to obey some principles of statistical mechanics. Accordingly, we no longer analyze the motion of separate particles, but want to define the dynamical characteristics of the system as a whole, such as its global-scale potential field Φ(x), matter density ρ(x), etc. We define dynamical tomography as the inverse problem of deducing these smoothly distributed characteristics from kinematic observations. A more detailed definition of this large-scale generalization of Newton's dynamical inverse problem is given in Section 2.1. The term tomography can be used here in both intuitive and technical senses: we determine density/distribution functions inside a domain, and this can be done by employing the principles of tomographic analysis.
Large gravitationally bound systems have been studied for more than a century, and the corresponding theoretical framework is described in several textbooks, of which [2] is the most up-to-date one. However, the studies have mostly concentrated on the direct problem of creating numerically or analytically well tractable dynamical models, or of designing approximate system models to explain various observed phenomena, while the general inverse problem (as posed in sections 2 and 3 here) has received less attention. This is mostly due to the lack of data sufficient for a comprehensive analysis, and the scarcity of efficient methods of dealing with such data. New sky surveys such as LSST (Large Synoptic Survey Telescope, USA) and Gaia (EU), both starting at the beginning of the next decade, will provide a vast amount of kinematic data of the stars in our galaxy, making the construction of a robust machinery for inverse problem analysis essential. We describe here some theoretical and practical aspects of solving problems in dynamical tomography.
The structure of the paper is as follows: in Section 2, we review as well as define some basic concepts of large gravitational systems, and pose the problem in corresponding terminology. In Section 3, we examine the fundamental aspects and uniqueness properties of the inverse problem, and in Section 4 present examples in the simplified case of spherically symmetric systems. In Section 5, we redefine the problem in the sense of probability distributions, which is usually necessary in practice. Finally, in Section 6, we present conclusions and discuss future work and applications.
2. Defining large systems 2.1. Distribution functions. In this paper, we assume the system to be bound, stable, collisionless (d ≫ D, where d is the shortest distance between two bodies of size D) and in a steady-state equilibrium (for a discussion on these standard assumptions and their applicability regimes, see [2]). In practice, this is a good approximation for large stellar systems, much in the same way as the two-body solution is a good starting point for analyzing the solar system. The dynamics of such a system is completely described by the smooth phase-space distribution function f (x, v, t) of its constituent particles (stars), f : where v ∈ R 3 is the velocity. The function f pertains to the limits d → 0 (infinitely many particles), D → 0, and D/d → 0. In practice, f is interpreted as the quantity giving the expected number of particles in a phase-space volume via f d 3 x d 3 v, or as the probability density of finding a particle at (x, v).
The distribution function f essentially corresponds to fluid in six dimensions; the equation of motion for f is the collisionless Boltzmann (or Vlasov) equation (a special case of Liouville's theorem; see [2]). In terms of Poisson brackets, this reads ∂f /∂t = [H, f ], where H is the Hamiltonian. This implies that the flow of the fluid (the smooth motion of particle phase points through phase space) is incompressible. In this paper, we study the steady-state case where matter units move in their orbits in the potential field Φ while leaving the collective system configuration completely unchanged: ∂f /∂t = 0 = ∂Φ/∂t, i.e., f = f (x, v) and Φ = Φ(x). Moreover, the observational data, measurements of (x, v) for stars or equivalent units of matter, are effectively for one epoch: though they are measured over several years, the orbital periods are millions of years, so the data do not provide orbital information outside the linearized regime v = dx/dt. We also define a set Q including all the other characteristics than x, v that can be assigned to a star from observations, such as luminosity, temperature, chemical composition, etc. These may be direct observables or implicit quantities such as mass and age that can be expressed in terms of direct ones. They can be continuous or even discrete (taxonomy), in which case an integral over such a variable is understood in a suitable corresponding sense. Thus we have a full distribution function F (x, v; Q) of N +6 dimensions, where N is the number of the quantities in the set Q. At first sight, one might imagine that only mass could be a dynamically interesting quantity in Q, but, e.g., luminosity directly affects the observability and bias factors. Also, as will be shown below, the possibility to partition the observed particles into different populations with the aid of Q adds to the information content.
We can distribute f (x, v) further among the members of Q at each (x, v) with a function ϕ(x, v; Q): Definition 2.1. The full distribution function of number density in R 3 x × R 3 v and in the set Q of N quantities q (generic notation for a member of Q) is The distribution function f i (x, v) of an object population P i is given by where the population filter P i (x, v) is where Λ i sets the (integration) limits for each observable of Q according to the given characteristics of P i .
An important distribution function is that of mass per Definition 2.3. The mass distribution function is given by where the mass factor M(x, v) is For example, the case of all particles having the same mass M 0 is given by where δ is the Dirac delta distribution. Thus M = M 0 . The amount of matter in an infinitesimal volume of phase space where the velocity domain V + includes all velocities v that can exist at x in a bound system; alternatively, for v / ∈ V + we define f (x, v) = 0 and can integrate over all v. The matter density and the potential are related through Poisson's equation: where G is the universal gravitation constant. Note that iff is the distribution of all matter, equations (1), (8) and (9) must be simultaneously fulfilled. The search for such self-consistent solution pairs off , Φ is the fundamental problem of stellar dynamics. However, as we will show in Section 3, f (orf ) does not need to cover all matter in our problem; it can describe a selected population, or there may be unobservable matter such that the full f cannot be determined while Φ can.
The distribution function f is a statistical tool, meant to describe the average distribution of matter in large enough bins. As mentioned above, another useful practical interpretation of f is the probability density of observing a star (or, for F (x, v; Q), a star with properties Q) at (x, v); this circumvents the local-scale distribution of matter in space, and the size of our averaging bins -the collisionless approximation means (and observations show) that most of space is void of luminous matter. In section 5, we will use this interpretation to redefine the observables of our inverse problem.
We can now state our problem of dynamical tomography as follows: Problem. Given a large number of observed (x, v) (for any motion markers such as stars or other matter and possibly in different populations P i ) in a domain Ω ⊂ R 3 × R 3 in a gravitationally bound steady-state system, deduce the potential Φ(x) of the system, and the distribution function(s) f (x, v) of the observed matter in Ω.
Obviously, if the number of observations is extremely large and all matter in the system is observed, the problem is trivial. The number of observations in arbitrarily small phase-space volumes is now nonzero, so an accurate estimate of the distribution f (x, v) is obtained by simply counting the objects, assuming that their masses are known. From this, Φ(x) follows; one could, as a shortcut to Φ(x), directly count objects in R 3 x only. What makes the problem an inverse one is that neither of the requisites is fulfilled in reality: the number of observations, though high, is not sufficient for direct counting in small enough phase-space (or even R 3 x ) bins, and, above all, not all of the system matter is observed because of bias factors or non-luminous matter. This is why we need to consider R 3 x even if we only want to obtain Φ(x): the fraction of unobserved matter can only be inferred by using a priori information furnished by the expected dynamical properties of the system. Our goal is now to find a parametrization in which the dynamics and a priori constraints of the system can be expressed in a practical and preferably analytically tractable way. Here we neglect effects such as spatial correlation between stars, changing mass (shedding or accretion) due to stellar evolution, etc. [2].
Some basic properties of the system can already be deduced from a very small number of observations. For example, if we observe matter at distance r from the centre of the system, moving at speed v r relative to us, and assume it to be bound by the system, we can roughly estimate the mass M inside r, e.g., by assuming a spherical mass distribution and a circular orbit around the centre: M = v 2 r r/G. In this and similar ways it has been deduced from several observations of galaxies and their clusters that most of the mass in the universe is contained in dark matter: non-luminous and extremely hard if not impossible to observe directly.

2.2.
Jeans theorems and integrable systems. As is well known, a natural and compact way of parametrizing the desired dynamical properties of the system can be achieved via its dynamical invariants. A function I(x, v), evaluated at [x(t), v(t)] of any orbit in the potential Φ of the system, is an integral of motion if for all orbits at all t. Evaluating the time derivative by the chain rule and substituting the equation of motion (1), we obtain an equation exactly of the form (1) (I corresponding to f ) with the steady-state condition ∂f /∂t = 0. Thus we arrive at the Jeans theorem (see, e.g., [2]) that states that any steady-state solution f of (1) is of the form f [I i (x, v)], where I i are integrals of motion. For example, the energy E(x, v): is always an integral in gravitationally interacting systems.
To have a complete set of integrals which we can, in principle, reconstruct and thus properly use in our formulation of the problem, we further restrict ourselves to regular orbits and integrable systems (defined below), a subset of the steady-state solutions of (1). This class of solutions, though restricted, has two great advantages: i) it facilitates an analytical or semianalytical investigation of the problem as well as the derivation of uniqueness results; and ii) it can be readily viewed as a further modifiable approximate representation of the full class of solutions. An orbit is regular (quasiperiodic) if it is confined to a 3-torus S 1 × S 1 × S 1 in phase space [1]. Then it has three integrals (also called isolating integrals) I i , i = 1, 2, 3 that can be used to define the torus: The isolating integrals can be given in an arbitrary basis as functions of I i are isolating integrals as well. A proper basis set of isolating integrals maps each torus T as T ↔ I i , i = 1, 2, 3, i.e., any such bases can be mapped one-to-one into each other: If a potential Φ(x), x ∈ R 3 creates an integrable system, all orbits are confined to 3-tori in R 3 x × R 3 v that are each defined by three action (Poincaré) integrals J i , i = 1, 2, 3, a class of isolating integrals I: where p ∈ R 3 and q ∈ R 3 are any canonically conjugate momenta and coordinates, and P i is a path that cannot be continuously deformed into a point. For other paths, the integral vanishes. There are three such sets of possible paths each producing one J i ; see, e.g., [1,2]. J ∈ R 3 J plays the role of canonical momentum in Hamilton's equations, and its canonically conjugate coordinate pair is the angle variable θ ∈ R 3 . (J, θ) are related to phase-space coordinates (x, v) by a canonical transformation, and energy (Hamiltonian) depends on J only: E = E(J).
Action-angle formalism yields the time averages theorem [2] that states that the density of the orbit on the torus, i.e., the average time it spends on different parts of the torus, is evenly distributed in θ on its surface. This holds for non-resonant, i.e., not closed orbits that make up almost all of phase space. This, in turn, leads to the strong Jeans theorem that essentially states that, in an integrable system, a steady-state solution of (1) must be of the form (12) f where I i are isolating integrals. If we want a steady-state solution for the luminous matter in the system, the distribution function of our choice is the mass distributionf (x, v), i.e., we now assume thatf =f [I(x, v)]. Thus, if we want f andf to be interchangeable in our steady-state equations and theorems to be derived in section 3, we must assume the mass factor M to be of the form M = M[I(x, v)] as well, i.e., ϕ(x, v; M ) = ϕ[I(x, v); M ]. Leaving the masses unknown is the advantage in using f , but in practice one should solve the inverse problem withf as well by weighting the count of each observed point (x i , v i ) by the corresponding mass M i , and then compare the results (e.g., the tori obtained).
The fact that the system as a whole is a steady-state and integrable one does not generally imply that a population filter . But if such filters (i.e., steady-state populations) can be assumed to be identifiable at least in some domain of Q and R 3 x × R 3 v , they are very useful in the inverse problem, as will be shown in section 3.

Inverse problem
3.1. Uniqueness. By virtue of the strong Jeans theorem, we now seek integrable systems that reproduce the observations. Since the orbits of all particles, observed or not, are on tori, and f (x, v) is of the form (12), we know that each isosurface , regardless of the group of objects it represents, entirely consists of 3-tori on which I(x, v) = const. To show that these isosurfaces determine the potential Φ(x) of the system, let us first present the following lemma: Lemma 3.1. Let the potential Φ(x) generate an integrable system, and let T denote the corresponding set of 3-tori in R 3 ×R 3 . Then Φ(x) is the only integrable potential (up to an additive constant) that creates any chosen subset T of arbitrarily small patches Γ on any tori of T such that T covers all of R 3 x (accessible to the system) in a connected manner.
Proof. The value E of the energy (10) is constant on each torus. Let us choose a where E 0 is the energy on the corresponding torus. Thus Φ(x) is defined up to an arbitrary constant for all x of the torus. If we have another patch Γ 1 with a point x = x 0 common with Γ 0 , and the corresponding phase-space points on the two patches are, respectively, (x 0 , v 0 ) and (x 0 , v ′ ), the value E 1 of the energy on Γ 1 must be This chain of patches can be continued to cover Φ(x) everywhere (up to the arbitrary constant chosen for E 0 at the beginning).
Remark 1. The connected manner thus means that Γ on different tori can be arranged in a sequence by common values of x as above. Actually, any patch on a torus can even be replaced by a set of two points that do not have to be close to each other: to establish the connected chain for Φ(x), we only require two points with different x on one torus, and one of the x shared with a point on another torus so that all x are covered.
Remark 2. The lemma can also be expanded to concern all potentials (not just integrable systems) by defining Γ to be sections of orbits having common points x. In fact, just one chaotic orbit is sufficient as it eventually defines Φ(x) at all x ∈ R 3 . More generally, Γ can denote parts of any structures on which E is constant, or parts of isosurfaces of any functions of the form f (E). Note that an isosurface need not be connected, i.e., f (E) need not be monotonous. One value of f can correspond to more than one value of E as long as the branches of different E can be identified: Γ ↔ E.
The lemma states that even highly fragmentary information on the shape of the tori in phase space is well sufficient to determine the integrable potential Φ(x) uniquely. Now, let independent distribution functions . The 3-surfaces formed by the intersection of three 5-surfaces f i (I) are 3-tori (defined by I as well). Then, by the lemma, any collection of parts of surfaces f i = const sufficient to determine a connected chain of torus patches uniquely determines Φ(x). We can now state the following uniqueness theorem: of matter in an integrable system be defined in some common domains of R 3 × R 3 such that a set of parts of the surfaces f i (x, v) = const forms a succession of torus patches connected in x. Then the f i (x, v) uniquely determine the potential Φ(x). Remark 3. We can pose this in the integral space R 3 I as well: we assume that the f i (I), I ∈ R 3 , are such that the set of equations {f i (I) = C i } has a nonzero and finite number of solutions I for sufficiently many {C i } such that a connected sequence of patches can be constructed (both I and f i (I) are, of course, unknown prior to the determination of Φ(x)). Note that the number can be larger than one, i.e., f i (x, v) need not form a proper basis of isolating integrals: if a set {f i = C i } corresponds to more than one torus (more than one set of , it is sufficient to be able to distinguish the different tori in the sense of the lemma (cf. remark 2 of the lemma). Remark 5. Most of all the feasible ways of filling R 3 × R 3 with invariant tori, each with a constant J of (11), are not derivable from any potential (cf. [7]). Thus there usually is no integrable potential φ other than Φ that, with its tori I φ and some distribution function The theorem is constructive for three f 's from which we directly get Φ(x) with the procedure of the lemma, while the case of one f (remark 5) is non-constructive. Finding the integrable Φ(x) corresponding to one f is not obvious since there are no general procedures for finding and exploring integrable potentials [10]. The set of potentials giving rise to integrable systems is known to be a vanishingly small subset of all potentials [10], and there is no such ǫ = 0 that, for an integrable φ(x), φ(x) + ǫϕ(x) is still integrable for an arbitrary ϕ(x). In practice, we can circumvent this difficulty by allowing the use of non-integrable potentials and approximate tori, i.e., we construct a set of approximate integrals I(x, v) for a potential Φ(x) [6]; this approach will be discussed in section 6.
The importance of isosurfaces in an integrable system is that they relieve us from having to record the time evolution of orbits. If the orbital tori are definable, we only need to probe phase space at one moment. It is important to note that f (x, v) does not need to cover all matter in the system (then the single-f uniqueness would be trivial for all systems): it can represent any fraction of the matter, even virtually massless test particles, as long as it is defined on all tori and it has settled to a steady-state equilibrium. This underlines the fact that we are interested more in the geometric structures in R 3 × R 3 (in practice, in a sufficiently large domain Ω of observations) rather than the actual distribution function f itself.
The fact that we do not need to observe all of the matter in the system to determine Φ(x) is of crucial importance. Because of the abundance of dark matter, a large part of all matter will not be seen even in ideal conditions. If, for example, f represents all luminous matter, we immediately obtain the mass density ρ D (x) of dark matter by subtracting (8) from (9) : Inverse Problems and Imaging Volume X, No. X (200X), X-XX Another cause of unobservability is the inevitable fact that our instruments are not sensitive enough, or that, e.g., interstellar dust prevents us from seeing some regions of space properly. Such effects can be described by the bias or selection function γ(x, v), 0 < γ ≤ 1, that represents the fraction of observable matter of The potential Φ(x) can be determined uniquely even in the presence of γ(x, v) as we immediately can see if we suppose that we have independent f i (x, v) > 0 available that share the same γ(x, v), since now the ratios of f i can be viewed as a new basis of isolating integrals: . . , 4, of a bias function 0 < γ ≤ 1 and four independent steady-state distribution functions f i be defined in R 3 × R 3 . Then the three ratios uniquely determine the potential Φ(x) as in theorem 3.2.
Remark 6. As above, this emphasizes the combined information content of different distribution functions. Analogously with remark 5, we can stipulate for two f i that in most cases two products Remark 7. The bias function γ(x) may depend on x only; γ : R 3 x → R. In such cases the additional factor γ(x) does not really increase the possibility of the existence of another integrable potential φ(x) and bias function ξ(x), 0 < ξ ≤ 1, such that ξf ′ = γf everywhere. This would require there to be an isolating integral of φ(x) that everywhere matches an isolating integral of another potential (Φ(x)) when multiplied with a function of x only (ξ(x)/γ(x)), while general isolating integrals are mixed functions of x, v, and individual potentials. No examples of such transformations are known. Thus, usually the single product γ(x)f (x, v) determines the potential Φ(x) of the system.
The above results mean that dark matter, selection bias, or fragmentary data are no fundamental obstacles to dynamical tomography. It should be noted that, for near-integrable systems in the sense of Kolmogorov-Arnold-Moser (KAM) theorem [1], most of our results for integrable potentials still hold to a high precision. This is because they are based on the toroidal topology of motion in phase space, and almost all motion in near-integrable systems still occurs on tori that can be viewed as perturbed versions of the invariant tori of an integrable system. Even though the geometric structure of orbits in phase space changes radically as we move away from integrability, allowing chaotic orbits and substructures of foliated tori around closed orbits, the transition is smooth dynamically, i.e., motion is still mostly quasitoroidal within a given resolution [10].
The solution of the inverse problem is based on the large number N of observed (x, v), from which the distribution function f (x, v) can be estimated. With smaller effective number of phase-space dimensions than six, i.e., in the case of potential symmetries, N may be sufficient for actual binning in sufficiently small volumes ∆x∆v in the reduced phase space. Thus we sample f (x, v) directly by counting and can take it to be our observable, as in section 4. For six dimensions, we can, e.g., take random samples of marginal distributions to be our observables, as will be discussed in section 5.
3.2. Self-consistency regularization. If we assumef to represent all matter in the system, the self-consistency requirement off and Φ can be used as regularization. For example, we can minimize the discrepancy between (8) and (9) at each space volume with the function where λ is a suitable weight factor, and ρ D (x) is given by (13). In practice, this can be replaced by (14) λ for a large number of test locations x i , written in a manifest χ 2 -form. As will be discussed below, it is sometimes better to compare potential values rather than densities at test points due to numerical instabilities in evaluating Poisson's equation directly.
If any amount of dark matter is allowed, no regularization such as (14) can be used if there is no information on the masses of the stars, i.e., observations correspond to the number density, and the estimated mass would only be based on the regularization. This is easy to see by considering the case of constant density fraction of luminous matter: ρ lum (x)/(ρ D + ρ lum ) = const < 1 everywhere in R 3 x . Regularization of the form (14) would then always yield exactly ρ D = 0, regardless of the true density ratio. If masses are included in observations, i.e., we would solve forf even without regularization, (14) can be employed, though minimizing the amount of dark matter is not necessarily an appropriate principle then.

Examples in spherical symmetry
Spherically symmetric systems have the advantage of being integrable, as now the energy E(x, v) as well as the three components of the angular momentum P (x, v) = x ∧ v are isolating integrals; the fact that there are four integrals instead of three is reflected in that each orbit is constrained to a plane (e.g., [2]). The original Jeans' theorem applies only to systems without such degeneracies, but it can be extended to the spherical case [11], stating that any steady-state distribution function in a potential Φ(r), where r = x is the radius, must be of the form f (E, P ). Further, since the system is spherically symmetric, the direction of P is uniformly distributed over the unit sphere S 2 , so we are only interested in functions of the form f (E, L), where It is customary (e.g., [2]) to use the concepts of relative potential and energy Ψ and E such that where Φ 0 is chosen such that f > 0 for E > 0 and f = 0 for E ≤ 0 (usually, Φ 0 = 0). We now study potentials Ψ(r) and distribution functions of the form f (E, L). In this section, we simplify the notation by using f (E, L) directly to denote the mass distribution function (i.e.,f ) as it is the form the analytical treatment here always refers to. For simplicity, let us also assume all the stars to have the same mass m. The systems as well as inversion methods used in this section are simple ones, as the goal is to investigate the basic properties of the inverse problem rather than develop a full-fledged analysis machinery. Our phase space is now essentially reduced to radius r and velocity v in a plane. Because of this reduction in variables, spherically symmetric cases offer analytical tractability to a much larger extent than other systems. Of course spherical symmetry rarely occurs in actual gravitational systems, but it is highly useful for exploring the basic properties of the inverse problem (uniqueness, stability, the effect of sampling density, convergence of solution, etc.) in practical computation. Indeed, because of their analytical prospects, spherical systems and distribution functions of the form f (E, L) have enjoyed ever-continuing popularity in theoretical dynamics for over a century.
Since now we know the exact functional forms of the isolating integrals, we can immediately see that the γ(x)-uniqueness of remark 7 holds exactly for any nontrivial f (E, L) and Ψ(r). All distribution functions f (x, v) now have to be of the everywhere (note that the bias function γ does not have to be spherically symmetric). Thus we can state the following uniqueness theorem:   [4]. Examining the forms of the isolating Stäckel integrals, we can immediately see that corresponding uniqueness results hold for all steady-state systems governed by Stäckel potentials Υ(x).
The simplest nontrivial distribution functions are obtained by just removing Ldependence and using isotropic f (E). The solution for a spherically symmetric system whose f (E) is self-consistent with its density ρ(r) and the corresponding potential Ψ(r) is now simple to obtain. By writing (8) in spherical symmetry and changing the integration variable v → E, we obtain an Abel integral equation for f (E), the solution of which is the well-known Eddington's formula [2] (17) where Ψ is used as the argument of ρ via Ψ(r) → r(Ψ); see [2] for the conditions for f (E) to be non-negative, etc. Note that (17) can naturally be used for nonself-consistent distribution functions (systems containing dark matter) as well. For example, we can solve for the distribution function f lum (E) of luminous matter from the luminous fraction of the total density: ρ lum (r) = ̺(r)ρ(r), with a given 0 < ̺(r) < 1, thus replacing ρ → ̺ρ in (17) while retaining the Ψ(r) corresponding to ρ(r).
With (17), we can simulate observations of the distribution function of a spherically symmetric system, and then numerically check how well its potential can be recovered. If self-consistency is used for regularization, Poisson's equation (9) is simple to integrate, so the regularizing function corresponding to (14), but in terms of potential rather than density, can be given as where u := v . When using gradient-based methods in optimization, integrals of the form (18) must be differentiated algorithmically for consistency, i.e., by taking the derivatives of the numerical algorithm for evaluating (18) rather than first taking the analytical derivative of (18) and then evaluating the integral numerically. The lemma (remark 2) and theorem 4.1 ensure that there is a unique solution for both the potential Ψ(r) and the bias function γ(x). The χ 2 -function of our inverse problem can now be represented as where N i is the number of stars observed in the region Ω i , and m i = m. To ensure positivity, let us model our potential and distribution function as We could include an explicit 1/E-term in the exponential sum for f (E) to make sure that f → 0 as E → 0, but this turns out to be neither necessary nor useful in practice. Because of the adopted form of Ψ(r), it is especially useful to employ (18) rather than density regularization as Poisson's equation would introduce a singularity for one term. Simple generalizations of the r −1 -type Kepler potential of a point mass already yield interesting examples of systems of continuous mass distribution. In fact, dynamical systems whose potentials decrease much faster than this exhibit instability problems [2]. This is another reason why a number of variations of the r −1 -theme have been developed. For example, the isochrone potential where M is the mass of the system and b > 0 a scaling constant, with its density pair ρ IC (r) given by Poisson's equation, yields the distribution function [2] f whereẼ = Eb/(GM ). We will use this in our numerical examples below. We simulated observed N i by creating 400 randomly distributed Ω i in (r, u)space (the extents of each Ω i about 1% of the maximum widths in r and u, with a suitable cutoff value r max ≫ b) and computing the amount of mass in them from f IC . Any distribution function corresponding to an r −1 -type potential drops fast as r increases; thus those Ω i close to r = 0 contain most of the mass and dominate χ 2 if the volumes of Ω i are equal. A more sophisticated observation sampling could use increased volumes away from r = 0, but in any case this sampling is artificial (as is the symmetry setup) and designed rather to probe the numerical properties of the problem than to simulate realistic conditions.
The objective function (19) was minimized with the Levenberg-Marquardt gradient-based procedure [13]. The initial guess for Ψ(r) was based on the top velocities at each r, and the initial f (E) simply had a correspondingly scaled zeroth-order term and a first-order term anticipating f → 0 as E → 0. Even with this crude initial values, the procedure converged well towards the correct minimum, i.e., χ 2 does not have many local minima far away from each other in the parameter space at least with reasonably low truncation degrees of (20). The minimum is not particularly sharp, i.e., there are many virtually as good solutions (χ 2 -level varying a few percent) close to each other even in the noiseless case, as expected because of the insufficiency of the model ("model noise"), in particular near the centre r = 0.
The solution was stable to typical error levels of data: adding 5% or even 10% noise to the data values resulted in virtually the same Ψ(r) and f (E). An exponential bias function γ(x) = exp(− x − x 0 /R 0 ) with the centre at different values of x 0 was well solved for by the procedure, together with Ψ(r) and f (E). Fig. 1 a  and b show the solution Ψ(r) and f (E) (dashed lines), obtained with coefficients up to degrees 4 and 5, respectively, against the actual Ψ IC and f IC (solid lines) with GM = 1 and b = 0.5 (r max = 10), a noise level of 5%, and a bias γ(x) with R 0 = 5 and x 0 = 2. With the bias, the sample cells in R 3 x were defined by r and the polar angle θ (x 0 was placed at r = 2 and θ = 0).
Including the regularization term (18) for a self-consistent case improved the Ψ(r) fit at large r; as a counterexample, Fig. 1 a shows the case without regularization, corresponding to allowing dark matter; with strong regularization, the lines essentially coincide at all r < r max . The slight "overshooting" of Ψ(r)-and f (E)-models at the centre of the system (at r = 0 and E = Ψ(0)) is partly an intrinsic limitation of the polynomial model.
The form f (E, L) for the distribution function describes the anisotropy of velocities; there are now infinitely many f (E, L) corresponding to a given density ρ(r). Creating general functions of this form even for spherical symmetries is not as straightforward as Eddington's formula for f (E). The Osipkov-Merritt procedure [2] is a standard technique for creating a family of self-consistent distribution functions with an adjustable anisotropy parameter, but it is heavily restricted as it still essentially uses Eddington's formula by fixing a combination U = E − cL 2 and stating f = f (U).
In our case we can (and must) allow the existence of dark matter, so we can carry out the simulations without the self-consistency requirement. Thus we can simply use, e.g., where f 0 (E) is any suitable isotropic function, and the function 0 ≤ h(E, L) ≤ 1 can be chosen to represent any desired dynamical characteristics of the observed matter (anisotropy of angular momentum distribution in various regions, etc.) via some adjustable parameters. The density ρ(r) from this is always ρ(r) ≤ ρ 0 (r), and the difference ρ 0 − ρ is attributed to dark matter; whether or not this interpretation is realistic is not relevant to this study. Suppose we want to describe a system with central orbits mostly radial and those reaching high radii mostly tangential. Then we can choose, for example, where 0 ≤ d ≤ 1 is an amplitude factor, E m is the maximal E, and the maximal angular momentum L m = r u(r, E) occurs for given E atr for which dL m (r)/dr = 0, i.e., L m (E) is obtained from where Ψ 0 (r) corresponds to ρ 0 (r).
Using the isochrone for f 0 (E) = f IC (E) and setting G = 1, we have  Fig. 1, and with the same computational procedure for solving the inverse problem, is shown in Fig. 2b. The (r, v)-space is here given by the dimensions (r, v r , v φ ), where v r = dr/dt is the radial velocity and the tangential velocity is v φ = u 2 − v 2 r , i.e., L = rv φ . With the dark-matter scenario, we have no self-consistency regularization (and m = 1 in the f (E, L)-version of (19)). The model for f (E, L) was taken to be with the largest degree of 5 for both E i and L j . The two-dimensional polynomial series may not be the best general representation for f (E, L) (cf. the intrinsic limitations mentioned with Fig. 1), but in this case it at least appears to lead essentially to as good a convergence and solution as in the one-dimensional polynomial case.

Distribution function as probability distribution
To be able to solve the inverse problem in terms of probability distributions and without binned sampling, we need to define how to compare a model distribution and observations, i.e., a sampled realization of some probability distribution. Direct comparison methods of distributions are based on cumulative distributions as the latter are well-defined non-binned observables. However, cumulative distributions are uniquely defined only in one dimension though there have been some attempts at two and three dimensions; see [13] and references therein. This dilemma is solved by noting that the one-dimensional marginal distributions of our f (x, v) exactly correspond to the usual line projections in the projection-slice theorem, Radon transform and related tomographic analysis; see, e.g., [3,9] and references therein.
In general terms, let f (x), x ∈ R N , be a probability distribution, and let R, an N × N orthogonal rotation matrix, det R = 1, describe a linear transformation to a new basis in R N : w = Rx. The marginal distribution function h(z) along z, any one of the new coordinate axes denoted by the set W (R), is and its cumulative distribution function C(z) is usually with z min → −∞. Since h(z) = dC(z)/dz uniquely defines h(z) from a given C(z), we know from tomographic theory (hence this inverse problem can be called dynamical tomography) that: The probability distribution f (x), x ∈ R N , is uniquely determined by the cumulative distributions C(z) of its marginal distributions h(z) along all line directions in R N that define the coordinate axis directions of z. For example, in R 2 the lines are defined by all possible values of the rotation angle θ ∈ S 1 , while in R 3 they are given by all the directions ω ∈ S 2 .
Two one-dimensional distributions can be compared via their cumulative distribution functions. In the case of observations and a model, we denote the observational distribution of K observations at z i (arranged in ascending order by z) by for number density, or, if mass is included in our problem, with S K (z) = 0 for z < z 1 . Now we are using cumulative distributions S K as our observables. The comparison between S K (z) and a model C(z) can be done with a number of norms; with the usual L 2 we have χ 2 -comparison. Another choice often used is the L ∞ -norm giving simply with which one can use the Kolmogorov-Smirnov (K-S) probability 0 ≤ P ≤ 1 of matching marginal distributions [14,13], when both S K (z) and C(z) are normalized to unity by usually with z max → ∞, and one normalizes the obtained f afterwards such that f d N x = K. This, however, distorts the comparison as the high-z ends are now always matched at unity at the comparison stage. Also, K-S probability P is defined by a series with each term a power of which means that, for a large number of observations K, the model C(z) must be very close to the observed S K (z) (KD 2 < 10) for the formal probability P (d KS ) to have any computationally useful level above zero. Even if the observations were perfect, the model cannot reach such a fit, so usually P (d KS ) → 0 regardless of the model.
It is thus practicable to use the χ 2 from the comparison pairs at each z i ; if K is very large, the comparison can be done for a smaller set of points in each z via pruning or interpolation as the curves of S K (z) are now so smooth that this loses virtually no information. Thus we can measure the χ 2 -sum of marginal distribution comparisons at each chosen coordinate line direction i and its points z j : In principle, we could use an arbitrarily large number of projection lines z (i) as, in contrast to standard tomography, there are no limiting factors for choosing them (except for computation time). However, we can expect that even a very limited number of projection lines is compensated for by the strong a priori information (or assumptions) on f (x, v). As is known from, e.g., limited-angle tomography, and its marginal distributions in r and u are shown as solid lines in Fig. 3 a and  b, suitably normalized to scale between 0 and 1. The cumulative distributions S K in r and u from the random samples are shown as dashed lines (again suitably normalized). The best-fit model solution C to (32), with Ψ(r) and f (E) of (20) as earlier, matches these well (dot-dash); its marginal distributions are shown as dotted lines. The solution is virtually the same as in Fig. 1, so the tomography of the probability distribution is a practical approach. In particular, just the two distributions S K (r) and S K (u) were already sufficient for obtaining the solution: due to the highly restricted form of f (x, v), no more projection lines were needed.

Conclusions and discussion
We have defined the inverse problem of dynamical tomography and shown that, with suitable assumptions and mathematical tools, the problem is well-posed and solvable. The uniqueness theorems are central to dynamical tomography: they demonstrate that the steady-state assumption allows a unique solution even with fragmentary data, unobservable mass, and observational bias functions.
Another important concept is the possibility to approximate general (or at least near-integrable) systems with integrable ones. Near-integrable systems that are not very old, but old enough to have settled to a quasistable state, appear closer to integrable than old ones as phase-space diffusion (Arnold diffusion) is not extensive yet. As Nekhoroshev's theorem as well as semianalytical approximations and numerical estimates show [10], the time scale for such diffusion grows fast in inverse proportion to the distance of the initial phase-space point from a KAM torus. For example, in [8] it was shown that even in chaotic zones it is possible to construct approximate tori such that orbits of the system resemble perturbed motion on these tori for small enough time intervals.
Torus construction [6] is a well-defined concept for near-integrable potentials [12]: if a Hamiltonian system is near-integrable in the sense of the KAM theorem, it is integrable on a Cantor set consisting of the surviving KAM tori of the system, i.e., it is possible to construct tori such that their defining action integrals indeed parametrize a global, ordered set. Via torus construction, we find a potential Φ(x) for which we can create an approximate set of tori (and determine expressions for their action integrals J and thus for distribution functions f [J(Φ(x); x, v)]) defining an integrable system such that the steady-state integrability assumption used here agrees with the observations as well as possible. Note that Φ(x) itself does not have to be integrable or even near-integrable; the torus set constructed defines an integrable Hamiltonian that is not usually derivable from a potential, so we never get an approximate integrable potential in the first place. This gives the derived Φ(x) some additional flexibility; indeed, any integrable potential approximating a real galaxy is probably not a very good representation. The key principle allowing the flexibility is the same as in the uniqueness theorems: we look for isosurfaces in phase space best explaining the observations. The goodness of our solution Φ(x) is not directly measured by its closeness to an integrable system.
Our Φ(x) should thus be able to mimic the real potential quite well in cases of near-integrable or even somewhat chaotic but not very old systems. The feasible potentials Φ(x) that can reproduce, via the torus-construction principle, the approximate isosurface structure of the observed matter distribution can all be expected to be close to each other. In fact, even if Φ(x) were integrable, we would have to use torus construction to find it even if we started with an integrable initial Φ 0 (x), since the iteration procedure (or equivalent) for fitting a model to observations will generally explore non-integrable potentials Φ(x).
Let us denote by H 0 the integrable Hamiltonian corresponding to the tori constructed for Φ(x), and by H the Hamiltonian of Φ(x). For optimal H 0 , the difference (in some chosen norm) δH = H − H 0 is as small as possible over the whole phase space; we call the corresponding tori the optimal tori of Φ, and H 0 the optimal Hamiltonian of Φ. If Φ is integrable, then its optimal tori are its invariant tori: δH vanishes everywhere. For our purposes, δH does not have to be small (although the smaller it is, the better). We surmise that: i) The optimal tori constructed for Φ(x) form a map Φ → H 0 . With the same construction scheme, a potential Φ(x) + ǫφ(x) yields an optimal Hamiltonian H 0 + ǫh 0 . ii) The isosurfaces of the distribution function of the system can be approximated by constructing them from a set of 3-tori describable by an integrable Hamiltonian H 0 (but not necesssarily by an integrable potential). Then our Φ can be expected to be a good approximation of the real potential of the system (as a regularizing constraint, we can choose, e.g., the smallness of δH ).
The possibility of constructing optimal tori and Hamiltonians holds another great advantage: a multitude of deviations from integrability can readily be modelled with a suitably tailored version of Hamiltonian perturbation theory [7]. This includes both structural detail (resonant orbit families and other details) and timelike irregularities (deviations from steady state).
In a forthcoming study, we will study more general and realistic systems such as axisymmetric and fully three-dimensional Stäckel potentials. We will also introduce more general observational biases and other factors, and investigate their influence. The final goal is the combination of a general torus-construction machinery and an analysis procedure for dynamical tomography, so that we can work with any potentials. With such tools, we can analyze the data from large-scale surveys and construct a consistent mathematical model of the dynamics of our galaxy. The real galactic problem can be expected to suffer from considerable model noise, so we should employ various forms of modelling the distribution functions and potential. In addition to mapping the density distribution of the dark matter in our galaxy, we should also be able to test whether distributions with alternative theories of gravity are distinguishable from standard models with dark matter.