1 Introduction

The Einstein general relativity (GR) and the standard cosmological model \(\Lambda CDM\) are very successful despite our present lack of understanding on important issues like the cosmological constant, dark energy, dark matter, and the discrepancy between different measurements of the Hubble constant, see [1] for a recent review work with more details. One of the possible modifications of GR which would be relevant for large scale physics and eventually contribute to our understanding of late time acceleration and the cosmological constant, is the existence of some graviton mass (\(m_{gr}\)). One of the striking successes of GR is the detection of gravitational waves coming from black-hole and neutron stars mergers [2], those experiments set an upper bound of about \(10^{-23}\) ev for \(m_{gr}\) which is very small but it does not eliminate a possible tinier graviton mass still relevant for the accelerated expansion of the Universe in a pure massive gravity scenario, see [3, 4] for review works on massive gravity.

Moreover, in order to figure out why the graviton is eventually massless we must, of course, start with some non zero mass and search for its consequences. In the early 1970\('\)s, one [5, 6] noticed that no matter how small is \(m_{gr}\), its consequences for solar system tests of the gravitational interaction would be disastrous. This is the known vDVZ mass discontinuity problem. Soon one has realized [7] that the graviton mass introduces another scale in the theory that spoils the linearized approximation used in [5, 6] as \(m_{gr}\rightarrow 0\). Therefore, non linear terms must be considered, however they lead to ghosts [8]. That problem remained for decades until one has finally found graviton potentials which eliminate the ghost and solve the vDVZ problem altogether, see [9, 10] and also the review works [3, 4]. A general fiducial (fixed) metric has been introduced in [11] and later the addition of a kinetic term for such metric gave rise to the bimetric model [12] which has a massive plus a massless graviton coupled to each other, see also the review work [13]. The bimetric model does not suffer from an unstable Friedmann–Lemaitre–Robertson–Walker (FLRW) solution, a problem present in the model of [9], see [14]. Moreover, in the bimetric model [12] the graviton mass does not need to be small, see [15] for recent detailed studies on observational constraints on the parameter space of the bimetric model which includes the Fierz–Pauli mass.

In earlier calculations [5,6,7,8], in the ghost free massive gravities [9, 11] and also in the bimetric model [12], the massive Fierz–Pauli (FP) model [16] is the paradigmatic starting point. It is a free theory for massive spin-2 particles where the linearized diffeomorphism (Diff) symmetry of the massless sector, linearized Einstein–Hilbert (LEH), is broken by the mass terms. Since the minimal symmetry for massless spin-2 particles is transverse diffeomorphisms (TDiff) instead of Diff, see [17], it is natural to search for the minimal way of describing massive spin-2 particles by adding mass terms to a massless TDiff tensor model. The question is: Is TDiff the minimal symmetry of the massless sector of a massive spin-2 theory? This has been investigated in [18, 19]. The TDiff tensor model describes in general massless spin-2 and spin-0 particles. The authors of [18, 19] have concluded that, although one could add a consistent mass term for the spin-0 sector without problems, there is no mass term in the spin-2 sector that might avoid the presence of ghosts. They have shown that such negative result holds also in the special case of the linearized unimodular gravity where the TDiff symmetry is enhanced to WTDiff (Weyl plus TDiff).

Instead of trying to obtain mass terms via an Ansatz one can follow another route and automatically generate them by means of a Kaluza–Klein [20, 21] dimensional reduction from \(D+1\) to D dimensions, restricted to one massive mode. Such procedure allows us to go, for instance, from Maxwell to Maxwell–Proca, from LEH to massive FP and holds also for higher spins [22, 23]. This has been done in [24] in the special case of the WTDiff model. We comment on the results of [24] and compare to ours whenever possible. In particular, we end up with simpler compact formulae for the reduced massive theory and clarify the issue of gauge fixing, which is subtle now due to the transverse condition on the vector symmetry of the massless theory.

Here we perform the KK dimensional reduction of an arbitrary TDiff tensor model in order to generate possible mass terms and look for generalizations of the resulting models in curved backgrounds, see Sects. 3 and 5 respectively. In Sect. 4 we analyse its content and present a covariant proof of unitarity. We find an interesting parametrization of the non conserved source in the massless TDiff tensor case. In Sect. 6 we draw our conclusions, stressing that the massless limit must have Diff symmetry instead of TDiff. If we insist in TDiff symmetry at \(m=0\) we have no possible mass term. The appendix shows, for the reader’s convenience, some technical details about spin projection and transition operators used to write down propagators.

2 The model and the notation

Here we closely follow the notation of [18, 19] but we use \(\eta _{\mu \nu }=diag(-1,1,\ldots ,1)\). In terms of a rank-2 symmetric tensor \(h_{\mu \nu }\) we start from the second order Lagrangian in \(D \ge 3\) dimensions

$$\begin{aligned} \mathcal {L}(a,b)= & {} -\frac{1}{4}\partial _\mu h^{\alpha \beta }\partial ^\mu h_{\alpha \beta }+\frac{1}{2}\partial ^\mu h^{\alpha \beta } \partial _{\alpha }h_{\mu \beta }-\frac{a}{2}\;\partial ^\mu h \partial ^\nu h_{\mu \nu }\nonumber \\&\quad +\frac{b}{4}\partial _\mu h \partial ^\mu h \quad . \end{aligned}$$
(1)

The first term is required for spin-2 propagation and the second one must be as it stands for the elimination of spin-1 ghosts [18, 19]. For any \(a,b\in \mathbb {R}\), \(\mathcal {L}(a,b)\) is invariant under TDiff (\(\partial ^\mu \xi ^T_\mu =0 \)):

$$\begin{aligned} \delta h_{\mu \nu } = \partial _\mu \xi ^{T}_{\nu } + \partial _\nu \xi ^{T}_{\mu }, \end{aligned}$$
(2)

It is convenient to split the discussion in three cases. The TDiff symmetry can be enhanced either to WTDiff (Weyl plus TDiff) or to arbitrary diffeomorphisms (Diff) according to a fine tune of the coefficients (ab). Regarding the first case, the reader can check that the Weyl transformation \(\delta h_{\mu \nu } = \Lambda \eta _{\mu \nu }\) will be a symmetry of (1) only if \((a,b)=\left( 2/D,(D+2)/D^2\right) \). The corresponding WTDiff model \(\mathcal {L}\left( 2/D,(D+2)/D^2\right) \) is the linearized version of unimodular gravity [18, 19, 24]. The second case corresponds to \(a=1=b\). Now the symmetry is enhanced to arbitrary linearized diffeomorphisms (Diff): \(\delta h_{\mu \nu } = \partial _\mu \xi _\nu + \partial _\nu \xi _\mu \). In this case \(\mathcal {L}(1,1)=\mathcal {L}_{LEH}\) is the linearized version of the Einstein–Hilbert theory \(\kappa \sqrt{-g}R\) about flat background, \(g_{\mu \nu } = \eta _{\mu \nu } + h_{\mu \nu }/\sqrt{\kappa }\). It turns out that \(\mathcal {L}(1,1)\) is just one example of a one parameter family of equivalent Lagrangians. Through the invertible redefinition (\(\lambda \)-shift) \(h_{\mu \nu }\rightarrow h_{\mu \nu }+\lambda \; h \eta _{\mu \nu }\), with \(\lambda \ne -1/D\), we generate a family of models \(\mathcal {L}(A(\lambda ),B(\lambda ))\) of the same form (1), where

$$\begin{aligned} A(\lambda )&= a+\lambda (D\;a-2), \end{aligned}$$
(3)
$$\begin{aligned} B(\lambda )&= b+2\lambda (D\;b-a-1)+\lambda ^2 [D^2 b-D(2a+1)+2]. \end{aligned}$$
(4)

In particular, if we start with the linearized Einstein–Hilbert theory \((a,b)=(1,1)\), we generate a Diff family invariant under \(\delta h_{\mu \nu } = \partial _\mu \xi _\nu + \partial _\nu \xi _\mu -2\lambda (\partial ^{\rho }\xi _{\rho })/(\lambda \, D +1)\). Notice that there is no WTDIFF family since \((a,b)=\left( 2/D,(D+2)/D^2\right) \) is a fixed point of the \(\lambda \)-shift. It turns out that for both the WTDiff model and the Diff family, the Lagrangian \(\mathcal {L}(a,b)\) describes physical massless spin-2 particles. Moreover, only in those cases we have \(f_D(a,b)=0\), where we have defined a function of the parameters of the TDiff model in D dimensions:

$$\begin{aligned} f_{D}(a,b) \equiv 1-2a+a^2(D-1)-b(D-2). \end{aligned}$$
(5)

The above quantity shows up after eliminating \(\lambda \) from (3) and plugging it in (4), with \((a,b)=(1,1)\); accordingly we arrive at \(f_D(A(\lambda ),B(\lambda ))=0\).

The third case is the main concern here and corresponds to pure TDiff models without enhanced symmetry. They describe massless spin-2 and massless spin-0 particles, the former are always physical while the latter ones are physical (definite positive Hamiltonian) whenever, see [18, 19], we have

$$\begin{aligned} f_{D}(a,b) > 0 \quad . \end{aligned}$$
(6)

This subset of models are the ones we call TDiff henceforth. For arbitrary \(a,b\in \mathbb {R}\), the model (1) may be identified with the linearized version of the following Lagrangian

$$\begin{aligned} \mathcal {L}= \sqrt{-g} \left[P(-g)R(g_{\mu \nu }) + Q(-g) g^{\mu \nu }\partial _{\mu }g\partial _{\nu }g \right]\quad . \end{aligned}$$
(7)

where \(P(-g)\) and \(Q(-g) \) are analytic functions at \(g=-1\) such that \(P(1)>0\), they are otherwise arbitrary. The linearization is obtained via \( g_{\mu \nu } = \eta _{\mu \nu } + h_{\mu \nu }/\sqrt{P(1)}\) with the identification \(a=1+2\, P^{\prime }(1)/P(1)\), \(b=1+4\, P^{\prime }(1)/P(1)+4\, Q(1)/P(1)\). Here we are mainly interested in massive theories obtained from (1) via dimensional reduction so we are not going to explore (7) anymore, see e.g. [25] for further developments and phenomenological applications when compared to general relativity.

It is known that TDiff is the minimal symmetry to have a Lorentz invariant S-matrix describing scattering of massless spin-2 particles [17, 26]. It is also the maximal subgroup of the diffeomorphisms. We point out that we can always treat TDiff as an one-parameter Lagrangian since, without loss of generality, we can use the \(\lambda \)-shift and shift \(a\ne \frac{2}{D}\rightarrow a=1\) while keeping b the only free parameter. Similarly, we can fix \(a=\frac{2}{D}\) and let b free, always bearing in mind that \(f_{D}(a,b) > 0\) in both cases. The case \(a=1\) may not be shifted into \(a=2/D\) since this would require a non invertible \(\lambda \)-shift (\(\lambda = -1/D\)).

The Kaluza–Klein (KK) dimensional reduction of both Diff and more recently WTDiff, see [24], have been already discussed in the literature, their results will be compared to ours whenever possible as will be explained in the next section.

3 Dimensional reduction

3.1 Particle content and unitary gauges

In this subsection, we present the Kaluza–Klein (KK) dimensional reduction of the TDiff model and comment on different ways of gauging the Stückelberg fields away. First, we write the TDiff Lagrangian in \(D+1\) dimensions, below \(A,M,N=0,1,\ldots ,D\),

$$\begin{aligned} \mathcal {L}_{D+1}= & {} -\frac{1}{4}\partial _A H^{MN}\partial ^A H_{MN}+\frac{1}{2}\partial _A H^{AM}\partial _N H^{N}{}_{M}\nonumber \\&\quad -\frac{a}{2}\partial _M H^{MN}\partial _N H+\frac{b}{4}\partial _A H \partial ^A H , \end{aligned}$$
(8)

It is invariant under \(\delta H_{MN}=\partial _M \xi _N^T+\partial _N \xi _M^T\) with

$$\begin{aligned} \partial ^M \xi _M^T=0 \quad . \end{aligned}$$
(9)

Notice that the requirement of physical particles in \(D+1\) dimensions meansFootnote 1

$$\begin{aligned} f_{D+1}(a,b) = 1-2a-b(D-1)+a^2 D \ge 0 .\end{aligned}$$
(10)

Defining the cyclic coordinate \(x_{D}=y\), we suitably decompose \(H_{MN}(x,y)\):

$$\begin{aligned}&H_{\mu \nu }(x,y)= \sqrt{\frac{m}{\pi }} h_{\mu \nu }(x)\cos {my}, \end{aligned}$$
(11)
$$\begin{aligned}&H_{y\mu }(x,y)=\sqrt{\frac{m}{\pi }}A_\mu (x)\sin {my}, \end{aligned}$$
(12)
$$\begin{aligned}&H_{yy}(x,y)=\sqrt{\frac{m}{\pi }}\varphi (x)\cos {my}. \end{aligned}$$
(13)

The action in \(D+1\) dimensions is

$$\begin{aligned} S = \int d^{D+1} x \; \mathcal {L}_{D+1} = \int d^D x \; \int _0^{\frac{2\pi }{m}} dy \, \mathcal {L}_{(D+1)}. \end{aligned}$$
(14)

Integrating over y, we obtain the massive theory in D dimensions:

$$\begin{aligned} \mathcal {L}_D= & {} \mathcal {L}(a,b)-\frac{1}{4}F_{\mu \nu }^2 [A_\mu ]-\frac{m^2}{4}(h_{\mu \nu }h^{\mu \nu }-b\;h^2)\nonumber \\&+\frac{a}{2}h_{\mu \nu }\partial ^\mu \partial ^\nu \varphi -m\;h_{\mu \nu }\partial ^\mu A^\nu +m\;a\;h\partial _\mu A^\mu + \nonumber \\&-m(a-1)A_\mu \partial ^\mu \varphi -\frac{1}{2}h[b\;\square +m^2(a-b)]\varphi \nonumber \\&-\frac{1}{4}\varphi [(b-1)\square +m^2(2a-b-1)]\varphi , \end{aligned}$$
(15)

where \(\mathcal {L}(a,b)\) is the massless TDiff model in D dimensions given in (1). Once again we split the discussion in three cases.

3.1.1 Massive Fierz–Pauli

If \((a,b)=(1,1)\), the \(D+1\) theory (8) is invariant under arbitrary diffeomorphisms without the restriction (9). Consequently, (15) becomes invariant under full Diff and U(1) transformations. We redefine the vector field in order to write down the gauge transformations in a diagonal form, i.e.,

$$\begin{aligned} \delta h_{\mu \nu } (x)= & {} \partial _\mu \psi _\nu + \partial _\nu \psi _\mu , \end{aligned}$$
(16)
$$\begin{aligned} \delta a_\mu (x)\equiv & {} \delta [A_{\mu } - \partial _\mu \varphi /(2\,m)] = -m\;\psi _\mu , \end{aligned}$$
(17)
$$\begin{aligned} \delta \varphi (x)= & {} 2m\;\Lambda . \end{aligned}$$
(18)

where \(\psi _{\mu }(x)\) and \(\Lambda (x)\) are \(D+1\) independent gauge parameters which stem from

$$\begin{aligned}&\xi _\mu (x,y)=\sqrt{\frac{m}{\pi }}\psi _\mu (x) \cos {my}, \end{aligned}$$
(19)
$$\begin{aligned}&\xi _y(x,y)=\sqrt{\frac{m}{\pi }}\Lambda (x) \sin {my}. \end{aligned}$$
(20)

The Lagrangian (15) becomes the usual massive spin-2 FP model in D dimensions written in terms of a gauge invariant field \(H_{\mu \nu }^{Diff}\),

$$\begin{aligned} \mathcal {L}_D(H_{\mu \nu }^{Diff})= & {} \mathcal {L}_{EH}(H_{\mu \nu }^{Diff}) -\frac{m^2}{4} \left[(H_{\mu \nu }^{Diff})^2-(H^{Diff})^2\right]\nonumber \\= & {} \mathcal {L}_{FP}(H_{\mu \nu }^{Diff}) \end{aligned}$$
(21)
$$\begin{aligned} H_{\mu \nu }^{Diff}\equiv & {} h_{\mu \nu } + \frac{ \partial _{\mu }a_{\nu }+ \partial _{\nu }a_{\mu }}{m} = h_{\mu \nu } + \frac{ \partial _{\mu }A_{\nu }+ \partial _{\nu }A_{\mu }}{m} - \frac{\partial _{\mu }\partial _{\nu }\varphi }{m^2}\nonumber \\ \end{aligned}$$
(22)

where \(\mathcal {L}_{EH}\) is the linearized Einstein–Hilbert Lagrangian.

It is clear from (16)–(18) that \(a_{\mu }\) and \(\varphi \) are pure gauge. The massive spin-2 content of (15), assured by the corresponding FP conditions \(\partial ^{\mu }h_{\mu \nu }=0=h\) and the Klein–Gordon (KG) equation \((\Box - m^2)h_{\mu \nu }=0\), follows from the unitary gauge \(a_{\mu }=0=\varphi \) which can be set at action level without affecting the particle content of the model. The gauge completely (uniquely) determine the gauge parameters \(\psi _{\mu }\) and \(\Lambda \), thus satisfying the “completeness” criterium of [27, 28] for a good gauge to be fixed at action level. This is a key issue for the elimination of the Stückelberg fields as we will see in the next subsection.

3.1.2 Massive WTDiff

In the case \([a,b]=[2/(D+1),(D+3)/(D+1)^2]\), the model (8) becomes the massless WTDiff model in \(D+1\). Such dimensional reduction has been investigated in [24]. Our Lagrangian (15) coincides with the corresponding one of [24] in the above case. Although it stems from a WTDiff model in \(D+1\), (15) is invariant under full Diff and Weyl transformations. In order to single out the pure gauge degrees of freedom we find convenient to redefine the vector and scalar fields

$$\begin{aligned} a_{\mu }^W&= A_{\mu } + \frac{\partial _{\mu }(h-\, D\, \varphi )}{2(D+1)\, m}, \ \end{aligned}$$
(23)
$$\begin{aligned} \Phi&= \varphi + h. \end{aligned}$$
(24)

Accordingly, we have the following Diff and Weyl transformations:

$$\begin{aligned} \delta h_{\mu \nu }= & {} \partial _\mu \psi _\nu + \partial _\nu \psi _\mu + \eta _{\mu \nu } \, \chi \end{aligned}$$
(25)
$$\begin{aligned} \delta a_\mu ^W= & {} -m\;\psi _\mu , \end{aligned}$$
(26)
$$\begin{aligned} \delta \Phi= & {} (D+1)\chi , \end{aligned}$$
(27)

Notice that the U(1) parameter \(\Lambda \) has been eliminated via the constraint

$$\begin{aligned} \partial ^\mu \psi _\mu +m\,\Lambda =0. \end{aligned}$$
(28)

which follows from the higher dimensional transverse condition (9). The previous gauge transformations suggest the definition of a gauge invariant tensor as in the Diff case, namely,

$$\begin{aligned} H_{\mu \nu }^{WT}= h_{\mu \nu } + \frac{\partial _{\mu }a_{\nu }^W + \partial _{\nu }a_{\mu }^W}{m} - \frac{\eta _{\mu \nu }\Phi }{D+1} \end{aligned}$$
(29)

It turns out that the whole Lagrangian (15) reduces to the FP model:

$$\begin{aligned} \mathcal {L}_D = \mathcal {L}_{FP}(H_{\mu \nu }^{WT})= \mathcal {L}_{FP}\left( h_{\mu \nu } + \frac{\partial _{\mu }a_{\nu }^W + \partial _{\nu }a_{\mu }^W}{m} - \frac{\eta _{\mu \nu }\Phi }{D+1} \right) \end{aligned}$$
(30)

Notice that the replacement \(h_{\mu \nu } \rightarrow H_{\mu \nu }^{WT}\) is non trivial since it envolves double derivaties of the tensor field itself. However, although the U(1) symmetry disappears, we can use the Weyl symmetry altogether with the Diff symmetry in order to eliminate pure gauge degrees of freedom and fix the unitary gauge \(a_{\mu }^W=0=\Phi \). Therefore, we recover spin-2 massive particles in D dimensions which is the expected result for a dimensional reduction of spin-2 massless particles in \(D+1\). The authors of [24] have chosen the partial gauge fixing \(a_{\mu }^W=0\) which leaves us with a massive model with Weyl symmetry. It amounts to the FP model with the replacement \(h_{\mu \nu } \rightarrow h_{\mu \nu } -\eta _{\mu \nu }h/D + \eta _{\mu \nu }\phi \) which comes from (30) after an invertible redefinition of the scalar field \(\Phi = (D+1)(h/D-\phi )\) where \(\phi \) is now Weyl invariant. We stress that such partial gauge does satisfy the completeness criterion of [27, 28], since \(a_{\mu }^W=0\) completely (uniquely) fix the D parameters \(\psi _{\mu }\), consequently it can be fixed at action level without problems even though the Weyl symmetry is left unbroken.

We have found interesting from the point of view of the completeness criterium of [27, 28] to discuss a third gauge condition:

$$\begin{aligned} f_{\mu }\equiv A_{\mu }- \frac{c}{m} \partial _{\mu }h =0 \quad ; \quad \varphi =0 \quad . \end{aligned}$$
(31)

where c is so far an arbitrary real constant. Are we allowed to fix such gauge at action level? After this gauge is fixed, the action (30) only depends upon the tensor field. We know that the only viable choice for massive spin-2 particles is the FP model or at most a \(\lambda \)-shifted (\(h_{\mu \nu }\rightarrow h_{\mu \nu } + \lambda \eta _{\mu \nu }\, h\)) version thereof. Indeed, under gauge transformations the gauge conditions change as

$$\begin{aligned} \delta \varphi= & {} \chi -2\,\partial _{\mu }\psi ^{\mu } \end{aligned}$$
(32)
$$\begin{aligned} \delta f_{\mu }= & {} -m\, \psi _{\mu }-\frac{(1+2\, c)}{m} \partial _{\mu }(\partial \cdot \psi )- \frac{c\, D}{m}\partial _{\mu }\chi \end{aligned}$$
(33)

Under the residual transformations

$$\begin{aligned} \delta _{\gamma } \psi _{\mu } = \partial _{\mu }\gamma ; \quad \delta _{\gamma }\chi =2\,\Box \gamma \end{aligned}$$
(34)

we have

$$\begin{aligned} \delta _{\gamma }\delta \varphi = 0; \quad \delta _{\gamma }\delta f_{\mu } = - \partial _{\mu }\left[m\gamma + [1+ 2\, c(1+D)]\frac{\Box \gamma }{m}\right]\,\,. \end{aligned}$$
(35)

Therefore, if \(c\ne -1/(2(1+D))\) the above transformations tell us that the gauge conditions (31) do not completely (uniquely) fix the gauge parameters \(\psi _{\mu }\) and \(\chi \), since we can always add the \(\gamma \)-transformations (34) such that \(\gamma \) satisfies the Klein–Gordon equation \(\Box \gamma = - m^2\,\gamma /\left( 1+ 2\, c(1+D)\right) \). On the other hand, if \(c = -\frac{1}{2(1+D)}\) the gauge conditions will completely fix the gauge parameters and consequently we are allowed, according to [27, 28], to fix the third gauge at action level. In fact, it is only at that particular point that the gauge fixed version of (30) becomes a \(\lambda \)-shifted version of the FP massive theory, namely, \(\mathcal {L}_{FP}[h_{\mu \nu }-\eta _{\mu \nu }h/(D+1)]\). For any \(c\ne -\frac{1}{2(1+D)}\) the action (30) at the gauge (31) is not unitary [16, 29].

3.1.3 Massive TDiff model

This is the most involved case, the constants (ab) are such that \(f_{D+1}(a,b)>0\). Differently from the previous two cases, the particle content of (8) consists of a massless spin-2 and a massless spin-0 particle. Thus, we expect massive spin-2 and massive spin-0 fields in D dimensions as will turn out to be the case. Once again the U(1) symmetry is eliminated via (28). So we have one less symmetry than in the previous two cases. We are left only with full Diff. Redefining the vector field as in the Diff case and the scalar field as in the WTDiff case we have

$$\begin{aligned} \delta h_{\mu \nu } (x)= & {} \partial _\mu \psi _\nu + \partial _\nu \psi _\mu , \end{aligned}$$
(36)
$$\begin{aligned} \delta a_\mu (x)\equiv & {} \delta [A_{\mu } - \partial _{\mu } \varphi /(2\,m)] = -m\;\psi _\mu , \end{aligned}$$
(37)
$$\begin{aligned} \delta \Phi= & {} \delta (\varphi +h)= 0. \end{aligned}$$
(38)

After trying an Ansatz for \(H_{\mu \nu }^{TD}\) and for a massive scalar–tensor theory, we have been able to show that (15) can be written as

$$\begin{aligned} \mathcal {L}_D = \mathcal {L}_{FP}(H_{\mu \nu }^{TD}) + \frac{f_{D+1}(a,b)}{4(D-1)} \,\Phi \left( \Box - m^2 \right) \Phi \end{aligned}$$
(39)

where we have defined the gauge invariant tensor

$$\begin{aligned} H_{\mu \nu }^{TD}= h_{\mu \nu } + \frac{\partial _{\mu }a_{\nu } + \partial _{\nu }a_{\mu }}{m} + \frac{a-1}{D-1}\left( \eta _{\mu \nu } - \frac{\partial _{\mu }\partial _{\nu }}{m^2}\right) \Phi \end{aligned}$$
(40)

Notice that (39) and (40) include the two previous cases which satisfy \(f_{D+1}(a,b)=0\). The unitary gauge now corresponds to \( a_{\mu } + (1-a)\partial _{\mu }\Phi /[ 2\, m\, (D-1)]=0 \), after which we redefine \(h_{\mu \nu } \rightarrow h_{\mu \nu } -\frac{a-1}{D-1}\eta _{\mu \nu }\Phi \) and decouple the massive spin-2 field from the scalar one. Remarkably, unitarity, \(f_{D+1}(a,b)>0\), of the \(D+1\) model (8) is strictly preserved by the dimensional reduction. In Sect. 4 we find interesting to choose the gauge \(a_{\mu }=0\). This and the unitary gauge can be fixed at action level without changing the content of the theory. Before we leave this section, we look at the massless limit of (15).

3.2 Smooth massless limit

The reduced theory (15) at \(m=0\) becomes

$$\begin{aligned} \mathcal {L}_D^{m=0}&= \mathcal {L}(a,b)+\frac{a}{2}h_{\mu \nu }\partial ^\mu \partial ^\nu \varphi -\frac{b}{2}h\, \square \varphi +\frac{1-b}{4}\varphi \square \,\varphi \nonumber \\&\qquad -\frac{1}{4}F_{\mu \nu }^2 [A_\mu ] \quad . \end{aligned}$$
(41)

The Lagrangian (41) is invariant under TDiff \(\delta h_{\mu \nu } = \partial _{\mu }\psi _{\nu }^T + \partial _{\nu }\psi _{\mu }^T\) and U(1), \(\delta A_{\mu } = \partial _{\mu }\Lambda \). If \(a=1=b\), TDiff is enlarged to full Diff while at \([a,b]=[2/(D+1),(D+3)/(D+1)^2]\) is enlarged to WTDiff where the scalar field must contribute to the Weyl symmetry: \((\delta h_{\mu \nu },\delta \varphi ) = (\eta _{\mu \nu }\chi , \chi )\). In this sense the model (15) is a massive deformation of a Diff, WTDiff and TDiff theory in the corresponding three cases respectively. Regarding the particle content of (41), we have to split it in two cases. First, if \(a\ne 2/D\), after a field redefinition \(h_{\mu \nu } \rightarrow h_{\mu \nu } + \frac{a}{2-a\, D}\eta _{\mu \nu }\varphi \), we have:

$$\begin{aligned} \mathcal {L}_D^{m=0}= & {} \mathcal {L}(a,b) + Y \, h\, \Box \, \varphi + \frac{Z}{2}\varphi \, \Box \, \varphi -\frac{1}{4}F_{\mu \nu }^2 \quad . \end{aligned}$$
(42)
$$\begin{aligned} Y= & {} \frac{a+a^2-b}{2(2-a\, D)} \quad ; \nonumber \\ Z= & {} \frac{(D-2)\left[a(D+1)-2\right]^2+4\, f_{D+1}(a,b)}{2(D-1)(2-a\, D)^2} \quad . \end{aligned}$$
(43)

In the case of the reduction of the \(D+1\) WTDiff model \([a,b]=[2/(D+1),(D+3)/(D+1)^2]\) we have \(Y=0=Z\). So we end up with a continuous massless limit with (in \(D=4\)) \(3+2= 2s+1\) physical degrees of freedom, since the reduced Lagrangian and the unitarity condition for the TDiff model in D dimensions become respectively

$$\begin{aligned}&\mathcal {L}_D^{m=0}= \mathcal {L}\left( \frac{2}{D+1},\frac{D+3}{(D+1)^2}\right) -\frac{1}{4}F_{\mu \nu }^2 \end{aligned}$$
(44)
$$\begin{aligned}&f_{D}\left( \frac{2}{D+1},\frac{D+3}{(D+1)^2}\right) =\frac{D-1}{(D+1)^2} > 0 \quad . \end{aligned}$$
(45)

In the Diff case \(a=1=b\), since \(Y=0\), \(Z>0\) and \(\mathcal {L}(1,1)\) is the usual EH Lagrangian, we have once again a smooth massless limit with \(2+2+1= 2s+1\) physical degrees of freedom in \(D=4\).

In the pure TDiff case, still assuming \(a\ne 2/D\), after the redefinition \(\varphi \rightarrow \varphi - Y\, h/Z\) in (42) we have a TDiff model plus Maxwell and a decoupled scalar field:

$$\begin{aligned} \mathcal {L}_D^{m=0}= \mathcal {L}(a,\tilde{b}) -\frac{1}{4}F_{\mu \nu }^2 + \frac{Z}{2} \varphi \Box \,\varphi \end{aligned}$$
(46)

where Z is given in (43) and

$$\begin{aligned} \tilde{b}&= b + \frac{(D-1)(a^2+a-2b)^2}{(D-2)[a(D+1)-2]^2 + 4 f_{D+1}(a,b)}, \end{aligned}$$
(47)
$$\begin{aligned} f_D(a,\tilde{b})&= \frac{(D-1)(2-a\, D)^2f_{D+1}(a,b)}{(D-2)[a(D+1)-2]^2 + 4 f_{D+1}(a,b)} \quad . \end{aligned}$$
(48)

Due to \(f_{D+1}(a,b)>0\) it follows that Z and \(f_D(a,\tilde{b})\) are both positive and we have again a smooth massless limit with \(3+2+1=2s+1 +1\) physical degrees of freedom in \(D=4\). Finally, if \(a=2/D\), after \(\varphi \rightarrow \varphi - h\) in (41) followed by \(h_{\mu \nu }\rightarrow h_{\mu \nu } +\eta _{\mu \nu }\varphi /D\) and another change \(\varphi \rightarrow \varphi -h/(2\, D\, \tilde{Z})\) we obtain,

$$\begin{aligned} \mathcal {L}_D^{m=0}= \mathcal {L}(0,\tilde{B}) -\frac{1}{4}F_{\mu \nu }^2 + \frac{\tilde{Z}}{2} \varphi \Box \,\varphi \end{aligned}$$
(49)

where

$$\begin{aligned} \tilde{B} = \frac{D+1-b\, D^2}{2\, D^2\, \tilde{Z}}; \quad \tilde{Z} = \frac{D-2+D^2f_{D+1}(2/D,b)}{2\, D^2(D-1)}. \end{aligned}$$
(50)

The unitarity condition of the \(D+1\) TDiff model \(f_{D+1}(2/D,b) >0\) assures \(\tilde{Z}>0\) and unitarity of the TDiff model, since

$$\begin{aligned} f_{D}(0,\tilde{B}) = \frac{D^2(D-1)f_{D+1}(2/D,b)}{D-2+D^2 f_{D+1}(2/D,b) } > 0. \end{aligned}$$
(51)

Thus, we have a smooth massless limit in all cases.

4 Massive reduced model at \(A_{\mu } = \partial _{\mu } \varphi /(2\,m)\)

4.1 General remarks

The model (39) at the gauge \( a_\mu = A_{\mu } - \frac{\partial _{\mu } \varphi }{2\,m}=0\) becomes the massive scalar tensor theory:

$$\begin{aligned} \mathcal {L}_m^{ST}= & {} \mathcal {L}(a,b)-\frac{m^2}{4}(h_{\mu \nu }h^{\mu \nu }-b\;h^2)+\frac{(a-1)}{2}h_{\mu \nu }\partial ^\mu \partial ^\nu \varphi \nonumber \\&\quad +\frac{(a-b)}{2}h(\square -m^2)\varphi \nonumber \\&\quad +\frac{(2a-b-1)}{4}\varphi (\square -m^2)\varphi \quad . \end{aligned}$$
(52)

The first remark we make is that in both cases of Diff and TDiff, see (17) and (37), we have \(\delta a_{\mu } = -m\,\psi _{\mu }\). Thus, the gauge \(a_{\mu }=0\) completely determines the parameters \(\psi _{\mu }\). However, from (23)–(27) we see that in the WTDiff case \(\delta a_{\mu } = -m\, \psi _{\mu }- \partial _{\mu }\chi /(2m)\) which has the residual symmetry \((\delta \psi _{\mu },\delta \chi )= (- \partial _{\mu }\gamma ,2\, m^2 \gamma )\). Therefore, the gauge \(a_{\mu }=0\) at action level is not allowed at the WTDiff point \((a,b)=\left( \frac{2}{D+1},\frac{D+3}{(D+1)^2}\right) \). So henceforth our results do not apply in this special case and may not be compared with [24] any more.

Our second remark concerns the massless limit of (52), i.e.,

$$\begin{aligned} \mathcal {L}_0^{ST}&=\mathcal {L}(a,b)+\frac{a-1}{2}\varphi \partial ^\mu \partial ^\nu h_{\mu \nu }+\frac{a-b}{2}\varphi \square h\nonumber \\&\qquad +\frac{2a-b-1}{4}\varphi \square \varphi , \end{aligned}$$
(53)

Although a scalar–tensor Lagrangian of the type \(\mathcal {L}(a,b)+c_1 \varphi \partial _\mu \partial _\nu h^{\mu \nu } + c_2 \varphi \square h + c_3 \varphi \square \varphi \) is in general only invariant under TDiff, the coefficients in (53) are such that we have full diffeomorphism invariance:

$$\begin{aligned} \delta h_{\mu \nu }=\partial _\mu \psi _\nu +\partial _\nu \psi _\mu ,\qquad \delta \varphi =-2\nabla \cdot \psi , \end{aligned}$$
(54)

Thus, (52) is a massive deformation of Diff instead of TDiff! The same situation ocurred in [24], where the dimensional reduction of a WTDiff model has given rise to a Weyl Stückelberg version \((h_{\mu \nu }\rightarrow h_{\mu \nu }-\eta _{\mu \nu }h/D + \varphi \eta _{\mu \nu })\) of the usual massive Fierz–Pauli (FP) model whose mass terms break Diff instead of TDiff. It seems that massive spin-2 particles require the breakdown of full diffeomorphisms in order to produce the correct number of constraints. In [18, 19] it has been shown that there is no Lorentz covariant mass term that could be added to the TDiff pure tensor model (1) that might generate a stable theory of massive spin-2 particles. It seems that the addition of a scalar field does not help either. In session 5 we show that the Diff symmetry is required in the flat limit in order to have a vector constraint in a massive scalar tensor theory of second order in derivatives. The reader may find our conclusion doubtful from the point of view of the massless limit of (28), but notice that our gauge, and the vector gauge used in [24], is singular at \(m\rightarrow 0\).

Notwithstanding, it is not fully inappropriate to call (52) a massive TDiff model since the particle content of (53) is exactly the same of a TDiff model in D dimensions, see (77), namely a physical massless spin-2 particle plus a massless spin-0 particle which is unitary whenever \(f_D(a,b)>0\). If we compare (53) to (1) we have one more field but one more symmetry, longitudinal diffeomorphisms (LDIFF): \(\delta h_{\mu \nu } = \partial _{\mu }\partial _{\nu }\lambda \). However, the equivalence between (53) and (1) is not complete. It depends on the boundary conditions. In fact, (53) stands to (1) as LEH stands to the WTDiff model. Namely, if we first derive the equations of motion \(E_{\mu \nu }=\delta S^{ST}_0/\delta h^{\mu \nu } = 0\) and \(\delta S^{ST}_0/\delta \varphi = 0\) and then fix the gauge condition \(\varphi =0\) we have \(R(a,b) \equiv (a-1)\partial ^{\mu }\partial ^{\nu }h_{\mu \nu } +(a-b) \Box \, h = 0\). On the other hand, if we first fix \(\varphi =0\) at action level, we lose its equation of motion and we can only derive from \(\partial ^{\mu }E_{\mu \nu }=0\) that \(R(a,b)=c\) where c is a real constant not necessarily zero. We have lost information since the gauge \(\varphi =0\) does not completely determine the gauge parameters \(\psi _{\mu }\) due to the residual symmetry \(\delta \psi _{\mu } = \partial _{\mu } \gamma \) with \(\Box \gamma =0\). This is similar to the fact that \(h=0\) at action level in the LEH action make us lose the trace of the linearized Einstein–Hilbert equation which becomes an integration constant related to the cosmological constant in the WTDiff model. So TDiff and WTDiff are obtained respectively from \(\mathcal {L}_0^{ST}\) and \(\mathcal {L}_{EH}\) via an “illegal” gauge condition at action level. A stronger claim at nonlinear level, see [30], is that transverseDiff gravity (7) is to scalar tensor as unimodular gravity is to general relativity.

4.2 Helicity variables and equations of motion

In this subsection we use helicity variables, see e.g. [31], in order to identify the different helicity modes of the massive spin-2 particle and split them from the scalar field in (52) without introducing field redefinitions involving time derivatives which might spoil the canonical structure of the theory. The helicity decomposition is also called sometimes “the cosmological decomposition” [18, 19, 32]. The symmetric tensor \(h_{\mu \nu }\) is decomposed in its scalar, vector and purely tensor modes, similarly to [18, 19],

$$\begin{aligned}&h_{00}=A, \qquad h_{0i}=\partial _i B+V^T_i, \end{aligned}$$
(55)
$$\begin{aligned}&h_{ij}=\psi \delta _{ij}+\omega _{ij}E+2\partial _{(i}F^T_{j)}+h^{TT}_{ij}, \end{aligned}$$
(56)

where \(\omega _{ij}=\partial _i\partial _j/\nabla ^2 \), while \(V^T_i\) and \(F^T_i\) are transverse vectors, e.g. \(\partial _jV^T_j=0\). The tensor \(h^{TT}_{ij}\) is traceless and transverse. Applying to the Lagrangian (52), and eliminating the fields ABE and \(V^T_i\) via their equations of motion, saving many details specially in the messy scalar sector, we end up with

$$\begin{aligned} \mathcal {L}_m= \mathcal {L}_m^t+\mathcal {L}_m^v+\mathcal {L}_m^s, \end{aligned}$$
(57)

where

$$\begin{aligned} \mathcal {L}_m^t&=\frac{1}{4}h^{TT}_{ij}(\square -m^2)h^{TT}_{ij} \quad ; \quad \mathcal {L}_m^v=\frac{1}{2}\tilde{F}^T_i(\square -m^2)\tilde{F}^T_i \quad , \end{aligned}$$
(58)
$$\begin{aligned} \mathcal {L}_m^s&=\frac{(D-1)(D-2)}{4}\theta (\square -m^2)\theta +\frac{f_{D+1}(a,b)}{4(D-1)}\Phi (\square -m^2)\Phi , \end{aligned}$$
(59)

Therefore, \(h^{TT}_{ij}\), \(\tilde{F}^T_i\equiv \sqrt{\frac{-\nabla ^2}{m^2-\nabla ^2}}m F_i^T\) and \(\theta \equiv \psi +\frac{a-1}{(D-1)}\Phi \), represent the \(\pm 2,\pm 1\) and 0 helicity modes of the spin-2 sector respectively, while \(\Phi = \varphi + h \) stands for the spin-0 scalar particle whose propagation is physical if \(f_{D+1}(a,b)\ge 0 \). This is exactly the unitarity condition for the massless TDiff theory in \(D+1\) dimensions which is the origin of (52). This completes the analysis of the particle content of (52) which coincides with the content of (39) at the unitary gauge mentioned in subsection 2.1. It is thus established that the gauge \(A_{\mu }=\partial _{\mu }\varphi /(2m)\) can be fixed at action level without spoiling the physical content of the reduced model.

For future generalizations to curved backgrounds we find instructive to look at the equations of motion of (52). Before we do it, however, let us make the redefinition inspired by (39), \(\varphi = \Phi - h \). The tensor and scalar equations of motion become respectively,

$$\begin{aligned} E_{\mu \nu }&\equiv (\square -m^2) h_{\mu \nu }-\partial ^\alpha (\partial _\mu h_{\nu \alpha }+\partial _\nu h_{\mu \alpha }) + \partial _\mu \partial _\nu h \nonumber \\&+ \eta _{\mu \nu } \partial ^{\alpha }\partial ^\beta h_{\alpha \beta }-\eta _{\mu \nu }(\square -m^2)h\nonumber \\&+(a-1)\partial _\mu \partial _\nu \Phi +(1-a)(\square -m^2)\eta _{\mu \nu }\Phi =0, \end{aligned}$$
(60)
$$\begin{aligned} \Psi&\equiv (a-1)\partial ^\mu \partial ^\nu h_{\mu \nu }+(1-a)(\square -m^2)h\nonumber \\&+(2a-b-1)(\square -m^2)\Phi =0. \end{aligned}$$
(61)

From (60) we have the vector constraint :

$$\begin{aligned} \partial ^\mu E_{\mu \nu }=-m^2\partial _\mu h^{\mu \nu }+m^2\partial _\nu h+(a-1)m^2\partial _\nu \Phi =0. \end{aligned}$$
(62)

In the Diff case \(a=1=b\) we get rid of the scalar field and the following combination provides a scalar constraint \(m^2\eta ^{\mu \nu }E_{\mu \nu }+(D-2)\partial ^\mu \partial ^\nu E_{\mu \nu }=m^2(D-1)h=0\), back in (60) and (62) we obtain the usual FP conditions \(h=0=\partial ^{\mu }h_{\mu \nu }\) and \((\Box -m^2)h_{\mu \nu }=0\). Henceforth we assume the pure TDiff case \(f_{D+1}(a,b)>0\). The following combination supplies a scalar constraint

$$\begin{aligned} \Omega= & {} (a^2-b)m^2\eta ^{\mu \nu }E_{\mu \nu }+f_{D}(a,b)\partial ^\mu \partial ^\nu E_{\mu \nu }\nonumber \\&\quad +(a-1)m^2\Psi =0 \nonumber \\= & {} m^2f_{D}(a,b)\left[ h+(a-1)\Phi \right] = 0 \Rightarrow \boxed {h+(a-1)\Phi = 0},\nonumber \\ \end{aligned}$$
(63)

provided that \(f_{D}(a,b)>0\) which follows from \(f_{D+1}(a,b)>0\) via the identity \((D-1)f_{D}(a,b)=(a-1)^2 +(D-2)f_{D+1}(a,b)\). From (63) in (62) we have \(\partial ^\mu h_{\mu \nu }=0\). Additionally, (60) reads simply \((\square -m^2)h_{\mu \nu }=0\), back in (61) we have the spin-0 Klein–Gordon equation

$$\begin{aligned} (\square -m^2)\Phi =0. \end{aligned}$$
(64)

A traceless and transverse tensor can be easily built

$$\begin{aligned} H_{\mu \nu }=h_{\mu \nu }-\frac{1}{D-1}\left( \eta _{\mu \nu }-\frac{\partial _\mu \partial _\nu }{m^2}\right) h \quad , \end{aligned}$$
(65)

and shown to satisfy the conditions for massive spin-2 particles,

$$\begin{aligned} {\left\{ \begin{array}{ll} (\square -m^2)H_{\mu \nu }=0,\\ \partial ^\mu H_{\mu \nu }=0,\\ \eta ^{\mu \nu }H_{\mu \nu }=0, \end{array}\right. } \end{aligned}$$
(66)

Notice that, due to (63), (65) is nothing but \(H_{\mu \nu }^{TD}\), see (40), at the gauge \(a_{\mu }=0\).

4.3 Massive scalar–tensor coupled to sources

Now we couple (52) to arbitary tensor and scalar sources and investigate the influence of the scalar field in the vDVZ mass discontinuity [5, 6]. The arbitrariness of the sources allows us to carry out an off-shell Lorentz covariant proof of unitarity. Motivated by simplicity and the previous discussions about the separation of the spin-2 and spin-0 degrees of freedom let us redefine \(\varphi \rightarrow \Phi - h\) and \(h_{\mu \nu }\rightarrow h_{\mu \nu }+\frac{1-a}{D-1}\eta _{\mu \nu }\Phi \) before adding sources. This cancels out the scalar tensor coupling in the mass term (\(m^2 h\Phi \)) and leads to the FP Lagrangian in the tensor sector. Adding sources we have

$$\begin{aligned} \mathcal {L}_m (T,J)&=\mathcal {L}_{FP}+\frac{a-1}{2(D-1)}\Phi (\partial ^\mu \partial ^\nu h_{\mu \nu }-\square h)\nonumber \\&\quad +\frac{f_{D+1}}{4(D-1)}\Phi (\square -m^2)\Phi + h_{\mu \nu }T^{\mu \nu }+\Phi J. \end{aligned}$$
(67)

Since there is no local symmetry any more, the sources \(T_{\mu \nu }\) and J are totally arbitrary. Integrating over \(h_{\mu \nu }\) and \(\Phi \) in the path integral we derive

(68)

where, suppressing the indices,

$$\begin{aligned} G^{-1}_{FP}= & {} \frac{P_{ss}^{(2)}}{(\square -m^2)}-\frac{P_{ss}^{(1)}}{m^2}+\frac{(2-D)(\square -m^2)P_{ww}^{(0)}}{m^4(1-D)}\nonumber \\&+\frac{P_{sw}^{(0)}+P_{ws}^{(0)}}{m^2\sqrt{D-1}}, \end{aligned}$$
(69)
$$\begin{aligned} G^{-1}_{S}= & {} \frac{D-1}{f_{D+1}}\frac{1}{(\square -m^2)}, \end{aligned}$$
(70)
$$\begin{aligned} \mathcal {J}= & {} J+\frac{(a-1)}{(D-1)m^2}\partial ^{\mu }\partial ^{\nu }T_{\mu \nu } \end{aligned}$$
(71)

The projection and transition operators \(P_{IJ}^{(s)}\), where s stands for the spin of the projected subspace, are given in the appendix. The two-point amplitude in momentum space is given by

(72)

The imaginary part of the residue of \({A}_{ST}(p)\) at the massive pole is given by

$$\begin{aligned} Im Res(\mathcal {A}_{ST})_{|p^2\rightarrow -m^2}= T^{*} P_{ss}^{(2)} T (p) + \frac{D-1}{f_{D+1}(a,b)} |\mathcal {J}|^2, \end{aligned}$$
(73)

which is definitely positive, if \(f_{D+1}>0\), since the first term is nothing but the usual FP result known to be positive. Consequently we have an off-shell Lorentz covariant proof of unitary for the scalar tensor model (52). Notice also that after replacing J in (67) following (71) and taking into account the field redefiniton before (67), we see that the tensor field effectively coupled to \(T^{\mu \nu }\) is \(H_{\mu \nu }^{TD}\) given in (40) at the gauge \(a_{\mu }=0\).

Regarding the vDVZ mass discontinuity, if we assume that the scalar source is proportional to the trace of spin-2 source, \({\mathcal {J}}=c\, T\), with some real constant c, from (69) and (70) we have the following tensor structure at \(m\rightarrow 0\) in \(D=4\):

$$\begin{aligned} G^{-1}_{\mu \nu \alpha \beta }= & {} \frac{1}{\Box }\left\{ \frac{\eta _{\mu \alpha }\eta _{\nu \beta }+\eta _{\mu \beta }\eta _{\nu \alpha }}{2} - \frac{\eta _{\mu \nu }\eta _{\alpha \beta }}{2} \right. \nonumber \\&\left. + \eta _{\mu \nu }\eta _{\alpha \beta }\left[\frac{1}{6} + \frac{3\, c^2}{f_5(a,b)}\right]\right\} + \cdots \end{aligned}$$
(74)

where dots stand for analytic contributions which only lead to contact terms in the two-point amplitude. If we neglect the two terms inside the brackets we have the Einstein–Hilbert result. The 1/6 factor is the well known vDVZ mass discontinuity. Since \(f_5(a,b)>0\), we see that the massive scalar field contribution is no cure for the mass discontinuity which is pushed even farther away.

4.4 TDiff coupled to sources

Now we return to the massless case and study the spin-0 and spin-2 contributions to scattering amplitudes in the pure tensor TDiff model (1). We couple the massless TDiff model \(\mathcal {L}(a,b)\) to sources, but from a slightly different perspective from [18, 19]. First we remark that invariance of the source term \(\int d^D x T_{\mu \nu }h^{\mu \nu }\) under TDiff does not require conserved sources, it only demandsFootnote 2 that

$$\begin{aligned} \partial _\mu T^{\mu \nu }=\partial ^\nu J, \end{aligned}$$
(75)

where J(x) is some local scalar quantity, which simply measures the non-conservation of \(T_{\mu \nu }\).

First of all, we have found interesting to rewrite the source term following a curious non local correspondence that we have found between \(\mathcal {L}(a,b)\) and the massless theory \(\mathcal {L}_0^{ST}\) given in (53). Notice that after the invertible redefinition:

$$\begin{aligned} h_{\mu \nu }=H_{\mu \nu }-\frac{(a-1)}{D-2}\eta _{\mu \nu }\Phi \quad ; \quad \varphi =-H+\frac{(a D-2)}{D-2}\Phi , \end{aligned}$$
(76)

the Lagrangian (53) decouples

$$\begin{aligned} \mathcal {L}_0^{ST}=\mathcal {L}_{EH}(H_{\mu \nu })+\frac{f_{D}}{4(D-2)}\Phi \square \Phi , \end{aligned}$$
(77)

The fact that (77) has exactly the same particle content of \(\mathcal {L}(a,b)\) in D dimensions has inspired us to find a closer connection between those models. If we substitute \(h_{\mu \nu }\rightarrow h_{\mu \nu }+\frac{\partial _\mu \partial _\nu }{\square }\varphi \) in (1), we exactly reproduce (53). The field \(\varphi \) works like a Stückelberg field for longitudinal diffeomorphisms (LDiff). It leads to the symmetry \(\delta h_{\mu \nu } = \partial _\mu \partial _\nu \Lambda \), \(\delta \varphi = - \square \Lambda \), thus enlarging TDiff to Diff. By further taking into account (76), we can go from (1) directly to the decoupled scalar–tensor model (77) through the singular non invertible redefinitionFootnote 3

$$\begin{aligned} h_{\mu \nu }=H_{\mu \nu }-\frac{\partial _\mu \partial _\nu }{\square }H+\frac{(a D-2)}{D-2}\frac{\partial _\mu \partial _\nu }{\square }\Phi +\frac{(1-a)}{D-2}\eta _{\mu \nu }\Phi . \end{aligned}$$
(78)

In particular, at the D dimensional WTDiff point \((a,b)=\left( \frac{2}{D},\frac{D+2}{D^2}\right) \), where \(f_{D}=0\), the non invertible redefinitionFootnote 4\(h_{\mu \nu }=H_{\mu \nu }-\frac{\partial _\mu \partial _\nu }{\square }H\) takes us from the WTDiff model to the linearized EH theory. The Weyl symmetry is broken and replaced by LDiff, such that WTDiff\(\rightarrow \)Diff.

Now we are ready to go back to \(\mathcal {L}(a,b)\) with sources. If we substitute (78) in the source term and use (75), we have (up to total derivatives)

$$\begin{aligned} h_{\mu \nu }T^{\mu \nu }=H_{\mu \nu }\tilde{T}^{\mu \nu }+\Phi \tilde{J}, \end{aligned}$$
(79)

where

$$\begin{aligned} \tilde{T}_{\mu \nu }=T_{\mu \nu }-\eta _{\mu \nu }J \quad ; \quad \tilde{J}=\frac{1-a}{D-2}T+\frac{aD-2}{D-2}J \end{aligned}$$
(80)

The conservation of \(\tilde{T}_{\mu \nu }\) follows from (75):

$$\begin{aligned} \partial ^\mu \tilde{T}_{\mu \nu }=0. \end{aligned}$$
(81)

Expressions (77) and (79) suggest that \(\tilde{T}_{\mu \nu }\) and \(\tilde{J}\) are the right source combinations which couple to the spin-2 and spin-0 modes of the TDiff theory without mixing. Indeed, we can invert (80) and obtain

$$\begin{aligned} T_{\mu \nu }=\tilde{T}_{\mu \nu }+\eta _{\mu \nu }\left[ \tilde{J}+\frac{(a-1)}{D-2}\tilde{T}\right] . \end{aligned}$$
(82)

Since (80) is a fully invertible source redefinition, we interpret (82) as a convenient way, without loss of generality, of rewriting the original non conserved source in terms of a conserved one. Adding the source term and a gauge fixing one to \(\mathcal {L}(a,b)\) , we have

$$\begin{aligned} \mathcal {L}(a,b,T)=\frac{1}{2}h_{\mu \nu }\mathcal {O}^{\mu \nu ,\alpha \beta }h_{\alpha \beta }+ h_{\mu \nu }T^{\mu \nu }. \end{aligned}$$
(83)

where, suppressing indices,

$$\begin{aligned} \mathcal {O}= & {} \frac{\square }{2}P_{ss}^{(2)}+\frac{(1-b(D-1))\square }{2}P_{ss}^{(0)}+\frac{(2a-b-1)\square }{2}P_{ww}^{(0)}\nonumber \\&\quad +\frac{\sqrt{D-1}(a-b)\square }{2}(P_{sw}^{(0)}+P_{ws}^{(0)})\nonumber \\&\quad - \lambda \frac{\square ^3}{2}P_{ss}^{(1)}. \end{aligned}$$
(84)

The last term stands for the transverse gauge \(\lambda (\partial _\alpha \partial _\mu \partial _\nu h^{\mu \nu }-\square \partial ^\mu h_{\alpha \mu })^2/2\) as in [18, 19]. Computing the two point amplitude in momentum space, the gauge parameter disappears as expected and we haveFootnote 5,

$$\begin{aligned} \mathcal {A}(p)= & {} -i T^{*\mu \nu }\mathcal {O}^{-1}_{\mu \nu ,\alpha \beta }T^{\alpha \beta }=\frac{2i}{p^2}\left[ \tilde{T}^{*\mu \nu }\tilde{T}_{\mu \nu }\right. \nonumber \\&\quad \left. -\frac{|\tilde{T}|^2}{D-2}+\frac{D-2}{f_{D}(a,b)}|\tilde{J}|^2 \right] . \end{aligned}$$
(85)

The first two terms inside the brackets are precisely the same ones of the linearized Einstein–Hilbert (LEH) theory. Consequently, TDiff is consistent with any experimental results from LEH as far as the scalar contribution \(\frac{D-2}{f_{D}}|\tilde{J}|^2\) is small enough. Notice that only positive deviations from LEH could be explained by a scalar contribution since \(f_{D}>0\). In particular, if we take \(\tilde{J}=c\, \tilde{T}\) we would have from (85), in \(D=4\), the following exceeding change in the deflection angle of the stars light by the sun: \(\theta _{TDiff} = [1+4\, c^2/f_4(a,b)]\theta _{LEH}\).

Moreover, we can calculate the imaginary part of the residue of \(\mathcal {A}(p)\) at \(p^2\rightarrow 0\) and easily check that it is positive if \(f_{D}(a,b)>0\). Since the source parametrization (82) is completely general, we have an explicitly covariant off-shell proof of unitarity of \(\mathcal {L}(a,b)\).

5 Massive scalar–tensor models in curved backgrounds

This section is a preliminary step towards the construction of possible generalizations of the massive scalar tensor theory discussed before, beyond flat backgrounds. Following the approach of [37, 38], we investigate under which conditions we have the correct number of degrees of freedom in a curved background. See also [3] and references therein and more recently [24, 39].

5.1 The constraints

In \(D=4\), our scalar–tensor models will contain a set of 11 independent variables, \(h_{(\mu \nu )}\) and \(\phi \). This number must be reduced to 6 since we need 5 for the massive spin-2 and 1 for the massive scalar. So we have to find out 5 constraints from the equations of motion. We need to search for a vector and a scalar constraint. This is true also for any dimension \(D\ge 3\).

We start writing the most general, up to second order in derivatives and quadratic in the fields, massive scalar–tensor model coupled to an external gravitational field via covariant derivatives and the addition of non-minimal coupling terms linear in curvatures. We begin with 14 arbitrary coefficients, including different mass terms for the tensor and scalar fields.

$$\begin{aligned} \mathcal {L}&=\mathcal {L}_{min}+ \dfrac{a_1}{2} R h_{\mu \nu }h^{\mu \nu }+\dfrac{a_2}{2} R h^2+\dfrac{a_3}{2} C^{\mu \alpha \nu \beta }h_{\mu \nu }h_{\alpha \beta }\nonumber \\&\quad +\dfrac{a_4}{2} \tilde{R}^{\mu \beta }h_{\mu \nu }h^{\nu }{}_{\beta }+\dfrac{a_5}{2} \tilde{R}^{\mu \nu }h_{\mu \nu } h + \nonumber \\&\quad + \dfrac{b_1}{2} R\varphi ^2 + \dfrac{b_2}{2} R h \varphi + \dfrac{b_3}{2} \tilde{R}^{\mu \nu }h_{\mu \nu }\varphi -\dfrac{m^2}{4}h^{\mu \nu }h_{\mu \nu }\nonumber \\&\quad +c_1 \dfrac{m^2}{4}h^2+c_2 \dfrac{m^2}{2}h\varphi +c_3 \dfrac{m^2}{4}\varphi ^2. \end{aligned}$$
(86)

where we define the traceless part of the Ricci curvature \(\tilde{R}_{\mu \nu } = R_{\mu \nu }-\eta _{\mu \nu }\frac{R}{D}\) and the minimally coupled TDiff-like scalar tensor theory,

$$\begin{aligned} \mathcal {L}_{min}= & {} \mathcal {L}_{a,b}^{\nabla } + \dfrac{x}{2}\varphi \nabla _\mu \nabla _\nu h^{\mu \nu } + \dfrac{y}{2}h\square \varphi + \dfrac{z}{4}\varphi \square \varphi , \end{aligned}$$
(87)
$$\begin{aligned} \mathcal {L}_{a,b}^{\nabla }= & {} -\dfrac{1}{4}\nabla _\mu h^{\alpha \beta }\nabla ^\mu h_{\alpha \beta }+\dfrac{1}{2}\nabla ^\mu h^{\alpha \beta } \nabla _{\alpha }h_{\mu \beta }\nonumber \\&\quad -\dfrac{a}{2}\;\nabla ^\mu h \nabla ^\nu h_{\mu \nu }+\dfrac{b}{4}\nabla _\mu h \nabla ^\mu h. \end{aligned}$$
(88)

The equations of motion of (86) with respect to \(h_{\mu \nu }\) are given by

$$\begin{aligned} E_{\mu \nu }&=\square h_{\mu \nu }-\nabla ^\alpha (\nabla _\nu h_{\mu \alpha }+\nabla _\mu h_{\nu \alpha })\nonumber \\&\quad +a \nabla _\mu \nabla _\nu h+a g_{\mu \nu } \nabla ^\alpha \nabla ^\beta h_{\alpha \beta }-b g_{\mu \nu }\square h + \nonumber \\&\quad +x\nabla ^\mu \nabla ^\nu \varphi + y \; g_{\mu \nu }\square \varphi + \mathbf {o}(h,\varphi )=0, \end{aligned}$$
(89)

where \( \mathbf {o}(h,\varphi ) \) denotes terms with less than two derivatives. Explicitly, we have

$$\begin{aligned} \mathbf {o}(h,\varphi )&= 2 a_1 R h_{\mu \nu }+ 2 a_2 R g_{\mu \nu }h+2a_3 C_{\mu \alpha \nu \beta }h^{\alpha \beta }\nonumber \\&\quad +a_4 (\tilde{R}_{\mu \beta }h_\nu {}^{\beta }+\tilde{R}_{\nu \beta }h_\mu {}^{\beta })\nonumber \\&\quad +a_5(\tilde{R}_{\mu \nu }h+g_{\mu \nu }R^{\alpha \beta }h_{\alpha \beta }) + b_2 g_{\mu \nu }R \varphi \nonumber \\&\quad +b_3 \tilde{R}_{\mu \nu }\varphi -m^2 h_{\mu \nu }+c_1 \, m^2 g_{\mu \nu }h +c_2 \, m^2 g_{\mu \nu }\varphi . \end{aligned}$$
(90)

The equation of motion of (86) with respect to \(\varphi \) is

$$\begin{aligned} \Psi =x\;\nabla ^\mu \nabla ^\nu h_{\mu \nu }+y\;\square h+z\;\square \varphi + \mathbf {o}(h,\varphi )=0, \end{aligned}$$
(91)

with

$$\begin{aligned} \mathbf {o}(h,\varphi )= 2 b_1 R \varphi + b_2 R h+b_3 \tilde{R}_{\mu \nu }h^{\mu \nu }+c_3\;m^2\varphi +c_2\;m^2 h. \end{aligned}$$
(92)

Similarly to what we have done in the flat space in Sect. 3.1 we try to build a vector constraint via the linear combination:

$$\begin{aligned} \phi _\nu =\nabla ^\mu E_{\mu \nu }+A_1\nabla _\nu g^{\alpha \beta }E_{\alpha \beta }+A_2\nabla _\nu \Psi =0. \end{aligned}$$
(93)

One can show that

$$\begin{aligned} \phi _\nu =&\nabla _\nu \square h\left[ (a-b)+A_1(1+a-b D)+A_2 y\right] \nonumber \\&\quad + \nabla _\nu \nabla ^{\mu }\nabla ^{\alpha }h_{\mu \alpha }\left[ (a-1)+A_1(a D-2)+A_2 x\right] \nonumber \\&\quad + \nabla _\nu \square \varphi \left[ (x+y)+A_1(x+y D)+A_2 z \right] +\cdots , \end{aligned}$$
(94)

where the dots denote terms that contain less than two covariant derivatives. We can guarantee the existence of a vector constraint if we get rid of the second covariant derivativesFootnote 6, namely,

$$\begin{aligned} {\left\{ \begin{array}{ll}(a-b)+A_1(1+a-b D)+A_2 y=0,\\ (a-1)+A_1(a D-2)+A_2 x=0, \\ (x+y)+A_1(x+y D)+A_2 z =0. \end{array}\right. } \end{aligned}$$
(95)

If \(f_D\equiv f_{D}(a,b)\ne 0\), we obtain,

$$\begin{aligned} A_1&=\dfrac{(b-a)x+(a-1)y}{x(1+a-bD)+y(2-aD)}, \nonumber \\ A_2&=\dfrac{f_{D}}{x(1+a-bD)+y(2-aD)}, \end{aligned}$$
(96)
$$\begin{aligned} z=&\dfrac{[b(D-1)-1]x^2+(D-2)y^2+2[(D-1)a-1]xy}{f_{D}}.\end{aligned}$$
(97)

The reader may worry about the denominators in (96), but it can be shown that if they vanish we necessarily have \(f_{D}=0\). Notice that no non minimal terms are required for the existence of the vector constraint.

Now we look for the scalar constraint via the Ansatz:

$$\begin{aligned} \Omega= & {} \nabla ^\nu \phi _\nu + (A_3 m^2+A_4 R)g^{\mu \nu }E_{\mu \nu }\nonumber \\&\quad + (A_5 m^2+ A_6 R)\Phi + A_7 \tilde{R}^{\mu \nu }E_{\mu \nu }=0. \end{aligned}$$
(98)

The combination \( \nabla ^\nu \phi _\nu \) is required in order to cancel the four-derivative terms. Collecting terms with two derivatives,

$$\begin{aligned} \Omega&= 2a_3 C_{\mu \alpha \nu \beta }\nabla ^\mu \nabla ^\nu h^{\alpha \beta }+\left[ a_5+b_3 A_2+(2a_4+a_5 D)A_1+A_7\right] \nonumber \\&\times \tilde{R}_{\mu \nu }\square h^{\mu \nu }+ \left( a+a_5+a A_7\right) \!\tilde{R}_{\mu \nu }\nabla ^\mu \nabla ^\nu h+ \tilde{R}_{\mu \nu }\nabla ^\mu \nabla ^\nu \varphi \nonumber \\&\times \left[ x(1+A_7)+b_3\right] + \tilde{R}_{\mu \nu }\nabla ^\mu \nabla _\alpha h^{\alpha \nu } \left[ 2(a_4-1)-2 A_7\right] \nonumber \\&+ R\nabla ^\mu \nabla ^\nu h_{\mu \nu }\left[ 2a_1-\dfrac{2}{D}+A_4(a D-2)+A_6 x \right] \nonumber \\&+ R\square \varphi \Bigg [b_2+\dfrac{x}{D}+A_1b_2 D+2A_2 b_1+A_4(x+Dy)+A_6 z\Bigg ] \nonumber \\&+ R\square h\Bigg [2a_2+\dfrac{a}{D}+A_1(2a_2D+2a_1)+A_2 b_2 + A_4(1+a-bD)\nonumber \\&+A_6 y\Bigg ] + m^2\nabla ^\mu \nabla ^\nu h_{\mu \nu }\left[ -1+A_3(aD-2)+A_5x \right] \nonumber \\&+ m^2\square h\left[ c_1+A_1(c_1 D-1)+A_2 c_2+A_3(1+a-bD)+A_5y\right] \nonumber \\&+ m^2\square \varphi \left[ c_2+A_1Dc_2+A_2c_3+A_3(x+Dy)+A_5z\right] +\dots = 0 \, . \end{aligned}$$
(99)

Once again dots stand for less than two derivatives. The first term of (99) impliesFootnote 7\(a_3=0\). By using (96) in the four terms with \(\tilde{R}_{\mu \nu }=0\), i.e. second to fifth terms, in (99), we can show that it is impossible to simultaneously get rid of these four terms. Consequently, we assume Einstein spaces:

$$\begin{aligned} \tilde{R}_{\mu \nu }=0 . \end{aligned}$$
(100)

The other six terms allow us to solve for the coefficients \(A_3,A_4,A_5,A_6\), fixing the scalar constraint and determining two out of the seven coefficients \(a_1,a_2,b_1,b_2,c_1,c_2\) and \(c_3\). The conclusion is that regardless of the coefficients in the Lagrangian, see (86), Einstein spaces are always required. This is also the case when \(f_{D}=0\), not considered in (96), it contains the massive Fierz–Pauli [37, 38] and massive WTDiff [24] theories. The later one corresponds to the FP case with the replacement \(h_{\mu \nu } \rightarrow h_{\mu \nu } - (h/D + \phi ) g_{\mu \nu }\), see [24]. It is not difficult to see that equation (95) is satisfied. Explicitly, for FP, \(\varphi =0=A_1\Rightarrow \phi _\nu =\nabla ^\mu E_{\mu \nu }=0\), whereas for WTDiff, \(A_2=1/D\) and \(A_1\) is redundant since \(g^{\mu \nu }E_{\mu \nu }=0\) due to the Weyl symmetry, so that \(\phi _\nu = \nabla ^\mu E_{\mu \nu } - 1/D \nabla _\nu \Phi =0\). The scalar constraint requires Einstein spaces and non minimal couplings. If one adds extra non minimal terms including higher powers in the curvature and negative powers of the mass, like e.g. \(R^4/m^2\), it turns out [37, 38] that one can lift the requirement of Einstein spaces. Moreover, it is remarkable [40, 41] that the non minimal coefficients can be obtained from the ghost free massive gravity theory of [9, 11] by eliminating the fiducial metric \(f_{\mu \nu }\) order by order in \(1/m^2\) in terms of the background metric \(g_{\mu \nu }^{(0)}\) around which one expands the dynamic metric \(g_{\mu \nu }=g_{\mu \nu }^{(0)}+h_{\mu \nu }\) or by the expansion, see [42], of the bimetric model of [12] in the massive gravity limit. Here we only work with non singular powers of the mass and stick to Einstein spaces.

Back to our scalar–tensor models (\(f_{D}\ne 0\)), that describe 6 degrees of freedom in \(D=3+1\), let us find out an explicit expression for the curved version of the massive extension (52). Fixing \(x=a-1\), \(y=a-b\), \(z=2a-b-1\), \(c_1=b\), \(c_2=-y\), \(c_3=-z\), and solving (95) and (99) for Einstein spaces, we obtain a consistent family of curved background generalizations of (52),

$$\begin{aligned} \mathcal {L}_m= & {} \mathcal {L}^{\nabla }_{TDiff} -\dfrac{m^2}{4}(h_{\mu \nu }h^{\mu \nu }-b\;h^2)\nonumber \\&+\dfrac{a-1}{2}\varphi \nabla _\mu \nabla _\nu h^{\mu \nu } + \dfrac{a-b}{2}h(\square -m^2)\varphi + \nonumber \\&+\dfrac{2a-b-1}{4}\varphi (\square -m^2)\varphi + \dfrac{a_1}{2} R h_{\mu \nu }h^{\mu \nu } + \dfrac{a_2}{2}R h^2 \nonumber \\&+ \dfrac{b_1}{2}R \varphi ^2 + \dfrac{a_1+a_2+b_1-1/(2D)}{2}R h \varphi \end{aligned}$$
(101)

where \(\mathcal {L}^{\nabla }_{TDiff}\) is given in (88). Although \(a_1,a_2\) and \(b_1\) are free, there is no way of avoiding non minimal couplings just like in the Diff and WTDiff cases.

5.2 Massless symmetries

It is expected that local symmetries of the massless theory play a key role in the derivation of constraints in massive theory as in the usual flat space Maxwell–Proca and Fierz–Pauli models. So now we investigate the interesting interplay between the massless symmetries and the constraints.

From (86) with \(m=0\), we require its invariance under linearized generalized Diff

$$\begin{aligned} \delta h_{\mu \nu }= \nabla _\mu \xi _\nu +\nabla _\nu \xi _\mu + k_1\; g_{\mu \nu } \nabla \cdot \xi , \qquad \delta \varphi =k_2\;\nabla \cdot \xi , \qquad (k_1,k_2)\in \mathbb {R}. \end{aligned}$$
(102)

where \(k_1,k_2\) must be determined. By explicit computation, we reach the same conclusions as before: Einstein spaces are required, and \(a_3=0\). Moreover, we still have:

$$\begin{aligned} \delta \mathcal {L}&=\dfrac{R}{D}h_{\mu \nu }\nabla ^\mu \xi ^\nu [2 (a_1 D - 1) ]+\dfrac{R}{D}h\nabla \nonumber \\&\quad \cdot \xi \left[ a+a_1 k_1 D+a_2(k_1 D+2)D+\dfrac{b_2}{2}k_2 D\right] \nonumber \\&\quad + \dfrac{R}{D}\varphi \nabla \cdot \xi \left[ x+b_1 k_2 D+\dfrac{b_2}{2}(k_1 D+2)D\right] \nonumber \\&\quad + h \square \nabla \cdot \xi \left[ \dfrac{a}{2}(k_1+2) - \dfrac{b}{2}(2+k_1 D)+\dfrac{k_1}{2}+\dfrac{y}{2}k_2\right] \nonumber \\&\quad +\nabla _\mu \nabla \cdot \xi \nabla _\nu h^{\mu \nu }\left[ 1+k_1-\dfrac{a}{2}(k_1 D+2)-\dfrac{x}{2}k_2\right] \nonumber \\&\quad + \varphi \square \nabla \cdot \xi \left[ \dfrac{x}{2}(2+k_1)+\dfrac{y}{2}(2+k_1 D)+\dfrac{z}{2}k_2\right] . \end{aligned}$$
(103)

So we do have TDiff symmetry (\(\nabla \cdot \xi =0\)) in the massless version of (86) on Einstein spaces if \((a_1,a_3)=(1/D,0)\). Notice however, that the vanishing of the last three brackets supplies us with the same three equations (95) necessary for the vector constraint by identifying \((A_1,A_2)=(k_1/2,k_2/2)\). Thus, Diff symmetry at \(m=0\) implies the vector constraint. Conversely and more interesting, the existence of the vector constraint implies Diff symmetry at \(m=0\) in the flat limit. In particular, there is no hope of a consistent massive scalar tensor model whose massless limit is only TDiff invariant in the flat space. Although TDiff is the minimal symmetry for massless spin-2 particles, the massive case seems to require full Diff in the massless sector. Finally, we stress that, as in the \(f_D=0\) case [37, 38], within the truncations made here we can have a massive scalar tensor model in Einstein spaces with a scalar and a vector constraint without any local symmetry at \(m=0\). The requirement of those symmetries imposes further constraints on the non minimal coefficients. For instance, in order that we have Diff symmetry in the massless limit of (101) we must have,

$$\begin{aligned} a_1= \frac{1}{D}; \quad b_1=a_2+\frac{a}{D}-\frac{1}{2D} \end{aligned}$$
(104)

We recall the reader that in Einstein spaces the scalar curvature R is constant and plays a similar role of \(m^2\). There are special values of \(m^2/R\) for which vector and scalar symmetries show up which are under investigation.

At last, we point out that the existence of constraints is equivalent to a simple counting of degrees of freedom, a necessary but not a sufficient condition for the propagation of physical particles in a curved background, unitarity and causality should be further investigated even within the truncations made here.

6 Conclusion

We have made the Kaluza–Klein dimensional reduction from the \(D+1\) TDiff model (8) to D dimensions and have obtained, for arbitrary values of (ab), the simple expression (39) in terms of gauge invariant fields. The connection between the original tensor field \(h_{\mu \nu }\) and \(H_{\mu \nu }^{TD}\) is nontrivial in general but we have shown the existence of unitary gauges in all cases. The case \(f_{D+1}(a,b)=0\) contains the Diff model, linearized general relativity, and WTDiff, linearized unimodular gravity. Both cases lead to massive spin-2 particles. All remaining cases correspond to a massive scalar tensor model. We point out that there is no guarantee, in general, that one will be able to write down the reduced model in terms of field combinations involving the Stückelberg fields invariant under all local symmetries. It may happen, see [43], that there is no combination of fields large enough to accommodate all symmetries, so part of them must be realized dynamically by combining the different terms in the Lagrangian.

In Sect. 3.1.2 we have investigated the nontrivial issue of gauging away Stückelberg fields at action level and showed that the completeness (uniqueness) criterion of [27, 28] fits well with the unitary requirements for a massive spin-2 theory. The message is that we have to be careful and avoid gauge conditions with residual symmetries.

In Sect. 3.2 we have checked, in all cases, that the massless limit of the reduced model (39) contains the same number of degrees of freedom as the corresponding massive theory, thus providing a smooth massless limit.

In Sect. 4 we obtain the massive scalar tensor model (52) via gauge fixing of the reduced model. The scalar and tensor degrees of freedom can not be decoupled by any local field redefinition. By means of a helicity decomposition we have identified the different helicity modes. We have also worked out the equations of motion and the necessary vector and scalar constraints for a massive scalar tensor theory. The coupling to arbitrary sources have allowed us to prove unitarity in a covariant way and to show that the scalar field can only worsen the mass discontinuity problem [5, 6].

The massless version of (52), i.e., (53) is invariant under Diff and not TDiff, similarly the massless limit of the WTDiff reduced model investigated in [24] is invariant under WDIFF and not WTDiff. In Sect. 5 we show, starting from a more general Ansatz, that in order to have the vector and the scalar constraints in a second order scalar tensor model we do need Diff symmetry in the massless sector in flat space. So the absence of a consistent Lorentz covariant mass term for spin-2 particles in the pure tensor TDiff model (1) seems to be more general. Apparently, although TDiff is the minimal symmetry [17] for massless spin-2 particles, the massless sector must be Diff invariant in order that the mass terms produce the necessary number of constraints for the elimination of non physical degrees of freedom.

We have found a non local correspondence (78) between the pure tensor model (1) and (41) leading to a parametrization (82) of the non conserved source allowing an explicitly covariant proof of unitarity of (1) and a clear separation of the spin-2 and spin-0 contributions to the two point amplitude of the pure tensor TDiff model, see (85). The relationship (78) takes us also from the WTDiff model to the Einstein–Hilbert theory. We wonder whether there would be any nonlinear generalization connecting unimodular gravity to general relativity.

Finally, in Sect. 5 we investigate linearized massive scalar tensor theories in curved backgrounds. We go beyond the scalar tensor theory obtained via dimensional reduction and start from a rather general second order, in derivatives, Ansatz with 14 parameters and non minimal terms linear in the curvature. The requirement of a vector plus a scalar constraint leads to the Eqs. (62) and (99). Instead of a detailed analysis of a complicate system of equations (under investigation) we show that the specific flat space massive model (52) does admit a family of curved background extensions on Einstein spaces with three free non minimal parameters. If we demand Diff symmetry at \(m=0\), only one parameter, see (104), remains arbitrary. The natural way of lifting the Einstein spaces demand is to include non minimal terms with higher powers in the curvature and negative powers in \(m^2\) as in [37, 38]. We may go even ahead that perturbative approach and try to formulate a massive scalar tensor gravity along the lines of [9] or even a scalar bi-tensor theory generalizing [12]. This is beyond the scope of the present work.