Detection of γ -ray emission from the Sagittarius Dwarf Spheroidal galaxy

The Fermi Bubbles are giant, γ -ray emitting lobes emanating from the nucleus of the Milky Way[1, 2] discovered in ∼ 1-100 GeV data collected by the Large Area Telescope on board the Fermi Gamma-Ray Space Telescope[3]. Previous work[4] has revealed substructure within the Fermi Bubbles that has been interpreted as a signature of collimated outﬂows from the Galaxy’s super-massive black hole. Here we show that much of the γ -ray emission associated to the brightest region of substructure – the so-called cocoon – is actually due to the Sagittarius dwarf spheroidal (Sgr dSph) galaxy. This large Milky Way satellite is viewed through the Fermi Bubbles from the position of the Solar System. As a tidally and ram-pressure stripped remnant, the Sgr dSph has no on-going star formation, but we demonstrate that its γ -ray signal is naturally explained by inverse Compton scattering of cosmic microwave background photons by high-energy electron-positron pairs injected by the dwarf’s millisecond pulsar (MSP) population, combined with these objects’ magnetospheric emission. This ﬁnding suggests that MSPs likely produce signiﬁcant γ -ray emission amongst old stellar populations, potentially confounding indirect dark matter searches in regions such as the Galactic Centre, the Andromeda galaxy, and other massive Milky Way dwarf spheroidals. Early of from the Fermi Large

origin. However, the cocoon is also spatially coincident with the core of the Sagittarius dwarf spheroidal galaxy (Sgr dSph [6]; Figure 1b), a satellite of the Milky Way that is in the process of being accreted and destroyed, as tidal forces gradually strip stars out of its core into elongated streams [7]. The chance probability of such an alignment is low, ∼ 1% (see the Supplementary Information; S.I. sec. 1), even before accounting for the fact that the cocoon and the Sgr dSph have similar shapes and orientations, and that the Sgr dSph is both one of the nearest and most massive (d = 26.5 kpc, M ∼ 10 8 M ; [8,9]) Milky Way satellites and has the largest mass divided by distance squared of any astronomical object not yet detected in γ-rays.
We therefore consider emission from the Sgr dSph as an alternative origin for the cocoon. In order to test this possibility, we fit the γ-ray emission observed by Fermi-LAT over a region of interest (ROI) containing the cocoon via template analysis. In our baseline model these templates include only known point sources and sources of Galactic diffuse γ-ray emission. We contrast the baseline with a baseline + Sgr dSph model that invokes these same templates plus an additional template constructed to be spatially coincident with the bright stars of the Sgr dSph (Extended Data (E.D.) Figure 1 and S.I. Figure 1); full details of the fitting procedure are provided in Methods and S.I. sec. 3. Using the best motivated choice of templates, we find that the baseline + Sgr dSph model is preferred at 8.1σ significance over the baseline model. We also repeat the analysis for a wide range of alternative templates for both Galactic diffuse emission and for the Sgr dSph (Table 1) and obtain > 5σ detections for all combinations but one. Moreover, even this is an extremely conservative estimate, because our baseline model uses a structured template for the FBs that absorbs some of the signal that is spatially coincident with the Sgr dSph into a structure of unknown origin. If we follow the method recommended by the Fermi collaboration [2] and use a flat FB template in our analysis, the significance of our detection of the Sgr dSph is always > 14σ . Despite this, for the remainder of our analysis we follow the most conservative choice by using the structured template in our baseline model. In Methods, we also show that our analysis passes a series of validation tests: the residuals between our best-fitting model and the data are consistent with photon counting statistics (E.D. Figure 2 and Figure 3), our pipeline reliably recovers synthetic signals superimposed on a realistic background (E.D. Figure 4), fits using a template tracing the stars of the Sgr dSph yield significantly better results than fits using purely geometric templates (S.I. Table 1), and if we artificially rotate the Sgr dSph template on the sky, the best-fitting position angle is very close to the actual one (E.D. Figure 5). By contrast, if we displace the Sgr dSph template, we find moderate (4.5σ significance) evidence that the best-fitting position is ∼ 4 • from the true position, in a direction very closely aligned with the dwarf galaxy's direction of travel (E.D. Figure 5); this plausibly represents a small, but real and expected (as explained below) physical offset between the stars and the γ-ray emission.
The directly-measured flux from the Sgr dSph, derived from our fiducial choice of templates, corresponds to a luminosity of (3.8 ± 0.6) × 10 36 erg/s (1σ error) for γ-ray photons in the range from 0.5 to 150 GeV (equivalently ∼ 4 × 10 28 erg/s/M ). Over this range the spectrum is approximately described by a hard power law dF γ /dE γ ∝ ∼ E −2.1 γ (Figure 2). There is no evidence for a cut-off at high energies. We show in E.D. Figure 6 that this spectral shape is qualitatively insensitive to the choice of foreground templates, and E.D. Figure 7 demonstrates that the spectra we recover for the various foregrounds within the ROI remain physically plausible when we introduce a Sgr dSph template.
Since our template fits strongly suggest that there is a real γ-ray emission component tracing the Sgr dSph, a natural next question is what mechanism could be responsible for producing it. The core of the Sgr dSph is the remnant of a once much more massive galaxy. Tidal and ram pressure stripping removed its gas and caused it to cease forming stars 2 − 3 Gyr ago [10], though it did experience punctuated bursts of star formation [11] -triggered by its crossings through the Galactic plane [12]up to that time. In the MW, the dominant source of diffuse γ-ray emission is collisions between (hadronic) cosmic rays (CRs) and ambient instellar medium (ISM) gas nuclei [13], but this mechanism cannot operate in the Sgr dSph, which lacks both 'target' gas with which CRs could interact, and supernova explosions from young, massive stars to accelerate hadronic CRs in the first place. Stellar γ-ray emission is also ruled out: while our Sun is a source of ∼ 100 GeV γ-rays, this emission is again dominantly due to collisions between hadronic CRs from the wider Galaxy and Solar gas; γ-ray emission from non-thermal particles accelerated by the Sun itself only extends to 4 GeV [14]. This leaves two possibilities for the Sgr dSph γ-ray signal: it is created from the self-annihilation of dark matter (DM) particles in the dwarf's DM halo, or by millisecond pulsars (MSPs) deriving from the stars of the Sgr dSph. The former is unlikely because the γ-ray signal largely traces the stars of the dwarf, while N-body simulations [12] show that the Milky Way's tidal field will have overwhelmingly dispersed the progenitor galaxy's original DM halo into the stream over its orbital history.
MSPs, by contrast, should follow the same spatial distribution as the rest of the stellar population, have a spin-down timescale O(Gyr), long enough to be compatible with the most recent episodes of Sgr dSph star formation, and radiate some part of their magnetic dipole luminosity into γ-rays. However, there are two significant challenges to this scenario: First, the inferred γ-ray luminosity per unit stellar mass is much larger ( 10×) for the Sgr dSph than for some other systems whose detected γ-ray emission is plausibly dominated by MSPs including the Galactic Bulge [15][16][17][18][19] and Andromeda [20,21] (M31), the giant spiral galaxy nearest to the Milky Way (although it is smaller than that observed for globular clusters -see Figure 3). Second, the hard, ∝ ∼ E −2.1 γ spectrum of the Sgr dSph ( Figure 2) does not resemble the classic ∼ few GeV bump (in the spectral energy distribution) of the magnetospheric γ-ray signal detected from individual MSPs or the globular clusters (GCs) that host 2/32 populations of MSPs: e.g. [22].
However, both of these challenges can be overcome by considering how the stellar population and interstellar environment of the Sgr dSph differ from other systems. With regard to stars, those in the Sgr dSph are both younger and more metal-poor than those of M31 or the Galactic Bulge; metal-poor stellar systems are expected to produce more MSPs per stellar mass [23], and ∼ 7 − 8 Gyr-old MSPs (the rough age of the Sgr dSph population) are expected to be significantly brighter than 10 − 12 Gyr-old ones (the ages of stellar populations in the Bulge and the core of M31) [19]. In S.I. sec. 5 we show that the best-fit value for the γ-ray luminosity of the Sgr dSph is fully consistent with both theoretical predictions and with observations of other γ-ray emitting old stellar populations once age and metallicity are taken into account. On the basis of stellar population synthesis models, we estimate that the γ-ray luminosity of the Sgr dSph is produced by ∼ 650 MSPs.
With regard to environment, note that while the spectrum of the Sgr dSph does not resemble an MSP magnetospheric signal, it does resemble inverse Compton (IC) emission from the up-scattering (by a CR electron-positron population; e ± ) of ambient light which, for the Sgr dSph, is dominated by the Cosmic Microwave Background (CMB). We also know that MSPs produce e ± with energies of at least a few TeV, since these are the particles that ultimately drive the observed GeV MSP γ-ray photospheric emission. Some of these e ± will give up all their energy within the MSP magnetosphere. However, given the expected absence of wind nebulae or supernova remnants surrounding these old, low luminosity objects [24], many will freely escape both magnetosphere and MSP environs into the larger Sgr dSph environment [25,26] where they can IC up-scatter CMB photons. In an environment like Andromeda or the Galactic Bulge, this IC signal will be weak (albeit detectable in the case of the Bulge according to [19]), because much of the escaping e ± energy will be lost to synchrotron rather than IC radiation. In an ultra-gas poor system like the Sgr dSph, however, we expect the ISM magnetic field to be far weaker than in a gas-rich galaxy ([27]; also see Methods) with an energy density significantly smaller than that in the CMB; thus radiative losses from MSP-escaping e ± are overwhelmingly into hard-spectrum IC γ-rays rather than (radio to X-ray) synchrotron radiation. Consistent with this explanation, globular clusters -which are also gas-poor and weakly-magnetized -represent another environment where MSP-driven γ-ray emission seems to sometimes include a significant IC component [22]. We formalize this intuitive argument in Methods, where we show that the spectrum of the Sgr dSph is extremely well fit as a combination of IC and magnetospheric radiation with self-consistently related spectral parameters. This scenario also explains why the γ-ray signal is displaced ∼ 4 • , or about 1.9 kpc (E.D. Figure 5, right), from the center of the Sgr dSph; the dwarf's Northward proper motion [28] means this displacement is backwards along its path. As the Sgr dSph plunges through the Milky Way halo, the magnetic field around it will be elongated into a magnetotail oriented backwards along its trajectory, and e ± emitted into the dwarf will be trapped by these magnetic field lines, leading them to accumulate and emit in a position that trails the Sgr dSph, exactly as we observe. We offer a more quantitative evaluation of this scenario in S.I. sec. 4.
There are a number of implications of the discovery of γ-rays from the Sgr dSph. Firstly, this study largely removes any residual motivation for the idea that the cocoon sub-structure of the Fermi Bubbles be interpreted as evidence for γ-ray jets launched from the Galactic nucleus. Secondly, our results motivate the introduction of stellar templates into the analysis of data from all γ-ray resolved galaxies (M31, Large and Small Magellanic Clouds) to probe the contribution of MSPs. In general, our study lends support to the argument [24] that MSPs contribute significantly to the energy budget of CR e ± in galaxies with low specific star-formation rates. Third and finally, we show in the SI that naive extrapolation of the Sgr dSph MSP γ-ray luminosity per unit mass to other nearby dSph galaxies suggests that they likely have considerably larger astrophysical γ-ray signatures than previous estimates; we report our revised estimates in S.I. Table 2 for the sample of ref. [29]. These signals are large enough that some are potentially detectable via careful analysis of Pass8 (15-year) Fermi-LAT data. Conversely, these brighter astrophysical signatures represent a larger-than-expected background with which searches for DM annihilation signals must contend, and potentially swamp DM signals in some nearby dwarfs. This finding strongly suggests that DM searches focus their efforts on the dSph galaxies where the MSP signal is expected to be weakest relative to the DM signal, and motivates further efforts to model the MSP contribution and its dependence on the age and metallicity of the stellar population. Such modelling will be critical to identifying the most promising targets for future DM annihilation searches.
Supplementary Information is available for this paper. Correspondence and requests for materials should be addressed to RMC or OM. . These data are as obtained by us in our Fermi-LAT data analysis as described in Methods. We have converted luminosities to surface brightnesses adopting source solid angles of Ω Sgr dSph = 9.6 × 10 −3 sr, and Ω FB = 0.49 sr, with the latter set by the 40 • × 40 • region of interest (ROI), not the intrinsic sizes of the Bubbles (which are larger than the ROI). Error bars show 1σ errors; for the Sgr dSph, the error bars incorporate both statistical and systematic errors added in quadrature. The smooth blue curves show (solid) the best fit combined (magnetospheric + IC) and (dashed) the best fit magnetospheric spectra. . γ-ray luminosity normalised to stellar mass for various structures whose emission is plausibly dominated by MSPs. The 'Sgr magneto.' datum shows our best-fit magnetospheric luminosity per stellar mass (the spectrum shown as the dashed blue curve in Figure 2) while the 'Sgr tot' datum is the total, directly-measured luminosity.

Data and template selection
We use eight years of LAT data, selecting Pass 8 UltraCleanVeto class events in the energy range from 500 MeV to 177.4 GeV. We choose the limit at low energy to mitigate both the impact of γ-ray leakage from the Earth's limb and the increasing width of the point-spread function at lower energies. We spatially bin the data to a resolution of 0.2 • , and divide it into 15 energy bins; the 13 lowest-energy of these are equally spaced in log energy, while the 2 highest-energy are twice that width in order to improve the signal to noise. We select data obtained over the same observation period as that used in the construction of the Fourth Fermi Catalogue (4FGL) [30] (August 4, 2008 to August 2, 2016). The region of interest (ROI) of our analysis is a square region defined by −45 • ≤ b ≤ −5 • , and 30 • ≥ ≥ −10 • (Figure 1). This sky region fully contains the Fermi cocoon substructure but avoids the Galactic plane (|b| ≤ 5 • ) where uncertainties are largest. Because the ROI is of modest size, we allow the Galactic diffuse emission (GDE) templates greater freedom to reproduce potential features in the data. We carry out all data reduction and analysis using the standard FERMITOOLS V1.0.1 1 software package. We model the performance of the LAT with the P8R3 ULTRACLEANVETO V2 Instrument Response Functions (IRFs). We fit the spatial distribution of the ROI data as the sum of a series of templates for different components of the emission. For all the templates we consider, we define a "baseline" model that includes only known point and diffuse emission sources, to which we compare a "baseline + Sgr dSph" model that includes those templates plus the Sgr dSph. Our baseline models, following the approach of Ref. [31], contain the following templates: (1) diffuse isotropic emission, (2) point sources, (3) emission from the Sun and Moon, (4) Loop I, (5) the Galactic Centre Excess, (6) Galactic cosmic ray-driven hadronic and bremsstrahlung emission, (7) inverse Compton emission, and (8) the Fermi Bubbles; baseline + Sgr dSph models also include a Sgr dSph template.
Our templates for the first five emission sources are straightforward, and we adopt a single template for each of them throughout our analysis. Since our data selection is identical to that used to construct the 4FGL, we adopt the standard isotropic background and point source models provided as part of the catalogue [30], iso − P8R3 − ULTRACLEANVETO − V2 − v1.txt, and gll psc v20.fit, respectively; the latter includes 177 γ ray point sources within our ROI. We similarly adopt the standard Sun and Moon templates provided. For the foreground structure Loop I, we adopt the model of Ref. [32]. Finally, given that the low-latitude boundary of our ROI overlaps with the spatial tail of the Galactic Centre Excess (GCE), we include the 'Boxy Bulge' template of Ref. [33], which has been shown [16-18] to provide a good description of the observed GCE away from the nuclear Bulge region (which is outside our ROI). The inclusion of this template in our ROI model has only a small impact on our results.
The remaining templates require more care. The dominant source of γ-rays within the ROI is hadronic and bremsstrahlung emission resulting from the interaction of Milky Way cosmic ray (CR) protons and electrons with interstellar gas; the emission rate is proportional to the product of the gas density and the CR flux. We model this distribution using three alternative approaches. Our preferred approach follows that described in Ref. [16]. We assume that the spatial distribution of γ-ray emission traces the gas distribution from the hydrodynamical model of Ref. [34], which gives a more realistic description of the inner Galaxy than alternatives. To normalise the emission, we divide the Galaxy into four rings spanning the radial ranges 0 − 3.5 kpc, 3.5 − 8.0 kpc, 8.0 − 10.0 kpc, and 10.0 − 50.0 kpc, within which we treat the emission per unit gas mass in each of our 15 energy bins as a constant to be fit. We refer to the template produced in this way as the "HD" model. Our first alternative is to use the same procedure of dividing the Galaxy into rings, but describe the gas distribution within those rings using a template constructed from interpolated maps of Galactic H I and H 2 , following the approach described in Appendix B of Ref. [35]; we refer to this as the "Interpolated" approach. Our third alternative, the "GALPROP" model, is the SA50 model described by Ref. [36], which prescribes the full-sky hadronic CR emission distribution.
We similarly need a model for diffuse, Galactic IC emission -the second largest source of background -which is a product of the CR electron flux and the interstellar radiation field (ISRF). As with hadronic emission, we consider four alternative distributions. Our default choice is the SA50 model described by Ref. [36], which includes 3D models for the ISRF [37]. We therefore refer to this as the "3D" model. However, unlike in Ref. [36], we use this model only to obtain the spatial distribution of the emission, not its normalisation or energy dependence. Instead, we obtain these in the same way as for our baseline hadronic emission model, i.e., we divide the Galaxy into four rings and leave the total amount of emission in each ring at each energy as a free parameter to be fit to the data; this approach reduces the sensitivity of our results to uncertainties in the electron injection spectrum and ISRF normalisation. Our three alternatives to this are models "2D A", "2D B", and "2D C", corresponding to models A, B, and C as described by Ref. [38], which model IC emission over the full sky under a variety of assumptions about CR injection and propagation, but rely on a 2D model for the ISRF.
The final component of our baseline template is a model for the Fermi Bubbles themselves, which are one of the strongest sources of foreground emission in high latitude regions of the ROI. The FBs are themselves defined as highly statisticallysignificant and spatially-coherent residuals in the inner Galaxy that remain once other sources are modelled out in all-sky γ-ray analyses. The FBs are not reliably traced by emission at any other wavelength, so we do not have an a priori model with which to guide the construction of a spatial template of these structures. However, one characteristic that renders the FBs distinct from other large angular scale diffuse γ-ray structures is their hard γ-ray spectrum. Indeed, the state-of-the-art, structured spatial template for them generated by the Fermi Collaboration[2] -the templates one would normally employ in large ROI, inner Galaxy Fermi-LAT analyses -were constructed using a spectral component analysis. That study recovered a number of regions of apparent substructure within the solid angle of the FBs, most notably substructure overlapping the previously-discovered[4, 5] "cocoon" which, as we have discussed here, is largely coincident with the Sgr dSph. Of course, a potential issue with constructing a phenomenological, spectrally-defined model for the FBs is that, if there happens to be an extended, spectrally-similar source coincident with the FBs, it will tend to be incorporated into the template. For this reason Ref.
[2] suggest using a flat FB template when searching for new structures. Despite this proposal, our default analysis uses the more conservative choice of a structured FB template. However, we also run tests using an unstructured template for comparison, and to understand the systematic uncertainties associated with the choice of template. We refer to these two cases as the "U" (Unstructured) and "S" (Structured) FB templates, respectively.
Finally, our baseline + Sgr dSph models require a template for the Sgr dSph. Our templates trace the distribution of bright stars in the dwarf, which we construct from five alternative stellar catalogues, all based on different selections from Gaia Data Release 2; we refer to the resulting templates as models I -V, and show them in E.D. Figure 1. Full details on how we construct each of these templates are provided in S.I. sec. 2. Model I, our default choice, comes from the catalogue of 2.26 × 10 5 Sgr dSph candidate member stars from Ref.
[8]; the majority of the catalogue consists of red clump stars. Model II uses the catalogue of RR Lyrae stars in the Sagittarius Stream from Ref. [39], which we have down-selected to a sample of 2369 stars whose kinematics are consistent with being members of the Sgr dSph itself. Model III uses the catalogue of 1.31 × 10 4 RR Lyrae stars belonging to the Sgr dSph provided by Ref. [40]. Finally, models IV and V come from the nGC3 and Strip catalogues of RR Lyrae stars from Ref. [41]; the former contains 675 stars with higher purity but lower completeness, while the latter contains 4812 stars of higher completeness but lower purity.

Fitting procedure
Our fitting method follows that introduced in Refs. [16,18], and treats each of the 15 energy bins as independent, thereby removing the need to assume any particular spectral shape for each component and allowing the spectra to be determined solely by the data. Our data to be fit consist of the observed γ-ray photon counts in each spatial pixel i and energy bin n, which we denote Φ n,i,obs , where n goes from 1 to 15, and the index i runs over the positions ( i , b i ) of all spatial pixels within the ROI. For a given choice of template, we write the corresponding model-predicted γ-ray counts as Φ n,i,mod = ∑ c N n,c R n,i Φ c,i , where R n,i is the instrument response for each pixel and energy bin (computed assuming an E −2 spectrum within the bin), and Φ c,i is the value of template component c evaluated at pixel i; for baseline models, we have a total of 8 components, while for baseline + Sgr dSph models we have 9. Note that Φ c,i is a function of i but not of n, i.e., we assume that the spatial distribution of each template component is the same at all energies, except for the IC templates, for which an energy-dependent morphology is predicted by our GALPROP simulations. Without loss of generality we further normalise each template component as ∑ i Φ c,i = 1, in which case N n,c is simply the total number of photons contributed by component c in energy bin n, integrated over the full ROI; the values of N n,c are the parameters to be fit. We find the best fit by maximising the usual Poisson likelihood function using the pylikelihood routine, the standard maximum-likelihood method in FermiTools. Note that, since each energy bin n is independent, we carry out the likelihood maximisation bin-by-bin. We perform all fits in pairs, one for a baseline model containing only known emission sources, and one for a baseline + Sgr dSph model containing the same known sources plus a component tracing the Sgr dSph. The set of paired fits we perform in this manner is shown in Table 1. We compare the quality of these baseline and baseline + Sgr dSph fits by defining the test statistic TS n = −2 ln(L n,base /L n,base+Sgr ); the total test statistic for all energy bins is simply TS = ∑ n TS n . We can assign a p-value to a particular value of the TS by noting that baseline + Sgr dSph models have 15 additional degrees of freedom compared to baseline models: the value of N n,c for the component c corresponding to the Sgr dSph, evaluated at each of the 15 energy bins. In this case, the mixture distribution formula gives[16]

11/32
where N = 15 is the difference in number of degrees of freedom, N n is the binomial coefficient, δ is the Dirac delta function, and χ 2 n is the usual χ 2 distribution with n degrees of freedom. The corresponding statistical significance (in σ units) is [16]: where (InverseCDF) CDF is the (inverse) cumulative distribution function and the first argument of each of these functions is the distribution function, the second is the value at which the CDF is evaluated, and the total TS is denoted byTS. For 15 extra degrees of freedom, a 5σ detection corresponds to TS = 46.1. (Additional details of these formulae are given in S.I. Sec. 2 of Ref. [16].) We report values of L base , L base+Sgr , TS, and the significance level for all the templates we try in Table 1. A final step in our fitting chain is to assess the uncertainties. For our default choice of baseline + Sgr dSph model (first row in Table 1), our maximum likelihood analysis returns the central value N def n on the total γ-ray flux in the nth energy bin attributed to the Sgr dSph, and also yields an uncertainty σ def N ,n on this quantity. This represents the statistical error arising from measurement uncertainties. However, there are also systematic uncertainties stemming from our imperfect knowledge of the templates characterising the other emission sources. To estimate these, we examine the five alternative models listed in Table 1 as "Alternative background templates", where we use different templates for the hadronic plus bremsstrahlung and inverse Compton backgrounds. Each of these models m also returns a central value N m n and an uncertainty σ m N ,n on the Sgr dSph flux. We use the uncertainty-weighted dispersion of these models as an estimate of the systematic uncertainty (e.g.,ref. [42]): where the sums run over the m = 6 − 1 alternative models. We take the total uncertainty on the Sgr dSph flux in each energy bin to be a quadrature sum of the systematic and statistical uncertainties, i.e., (σ def,tot We plot the central values and uncertainties of the fluxes for the default model derived in this manner in Figure 2. We have carried out several validation tests of this pipeline, which we describe in the Supplementary Information (SI).

Spectral modelling
We model the observed Sgr dSph γ-ray spectrum as a combination of prompt magnetospheric MSP emission and IC emission from e ± escaping MSP magnetospheres. We construct this model as follows. The prompt component is due to curvature radiation from e ± in within MSP magnetospheres. The e ± energy distribution can be approximated as an exponentially-truncated power law [22,43] and curvature radiation from these particles has a rate of photon emission per unit energy per unit time where E γ is the photon energy, N (L γ,prompt ) is a normalisation factor chosen so that the prompt component has total luminosity L γ,prompt , the index α is related to that of the e ± distribution by α = (γ MSP − 1)/3, and the photon cutoff energy is related to the e ± cutoff energy by [25] where m e is the electron mass, ρ c is the radius of curvature of the magnetic field lines, and the other symbols have the usual meanings. Given the rather small magnetospheres, we expect ρ c to be a small multiple of the ∼ 10 km neutron star characteristic radius; henceforth we set ρ c = 30 km. Empirically, L γ,prompt is ∼ 10% of the total MSP spin-down power [43].
A larger proportion of the spin-down power goes into a wind of e ± escaping the magnetosphere. In the ultra-low density environment of the Sgr dSph, ionization and bremmstrahlung losses for this population, which occur at a rate proportional to the gas density, are negligible. Synchrotron losses, which scale as the magnetic energy density, will also be negligible; as noted in the main text, observed magnetic fields in dwarf galaxies are very weak [27], and we can also set a firm upper limit on the Sgr dSph magnetic field strength simply by noting that the magnetic pressure cannot exceed the gravitational pressure provided by the stars since, if it did, that magnetic field, and the gas to which it is attached, would blow out of the galaxy in a dynamical time. The gravitational pressure is P ≈ (π/2)GΣ 2 , where Σ = M/πR 2 is the surface density, and using our fiducial numbers M = 10 8 M and R = 2.6 kpc gives an upper limit on the magnetic energy density 0.06 eV / cm 3 ; non-zero gas or cosmic ray pressure would lower this estimate even further. This is a factor of four smaller than the energy density of the CMB, implying that synchrotron losses are at most a 20% effect, and can therefore be neglected.
This analysis implies that the only significant loss mechanism for these e ± is IC emission, resulting in a steady-state e ± energy distribution where γ = γ MSP − 1. We compute the IC photon distribution produced by these particles following ref. [44], assuming that ISRF of the Sgr dSph is the sum of the CMB and two subdominant contributions, one consisting of light escaping from the Milky Way and the other a dilute stellar blackbody radiation field due to the stars of the dwarf. We estimate the Milky Way contribution to the photon field at position of the dwarf using GalProp [37], which predicts a total energy density of 0.095 eV/cm 3 (compared to 0.26 eV/cm −3 for the CMB), comprised of 5 dilute black bodies with colour temperatures and dilution factors {T rad , κ} as follows: We characterise the intrinsic light field of the dwarf as having a colour temperature 3500 K and dilution factor of 7.0 × 10 −15 (giving energy density 0.005 eV cm −3 ; these choices are those expected for a spherical region of radius 2.6 kpc and stellar luminosity 2 × 10 8 L , the approximate parameters of the Sgr dSph). This yields an IC spectrum where N L γ,IC is again a normalisation chosen to ensure that the total IC luminosity is L γ,IC , and F γ, E cut,e ± is the functional form given by equation 14 of ref. [44], which depends on the e ± spectral index γ and cutoff energy E cut,e ± . Combining the prompt and IC components, we may therefore write the complete emission spectrum as This model is characterised by four free parameters: the total prompt plus IC luminosity L γ,tot = L γ,prompt + L γ,IC , the ratio of the prompt and IC luminosities f = L γ,prompt /L γ,IC , the spectral index α of the prompt component (which in turn fixes the other two spectral indices γ MSP and γ), and the cutoff energy for the prompt component E cut,prompt (which then fixes the e ± cutoff energy E cut,e ± ). Note that we make the simplest assumption that α and E cut,prompt are uniform across the MSP population. In reality, there may be a distribution of these properties but the parameteric form of Equation 5 provides a good description, in general, of both individual MSP spectra and the aggregate spectra of GC MSP populations [22]. We fit the observed Sgr dSph spectrum to this model using a standard χ 2 minimisation, using the combined statistical plus systematic uncertainty. We obtain an excellent fit: the minimum χ 2 is 7.7 for 15 (data points) -4 (fit parameters) = 11 (degrees of freedom, dof) or a reduced χ 2 of 0.70. We report the best-fitting parameters in E.D. Table 1, and plot the result best-fit spectra over the data in Figure 2; we show the best-fit estimate (with ±1σ confidence region) for the magnetospheric luminosity per stellar mass of the Sgr dSph MSPs in Figure 3.
We also carry out an additional consistency check, by comparing our best-fit parameters describing the prompt emissionα and E cut,prompt -to direct measurements of the prompt component from nearby, resolved MSPs [22,43], and to measurements of GCs, whose emission is likely dominated by unresolved MSPs [22]. We carry out this comparison in E.D. Figure 8. In this figure, we show joint confidence intervals on α and E cut,prompt from our fit. For comparison, we construct confidence intervals for α and E cut,prompt from observations using the sample of ref.
[22], who fit the prompt emission from 40 GCs and 110 individually-resolved MSPs. We draw 100,000 Monte Carlo samples from these fits, treating the stated uncertainties as Gaussian, and construct contours in the (E cut,prompt , α) plane containing 68%, 95%, and 99% of the sample points. As the plot shows, the confidence region from our fit is fully consistent with the confidence regions from the observations, indicating that our best-fit parameters are fully consistent with those typically observed for MSPs and GCs.

Data availability
All data analysed for this study are publicly available. In particular, Fermi-LAT data are available from https://fermi. gsfc.nasa.gov/ssc/data/ and Gaia data are available from https://gea.esac.esa.int/archive/. The statistical pipeline, astrophysical templates, and gamma-ray observations necessary to reproduce our main results are publicly available in the following zenodo repository: 10.5281/zenodo.6210967.

Code availability
Fermi-LAT data used in our study were reduced and analysed using the standard FERMITOOLS V1.0.1 software package available from https://github.com/fermi-lat/Fermitools-conda/wiki. The performance of the Fermi-LAT was modelled with the P8R3 ULTRACLEANVETO V2 Instrument Response Functions (IRFs). Spectral analysis and fitting was performed using custom MATHEMATICA code created by the authors which is available upon reasonable request.

Additional information
To include, in this order: Accession codes (where applicable); .

Competing interests
The authors declare no competing interests. Extended Data Table 1. Best fit spectral parameters with ±1σ confidence regions as determined from χ 2 fitting to the measured γ-ray spectrum of the Sgr dSph. The parameter l 0 is calculated using a stellar mass M = 10 8 M [9] for the Sgr dSph. See also E.D. Figure 8 16/32  Table 1). In each of the 15 panels, one for each of the energy bins in our analysis pipeline, the blue histograms show the distribution of − ln L values produced in 100 Monte Carlo trials where we use our pipeline to fit a mock data set produced by drawing photons from the same set of templates used in the fit; orange dashed vertical lines show the 68% confidence range of this distribution, and black dashed vertical lines show the mean. Under the null hypothesis that our best-fitting model for the real Fermi observations is a true representation of the data, and that disagreements between the model and the data are solely the result of photon counting statistics, the log-likelihood values for our best-fitting model should be drawn from the distributions shown by the blue histograms. For comparison, the red vertical line shows the actual measured log likelihoods for our best fit. The fact that these measured values are well within the range spanned by the Monte Carlo trials indicates that we cannot rule out the null hypothesis, indicating that our model is as good a fit to the data as could be expected given the finite number of photons that Fermi has observed.

Injected flux Flat FB FB not included
Extended Data Figure 4. Results from our template mismatch tests. Each of the coloured lines shows the results of a test where we generate synthetic data with one set of templates, and attempt to recover the Sgr dSph in those data using a different set. In the upper two panels, the horizontal axis shows the true, energy-integrated Sgr dSph photon flux in the synthetic data, while the vertical axis shows the value (with 1σ statistical error bars) retrieved by our pipeline; the black dashed lines indicate perfect recovery of the input, and the vertical bands show the photon flux we measure for the Sgr dSph in the real Fermi data. In the bottom two panels we plot the recovered energy flux in each energy bin (with 1σ statistical error bars), for the case where the injected photon flux most closely matches the real Sgr dSph flux; the black dashed line again shows perfect recovery of the injected signal. The left panels show experiments where we mismatch the Galactic hadronic and IC templates, while the right panels show experiments where we mismatch the FB templates; see Methods for details.   Figure 6. Sgr dSph spectra derived from template analysis using different Galactic diffuse emission models; in all cases the spectrum shown is the flux averaged over the entire ROI, not the flux within the footprint of the Sgr dSph template. The fiducial model is our default choice (first entry in Table 1), while other lines correspond to alternate foregroundsmodels 2D A (red), 2D B (black), and 2D C (blue) for the Galactic IC foreground, and models Interpolated (dark green) and GALPROP 3D-gas (light green) for the Galactic hadronic + bremsstrahlung foreground. The error bars display 1σ statistical errors. See Table 1 and text for details.  Figure 9. Data: γ-ray luminosity per stellar mass for a number of stellar systems (cf. fig. 3 main text) versus mean stellar age of those systems. The mean stellar ages have been determined from empirically-determined star formation histories for all these objects (data sources as follows: Sgr dSph [11], M31 [45], Galactic Bulge [46], NB [47] and ref. [48] for the LMC). The globular cluster datum ('GCs') is plotted at the mean measured γ-ray luminosity for the 27 systems analysed in ref.

19/32
[22] divided by their stellar masses, and the age is the luminosity-weighted mean age for the 31 systems analysed in ref. [49] (while the error bars for this datum show the standard deviations of these measurements for each population). The purple datum shows the secondary electron plus positron luminosity of the Milky Way ('disk e ± ') as inferred in ref.
[13] and adopting a disk stellar mass of 5.2 × 10 10 M [50]. Model curve: The solid blue curve shows the evolution with time (since the initial, burst-like star formation event) of the total spin-down power generated by a population of MSPs (normalised to the stellar mass expected to host that same population) according to the recent binary population synthesis modelling presented in ref.
[19] (with the blue band indicating the estimated the ±1σ error on this quantity dominated by the uncertainties in the overall stellar binarity fraction). The dashed red line is an approximate fit to the solid blue line described by 5.0 × 10 28 exp(−t/t decay ) erg/s/M with t decay = 3 Gyr. The dashed blue curve shows 10% of the mass-normalised spin-down power (with the error band suppressed for clarity). The brown, dashed, horizontal line shows the total power (per unit stellar mass) from MSP spin-down we infer from the study by Sudoh et al.
[24] of radio continuum emission from massive, quiescent galaxies (with expected mean stellar ages > 8 − 10 Gyr; see the main text for more details).

Supplementary Information 1 Chance overlap calculation
In the main text we estimate the probability of a chance overlap between cocoon γ-ray structure and Sgr dSph to be ≈ 1%. This follows simply from noting that the solid angle of the Bubbles is around 0.7 sr[2] and the cocoon covers 20% of this solid angle, so the chance probability for an overlap if these objects were placed randomly on the sky is 0.2 × 0.7/(4π) ∼ 0.012. However, this is a generous upper limit; it does not take into account that, as revealed by the template analysis, there is a much more detailed correspondence between the γ-ray substructure and the stellar distribution not accounted for here. Moreover, the naive 1% estimate does include a 'look-elsewhere' correction: the Milky Way is surrounded by satellite galaxies and there are apparently other regions of sub-structure within the Fermi Bubbles. However, not only is the cocoon the brightest and first-discovered region of sub-structure [4], it is also the only region that has been reliably detected by independent analyses [2, 5], and is visibly-evident in independently-produced γ-ray maps [51,52]. The Sgr dSph is also a special object: it is the brightest MW satellite not yet (prior to this work) detected in γ-rays. (In fact, not only is the Sgr dSph the brightest satellite undiscovered in γ-rays, it is substantially brighter than the next brightest galaxy 2 .) Overall, we have a spatial overlap (and detailed morphological correspondence as argued elsewhere) between the brightest region of substructure within the Fermi Bubbles and the Sgr dSph, the second closest, third-most massive, third brightest, and third most angularly extended satellite galaxy of the MW.

Construction of the Sgr dSph templates
Here we provide detailed descriptions of how we construct the Sgr dSph templates shown in E.D. Figure 1.

Model I:
We extract this template from the stellar catalogue constructed in Ref. [8], which was derived using photometric and astrometric data from Gaia Data Release 2 (DR2), and kinematic measurements from various other surveys. The catalogue consists of a list of 2.26 × 10 5 candidate member stars of the Sgr dSph remnant, which are reliably separated from the field stars. Every object in the catalogue has an extinction-corrected G-band magnitude larger than 18, and more than half of the objects in this catalogue are classified as red clump stars. Note that Ref.
[8] adapted their procedure to reproduce the observed properties of the Sgr dSph remnant, not the stream, which is why the first panel of E.D. Figure 1 only shows the remnant. We show profiles of stellar number count along the long and short axes of the dwarf for this template in S.I. Figure 1.

Model II:
Our second template comes from ref. [39]. Instead of red clump stars, this study selected a sample of RR Lyrae stars from Gaia DR2 data, for which distances are accurately measured. Also, rather than focusing on member stars of the Sgr dSph remnant, Ref. [39] used the STREAMFINDER algorithm to single out stars with high probability of belonging to the Sagittarius Stream. By using the kinematic properties of the stars in that study, we constructed a template containing 2369 RR Lyrae stars (cf. Figure 1) in our ROI. Note that the stellar number count in this map is approximately two orders of magnitude smaller than that in Model I.

Model III:
Ref. [40] performed an all-sky analysis of RR Lyrae stars (in Gaia DR2 data) belonging to globular clusters, dwarf spheroidal galaxies, streams, and the Magellanic Clouds. Our Model III template is a subset of their data identified as belonging to the Sgr dSph, selected to reproduce their Fig. 1 (bottom-right). It includes 1.31 × 10 4 RR Lyrae stars in our ROI.

Model IV and Model V:
Ref. [41] developed two empirical catalogues of RR Lyrae stars in Gaia DR2 data, which form the basis for our final two templates. The first (Model IV), corresponds to the nGC3 sample, which is characterized for its lower-completeness and higherpurity. This template contains 675 stars in our ROI. The second (Model V), is the Strip sample, containing higher-completeness, but lower purity. The total number of stars in our ROI for this model is 4812.  Figure 1), we mark the gravitational centre of the dwarf with a cyan circle, and show the long and short axes along which we measure the profiles as white bands.
First, we check whether the residuals between the baseline + Sgr dSph source model and the Fermi data from our ROI are consistent with the level expected simply as a result of photon counting statistics, using a method similar to that of Ref. [53]. Under the null hypothesis that the Fermi data are a Poisson draw from our best-fit baseline + Sgr dSph model (i.e., that our model is correct, and any differences between it and the actual data are simply due to shot noise), we can determine the expected distribution of ln L n values via Monte Carlo. For each Monte Carlo trial, we draw a set of mock photon counts Φ n,i,mock in each pixel and energy bin from our best-fitting model (multiplied by the instrument response function), and then compute the energy-dependent log-likelihood for this mock data set using the same pipeline we use on the real data. We repeat this procedure 100 times, and plot the distribution of log-likelihood values it produces as the blue histograms in E.D. Figure 2. These histograms represent the expected log likelihood in each energy bin under the null hypothesis. We then compare this to the actual value of ln L n we measure for our model as compared to the real Fermi data. The plot shows that our measured log-likelihood falls squarely within the range expected under the null hypothesis, and we therefore conclude that the residuals between our model and the real data are consistent with being solely the result of photon counting statistics.
In addition to testing whether the residuals between model and data are consistent with simply being shot noise when we sum over all pixels (which is what the likelihood measures), we can also examine the residuals as a function of position. We do so in E.D. Figure 3, which shows the measured Fermi counts in our ROI (summed in three energy bins) in the first column, our best-fitting baseline + Sgr dSph model in the second column, and fractional residuals [(Data − Model)/Model] in the third column. The images are smoothed with a 0.5 • Gaussian kernel, since this is roughly the resolution of our interstellar gas maps [16,18]. The plot shows that, on a point-by-point basis, our models reproduce the data within ∼ 10% over most of the ROI. There are, however, a few small patches of correlated residuals, which are only at the ∼ 30% level, and are far from the Sgr dSph region. This points to the existence of real structure in the Fermi Bubbles that is not yet perfectly modelled, but given the small level of the residuals and the distance between them and the signal in which we are interested, this modelling imperfection has little impact on our results.
As our second validation test, we evaluate the sensitivity of our pipeline to uncertainties in our templates for Galactic diffuse emission, and we verify that our pipeline can recover synthetic signals similar to the Sgr dSph even when our templates are imperfect. Recall that we have three components of Galactic diffuse emission for which the templates are at least somewhat uncertain: hadronic + bremsstrahlung emission (for which our template can be HD, Interpolated, or GALPROP), Galactic IC emission (for which the template can be 3D, 2D A, 2D B, or 2D C), and the Fermi Bubbles (for which the template can be S, structured, or U, unstructured). We test the sensitivity of our fits to these template choices as follows. First, we generate a set of mock background data by drawing a random realisation of photon counts from one combination of these templates, and on top of this we add a synthetic Sgr dSph signal; the Sgr dSph photons follow the spatial morphology of our Sgr dSph model I template, have a spectral shape dN γ /dE γ ∝ E −2 γ , and have a normalisation that we vary systematically from ≈ 10 −11 ph cm −2 s −1 (integrated over all energies) to ≈ 10 −5 ph cm −2 s −1 ; our best-fit Sgr dSph photon flux falls in the middle of this range, ≈ 2 × 10 −8 ph cm −2 s −1 . Then we use our pipeline to recover the flux of the Sgr dSph from the synthetic map, but using a different set of templates for Galactic diffuse emission to the ones used to generate the synthetic data. Comparing the recovered Sgr dSph spectrum to the injected one reveals how well our pipeline performs when the input diffuse emission templates are not exactly correct. We carry out this experiment with four diffuse emission template combinations: (1) synthetic data generated from GALPROP + 3D + S, analysed using HD + 3D + S; (2) synthetic data generated from HD + 2D A + S, analysed using HD + 3D + S; (3) synthetic data generated from HD + 3D + S, analysed using HD + 3D + U; (4) synthetic data generated using HD + 3D + S, analysed using HD + 3D but no template for the FBs at all.
We show the results for the first two of these experiments in the two left panels of E. D. Figure 4; the top left panel shows the recovered energy-integrated photon flux compared to the injected flux, while the bottom left shows the recovered spectra when the input flux is ≈ 2 × 10 −8 ph cm −2 s −1 . The plot shows that our pipeline yields excellent agreement between the injected and recovered signals for both the integrated flux and the spectrum unless the Sgr dSph signal is ∼ 1 order of magnitude weaker than our estimate. In no circumstance does our pipeline produce a false signal comparable in magnitude to our observed one. The two right panels of E. D. Figure 4 show the third and fourth tests, where we mismatch the FB template. Here the effects are somewhat larger, but still relatively minor: if we create synthetic data with the S Fermi Bubble template (so that there is structure corresponding to the cocoon), and then analyse it using either the U template or no FB template at all, then we make a factor of ∼ 2 − 3 level error in the absolute flux, but no substantial error in the spectral shape. This test suggests that our detection of the Sgr dSph is very robust, but that we have a factor of ∼ 2 − 3 uncertainty in its absolute flux, stemming from our imperfect knowledge of the foreground FBs.
The third validation test we perform is to check whether a fit using the observed stellar distribution of the Sgr dSph as a template performs better than one using a purely geometric template placed at the same position; if the emission really is tracing the stars of the dwarf, and is not merely a chance overlap, a template matching the shape of the dwarf should perform better than a purely geometric distribution. For this purpose we consider disc-shaped templates of varying radii, centred at Galactic coordinates ( , b) = (5.61 • , −14.09 • ) -the dynamical centre of the Sgr dSph -and repeat our standard procedure of comparing baseline models to baseline + Sgr dSph models, using these geometric templates in place of the Sgr dSph stellar templates. We use our fiducial choices for all other templates (hadronic and bremmstrahlung emission, galactic IC emission, and the Fermi Bubbles).
We show the results of this experiment in S.I. Table 1. We find that geometric templates do perform better than baseline models with no Sgr dSph component, but, as expected, even the best geometric template (for a disc of radius r = 2.0 • ) yields significantly less fit improvement (T S = 63.8) than our fiducial stellar template (T S = 95.2); this difference in test statistic, ∆ T S = 31.4, corresponds to the Sgr dSph template being preferred at 3.7σ significance. Moreover, this result becomes even stronger if we notice two additional points. First, because we tried a wide range of radii for the geometric models, the geometric templates effectively provide an extra degree of freedom that the Sgr dSph template, which is fixed by observations, lacks. Because we fix the template radius while performing each fit, we do not treat the varying radius as an extra degree of freedom when computing the test statistic, but if we did so, then the difference in performance between the geometric and stellar templates would be even larger. Second, the geometric model that gives the best fit to the data is in fact the one whose radius most closely approximates the actual size of the core of the Sgr dSph. Indeed, Fig. 1 (bottom) shows that, in the direction of the short axis, the Sgr dSph stellar profile falls off steeply ∼ 2 • − 3 • away from the Sgr dSph centre. Thus the geometric template that gives the best match to the observations happens to be the one that most closely approximates the actual distribution of stars in the Sgr dSph.
Our if the signal we are detecting really does come from the Sgr dSph, the best fit should be for a template that traces its actual orientation and position, while rotated or shifted templates should produce progressively worse fits. This check is significant in part because Ref.
[2] performed similar rotation analysis for the hypothesis that the cocoon is tracing a jet from Sgr A * , and found that there was no preference for a jet oriented toward Sgr A * over one oriented in some other way; they took this as evidence against the jet hypothesis. To check if the Sgr dSph template performs better on this test, we first rerun our analysis pipeline for our default set of templates (first line in Table 1), but with the Sgr dSph template rotated about its core. For each rotation angle we compute the TS, and compare to the TS of the original, unrotated model. We plot the result of this experiment in the left panel of E.D. Figure 5. It is clear that, as expected, the fit is best when we use the actual orientation of the Sgr dSph, and degrades as we increase the rotation. Next, we carry out a similar procedure, but this time rather than rotating the Sgr dSph template about its core, we rotate around the centre of the Galaxy, thereby both translating and rotating the template. (This latter test was motivated by the particular alignment of the Sgr Stream with the previously claimed collimated jets from the Galaxy's supermassive black hole [4].) We show the results in the middle panel of E.D. Figure 5, and, again as expected, the TS strongly favours the true location and orientation of the Sgr dSph. Finally, we translate the Sgr dSph while leaving its orientation unchanged. We show the TS for displaced Sgr dSph in the right panel of E.D. Figure 5. In this case the fit improves if we do displace the Sgr dSph from its true position by ≈ 4 • south. The amount by which the shift is favoured is fairly significant -the TS improved by 40.8, which corresponds to 4.5σ significance. Interestingly, the direction of the displacement is within a few degrees of the direction anti-parallel to the Sgr dSph's proper motion, suggesting that the dwarf's γ-ray signal trails it slightly on its orbit. If IC-emitting CR e ± are largely responsible for the observed Sgr dSph γ-ray signal as suggested by our spectral modelling, a systematic displacement of this signal southward by ∼ 4 • from the stars of Sgr dSph is quite reasonable as we have explained elsewhere (and see section 4).

Transport of IC-emitting CR e ±
We have seen that, while our pipeline detects a signal from the Sgr dSph at very high statistical significance, the fit improves even more (by ≈ 4.5σ ) if we displace the Sgr dSph template ≈ 4 • from its actual position (corresponding to 1.9 kpc at the distance of the Sgr dSph), in a direction very close to anti-parallel to the dwarf's proper motion. Here we demonstrate that a displacement of this type is expected in a model where the γ-ray signal from the Sgr dSph is powered by MSPs. Part of the MSP signal emerges directly from the MSP magnetospheres, and thus traces the stellar component of the Sgr dSph. However, the majority of the observed signal is, in our model, IC emission powered by e ± escaping MSP magnetospheres and interacting with the CMB. The time between when e ± leave MSPs and when they IC scatter to produce γ-ray photons is non-negligible: the CMB is dominated by photons with energies ∼ k B T CMB (with T CMB = 2.7 K), so IC photons with energies of ∼ 1 − 100

27/32
GeV must be produced by e ± with energies E e ± ∼ 0.6 − 6 TeV. The characteristic IC loss time for such particles is where m e is the electron mass, c is the speed of light, σ T is the Thomson cross section, and U CMB = a R T 4 CMB = 0.25 eV cm −3 is the energy density of the CMB.
During this time, the e ± will have the opportunity to move a significant distance prior to producing γ-rays, due to both bulk gas motion and CR flow relative to the gas. With regard to bulk advection, we note that the proper speed of the Sgr dSph is ≈ 260 km s −1 , and we therefore expect an effective wind of Galactic halo gas to be blowing through (or, at least, around) the dwarf at approximately this speed. This wind would advect the IC-radiating e ± southward. Quantitatively, the extent of the angular displacement of an IC γ-ray signal at E γ where v prop is the proper motion on the sky. Thus advection is expected to generate a southward displacement of ∼ 1 • . This is less than the displacement we observe, but advection is also likely less important than CR transport through the gas. While the diffusion coefficient for CRs in the galactic halo is very poorly known, we can make an order of magnitude estimate by adopting the functional form for the diffusion coefficient given in ref [54] which is normalised to 3 × 10 27 cm 2 s −1 for a 1 GeV CR in a 3 µG field. Then the expected diffusive displacement of the IC-radiating e ± is While this is roughly the correct amount of displacement to reproduce what we observe, if the diffusion were isotropic then we would still not have explained the systematic offset between the dwarf and the displaced location picked out by our template analysis. However, we do not expect isotropic diffusion in the environment of the Sgr dSph. Simulations of objects plunging through diffuse halo gas indicate that a generic outcome of such interactions is the development of a coherent magneto-tail back along the objects' direction of motion [55]. Such a structure formed by the Sgr dSph plunging through the Milky Way's halo would naturally explain why, rather than being isotropic, the diffusive transport is primarily backwards along the dwarf's trajectory.

Energetics of the Sgr dSph MSP population
As discussed in the main text, the γ-ray luminosity per stellar mass we measure for the Sgr dSph is substantially brighter than we measure for the Galactic Bulge, Galactic disk, or M31, but is substantially dimmer than is observed for globular clusters. Indeed, in Figure 3, Sgr dSph appears as a transition object between gas-poor, low metallicity, low star formation rate, and relatively low stellar mass systems on the left side and relatively gas rich and massive systems (some with appreciable star formation) on the right side. In order to investigate more deeply how the γ-ray luminosity of the Sgr dSph compares to that of other observed systems, and to theoretical expectations, in Figure 9 we collect measurements of γ-ray luminosity per unit stellar mass versus approximate age for a range of observed systems, and compare these measurements to model predictions. For the observed systems we include M31, the Milky Way bulge and nuclear bulge, the mean of Milky Way globular clusters, and the Milky Way disc; for the latter we have included both the γ-ray emission directly measured from MSPs, and the observed e ± luminosity of the disc, which may include a significant MSP contribution. As in Figure 3, we see that the Sgr dSph is intermediate between the metal-rich galactic systems -M31, the Milky Way disc and bulge -and the globular clusters (GCs). However, the figure also reveals a clear trend that galactic systems dim as a function of age, with Sgr dSph as both the youngest and the most luminous of the galactic systems. The trend with age is consistent with theoretical expectations, indicated by the blue band in Figure 9 which shows the prediction of a binary population synthesis (BPS) model [19] for the total spin-down power per unit stellar mass liberated by magnetic braking of MSPs. Some of this power should emerge as prompt emission, and some as e ± injected into the ISM; the lower dashed blue line shows 10% of the total spin-down power, a rough estimate for the prompt component. In this particular calculation, the MSPs derive from Accretion Induced Collapse, the population is assumed to be of Solar metallicity, and each binary evolves independently (i.e., the 'field star' limit is assumed). Based on the predictions of this model, and the estimated age of the Sgr dSph, we estimate that the γ-ray signal we have detected can be explained by the presence of ≈ 650 MSPs in the galaxy. Given that the overall γ-ray luminosity of an MSP population is bounded by the spin-down power, it is evident from the figure that the expected energetics appear to be elegantly sufficient to power the signal from Sgr dSph given the (relatively 28/32 young) mean age of its stars; this age difference naturally explains why the Sgr dSph should be more luminous per unit mass than M31 or components of the Milky Way.
It is also noteworthy that the GCs are considerably more luminous per unit mass than both the BPS model and the Sgr dSph. The extremely high brightness of GCs is plausibly explained by some combination of dynamical effects, which lead to dynamical hardening of binaries and thence a higher production rate of MSPs, and metallicity effects, which lead to higher MSP production because metal-poor stars have weaker winds and thus experience less mass loss during their main sequence lifetimes than Solar-metallicity stars [23]. The former effect would not occur in the Sgr dSph, but the latter would, since the Sgr dSph has a metallicity log 10 (Z Sgr /Z ) −0.9 [8], where Z is the solar metallicity, which is comparable to typical GC metallicities.
The final comparison we show in Figure 9 is with the MSP power inferred by Sudoh et al. [24] in massive, quiescent galaxies (M * > 10 9.5 M , star formation rate < 0.1 M yr −1 ). Such galaxies typically have stellar population ages 8 − 10 Gyr [24, 56,57], and Sudoh et al. show that they produce anomalously-large synchrotron emission, which they attribute to radiation from e ± injected by MSPs; they infer an injection power 1.8 × 10 28 erg/s/M , which we show as the brown dashed line in Figure 9. We see that this estimate is consistent both with the BPS model and comparable to the luminosity we infer for the Sgr dSph.
Our overall conclusion is that the MSP luminosity we have derived for the Sgr dSph is fully consistent with both theoretical expectations and with a wide variety of observed systems. The Sgr dSph is more luminous per unit mass than the Milky Way or M31, but this is easily explained by its youth and low metallicity, and it is comparably-or less-luminous than other observed systems that are of comparable age or metallicity.

Astrophysical γ-ray emission from other dSphs
On the basis of the normalisation (L γ /M ) supplied by the Sgr dSph γ-ray detection, we can make rough predictions for the astrophysical γ-ray fluxes from a number of other dSph systems, simply by assuming this normalisation applies to them as well; future work should be based on full theoretical models including metallicity and age effects, but the simple calculation we present here can serve as a guide to the system for which such investigations are likely to be fruitful. Our predictions can, in turn, be compared to i) actual observational upper limits to the γ-ray fluxes from these dSphs and ii) (model-dependent) predictions for the (WIMP) dark-matter-driven γ-ray fluxes from the same satellite galaxies. For this purpose we use the data assembled in Winter et al. [29] for the distances, stellar masses, and MSP-and DM-driven fluxes for a population of 30 dSphs satellites of the Milky Way. These authors derive their MSP fluxes by extrapolating the γ-ray luminosity function of resolved Milky Way MSPs; their result implies that at energies above 500 GeV, galaxies should produce an MSP photon flux per unit stellar mass of ≈ 6.3 × 10 29 s −1 M −1 , roughly a factor of 40 smaller than the ≈ 2.5 × 10 31 s −1 M −1 we detect for the Sgr dSph. We report our revised estimates dwarf spheroidals' MSP luminosity in S.I. Table 2. This finding has two implications, which we explore below: first, for some dwarfs this brings the predicted γ-ray flux close to current observational upper limits, suggesting that a more detailed analysis of Fermi-LAT data might yield a detection. Second, in some dSph galaxies, the predicted MSP flux is comparable to or exceeds the γ-ray fluxes that might be expected from dark matter annihilation.

omparison with existing upper bounds
To estimate whether other dwarf spheroidals might be detectable, we compare our differential flux predictions (incorporating both prompt and IC emission where, for simplicity, we make the approximation that the CMB-dominated ISRF of the Sgr dSph also pertains in each other dSph under consideration) against the results from ref. [58] 3 . On the basis of this comparison, we do not predict γ-ray emission from any dSph that surpasses the upper limits from ref. [58]. However two dSphs reach a significant fraction of the relevant upper limit in at least one energy bin (of width log 10 (∆E/GeV) = 0.5): Fornax (which reaches 0.24 of the upper limit for the energy bin centred at 1.36 GeV) and Sculptor (which reaches 0.09 of the upper limit for the energy bin centred at 2.46 GeV). Furthermore, the results of ref. [58] were obtained using Pass7 Fermi-LAT data accumulated over only the first 3 years of Fermi-LAT operation. On the basis of, e.g., the results of ref. [59] we expect that updated upper limits (Pass8, 15 years data) should be at least a factor of 4 more stringent. This makes Fornax and Sculptor both very interesting targets for a future study, though we remind the reader that our predictions are predicated on a normalisation obtained from the Sgr dSph detection that may be somewhat over-optimistic because it ignores the stellar age effects evidenced in Figure 9 4 . After these two, the brightest expected dSphs are Sextans, Ursa Minor, Leo I, and Draco. These may also be interesting targets, though we note that we expect that they are almost one order of magnitude dimmer than Fornax and Sculptor. . A clear implication of these, albeit preliminary, results is that these targets should be avoided in the quest to better constrain putative WIMP DM self-annihilation cross-sections. By contrast, there remain other dSphs where the expected DM signal remains comfortably much larger than the MSP signal; these are more promising targets.