Astrophysical interpretation of the medium scale clustering in the ultra-high energy sky

We compare the clustering properties of the combined dataset of ultra-high energy cosmic rays events, reported by the AGASA, HiRes, Yakutsk and Sugar collaborations, with a catalogue of galaxies of the local universe (redshift z<~0.06). We find that the data reproduce particularly well the clustering properties of the nearby universe within z<~0.02. There is no statistically significant cross-correlation between data and structures, although intriguingly the nominal cross-correlation chance probability drops from ~50% to ~10% using the catalogue with a smaller horizon. Also, we discuss the impact on the robustness of the results of deflections in some galactic magnetic field models used in the literature. These results suggest a relevant role of magnetic fields (possibly extragalactic ones, too) and/or possibly some heavy nuclei fraction in the UHECRs. The importance of a confirmation of these hints by Auger data is emphasized.


I. INTRODUCTION
One of the keys towards the solution of the mysterious origin of ultra-high energy cosmic rays (UHECRs) is the study of their anisotropy pattern. The chances to perform (some kind of) UHECR astronomy increase significantly at extremely high energy, in particular due to the decreasing of deflections in the galactic and possibly extragalactic magnetic fields. Moreover, at E > ∼ (4 − 5) × 10 19 eV the opacity of the interstellar space to protons drastically grows due to the kinematically allowed photo-pion production on Cosmic Microwave Background (CMB) photons, known as the Greisen-Zatsepin-Kuzmin or GZK effect [1,2]. A similar phenomenon at slightly different energies occurs for heavier primaries via photo-disintegration energy losses. Recently, an observational evidence for a flux suppression consistent with the GZK feature has been reported by the HiRes collaboration [3]. Within reasonable astrophysical assumptions, these energy-losses phenomena impose a conservative upper limit to the distance from which the bulk of UHECRs is emitted, of the order of a few hundreds Mpc at most, which may enhance the chances of identifying structures. In Ref. [4] a forecast analysis for the Pierre Auger Observatory [5,6] was performed to derive the minimum statistics needed to test the "zeroth order" hypothesis that UHECRs trace the baryonic distribution in the universe. Assuming proton primaries, it was found that a few hundred events at E > ∼ 5 × 10 19 eV are necessary at Auger to have reasonably high chances to identify the signature. On the other hand, available catalogues from the experiments of the previous generation contain O(100) events above E > ∼ (4 − 5) × 10 19 eV, thus motivating a search for possible angular patterns already in the present data [7,8]. In particular, after renormalizing the energy scales of the different experiments to the HiRes one at 4 × 10 19 eV, the authors of [8] found some evidence of a broad maximum of the two-point autocorrelation function of UHECR arrival directions around 25 degrees. Since the search was made a posteriori, the assessment of its significance is a delicate issue and involves the determination of a penalty factor critically dependent on the performed number of trials. The previous claim of small scale clustering in the AGASA data and the associated debate on its significance (see e.g. [9,10,11]) would suggest to take a cautionary attitude towards a posteriori claims. However, a similar feature has been found in the Auger data alone as well, as recently reported in [12]. We shall thus proceed in the following under the assumption that the signal is real, exploring some astrophysical implications.
In [13], the present authors already tested the qualitative interpretation of the result (as reflecting the largescale structure (LSS) of UHECR sources) given in [8] on the light of our previous map templates obtained from the IRAS PSCz galaxy catalogue [14]. The observed data and the Monte Carlo events from the catalogue share several features, which are even more prominent if a quadratic correlation with LSS is assumed. On the other hand, no relevant cross-correlation has been found, which would be the smoking gun to test such scenarios. However, this is not particularly surprising: apart for the sake of simplicity, there is no a-priori reason to expect that cosmic rays are 100% made of protons, that the effects of magnetic fields are negligible above 4 × 10 19 eV for the angular scales considered, and that the sources trace in an unbiased way the LSS. If some or all these assumptions are relaxed, the possibility of a consistent scenario emerges: At "low" energy, both clustering and cross-correlations in UHECRs are absent, since magnetic deflections and a very large energy-loss horizon destroy them. With sufficient statistics and at sufficiently "high" energy, cross-correlations should eventually emerge, both because magnetic deflections scale like 1/Energy and be- cause of the expected shrinking of the horizon. Before this stage is reached experimentally, it is likely that the first hint will appear in the clustering, but not in the cross-correlation. The reason being that the former is much more robust versus magnetic deflections than the second one, as we shall argue.
In this paper, we extend our previous analysis in two ways: (i) we assume a smaller horizon, i.e. biasing the correlation with LSS towards closer sources; (ii) we study the impact of the galactic magnetic field (GMF) on the autocorrelation signature and on the cross-correlation signal. We anticipate that the data reproduce particularly well the clustering properties of the nearby universe within z < ∼ 0.02 and they are also quite robust with respect to deflections in galactic magnetic fields. We summarize our assumptions and techniques in Sec. II, while devoting Sec. III to present our results and attempting some interpretations of them. In Sec. IV we briefly discuss our findings and conclude.

A. The data
In our analysis, we closely follow the approach reported in [8,13], using a similar dataset extracted from available publications or talks of the AGASA [15], Yakutsk [16], SUGAR [17], and HiRes collaborations [18,19]. Note that we rescale a priori the energies of the experiments to the HiRes one and consider events above E ≥ 4 × 10 19 eV in this renormalized sample. This approach is applied hereafter in the analysis and we address the reader to [8] for further details. In Fig. 1 we show the points used in this analysis in galactic coordinates, while Fig. 2 reports the single and combined exposure for the various experiments as a function of the declination, in the limit of saturated acceptance and mediated over the right ascension (see e.g. [20]). In Fig. 3 we show the derived UHECR excess map (flux over average expected flux, minus one) properly smoothed by a gaussian filter of 10 • . Such a choice for the width amplitude (which has only illustrative purposes) represents an acceptable compromise between the few degrees of the experimental uncertainty on the arrival direction of UHECR, and the typical angular length of the nearby astrophysical structures of several tens of degrees. Of course, the data have been properly weighted by the exposure.
The smoothed map in Fig. 3 clearly shows that the most apparent visual feature in the data is the medium scale clustering, with the data clustered in few spots of 20 • -30 • degrees each, that in their turn are distributed almost uniformly in the sky. This is of course the reason of the signal found in [8] with a proper statistical analysis. The clustering seen in the southern hemisphere is due to the Sugar data only and has thus a weak statistical evidence. However, the clustering signal is indeed present and statistically significant also considering the data from the northern hemisphere only [8]. Indeed, hints of this clustering in the northern hemisphere were recognized already some years ago [21]. Finally, lowering the energy threshold the signature disappears, excluding the possibility that the signal is only a systematic feature coming from an incorrect modeling of the exposures [8].
Concerning possible interpretations of the signal, here we report a few general considerations, while more quantitative analyses are reported in the following. The absence of a correlation with the Galactic Plane or the absence of excess toward the Galactic Center disfavor respectively Galactic astrophysical sources and heavy relic decays in our Galactic Halo as origin of these events. Qualitatively, extragalactic sources of astrophysical nature appear the most likely accelerators. In these scenarios, unless only a handful of sources dominate the emission, the pattern of the arrival directions of the events should reflect to some extent the one of large scale structures in the nearby universe. Actually the degree of clustering observed is quite pronounced. It exceeds also the anisotropy expected in the minimal case of proton primaries (with a GZK horizon z ≃ 0.06), which is in marginal agreement with the data (see [13] and section III). Thus, the few prominent structures visible naturally suggest either a scenario where the UHE sky is dominated by few nearby powerful sources or one where UHECRs are produced by a relatively larger number of sources significantly biased with overdensities in the local universe (within z ≃ 0.02). Both scenarios require an important role of magnetic fields, either galactic or extragalactic, to accommodate deflections of the order > ∼ 10 • . In the former case such deflections are necessary to explain the large smearing of the point sources emission and in the latter to justify the lack of a significant correspondence between the data and the nearby galaxy clusters.

B. The models
The previous discussion motivates an extension of the previous analysis reported in [13] along two directions: (I) assuming a smaller horizon, i.e. biasing the correlation with LSS towards closer sources; (II) studying the impact of the GMF on the autocorrelation signature and on the cross-correlation signal.
Both extensions should be regarded as "first order" refinements of our previous study. The point (II) does not need much justification: it is important to establish the robustness of the previous results with respect to the effects of astrophysical magnetic fields. Even if extragalactic magnetic fields may have a major role in shaping or preserving the UHECR anisotropies, very little is known about them (see [22] and [23]). On the other hand we know for sure that a regular GMF exists-although we have only rough ideas on its magnitude and structureand it is in principle relevant for UHECR deflections, even in the case of pure proton composition. In the following, we shall then consider how our results change when data are corrected for the effects of a few GMF models available in the literature. In particular, we shall use the three models HMR, TT, and PS employed in [24], which we refer to for details. To account for the deflections in the GMF in our analyses we shall follow the back-tracking technique described in [25,26]. The technique consists in mapping the arrival CRs directions on the Earth backward outside the GMF to obtain a map of the GMF deflections. We then apply the mapping to the extragalactic expected CR map and correct it for the GMF displacements. The extragalactic CRs map F (E cut ,Ω) expected at an energy greater than E cut at the directionΩ is obtained as described in [4]. The map is then convolved with the GMF deflections to have F (E cut ,Ω(Ω ′ )) whereΩ(Ω ′ ) is the mapping produced by the back-tracking technique. This method is fully suited for the cases in which energy losses along the particle track are negligible and when the particle energy is large enough to exclude loops and/or trapped regions during the propagation. Both these conditions are satisfied for the UHECRs and for the GMFs we considered. Also note that an isotropic sky remains isotropic under the GMF transformation, in agreement with the expectation from the Liouville theorem. We refer to [24,25,26] for details. For simplicity we only consider the mapping produced for a fixed rigidity corresponding to the energy E cut (with the choice E cut =40 EeV in the present case). This should be a reasonable approximation, equivalent to replace the steep (∝ E −3 ) UHECRs spectrum above E cut with a delta-function at E cut . Beside the steepness of the UHECR spectrum, a further motivation for this approximation is that we are not considering the shift of single objects but of an overall map, already smoothed at a scale of order ∼ 5 • (we are not interested to the very small scales, indeed). Finally, we shall show in the next section that the auto-correlation analysis is quite insensitive to the details of the GMFs or the assumed rigidity as long as the magnetic deflections effects remains moderate, so that the approximation is also justified a posteriori, at least for autocorrelations studies. The effects are potentially larger for cross-correlation analyses, which however are already less robust for other reasons.
A further possible problem is given by the fact that the mask region present in the catalogue and excluded from the analysis is distorted by the effect of the GMF, so that in principle one should exclude, case by case, the regions which the mask is mapped into by the GMF. We neglect this effect assuming the the mask is approximately mapped in itself by the GMF transformation. This is a quite good approximation for the region near the galactic plane while it is not satisfied by the two narrow stripes. However, the the stripes amount to about 10% of the total mask and only roughly 2% of the whole sky, which is a very small bias for our purposes in this work.
From a qualitative point of view, the point (I) is reasonable, too. In many scenarios, only relatively nearby sources (if any) may be identifiable in cosmic ray maps. There are several plausible reasons for that. Even in absence of magnetic fields, an heavier composition implies a different energy-loss horizon for UHECRs [27,28]. For the energy threshold considered here (E > ∼ 4 × 10 19 eV), this is smaller than the proton one. In presence of extragalactic magnetic fields, the propagation of a UHECR may greatly differ from a straight line and in principle may even happen in a diffusive regime [29,30,31,32]. Although it is unlikely that the propagation is truly diffusive, Gpc-scale pathlengths for protons injected within a few hundreds Mpc may be common even above 10 EeV [33]. A non-negligible role of magnetic fields would have two consequences: for a given energy-loss mechanism, it is clear that the true horizon may be significantly shorter than the expected one. Thus, UHECRs above a given E th may be largely collected within a region smaller than the linear energy-loss horizon. More important, apart for energy losses, the longer the propagation time, the smaller the chance that intrinsic anisotropies may survive (in some form). Finally, since UHECR source likely have to meet special accelerator requirements, it is reasonable to conceive a relatively rare population of sources, possibly strongly biased with respect to LSS.
However, how to implement in practice point (I) is admittedly not model independent. One possibility may be to cut arbitrarily a LSS catalogue to some redshift z cut , and consider only correlations with structures within this distance, assuming for the rest that UHECRs are unbiased tracers of LSS (i.e. neglecting otherwise energy loss effects). Another possibility is to create anisotropy map templates of specific scenarios for UHECR composition, sources, and extragalactic magnetic fields, comparing them with the observed configurations of data in order to infer the best model. Although this will be the way to proceed when high statistics will be achieved, at the moment it could just dilute the basic consequence of our assumption (I) under a large number of unknown parameters. To keep some physical-inspired input in a toy model, we shall compare the distribution of data as in Fig. 1 with the LSS maps obtained by convolution of the PSCz catalogue with an energy-loss window function corresponding to protons twice more energetic, i.e. E = 8 × 10 19 eV, implying an effective horizon z ≃ 0.02 [4]. We shall denote this scenario as "small horizon" (SH), as opposed to the "proton horizon" (PH) as treated in [13] and corresponding to the minimal assumption of protons primaries with E ≥ 4 × 10 19 eV propagating in a negligible EGMF (usual GZK horizon z ≃ 0.06). In the top two panels of Fig. 4 we report the PH and SH maps. The smoothing is variable and it is related to the adaptive smoothing applied to the PSCz catalogue to minimize the effect of the shot noise. We emphasize that this should be considered a toy model, and not a realistic scenario for UHECR sources or composition. However, our toy model may be indicative of a plausible situation where at least for anisotropy searches the useful UHECR horizon is relatively short.

C. Statistical tools
For the statistical analysis we define the (cumulative) autocorrelation function w as a function of the separation angle δ as where Θ is the step function, N d the number of CRs considered and δ ij = arccos(cos ρ i cos ρ j + sin ρ i sin ρ j cos(φ i − φ j )) is the angular distance between the two cosmic rays i and j with coordinates (ρ, φ) on the sphere. Analogously, one can define the correlation function ξ(δ) as where δ ia is the angular distance between the CR i and the candidate source a and N s is the number of source objects considered. We perform a large number M ≃ 10 5 of Monte Carlo simulations of N data sampled from a distribution on the sky corresponding to the hypothesis H (e.g., uniform, LSS, etc.) and for each realization j we calculate the autocorrelation function w H j (δ). The sets of random data match the number of data for the different experiments passing the cuts after rescaling, and are spatially distributed according to the exposures of the experiments. The formal probability P H (δ) to observe an equal or larger value of the autocorrelation function by chance is where w ⋆ (δ) is the observed value for the cosmic ray dataset and the convention Θ(0) = 1 is being used. Relatively high values of P and 1 − P indicate that the data are consistent with the null hypothesis being used to generate the comparison samples, while low values of P or 1 − P indicate that the model is inappropriate to explain the data. That is, in the following we shall plot the function P (δ) × [1 − P (δ)], which vanishes if any of P or 1 − P vanishes and has the theoretical maximum value of 1/4. Thus, the higher its value is the more consistent the data are with the underlying hypothesis. Note also that by construction the values at different δ of the function P (δ) are not independent. To calculate the cross-correlation probability, we perform a large number M (≃ 10 5 ) Monte Carlo realization of N events is sampled according to the LSS probability distribution, and for each realization i we calculate the function ξ LSS i (δ). We generate analogously M random datasets from an uniform distribution, and calculate ξ uni j (δ). We have thus M 2 independent couples of functions (i, j). The fraction of the M 2 simulations where the condition ξ uni (δ) ≥ ξ LSS (δ) is fulfilled is the probability A technical detail of the analysis is related to the presence of the catalogue mask. This includes a zone centered on the galactic plane and caused by the galactic extinction and a few, narrow stripes which were not observed with enough sensitivity by the IRAS satellite. These regions are excluded from our analysis with the use of the binary mask available with the PSCz catalogue itself. This reduces the available sample by about 10%.

III. RESULTS
By repeating our analysis in [13] following the SH model and without considering for the moment the effects of the GMF, we obtain the results shown in Fig. 5. The SH model seems to explain extremely well the clustering properties of the data, with the related P × (1 − P ) curve almost coincident with the ideal P = 0.5 expectation. Not surprisingly, this can be understood after a visual inspection of the maps in Fig. 3 (data) and Fig. 4 (models). While the map from protons with E ≥ 4 × 10 19 eV (the PH model) is still too much isotropic with respect to the data, in the SH map the number of clusters and their distribution resemble much more the data, in that it leaves typical "voids" between clustered hot-spots observed.
Our next step is to investigate the effects of the galactic magnetic field. In the two bottom panels of Fig. 4 we show the effective modification of LSS structures happening for the PH and the SH scenario, assuming as example the HMR model in [24]. In general, besides the shifting of the positions of the structures, as expected the GMF introduces in the deflected maps also other peculiar lensing phenomena like shearing and (de)magnification [26]. More quantitatively, the effects of the GMF are studied in the following through the modifications induced in the auto and cross-correlation functions. In Fig. 6 we investigate the effect of the GMF on the signature in the autocorrelation function. We also show the effect of changing the rigidity of the particles. Actually, the equations of motion for CRs in the GMF only depend on the parameter C = B × Z/E where B, Z, E are respectively the GMF normalization, the particle atomic number (electric charge of the nucleus) and the particle energy. A combination of parameters that leaves unchanged C is thus completely degenerate from the point of view of propagation in the GMF (though, of course, not for the energy losses in the propagation in the extragalactic sky.) As a general consideration, we see that at least for the baseline cases considered, the correction for the GMF does not destroy the pattern in P (1 − P ), but can improve or worsen it at most by a factor of a few. On the other hand, extreme changes in C may significantly alter the pattern of the function.
Finally, in Fig. 7 we report the results of the cross correlation analysis. Note that, while in the previous case we were repeating the same test of Ref. [8], here we perform a different test, and to assess the confidence level of any statistically significant signal we might find one should carefully evaluate the penalty factor. Unfortunately, in no case we find a statistically significant signal (namely, not even at the nominal level). Yet, qualitatively all the SH maps show some improvement with a nominal P ξ ∼ 10% (with respect to P ξ ∼ 50% for the PH case), even without use of the GMF correction. This is understood since we have about the same number of clusters in the data and in the map and typically one can find a correspondence between the two within a radius of roughly 50 • . A more significant signal of cross-correlation should eventually peak within ∼ 25 • (the typical size of the clusters) signaling a superposition of the data and map clusters. Once again, no significant difference arises when the data are corrected for the GMF. Although the minimum of the probability can change by up to a factor of a few, it does not move towards δ ≃ 0 • , as it should be if the GMF were correctly shifting the hot-spots.
An important point to stress is that while an evidence of cross-correlation would be tantalizing signature of a discovery, the lack of it can not be easily used as an argument against the hypothesis. The cross-correlation signal is indeed much more sensitive than the autocorrelation one to magnetic fields deflection and, importantly, to unknown experimental systematic effects. For the case at hand, the main responsables of the displacement up to 50 degrees between clusters in the data and overdensities in the LSS are the hot-spots from the SUGAR data, which is the experiment among the ones considered which mostly suffers for a poor angular resolution, beside not well-understood systematics in the energy scale determination. Indeed, limiting the analysis to the northern hemisphere experiments only, the data show quite a good cross-correlation with the local over-density of matter, especially within z ∼ 0.02. More noticeably, they fall relatively close to the so-called Super-Galactic plane, as can be appreciated also from a comparison between Fig. 1 and Fig. 4. This correspondence was already noticed by the authors of ref. [21] and further assessed in ref. [34]. So, while the features in the cross-correlation function vary quite a bit excluding e.g. one dataset, the autocorrelation ones do not, as discussed more extensively in [8].
Aware of this caveat, it is still worth exploring the consequences of assuming that displacements up to 50 • with respect to the true sources are effective. Under this hypothesis, and, if the signal corresponds to extragalactic structures, we would be brought to conclude that: (i) if UHECRs are dominated by protons, then there are significant deflections by extragalactic magnetic fields. Indeed, although the GMF may be not well reproduced by current models, even changing within reasonable ranges the GMF geometry and intensity no appreciable crosscorrelation at small angles appears. (ii) If there is a significant fraction of heavy nuclei in the UHECR flux, results may be also explained with a negligible role of extragalactic magnetic fields, attributing to GMF deflections the significant (∼ 30 • -50 • ) displacement between the observed clusters in the data and the real galaxy clusters. Note that, if in any case overall deflections as large as ∼ 50 • are effective, peculiar manifestations of regular deflections in the magnetic field-like elongations directed towards structures with a proper ordering of energies-may not be observable due to non-negligible non-linearities, especially in the case of a chemically inhomogeneous sample of UHECRs. Yet, if this interpretation would turn out to be the correct one, we expect that once higher statistics will be available at higher energies, these features will eventually show up in the data, a prediction that can eventually be confirmed by Auger.

IV. DISCUSSION AND CONCLUSION
The anisotropy pattern of the combined UHECRs data (re-scaled energy E = 4 × 10 19 eV in HiRes scale), although compatible with isotropy at very large angular scales, shows a peculiar medium scale clustering corresponding to 6-7 spots of roughly 20 • -30 • degrees of extension distributed uniformly in the sky. If confirmed, this would have a wealth of consequences for the long-awaited astronomy of UHECRs. At a general level, the absence of a correlation with the Galactic Plane or the absence of an excess toward the Galactic Center disfavor respectively Galactic astrophysical sources and heavy relic decays in our Galactic Halo as origin of these events. Extragalactic sources of astrophysical nature appear instead the most likely accelerators consistent with these features. Should forthcoming data show similar features, the first realistic quantities one could extract from the clustering are constrains the number density of the sources as well as their type, from their bias with respect to LSS (see [35] and references therein). As next goal, one should be able to establish explicit cross-correlation with extragalactic structures and/or hints in that direction, as elongations of events directed toward the sources, whose distance should scale inversely to the UHECR energy. More "conventional" astronomy, determining the locations of single UHECR sources and perhaps of their spectrum is likely demanded to a subsequent phase when sufficient statis-tics will be available.
Although the clustering properties in the data are intriguing, the interpretation of the signature is puzzling, especially in absence of a significant statistics at higher energy and of chemical composition constraints. The comparison between significant sets of data at different energy cuts may reveal the importance of magnetic deflection effects. At the moment, we can only speculate on the possible implications of the signal-assuming it is not a statistical fluke-under some simplifying hypotheses.
One possibility is that these excesses may trace LSS overdensities in the near universe (within the GZKsphere). The autocorrelation analyses reported in this paper show that this interpretation is indeed favored in particular if the effective horizon is smaller than the GZK one for protons of the assumed energy. Both a significant fraction of heavier nuclei and a significant role of extragalatic magnetic fields may cause this effect (the former might be favored by recent Auger data [36].) Although not statistically significant, this interpretation may be supported by a weak hint of a broad minimum in the cross correlation function (at the level of nominal chance probability of 10%-15%) around 50 • if a small horizon (z < ∼ 0.02) is assumed. Both signatures are relatively robust with respect to deflections in typical GMF models, although some marginal improvement or worsening may arise for some choices of the GMF model and effective rigidities. In this case, the size of the hot-spots would be due partly to the one of the largest overdensities in the local LSS and partly to magnetic smearing needed to explain the overall deflection with respect to the LSS. The latter effect would be in general subleading but for the SUGAR hotspots in the Southern Emisphere, which are the most distant ones from overdensities. This may be physically associated to the more intense magnetic fields towards the central regions of our Galaxy to which SUGAR is pointing. An alternative interpretation of the data is that they are due to very few (O(5-6)) powerful sources. Yet, the smearing of a point-like emission to the level of the observed spots of O(20 • ) would require a quite extreme magnetized environment [29,30].
In any case, the hints for some structures in the data are very exciting, and we urge an independent cross-check with the nowadays large statistics collected by Auger. If confirmed, together with the indication for the presence of a GZK-like feature in the energy spectrum of HiRes data [3], this likely implies that UHECR are dominated by astrophysical sources (as opposed to exotic scenarios). However, far from being the end of the UHECR saga, the combined use of spectral information, chemical composition constraints, and anisotropy maps at different energies would offer the tools for the long-awaited hunt for the UHECR accelerators.