SModelS database update v1.2.3

We present an update of the SModelS database with simplified model results from 13 ATLAS and 10 CMS searches for supersymmetry at Run 2. This includes 5 ATLAS and 1 CMS analyses for full Run 2 luminosity, i.e. close to 140/fb of data. In total, 76 official upper limit and efficiency map results have been added. Moreover, 21 efficiency map results have been produced by us using MadAnalysis5, to improve the coverage of gluino-squark production. The constraining power of the new database, v1.2.3, is compared to that of the previous release, v1.2.2. SModelS v1.2.3 is publicly available and can readily be employed for physics studies.


INTRODUCTION
SModelS [1,2,3] is a public software tool that enables the fast interpretation of simplified model results from ATLAS and CMS searches for supersymmetry (SUSY) in an automatised way. It can be used for evaluating the collider signals of any Beyond the Standard Model (BSM) scenario with a Z 2 -like symmetry, for which the signal acceptance of the SUSY searches apply [1]. The working principle of SModelS is to decompose all signatures occurring in a given model or scenario into simplified model topologies-also referred to as simplified model spectra (SMS)-by means of a generic procedure where each topology is defined by the vertex structure, the Standard Model (SM) and BSM final states; intermediate Z 2 -odd BSM particles are characterized only by their masses, production cross sections and branching ratios.
The signal weights, determined in terms of cross-sections times branching ratios, σ × BR, are then matched against a database of LHC results. This is easier and much faster than reproducing analyses with Monte Carlo event simulation, and it allows for reinterpreting searches which are not just cut and count, e.g. analyses which rely on BDT (boosted decision tree) variables. The downside is that the applicability is limited by the simplified model results available in the database. Moreover, whenever the tested signal splits up into many different channels, as often the case in complex models with many new particles, the derived limits tend to be highly conservative. SModelS is thus particularly useful for evaluating constraints and generally characterizing collider signatures in large scans and model surveys.
SModelS makes use of two types of experimental results: upper limit (UL) results and efficiency map (EM) results. Upper limit results provide 95% confidence level (CL) upper limits on σ × BR as a function of the respective parameter space of the simplified model-usually BSM masses or slices over mass planes. Their advantage is that the statistical evaluation (i.e. combination of signal regions when relevant, limit setting procedure, etc.) is done directly by the experimental collaboration.
Furthermore limits obtained from non cut and count analyses can also be used. However, their statistical interpretation is limited, only allowing for a excluded or not statement, on a purely topology-per-topology basis. Only if the expected UL maps are also available, it becomes possible to select the most sensitive result and/or to compute an approximate likelihood as a truncated Gaussian [4]. Efficiency maps correspond to grids of simulated acceptance times efficiency (A × ) values (simply called 'efficiencies' in the following) for the various signal regions of an analysis. Their advantage is that they allow for combining contributions from different simplified model topologies to the same signal region, and for computing the likelihood [2,3].
Besides speed, the power of SModelS comes from its large database of results, which is regularly updated. For Run 1, not counting superseded results, the SModelS database contains 93 official UL and 72 EM results from 17 ATLAS and 18 CMS analyses. In [5] we presented the implementation of the Run 2 SUSY search results from CMS with 36 fb −1 from the Moriond and the summer (LHCP and EPS) conferences of 2017; this amounted to 84 new UL maps from 19 different analyses. This was further augmented by CMS long-lived particle (HSCP and R-hadron) constraints [6] and a first set of 12 UL maps from six ATLAS SUSY analyses with 36 fb −1 in [3]. Moreover, we included 30 "home-grown" EM results relevant for constraining gluino-squark production at 8 TeV [3].
In the new v1.2.3 presented in this Letter, this is extended by the simplified model results from 13 new ATLAS and 10 CMS searches for SUSY at Run 2, including a first set of results for full Run 2 luminosity. In total, 76 official UL and EM results have been added. Moreover, 21 EM results (consisting of 351 individual EMs when counting each maps in each signal region separately) for 13 TeV were produced by us for better covering scenarios where gluino-squark associated production is important. In the following, we discuss in detail which results have been added and compare the constraining power of the new database, v1.  in the final state. The 5th column lists the specific SMS results included, using the shorthand "txname" notation (see text for details). For brevity, only the on-shell results are listed, although the off-shell ones are always also included (e.g., T2tt in the table effectively means T2tt and T2ttoff; see also [5]

NEW RESULTS IN THE DATABASE
The new (official) ATLAS and CMS results included in the v1.2.3 database are detailed in Tables 1 and 2. They concern all applicable new SUSY search results since SModelS v1.2.2, for which simplified model results are available in digital form on HEPData or the analysis' twiki page (status end of March 2020). Note that six analyses, five from ATLAS and one from CMS, are for full Run 2 luminosity of about 140 fb −1 . The home-grown EM results produced by us are listed in Table 3; we will come back to these later.
Inside SModelS, individual SMS results are identified by the analysis ID and the "txname", which describes in a shorthand notation the hypothesised SUSY process (largely following [30]). Due to lack of space we do not elaborate this naming scheme here, but refer the reader to our "SMS Dictionary" at https://smodels.github.io/docs/ SmsDictionary123, which provides a complete list of txnames together with the corresponding diagrams. Each included map is thoroughly validated to make sure it reproduces the limits reported in the experimental publication. Detailed validation plots for each result are available online at https://smodels. github.io/docs/Validation123.
A couple of comments are in order regarding the SMS (UL and EM) results in Tables 1 and 2. First, with the exception of ATLAS-SUSY-2018-06 [16], the UL maps provided by ATLAS contain only the observed limits but not the expected ones. This makes it impossible to deduce the statistically most sensitive analysis, or to estimate a CL for a given hypothesised signal. On the CMS side, SUS-17-003 and SUS-17-004 do not report expected limits.
Second, while we very highly appreciate the provision of EMs by ATLAS, in many cases the A × values are for the most sensitive (a.k.a. "best") signal region only. This is not optimal because the best signal region (at a given mass point) can FIGURE 1: Sample of SMS topologies with 2-5 jets relevant for gluino/squark production, for which "home-made" EM results have been produced with the MadAnalysis 5 recast codes [31,32] for the ATLAS and CMS 13 TEV multi-jet + E miss T searches [7,29]. From left to right: T2, TGQ, T3GQ, T1, and T5GQ.
depend on the tested signal and hence vary for different BSM scenarios. We therefore want to encourage ATLAS and CMS to provide EMs for all signal regions over the full parameter space of the considered simplified model. In case the number of signal regions is too large, as often the case for CMS searches, this might be done for a set of appropriately aggregated regions.
Third, some EMs provided by ATLAS were not included in the v1.2.3 database because they apply to a sum over SMS topologies instead of a single topology. This is in particular the case when mixed decays are assumed, e.g.χ 0 2 →χ 0 1 + Z or h 0 with fixed BRs, orχ ± decays via sleptons, with the contributions of charged sleptons and sneutrinos summed over. Such EM results currently cannot be used in SModelS because the relative contributions of each topology cannot be disentangled.
A general discussion of simplified model results and recommendations regarding their presentation can be found in section II.A.5 of the recent report of the LHC Reinterpretation Forum [33].
Related with the second point above, we note that ATLAS has started to release full likelihoods in the form of a json serialisation [34], which should describe background correlations at the same fidelity as the likelihood model used in the experiment. So far this is available for two analyses, the sbottom multi-bottom search [17] (SUSY-2018-31) and the search for direct stau production [15] (SUSY-2018-04), both for full Run 2 luminosity. Both analyses also provide detailed EMs for the simplified models they consider. This is a pioneering step forward in the presentation of BSM searches, that can greatly improve the preservation and re-use of the experimental results. We are currently working on an interface to pyhf [35] to make full use of these data (see contribution no. 15 in [36]). In the meanwhile, the TStauStau and T6bbHH EMs from [15,17] are not included in the new SModelS v1.2.3 database, because using only the best signal regions for limit setting leads to over-or underexclusions compared to the UL results -we think that this is due to background fluctuations and thus postpone usage of the TStauStau and T6bbHH EMs until the full json/pyhf implementation is available in SModelS.
Let us now turn to the home-grown EMs in Table 3. For a good coverage of complex models with several new particles, it is crucial that all major contributions to the total signal considered by a particular analysis can be taken into account. In the SUSY context this means that a large set of EM results is required in particular for the generic gluino/squark searches, see [37,38]. Concretely, it is important to cover topologies arising from gluino-squark associated production in addition to the usual gluino-pair and squark-pair productions [37].
We therefore turned to recasting the ATLAS and CMS multi-jet + E miss T searches [7,29] with MadAnalysis 5 [39,40,41] in order to produce such a set of EMs for 13 TeV which were then incorporated to the SModelS v1.2.3 database. (Similar home-grown EMs for 8 TeV were already included in v1.2.2 [3].) We considered topologies with 2-5 jets in the final state as shown in Figure 1, to allow for the combination of gluino-pair, squark-pair and gluino-squark associated production. This was augmented with EMs for additional topologies considered in the experimental searches (T5WW, T5ZZ and T6WW for the ATLAS analysis; T1bbbb, T1tttt, T2bb and T2tt for the CMS one) giving the sets listed in Table 3.
For each signal topology, we simulated 10,000 events per parameter point, with the number of parameter points per topology ranging between 251 (CMS-SUS-16-033, T1bbbb) and 2635 (CMS-SUS-16-033 and ATLAS-SUSY-2016-07, TGQ). The total number of parameter points amounts to 22829. We used MadGraph5 aMC@NLO [42] to simulate the hard scattering (with one additional hard jet) processes and Pythia 8 [43,44] for the decays and parton shower, employing the MLM scheme for matching and merging. The events were then subjected to the MadAnalysis 5 framework, which uses Delphes 3 [45] for emulation of the detector response. The concrete recast codes used were [31] and [32] for the ATLAS and CMS searches, respectively, each with its specific Delphes 3 configuration. Note that [32] employs the aggregate regions of [29], which (as also mentioned in the CMS paper) gives somewhat weaker limits than the full analysis. In a final step, the efficiencies and their relative errors were read from the MadAnalysis 5 output and adapted for SModelS to form a total of 351 individual EMs (22 signal regions × 10 topologies for ATLAS-SUSY-2016-07 and 12 aggregate regions × 11 topologies for CMS-SUS-16-033, minus one which has only zero efficiencies for one topology in one region).

PHYSICS IMPACT
We demonstrate the increase in constraining power of SModelS owing to this database update upon the minimal supersymmetric standard model (MSSM) with 19 free parameters defined at the weak scale-the so-called phenomenological MSSM (pMSSM). To this end we make use of the extensive dataset from the ATLAS pMSSM study [46] available at [47]. Concretely, we use the ATLAS pMSSM scan points with a binolike neutralino as the lightest supersymmetric particle (LSP), which were not excluded by ATLAS at 8 TeV. This amounts to about 61.4K points with sparticle masses up to 4 TeV, of which 28.4K (46%) are excluded by SModelS v1.2.2. This increases to 34.4K excluded points (56%) with v1.2.3. 1 Of the 6K newly excluded points, 27% are excluded only by the home-grown EMs SModelS reports its results in the form of r-values, defined as the ratio of the theory prediction over the observed upper limit, for each experimental constraint that is matched in the database. All points for which at least one r-value equals or exceeds unity (r max ≥ 1) are considered as excluded. It is instructive to see which analyses/results thus turn out   The importance of the combination of contributions from the topologies shown in Figure 1 is best shown upon an explicit example. To this end we choose 424686577.slha from the ATLAS pMSSM dataset [47]. This point features heavy squarks (mq ∼ 2.5-3.5 TeV), a light gluino (mg 976 GeV) and an LSP which is close in mass to the gluino (mχ0 1 908 GeV). The gluino decays to the LSP via two competing modes, 3-body decays into the LSP plus two quarks (BR 40% for qqχ 0 1 , 16% for bbχ 0 1 ) or loop decays into the LSP plus a gluon (BR 44%). As a result, the signal of gluino-pair production consists of 16% T1, 19% T2 and 35% TGQ. The heavy squarks, on the other hand, decay into the gluino plus a quark with BRs of 90-99%. Thus, gluino-squark associated production generates the T3GQ and T5GQ topologies from Figure 1 (the contribution from pair production of squarks is subdominant). Table 4 lists the individual contributions of each topology to the total signal for the best signal region of the CMS-SUS-16-033 analysis. As we can see, each individual topology contributes with an r-value less than 0.3 and only the combination of all contributions allows SMod-elS to exclude this particular point. This is only possible due to the the efficiency maps listed in Table 3.

USAGE
The new database presented here is shipped with the SModelS v1.2.3 package, but can also be used with the earlier v1.2.x code versions. The easiest way is by specifying the path to the database URL. When using runSModelS.py, this means setting path = http://smodels.github.io/database/official123 in the parameters file. In SModelS v1.2.3, also the shorthand notation path = official can be used. When writing one's own python main program, one has to set database = Database("...") where the dots stand for the same database URL as above. The official, pre-compiled pickle file official123.pcl (680 MB) is then downloaded upon first execution. Note that this download it is often faster than parsing the text database oneself.
Users who want to update the text database in an existing SModelS v1.2.x installation, can download the .zip or .tar.gz file from https://github.com/SModelS/smodels-database-release/ releases. It suffices to put this tarball into the main smodels folder and explode it there. That is, the following steps need to be performed mv smodels-database-v1.2.3.tar.gz <smodels folder> cd <smodels folder> tar -xzvf smodels-database-v1.2.3.tar.gz rm smodels-database-v1.2.3.tar.gz The new database will be unpacked into the smodels-database directory, replacing the previous version and the pickle file will then be automatically rebuilt on the next run of SModelS. For a clean installation, it is recommended to first remove the previous database version. If the tarball is unpacked to another location, one has to correctly set the SModelS database path when running SModelS. If using runSModelS.py, this is done in the parameters.ini file.

CONCLUSIONS
We presented the update of the SModelS database with the simplified model results from 13 ATLAS and 10 CMS SUSY analyses from Run 2 with 36-139 fb −1 of data. This comprises 76 new official UL and EM results from ATLAS and CMS, supplemented by 21 EM results produced by us using MadAnal-ysis5. These results significantly improve previously available constraints.
In total, the SModelS v1.2.3 database now contains 170 UL and 42 EM results 2 from 23 ATLAS and 29 CMS analyses at 13 TeV, plus a large number of SMS results for 8 TeV. The SModelS package is publicly available and can readily be used to constrain arbitrary BSM models which have a Z 2 symmetry, provided the SMS assumptions [1,2] apply. We will continue to update it as new ATLAS and CMS search results become available.