SciPost Submission Page
Signal region combination with full and simplified likelihoods in MadAnalysis 5
by Gaël Alguero, Jack Y. Araz, Benjamin Fuks, Sabine Kraml
This is not the latest submitted version.
|As Contributors:||Jack Araz · Benjamin Fuks · Sabine Kraml|
|Arxiv Link:||https://arxiv.org/abs/2206.14870v1 (pdf)|
|Date submitted:||2022-07-01 20:09|
|Submitted by:||Kraml, Sabine|
|Submitted to:||SciPost Physics|
|Approaches:||Theoretical, Computational, Phenomenological|
The statistical combination of disjoint signal regions in reinterpretation studies uses more of the data of an analysis and gives more robust results than the single signal region approach. We present the implementation and usage of signal region combination in MadAnalysis 5 through two methods: an interface to the pyhf package making use of statistical models in JSON-serialised format provided by the ATLAS collaboration, and a simplified likelihood calculation making use of covariance matrices provided by the CMS collaboration. The gain in physics reach is demonstrated 1.) by comparison with official mass limits for 4 ATLAS and 5 CMS analyses from the Public Analysis Database of MadAnalysis5 for which signal region combination is currently available, and 2.) by a case study for an MSSM scenario in which both stops and sbottoms can be produced and have a variety of decays into charginos and neutralinos.
Submission & Refereeing History
You are currently on this page
Reports on this Submission
Anonymous Report 1 on 2022-8-2 (Invited Report)
1. Pedagogical description of the implementation method.
2. Validation summaries for 9 searches.
1. Some of the validation results are problematic in terms of accuracy.
The paper "Signal region combination with the full and simplified likelihoods in MADANALYSIS5" by G. Alguero, J. Araz, B. Fuks, and S. Kraml is a welcome development in the field of recasting at the LHC. It provides a tool, as a part of the MadAnalysis package, for automatic combination of signal regions in the searches published by ATLAS and CMS. Several methods for combination are implemented, depending on the data provided by collaborations. A number or recent ATLAS and CMS searches are making use of the multiple-bin signal regions where the exclusion is decided on the basis of a fit to histograms in distributions of certain kinematic variables. This approach has advantages over the "best signal region" approach which used to be more common in the past (although still used widely). As the authors demonstrate, the method has a profound impact in the CMS searches which define a large number of signal region (~ 100).
In the second section of the paper, the authors provide a detailed description of the implementation of statistical methods and interfaces to various data sources. For the ATLAS searches they use PHYF package and JSON format input files provided by the experiment. For the CMS, they use covariance matrices provided by the collaboration. I find the examples showing the details of implementation particularly useful.
The next section is devoted to validation of the searches. While some of them show a clear advantage, in other cases one can have doubts about their validity. I will specifically list my questions here:
ATLAS-SUSY-2018-31: I am not sure what is the purpose of showing LO analysis. It is just for this search and is clearly off. Not surprisingly as the missing K-factors can easily be of order 1.5. Otherwise the agreement is excellent (but also the gain from the combination rather small).
ATLAS-SUSY-2018-04: The implementation inherits problems of the original implementation, ie. overshooting the acceptance rate by some 30%. What is puzzling to me is why they can reproduce the expected limit and fail with the observed one? I would naively expect a similar level of (dis)agreement. This question equally applies to ATLAS-SUSY-2019-08, CMS-SUS-16-039, CMS-SUS-19-006.
CMS-SUS-16-039: While the agreement in the observed limit is clearly impressive in the high mass region, there is clearly a problem in the 3-body decay region. Is it clear why? In any case the improvement with respect to the best SR is impressive.
CMS-SUS-19-006: The authors downplay the over-exclusion but to me it seems quite dangerous. If I read the plot correctly for the expected limit at 0 LSP mass, this shifts the bound from 2170 GeV to 2350 GeV. This means the upper limit on the cross section is factor 2.4 too strong. Perhaps it does not look that bad in the plot but numerically it is not a negligible number. I would like to see, if possible, how the upper limits on cross section compare in the exclusion plot (at least in a reasonable vicinity of the exclusion line).
The fourth section provides a toy example of a realistic MSSM setup (in contrast to the simplified models from the previous section). The advantage of the improved statistical treatment is clearly demonstrated.
My main request, apart from the minor questions in the main report, is to show how the upper limits on cross section compare between an experiment and a recast. This is certainly more interesting than the LO analysis presented for one of the searches. I am aware that these data are not always available but as far as I remember ATLAS used to provide such information in the auxiliary plots. I think this would would give more confidence on the validity of the presented approach.