Trials Factor for Semi-Supervised NN Classifiers in Searches for Narrow Resonances at the LHC

Benjamin Lieberman; Andreas Crivellin; Salah-Eddine Dahbi; Finn Stevenson; Nidhi Tripathi; Mukesh Kumar; Bruce Mellado

SciPost Submission Page

Trials Factor for Semi-Supervised NN Classifiers in Searches for Narrow Resonances at the LHC

by Benjamin Lieberman, Andreas Crivellin, Salah-Eddine Dahbi, Finn Stevenson, Nidhi Tripathi, Mukesh Kumar, Bruce Mellado

This is not the latest submitted version.

This Submission thread is now published as

SciPost Phys. Core 7, 073 (2024)

Submission summary

Authors (as registered SciPost users):

Benjamin Lieberman

Submission information
Preprint Link:	https://arxiv.org/abs/2404.07822v1 (pdf)
Date submitted:	2024-04-22 16:26
Submitted by:	Lieberman, Benjamin
Submitted to:	SciPost Physics Core

Ontological classification
Academic field:	Physics
Specialties:	High-Energy Physics - Phenomenology
Approaches:	Computational, Phenomenological

Abstract

To mitigate the model dependencies of searches for new narrow resonances at the Large Hadron Collider (LHC), semi-supervised Neural Networks (NNs) can be used. Unlike fully supervised classifiers these models introduce an additional look-elsewhere effect in the process of optimising thresholds on the response distribution. We perform a frequentist study to quantify this effect, in the form of a trials factor. As an example, we consider simulated $Z\gamma$ data to perform narrow resonance searches using semi-supervised NN classifiers. The results from this analysis provide substantiation that the look-elsewhere effect induced by the semi-supervised NN is under control.

Current status:

Has been resubmitted

Reports on this Submission

Report #2 by Anonymous (Referee 2) on 2024-5-24 (Invited Report)

Cite as: Anonymous, Report on arXiv:2404.07822v1, delivered 2024-05-24, doi: 10.21468/SciPost.Report.9128

Strengths

1- The paper highlights and addresses an important issue namely an additional multiple-testing problem (aka “the look elsewhere effect”) that arises when weakly-supervised models are used in searches for new particles.
2- The investigation of different generative models (WGAN, VAE, KDE) to create fast simulators yielded useful information about the effectiveness of these models for the kind of data used.
3- The methodology presented is a step in the right direction in trying to quantify the additional look elsewhere effect.
4- The steps in the methodology are clear but see Weaknesses.

Weaknesses

1- While the methodology is a step in the right direction, it is far from clear that the proposed way to handle the statistically dependent Z-values or, equivalently the statistically dependent p-values, is superior to methods that are already in use for the standard look elsewhere effect.
2- The authors point out correctly that the NN classifier needs to be statistically independent of the lepton-lepton-photon mass because the latter is used to define the two samples that are used for training, but they do not provide evidence that this is the case.
3- There is some inconsistency in notation in Eqs. (1), (2) and (4). The symbols for invariant mass and Z seem to change. The invariant mass is denoted by a symbol with lepton and photon subscripts in the text but is denoted by m in Eq. (2). In Eq. (3) the symbol Z is used for signal significance. But in Section 4, which introduces the “probability density functions (PDF) of the significances”, the presumption is that these are the PDFs of the Z-values. If so, one might have expected to see symbols such as fi (Zi), i = 1,…,6 for each of the six significances Zi, but instead one sees the symbol f (sigma). Since Z (from Eq. (3), which is a well-known generalization of Ns / sqrt(Nb)) can be interpreted as the “number of standard deviations above the background”, it is unclear what “sigma” is intended to represent in f (sigma). It could be sigma = Z sqrt(Nb)), but that does not seem to be what is intended. This needs to be clarified. Furthermore, from Eq. (3) it is unclear why the domain of f (sigma) should extend to negative values. That also needs clarification.
4- It is unclear why the fits are not performed so that they give Ns and Nb directly. For example, if the standard particle physics toolkit RooFit were used that would be the case since the probability densities are automatically normalized.
5- In Section 4, it is unclear why the average PDF (Eq. (6)) is the appropriate quantity to use to compute the global p-value. (See discussion in Report.)

Report

Since the purpose of using a weakly supervised NN is to look for statistically significant signals without bias, the authors need to strengthen the motivation for their specific analysis of the pseudo-experiments. For example, one might argue that one should take the largest Z-value under the null hypothesis and use its PDF to assess how frequently a >= Z-sigma fake signal occurs. Ultimately, the trials factor is not what is of interest, but rather getting the p-value that automatically incorporates that factor.

In Section 4, the authors note correctly that the six Z-values are not statistically independent, but then argue in the next sentence that “This approach facilitates an examination of the impact of background rejection on the results…”. This reviewer would agree if the goal were to answer the question: “Given this level of background rejection what is the probability to obtain a signal significance Z (as defined in Eq. (3)) equal to or larger than a specified number”. But to quantify the look-elsewhere effect induced by the NN it is necessary, somehow, to deal with the six Z-values. The authors’ proposal in Eq. (6) suffers from the lack of clarity about the intended meaning of “sigma”. But if sigma really means Z, it is unclear why the average PDF, Eq. (6), which is billed as the “global” PDF is the appropriate quantity to use to compute the global p-value. Is it not necessary to account for the statistical dependencies between the Z-values if all six are used?

Requested changes

1- Remove inconsistencies in notation.
2- Motivate the use of Eq. (6).
3- Given an analysis that yields K statistically dependent p-values (or Z-values) explain why methods such as the Bonferroni correction are not sufficient to account for this particular look elsewhere effect or why methods already in use in particle physics cannot be applied to this look elsewhere effect.

Recommendation

Ask for major revision

validity: good
significance: ok
originality: ok
clarity: low
formatting: reasonable
grammar: good

Report #1 by Anonymous (Referee 1) on 2024-5-6 (Invited Report)

Cite as: Anonymous, Report on arXiv:2404.07822v1, delivered 2024-05-06, doi: 10.21468/SciPost.Report.8999

Report

The manuscript introduces an innovative approach for detecting anomalies indicative of the production and decay of a resonance beyond the Standard Model. Employing classifiers constructed within a semi-supervised neural network framework, the method ensures assessment of look-elsewhere effects. A demonstration of the approach is provided in the context of potentially observing a signal within the $Z\gamma$ final state.

While the subject matter holds significant interest and the manuscript could certainly warrant consideration for publication within SciPost Physics Core, enhancing its comprehensiveness through further clarifications and elaborations would be beneficial. Specifically, it would be advisable for the authors to explore one or two additional illustrations of their method, including for instance conventional resonance searches with final states comprising a pair of jets, leptons, or photons, and to assess the impact of background systematics.

I now proceed with a list of comments that should be addressed by the authors.

1 - The introduction of the article could benefit from a discussion on experimental new physics searches utilising unsupervised machine learning methods, as well as a more comprehensive explanation of the distinctions/similarities between the proposed semi-supervised technique and other (semi-supervised) methods used for anomaly detection. The conclusions should be rewritten to reflect these considerations.

Furthermore, clarification is needed regarding the assertion that the proposed method is less model-dependent than other methods, especially semi-supervised or unsupervised ones.

2- The manuscript illustrates its methodology through resonance searches in the $Z\gamma$ final state. While the choice of this example is motivated by existing anomalies in LHC data, care should be taken to streamline the referencing and ensure clarity and conciseness in the justification. Consideration should be given to replacing the second paragraph on page 3 with a succinct statement detailing the rationale behind choosing the $Z\gamma$ analysis as an illustrative example of the proposed methodology. The heavy reliance on 16 self-citations out of 21 references, which exclude many relevant experimental papers, is in my opinion unnecessary in light of the actual topic of the present manuscript.

Furthermore, it is essential to accurately characterise the origin of these anomalies. Properly distinguishing between those confirmed by LHC collaborations and those proposed by phenomenological works, which may lack access to comprehensive statistical treatment, is crucial.

In addition, as written above, including other illustrative examples based on standard resonance searches in dijet, diphoton or dilepton final states, would be beneficial for readers.

3 - Section 2.1 lacks sufficient information on the simulation toolchain used. Event generation for the $pp\to Z\gamma \to \ell\ell\gamma$ process seems to enforce the intermediate $Z$ -boson to be on-shell. However, since the mass window in $m_{\ell\ell\gamma}$ is large enough, off-shell $Z$ contributions, virtual-photon contributions, and their interference are relevant. It remains unclear whether they have been properly accounted for.

Furthermore, the discussion on the chosen parton density set is unclear. It is essential to clarify whether next-to-leading-order matrix elements have been consistently convolved with next-leading-order parton densities, and not leading-order ones.

Finally, the text does not clearly distinguish between generator-level cuts and reconstructed-level cuts that are implemented in the simulation. Providing a clear delineation between these sets of cuts is crucial. Additionally, Section 2.1 should include details on preselection criteria, like cuts on the number of leptons and photons, that are currently not discussed.

4 - The manuscript should define central jets and specify the associated pseudo-rapidity cut.

5 - Figures 1 and 2 should be adjusted to improve readability. The missing transverse energy spectrum could be presented with a log scale or a reduced domain to enhance clarity. Additionally, in figure 2, all eight lower insets should indicate whether they refer to the sideband or signal mass window. In fact, consideration should be given to showing both these curves.

6 -The caption of figure 4 should define the acronym 'BR' for clarity.

7 - In Section 3.3, the manuscript should avoid using the term 'centre of mass' to refer to the 'center of the signal mass window', as 'centre of mass' has a different well-defined meaning.

8 - It would be instructive to assess the impact of background systematics on the calculation of local significance, especially considering the incomplete background modeling acknowledged by the authors in Section 2. Equation (3) should be generalised accordingly, and the results of Section 4 updated subsequently.

9 - The bibliography should be carefully proofread to correct any errors. Specifically, attention should be given to identifying and correcting duplicate references (like references [2] and [3]), updating references that are now published (like reference [23]), and ensuring insertion of complete references (like [45] and [46]).

Recommendation

Ask for major revision

validity: -
significance: -
originality: -
clarity: -
formatting: -
grammar: -

SciPost Submission Page

Trials Factor for Semi-Supervised NN Classifiers in Searches for Narrow Resonances at the LHC

by Benjamin Lieberman, Andreas Crivellin, Salah-Eddine Dahbi, Finn Stevenson, Nidhi Tripathi, Mukesh Kumar, Bruce Mellado

This is not the latest submitted version.

This Submission thread is now published as

Submission summary

Abstract

Current status:

Reports on this Submission

Report #2 by Anonymous (Referee 2) on 2024-5-24 (Invited Report)

Strengths

Weaknesses

Report

Requested changes

Recommendation

Report #1 by Anonymous (Referee 1) on 2024-5-6 (Invited Report)

Report

Recommendation

Login to report or comment