Trials Factor for Semi-Supervised NN Classifiers in Searches for Narrow Resonances at the LHC

Benjamin Lieberman; Salah-Eddine Dahbi; Andreas Crivellin; Finn Stevenson; Nidhi Tripathi; Mukesh Kumar; Bruce Mellado

SciPost Submission Page

Trials Factor for Semi-Supervised NN Classifiers in Searches for Narrow Resonances at the LHC

by Benjamin Lieberman, Salah-Eddine Dahbi, Andreas Crivellin, Finn Stevenson, Nidhi Tripathi, Mukesh Kumar, Bruce Mellado

This Submission thread is now published as

SciPost Phys. Core 7, 073 (2024)

Submission summary

Authors (as registered SciPost users):

Benjamin Lieberman

Submission information
Preprint Link:	https://arxiv.org/abs/2404.07822v4 (pdf)
Date accepted:	2024-10-30
Date submitted:	2024-09-09 09:57
Submitted by:	Lieberman, Benjamin
Submitted to:	SciPost Physics Core

Ontological classification
Academic field:	Physics
Specialties:	High-Energy Physics - Phenomenology
Approaches:	Computational, Phenomenological

Abstract

To mitigate the model dependencies of searches for new narrow resonances at the Large Hadron Collider (LHC), semi-supervised Neural Networks (NNs) can be used. Unlike fully supervised classifiers these models introduce an additional look-elsewhere effect in the process of optimising thresholds on the response distribution. We perform a frequentist study to quantify this effect, in the form of a trials factor. As an example, we consider simulated $Z\gamma$ data to perform narrow resonance searches using semi-supervised NN classifiers. The results from this analysis provide substantiation that the look-elsewhere effect induced by the semi-supervised NN is under control.

List of changes

Referee 1: Comments and Response:

Dear Referee, we thank you for your continued review of our paper and constructive insight. We have carefully addressed each comment and provided responses below.

Comment 1.1. - Former comment #2. I still find it exaggerated that nearly half of the bibliography of this paper (35 references out of 73) is necessary to justify the choice of the Zγ example, that is just a showcase for the methodology. Moreover, among these 35 references, there are 15 self-citations (specifically, the entire set of chosen theory papers discussing the multi-lepton anomalies).
Additionally, the text is written in a way that might lead readers to believe that a model in which the Standard Model is extended by three scalars with well-defined masses has been confirmed by data. This is not the case, as such a confirmation could only be made by the LHC experiments (and not by any theoretical study). This should be made crystal clear, and the text should be rephrased accordingly.

Response 1.1. We thank the referee for his feedback and have re-written the introductory paragraph in question. We have improved the description of the model and anomalies used to motivate the selection of the Zγ dataset and reduced the number of self citations significantly. We believe that with these changes we provide a clear background and motivation to the selection of our showcase example without misleading the readers or using unnecessary self-citations.

Comment 1.2. - Finally, the authors mentioned that they chose the multi-lepton anomalies as a showcase for the proposed method. While this is fine, I do not understand their reluctance to add a second example. As they stated in their reply to my report, the methodology is independent of the chosen physics case. Therefore, there should be no reason to refuse to add a second, different and maybe more traditional, physics example.

Response 1.2. We agree with the referee that an additional example will be advantageous, however as performing the complete frequentist study takes hundreds of hours (±3 minutes per pseudo-experiment, with 125000 pseudo-experiment iterations), we believe the selected example is sufficient and can be used as a basis for further studies. However, we have added a sentence to the end of the conclusion stating that it would be beneficial to include in future studies.

Comment 2 - Former comment #3. The description of the simulation tool chain used in this work is now very clear, and I thank the authors for adding the associated text in section II.1. However, I do not understand why virtual-photon contributions have been ignored in the simulation of the signal. The process considered is pp→ℓℓγ with the lepton pair invariant mass being off the Z peak. Therefore, there is no reason to only consider Z-mediated diagrams, as the impact of the virtual-photon component on the signal is not negligible. This should be trivial to fix with MG5aMC.
Furthermore, the authors should explicitly state which PDF set they have used. Is it NNPDF 2.3 as suggested by the reference quoted?

Response 2. We apologize for not picking up this error which was due to an internal miscommunication. The virtual photon mediated diagram is considered and the statement in the paper has been amended. We thank the referee for this input as it will be important for future studies that include signal processes.
We thank the referee and have added the PDF set used, NNPDF 3.0, as suggested.

Referee 3: Comments and Response:

We thank the referee for their thorough review and valuable comments. After extensive internal discussions, we have agreed that the results should include the classical trials factor in addition to the version previously presented. Accordingly, we have re-written the results section to incorporate the calculation of the classical trials factor, which compares the probabilities of the maximum significances across all BR cuts and pseudo-experiments to the nominal values. This ensures a more comprehensive analysis and a clearer representation of the look-elsewhere effect in our study.

Comment 1: References 19-26 are woefully incomplete. The authors should do a more thorough and proper job surveying the literature and add many more references on the applications of weak supervision to anomaly searches at the LHC. A lot of work has been done by many authors on this topic and the authors are not giving the proper credit where credit is due.
Response 1: Thank you for your valuable feedback regarding the references. We have carefully reviewed your comment and have significantly expanded the literature citations in our manuscript, particularly in the area of weak supervision applied to anomaly searches at the LHC. We have added several key references to properly acknowledge the substantial body of work in this field and to give due credit to the contributions of many researchers. We believe this revision now more accurately reflects the extensive research efforts in this area.
Comment 2: "therefore the training samples should be
indistinguishable to the NN apart from statistical fluctuations."
What if the SB and SR events have systematic differences (ie the features are not uncorrelated with m)? The authors do not demonstrate that the CWoLa method is even valid here.
Response 2: Thank you for your comment. The features used to train the NN are carefully selected to not have correlation with the mass (mℓℓγ) as stated in section 2.1. “... The mℓℓγ distribution, and any features correlated with it, cannot be used to train the NN due to the fact that the training samples, mass window (144 < mℓℓγ < 156 GeV) and side-band (132 < mℓℓγ < 168 GeV excluding mass window), are defined on this mass.”). This was verified through the response of the neural network which consistently (for each pseudo-experiment) had AUC scores of ±0.5, providing evidence that no features have significant enough correlation with the mass to influence the NN.
Comment 3: I don't understand Fig 5b, why is the p-value more significant than expected from Z_T for lower background rejection fractions (eg for 0% selection, the p-value is 3 sigma when Z_T is 2 sigma)?
Response 3: In Figure 5b the p-values for lower background rejection fractions are more significant than expected from Z_T due to fewer positive significances yielded across all thresholds (as reflected in the distributions in Figure 5a). When extracting the corresponding p-values for positive significance thresholds this leads to higher significance from p-values which is exposed in Figure 5b. To make this clear we have added the following sentence to the results: “Note that, in Figure 5b, categories with lower background rejection differ more greatly

Comment 4: If the NN sculpts the m distribution this will also inflate p-values. How can the authors be sure that it is a LEE and not sculpting?
Response 4: Thank you for this insight. As we understand, sculpting occurs when the region of interest is close to the kinematic thresholds. We have therefore selected the mass range of 150GeV as it is far enough from kinematic limits (±100GeV) to make sure we are considering the LEE and not sculpting.

Published as SciPost Phys. Core 7, 073 (2024)

Reports on this Submission

Report #2 by Anonymous (Referee 7) on 2024-10-4 (Invited Report)

Strengths

The current version of the paper is much clearer.

Weaknesses

On the whole, the weaknesses identified in previous versions of the paper have been adequately addressed.

Report

With he changes made, the paper is acceptable. I recommend publication.

Recommendation

Publish (meets expectations and criteria for this Journal)

validity: good
significance: good
originality: good
clarity: good
formatting: good
grammar: good

Report #1 by Anonymous (Referee 6) on 2024-9-9 (Invited Report)

Report

It seems that the only item left is an item on which the authors and I agree to disagree. I would have liked to see a second, more traditional, example showcasing the methodology. Whilst the authors agreed with me, they mentioned that this would represent too much work and won't bring much relatively to the current example.

I therefore leave the decision to the editor (and won't block the acceptance process).

Recommendation

Publish (meets expectations and criteria for this Journal)

validity: -
significance: -
originality: -
clarity: -
formatting: -
grammar: -

SciPost Submission Page

Trials Factor for Semi-Supervised NN Classifiers in Searches for Narrow Resonances at the LHC

by Benjamin Lieberman, Salah-Eddine Dahbi, Andreas Crivellin, Finn Stevenson, Nidhi Tripathi, Mukesh Kumar, Bruce Mellado

This Submission thread is now published as

Submission summary

Abstract

List of changes

Reports on this Submission

Report #2 by Anonymous (Referee 7) on 2024-10-4 (Invited Report)

Strengths

Weaknesses

Report

Recommendation

Report #1 by Anonymous (Referee 6) on 2024-9-9 (Invited Report)

Report

Recommendation

Login to report or comment