SciPost logo

SciPost Submission Page

Trials Factor for Semi-Supervised NN Classifiers in Searches for Narrow Resonances at the LHC

by Benjamin Lieberman, Andreas Crivellin, Salah-Eddine Dahbi, Finn Stevenson, Nidhi Tripathi, Mukesh Kumar, Bruce Mellado

Submission summary

Authors (as registered SciPost users): Benjamin Lieberman
Submission information
Preprint Link: https://arxiv.org/abs/2404.07822v2  (pdf)
Date submitted: 2024-06-28 11:15
Submitted by: Lieberman, Benjamin
Submitted to: SciPost Physics Core
Ontological classification
Academic field: Physics
Specialties:
  • High-Energy Physics - Phenomenology
Approaches: Computational, Phenomenological

Abstract

To mitigate the model dependencies of searches for new narrow resonances at the Large Hadron Collider (LHC), semi-supervised Neural Networks (NNs) can be used. Unlike fully supervised classifiers these models introduce an additional look-elsewhere effect in the process of optimising thresholds on the response distribution. We perform a frequentist study to quantify this effect, in the form of a trials factor. As an example, we consider simulated $Z\gamma$ data to perform narrow resonance searches using semi-supervised NN classifiers. The results from this analysis provide substantiation that the look-elsewhere effect induced by the semi-supervised NN is under control.

Author comments upon resubmission

Dear Editors,

We sincerely thank the referees for their thorough and thoughtful review of our paper. Their insights are invaluable to us, and we have addressed each of their comments in detail in the following response.

Kind Regards,
Benjamin Lieberman and authors

List of changes

Response to Referee report 1 Comments:
Comment 1: The introduction of the article could benefit from a discussion on experimental new physics searches utilising unsupervised machine learning methods, as well as a more comprehensive explanation of the distinctions/similarities between the proposed semi-supervised technique and other (semi-supervised) methods used for anomaly detection. The conclusions should be rewritten to reflect these considerations.
Furthermore, clarification is needed regarding the assertion that the proposed method is less model-dependent than other methods, especially semi-supervised or unsupervised ones.
Response 1:
We have improved the introductory paragraphs introducing the proposed semi-supervised technique. This includes the introduction and comparison of unsupervised, semi-supervised and weakly supervised methods with corresponding added references. Furthermore, we have added a discussion comparing the extent of model dependence. Although clarification that the proposed method is less model-dependent than alternative semi- or weakly supervised would provide a valuable study, it is largely dependent on the specific signal, background and region of interest and is not the focus of this study.

Comment 2: The manuscript illustrates its methodology through resonance searches in the 𝑍𝛾 final state. While the choice of this example is motivated by existing anomalies in LHC data, care should be taken to streamline the referencing and ensure clarity and conciseness in the justification. Consideration should be given to replacing the second paragraph on page 3 with a succinct statement detailing the rationale behind choosing the 𝑍𝛾 analysis as an illustrative example of the proposed methodology. The heavy reliance on 16 self-citations out of 21 references, which exclude many relevant experimental papers, is in my opinion unnecessary in light of the actual topic of the present manuscript.
Furthermore, it is essential to accurately characterise the origin of these anomalies. Properly distinguishing between those confirmed by LHC collaborations and those proposed by phenomenological works, which may lack access to comprehensive statistical treatment, is crucial.
In addition, as written above, including other illustrative examples based on standard resonance searches in dijet, diphoton or dilepton final states, would be beneficial for readers.

Response 2: In our analysis we selected the 𝑍𝛾 final state, motivated by the multi-lepton anomalies as an ideal showcase for an analysis of semi-supervised learning for narrow resonance with topological requirements. Although 𝛾𝛾 is similarly motivated by the multi-lepton anomalies, we selected 𝑍𝛾. The methodology and results stand independent of the anomaly at 152GeV and use it only as a showcase. Therefore even if 152GeV excess goes away, the paper will still stand as a showcase. We have removed unnecessary self-citations and added relevant references from the LHC collaborations to better substantiate our motivation.

Comment 3: Section 2.1 lacks sufficient information on the simulation toolchain used. Event generation for the 𝑝𝑝→𝑍𝛾→ℓℓ𝛾 process seems to enforce the intermediate 𝑍-boson to be on-shell. However, since the mass window in 𝑚ℓℓ𝛾 is large enough, off-shell 𝑍 contributions, virtual-photon contributions, and their interference are relevant. It remains unclear whether they have been properly accounted for.
Furthermore, the discussion on the chosen parton density set is unclear. It is essential to clarify whether next-to-leading-order matrix elements have been consistently convolved with next-leading-order parton densities, and not leading-order ones.
Finally, the text does not clearly distinguish between generator-level cuts and reconstructed-level cuts that are implemented in the simulation. Providing a clear delineation between these sets of cuts is crucial. Additionally, Section 2.1 should include details on preselection criteria, like cuts on the number of leptons and photons, that are currently not discussed.

Response 3:
Our simulation accounts for the fact that the ℓℓγ invariant mass of 130 to 170 GeV necessitates the Z boson to be off-shell. We configured MadGraph to include off-shell Z boson contributions, although a prompt photon is considered. The event generation utilized next-to-leading-order (NLO) parton distribution functions (PDFs) convolved with NLO matrix elements, ensuring consistency and accuracy in the simulation. Generator-level cuts were applied during the event generation in MadGraph, specifically imposing an invariant mass cut on the ℓℓγ system to select events within the 130-170 GeV range. These generated events were then processed through PYTHIA for parton showering and DELPHES for detector simulation, where reconstructed-level cuts were applied to mimic experimental conditions. Reconstructed-level cuts included detailed preselection criteria such as the number of leptons and photons, their transverse momenta, and isolation requirements. Furthermore, overlap removal procedures were implemented to avoid double-counting objects. This involved removing jets that were too close to leptons or photons to ensure distinct identification of each particle. These updates, along with more detailed descriptions, have been included in the revised version of the paper.

Comment 4: The manuscript should define central jets and specify the associated pseudo-rapidity cut.

Response 4: We have added the definition and associated pseudo-rapidity cut for central jets to Section 2.1. Monte Carlo Simulation.

Comment 5: Figures 1 and 2 should be adjusted to improve readability. The missing transverse energy spectrum could be presented with a log scale or a reduced domain to enhance clarity. Additionally, in figure 2, all eight lower insets should indicate whether they refer to the sideband or signal mass window. In fact, consideration should be given to showing both these curves.

Response 5: We have updated Figures 1 and 2 to improve readability. Firstly the domain of the missing transverse energy spectrum, the number of jets and the number of central jets have been reduced. Secondly the overlap between the plots and the legends has been removed. Finally, in Figure 2 the lower insets have been updated to include the relative difference for both the side band and mass window categories with corresponding legends.

Comment 6: The caption of figure 4 should define the acronym 'BR' for clarity.

Response 6: We have added a definition for the acronym ‘BR’ in the caption of Figure 4 for improved clarity.

Comment 7: In Section 3.3, the manuscript should avoid using the term 'centre of mass' to refer to the 'center of the signal mass window', as 'centre of mass' has a different well-defined meaning.

Response 7: To avoid misinterpretation we have replaced our use of ‘center of mass’ with ‘center of the signal mass window’ as suggested.

Comment 8: It would be instructive to assess the impact of background systematics on the calculation of local significance, especially considering the incomplete background modeling acknowledged by the authors in Section 2. Equation (3) should be generalised accordingly, and the results of Section 4 updated subsequently.

Response 8: The study focuses on measuring the look elsewhere effect. As background systematics are additionally applied and conventionally applied elsewhere it has been factored out of this study and can be applied after. The background systematics must therefore be applied additionally to this case study before it can be applied to a specific excess. The background systematics are obtained by using different fitting functions which however does not impact the DNN part of the analysis.

To make this clear to the reader we have added a footnote (“Note that the systematics related to the background fitting functions, i.e. the spurious signal analysis, are not included here. However, this uncertainty should be included on top of the additional look-elsewhere effect studied here and is not related to the use of NNs.”) following Equation 3.

Comment 9: The bibliography should be carefully proofread to correct any errors. Specifically, attention should be given to identifying and correcting duplicate references (like references [2] and [3]), updating references that are now published (like reference [23]), and ensuring insertion of complete references (like [45] and [46]).

Response 9: Firstly, we apologize for the extent of error in the bibliography. We have removed duplicate references, updated references to now published articles, and corrected incomplete references.

Response to Referee report 2 Comments:
Weaknesses:
Comment (weakness) 1: While the methodology is a step in the right direction, it is far from clear that the proposed way to handle the statistically dependent Z-values or, equivalently the statistically dependent p-values, is superior to methods that are already in use for the standard look elsewhere effect.

Response (weakness) 1: The methodology is designed to evaluate the potential look-elsewhere effect arising from semi-supervised classifiers, rather than the standard look-elsewhere effect. This method employs the frequentist framework to calculate this effect directly (through statistics of multiple tests), instead of relying on the approximations typically used for calculating the classic look-elsewhere effect.

Comment (weakness) 2: The authors point out correctly that the NN classifier needs to be statistically independent of the lepton-lepton-photon mass because the latter is used to define the two samples that are used for training, but they do not provide evidence that this is the case.

Response (weakness) 2: To make clear the dependence of the NN training samples on the lepton-lepton-photon mass, we have added the definitions of the side band and mass window categories to Section 2.1. (previously only in Section 3.3).

Comment (weakness) 3: There is some inconsistency in the notation in Eqs. (1), (2) and (4). The symbols for invariant mass and Z seem to change. The invariant mass is denoted by a symbol with lepton and photon subscripts in the text but is denoted by m in Eq. (2). In Eq. (3) the symbol Z is used for signal significance. But in Section 4, which introduces the “probability density functions (PDF) of the significances”, the presumption is that these are the PDFs of the Z-values. If so, one might have expected to see symbols such as fi (Zi), i = 1,…,6 for each of the six significances Zi, but instead one sees the symbol f (sigma). Since Z (from Eq. (3), which is a well-known generalization of Ns / sqrt(Nb)) can be interpreted as the “number of standard deviations above the background”, it is unclear what “sigma” is intended to represent in f (sigma). It could be sigma = Z sqrt(Nb)), but that does not seem to be what is intended. This needs to be clarified. Furthermore, from Eq. (3) it is unclear why the domain of f (sigma) should extend to negative values. That also needs clarification.

Response (weakness) 3: With regards to notation please see requested changes 1 and response.
In Equation 3 the domain is extended to negative values to allow a full analysis of the dynamics of each sample. If a negative signal yield is found then the corresponding significance must reflect that. This is used to verify the fitting process and therefore confirm the validity of the positive only values of interest.

Comment (weakness) 4: It is unclear why the fits are not performed so that they give Ns and Nb directly. For example, if the standard particle physics toolkit RooFit were used that would be the case since the probability densities are automatically normalized.

Response (weakness) 4: The fit methodology was initially achieved using the RooFit toolkit as well as asymptotic calculator to calculate the significance as suggested. This was repeated using the more “manual” process described in the paper (after verifying the results reflected those of the internal toolkit methods) to provide a more comprehensive and clearer methodology.

Comment (weakness) 5: In Section 4, it is unclear why the average PDF (Eq. (6)) is the appropriate quantity to use to compute the global p-value. (See discussion in Report.)

Response (weakness) 5: Please see requested changes 2 and response.
Requested Changes:
Comment (requested change) 1: Remove inconsistencies in the notation

Response (changes) 1:
To improve consistency of notation we have updated equations 2 to use the correct mass notation (mℓℓγ). We have updated Equations 4, 5, 6 and 7 as well as Figure labels and in text references to use “Z” for significance rather than misleading use of σ (e.g. changed f(σ)BR to f(Z)BR).

Comment (requested change) 2: Motivate use of Eq. 6 (... it is unclear why the average PDF, Eq. (6), which is billed as the “global” PDF is the appropriate quantity to use to compute the global p-value. Is it not necessary to account for the statistical dependencies between the Z-values if all six are used?)

Response (changes) 2: We acknowledge that Eq. 6 was misleading. We have therefore removed Eq. 6 and replaced it with explanatory text. The global distribution of results is directly calculated as the ensemble or all the outcomes across all background rejections and not the average or weighted average. Although there are different numbers of events entering the fits of each BR, the number of fits/significance values for each BR is equal to the number of pseudo-experiments. Thus the global values include outputs from all BR to understand the dynamics of the semi-supervised response across all BRs without focusing on potential bias in individual categories (which is exposed in the comparative plots of local/BR distributions).

Comment (requested change) 3: Given an analysis that yields K statistically dependent p-values (or Z-values) explain why methods such as the Bonferroni correction are not sufficient to account for this particular look elsewhere effect or why methods already in use in particle physics cannot be applied to this look elsewhere effect.

Response (changes) 3: Methods such as Bonferroni and Vitells are approximations used to estimate the look-elsewhere effect (LEE). The frequentist framework used is a way of calculating the LEE directly providing a true depiction of the extent of induced error. Additionally, when calculating the look elsewhere effect generated from semi-supervised NN, one cannot assume that standard estimations are sufficient and must be calculated directly.

Once again, we extend our gratitude to the referee for their detailed and insightful comments. We trust that the comprehensive responses and revisions provided adequately address all the points raised. We are optimistic that our revised manuscript now meets the criteria for publication in SciPost.

Current status:
In refereeing

Login to report or comment