SciPost Submission Page
Goodness of fit by NeymanPearson testing
by Gaia Grosso, Marco Letizia, Maurizio Pierini, Andrea Wulzer
This Submission thread is now published as
Submission summary
Authors (as registered SciPost users):  Marco Letizia 
Submission information  

Preprint Link:  https://arxiv.org/abs/2305.14137v2 (pdf) 
Code repository:  https://github.com/GaiaGrosso/NPLMGOF 
Date accepted:  20240402 
Date submitted:  20240318 17:47 
Submitted by:  Letizia, Marco 
Submitted to:  SciPost Physics 
Ontological classification  

Academic field:  Physics 
Specialties: 

Approaches:  Computational, Phenomenological 
Abstract
The NeymanPearson strategy for hypothesis testing can be employed for goodness of fit if the alternative hypothesis is selected from data by exploring a rich parametrised family of models, while controlling the impact of statistical fluctuations. The New Physics Learning Machine (NPLM) methodology has been developed as a concrete implementation of this idea, to target the detection of new physical effects in the context of high energy physics collider experiments. In this paper we conduct a comparison of this approach to goodness of fit with others, in particular with classifierbased strategies that share strong similarities with NPLM. From our comparison, NPLM emerges as the more sensitive test to small departures of the data from the expected distribution and not biased towards detecting specific types of anomalies. These features make it suited for agnostic searches for new physics at collider experiments. Its deployment in other scientific and industrial scenarios should be investigated.
Author comments upon resubmission
Dear Referees and Editor,
Thank you for for acknowledging the relevance of our work, the careful reading of the manuscript and the many comments that allowed us to increase its quality.
We report here some comments regarding specific points that deserve further clarifications.
Referee 1  Figure 7, EFT : The amount of signal injection is such that all models demonstrate a high power in this particular case. Note how this is not guaranteed, as shown for example in Figures 13 and 14 for different tests on the same data.
Referee 1  Paragraph 3.2: We thank the referee for the comments as they help us to clarify crucial aspects of our methodology. The correct procedure is the following:  We use a supervised ML model with a loss such that the ratio of the datagenerating densities is approximated at the end of the training, i.e. f from eq. 1 (note that the actual likelihood ratio includes a Poisson distribution that models the fluctuations in the number of collected data points, as shown in eq. 4). The loss is a weighted logistic for the model based on kernel methods (eq. 2) and the “likelihoodratio loss” for the one based on neural networks (eq. 3).  In both cases, we plug the learned f in the expression for the extended likelihood ratio (eq. 6) which can then be seen as a metric from the point of view of a supervised ML model and the test statistic from the point of view of the hypothesis test. (In the case of the neural networks, this comes automatically by evaluating the loss at the end of training, since the metric is directly optimised). There is no sigmoid in the neural net model and a linear activation is used, while in the kernel approach it is integrated in the definition of the loss.  To understand the role of the different components of the NPLM methodology, we perform various tests. One of these is to replace the final metric, to be used as a test statistic, with more traditional ML ones. The sigmoid activation is used when necessary. These are variants of the NPLM method because the alternative is still datadriven but the test statistic is not the likelihood ratio.
Referee 2  unknown true distribution: Not knowing the true datagenerating distributions is a common scenario in twosample testing. Indeed, the goal is to test populationlevel hypotheses from data.
Referee 2  code: In this work we focus on comparing the NPLM methodology with different approaches to GoF. Codes for NPLM, in its kernel methods or neural network implementations, have been presented in earlier work and they are publicly available. We will update the repository with instruction on how to reproduce the results of this paper.
List of changes
Referee 1 and 2: following the suggestions from the referees, we improved the overall presentation and writing throughout the paper. In the figures, we increased the font size, simplified the notation and improved the labels.
Major changes involved:
 abstract.
 introduction: first paragraph.
 caption figure 1.
 central paragraph page 4.
 paragraphs at end of page 5 and page 6.
 conclusions.
 final part of appendix A.2.
Published as SciPost Phys. 16, 123 (2024)