A simple guide from Machine Learning outputs to statistical criteria

Submission summary

 As Contributors: Charanjit Kaur Khosa · Veronica Sanz Arxiv Link: https://arxiv.org/abs/2203.03669v1 (pdf) Code repository: https://github.com/high-energy-physics-ml/hypothesis-testing Date submitted: 2022-03-16 17:06 Submitted by: Sanz, Veronica Submitted to: SciPost Physics Core Academic field: Physics Specialties: High-Energy Physics - Phenomenology Approaches: Theoretical, Phenomenological

Abstract

In this paper we propose ways to incorporate Machine Learning training outputs into a study of statistical significance. We describe these methods in supervised classification tasks using a CNN and a DNN output, and unsupervised learning based on a VAE. As use cases, we consider two physical situations where Machine Learning are often used: high-$p_T$ hadronic activity, and boosted Higgs in association with a massive vector boson.

Current status:
Editor-in-charge assigned

Submission & Refereeing History

Submission 2203.03669v1 on 16 March 2022

Reports on this Submission

Report

My apologizes for the very late report!

This paper describes how to take the output of a neural network and use it for hypothesis testing in the context of collider physics. While the manuscript is mostly well-written, I am not sure what this paper adds to the literature. There is nothing special about neural networks - the paper describes how one takes a fixed observable and then uses simulations to create histograms of the observable to compute p-values. This is common knowledge and so I am unsure what is new in this paper. From the title, I was expecting to see some discussion of how one can use network outputs directly as test statistics instead of indirectly by first making histograms to compute likelihood ratios. This subject is also well-documented in other papers, but there could be room for additional innovation, such as in the area of uncertainty quantification (which is also not really discussed in detail since the entire topic of nuisance parameters are relegated to references). I am sorry if I have misunderstood the paper, but in the present form, I cannot recommend publication in SciPost Physics.

• validity: -
• significance: -
• originality: -
• clarity: -
• formatting: -
• grammar: -

Author:  Veronica Sanz  on 2022-06-24  [id 2608]

(in reply to Report 1 on 2022-05-02)

Dear referee,

Thanks for reading the paper and providing comments. Note that we have submitted this manuscript to Scipost Physics Core, not Physics. This is a contribution which (we believe) may help readers to navigate between ML outputs and usual statistical outputs, but it is by no means a significant addition to the field. Our purpose is to provide, as the title suggests, a simple guide to perform this translation for supervised and unsupervised ML outputs, and show some of the limitations we found. We hope this clarifies a possible misunderstanding.

Best regards,

V. Sanz