SciPost logo

SciPost Submission Page

A simple guide from Machine Learning outputs to statistical criteria

by Charanjit K. Khosa, Veronica Sanz, Michael Soughton

Submission summary

As Contributors: Charanjit Kaur Khosa · Veronica Sanz
Arxiv Link: https://arxiv.org/abs/2203.03669v1 (pdf)
Code repository: https://github.com/high-energy-physics-ml/hypothesis-testing
Date submitted: 2022-03-16 17:06
Submitted by: Sanz, Veronica
Submitted to: SciPost Physics Core
Academic field: Physics
Specialties:
  • High-Energy Physics - Phenomenology
Approaches: Theoretical, Phenomenological

Abstract

In this paper we propose ways to incorporate Machine Learning training outputs into a study of statistical significance. We describe these methods in supervised classification tasks using a CNN and a DNN output, and unsupervised learning based on a VAE. As use cases, we consider two physical situations where Machine Learning are often used: high-$p_T$ hadronic activity, and boosted Higgs in association with a massive vector boson.

Current status:
Editor-in-charge assigned


Submission & Refereeing History


Reports on this Submission

Anonymous Report 1 on 2022-5-2 (Invited Report)

Report

My apologizes for the very late report!

This paper describes how to take the output of a neural network and use it for hypothesis testing in the context of collider physics. While the manuscript is mostly well-written, I am not sure what this paper adds to the literature. There is nothing special about neural networks - the paper describes how one takes a fixed observable and then uses simulations to create histograms of the observable to compute p-values. This is common knowledge and so I am unsure what is new in this paper. From the title, I was expecting to see some discussion of how one can use network outputs directly as test statistics instead of indirectly by first making histograms to compute likelihood ratios. This subject is also well-documented in other papers, but there could be room for additional innovation, such as in the area of uncertainty quantification (which is also not really discussed in detail since the entire topic of nuisance parameters are relegated to references). I am sorry if I have misunderstood the paper, but in the present form, I cannot recommend publication in SciPost Physics.

  • validity: -
  • significance: -
  • originality: -
  • clarity: -
  • formatting: -
  • grammar: -

Author:  Veronica Sanz  on 2022-06-24  [id 2608]

(in reply to Report 1 on 2022-05-02)

Dear referee,

Thanks for reading the paper and providing comments. Note that we have submitted this manuscript to Scipost Physics Core, not Physics. This is a contribution which (we believe) may help readers to navigate between ML outputs and usual statistical outputs, but it is by no means a significant addition to the field. Our purpose is to provide, as the title suggests, a simple guide to perform this translation for supervised and unsupervised ML outputs, and show some of the limitations we found. We hope this clarifies a possible misunderstanding.

Best regards,

V. Sanz

Login to report or comment