SciPost Submission Page
Learning Selection Cuts With Gradients
by Mike Hance, Juan Robles
Submission summary
Authors (as registered SciPost users): | Michael Hance |
Submission information | |
---|---|
Preprint Link: | https://arxiv.org/abs/2502.08615v1 (pdf) |
Code repository: | https://github.com/scipp-atlas/cabin |
Data repository: | https://github.com/scipp-atlas/cabin-paper |
Date submitted: | 2025-02-14 05:23 |
Submitted by: | Hance, Michael |
Submitted to: | SciPost Physics Core |
Ontological classification | |
---|---|
Academic field: | Physics |
Specialties: |
|
Approaches: | Experimental, Computational |
Abstract
Many analyses in high-energy physics rely on selection thresholds (cuts) applied to detector, particle, or event properties. Initial cut values can often be guessed from physical intuition, but cut optimization, especially for multiple features, is commonly performed by hand, or skipped entirely in favor of multivariate algorithms like BDTs or classification networks. We revisit this problem, and develop a cut optimization approach based on gradient descent. Cut thresholds are learned as parameters of a network with a simple architecture, and can be tuned to achieve a target signal efficiency through the use of custom loss functions. Contractive terms in the loss can be used to ensure a smooth evolution of cuts as functions of efficiency, particle kinematics, or event features. The method is used to classify events in a search for Supersymmetry, and the performance is compared with traditional classification networks. An implementation of this approach is available in a public code repository and python package.
Current status:
Reports on this Submission
Report #1 by Matthew James Stephenson (Referee 1) on 2025-2-26 (Contributed Report)
Strengths
1. preselection optimization: creating robust event filters before multivariate analysis.
2. trigger system design: physical intuition of cut-based analyses
3. systematic uncertainty quantification: propagating errors through differentiable cut parameters
Weaknesses
The ATLAS_significance_loss function is a notable weakness because:
1) division-by-zero risk: when background (b) approaches zero, the calculation n * torch.log(n/b) becomes numerically unstable
2) incorrect sigma handling: the sigma=0 approximation fails to account for vanishing background scenarios
a corrected implementation is posted here as a pull request at the publicly available supplemental github repo at this submission which fixes this by adding value clamping for logarithmic arguments, protects against negative values in physical quantities and maintains differentiability through masked tensor operations and is available to merge at pull request https://github.com/scipp-atlas/cabin/pull/6
Report
arXiv:2502.08615:
numerical stability is of critical importance at machine learning applications for physical analysis. 2502.08615v1's effectiveness for cut optimization 'should' properly extend to low-statistics regimes and pure signal scenarios if and only if it's implementation is corrected by approving the proposed PR (formatting issues, should they arise can be corrected post-merge) at the publicly accessible repository paired to this submission at https://github.com/scipp-atlas/cabin/pull/6
mathematical demonstration of critical error
For pure signal samples where b → 0:
n = s + b ≈ s
x = n * log(n/b) → s * log(s/0) → ∞
This produces infinite loss values and NaN gradients during training.
-Matthew James Stephenson
Requested changes
critical implementation errors were located at https://arxiv.org/abs/2502.08615v1's significance calculation (at accessory data hosted at github) and updated code was provided as a pull request to resolve the numerical instability issues at https://github.com/scipp-atlas/cabin/pull/6
Recommendation
Ask for minor revision
Author: Michael Hance on 2025-03-04 [id 5266]
(in reply to Report 1 by Matthew James Stephenson on 2025-02-26)
Hello,
Thanks for these comments on the code and the pull request to address the issues that were raised. The PR has been accepted and merged into the main branch of the github repo. We do not see that these comments motivate any changes to the paper itself, but the improvements to the code were welcome.
Best wishes,
-Mike
Matthew James Stephenson on 2025-02-26 [id 5246]
Corrigendum for arXiv:2502.08615:
i would like to address critical implementation errors at arXiv:2502.08615's significance calculation at https://doi.org/10.5281/zenodo.14927629 and provide updated code to resolve the numerical instability issues at pull request https://github.com/scipp-atlas/cabin/pull/6
mathematical demonstration of critical error:
for pure signal samples where b → 0:
n = s + b ≈ s
x = n * log(n/b) → s * log(s/0) → ∞
this produces infinite loss values and NaN gradients during training.
once implemented, the corrected ATLAS_significance_loss function (which can presumably be found in primitive form at previous work published at https://cds.cern.ch/record/2736148 as cited at the very end of the manuscript) will allow for numerical stabilization, special case handling and gradient preservation by adding ε-regularization (1e-12) to all denominators, implemented value clamping for logarithmic arguments, explicit treatment of pure signal scenarios (b=0), protecting against negative values in physical quantities, maintaining differentiability through masked tensor operations and ensured positive-definite outputs for stable backpropagation.
-- Matthew James Stephenson