SciPost Submission Page
A Normalized Autoencoder for LHC Triggers
by Barry M. Dillon, Luigi Favaro, Tilman Plehn, Peter Sorrenson, Michael Krämer
This is not the latest submitted version.
This Submission thread is now published as
Submission summary
Authors (as registered SciPost users): | Barry Dillon · Luigi Favaro · Tilman Plehn |
Submission information | |
---|---|
Preprint Link: | https://arxiv.org/abs/2206.14225v2 (pdf) |
Code repository: | https://github.com/heidelberg-hepml/normalized-autoencoders |
Date submitted: | 2023-02-28 16:03 |
Submitted by: | Favaro, Luigi |
Submitted to: | SciPost Physics |
Ontological classification | |
---|---|
Academic field: | Physics |
Specialties: |
|
Abstract
Autoencoders are an effective analysis tool for the LHC, as they represent one of its main goal of finding physics beyond the Standard Model. The key challenge is that out-of-distribution anomaly searches based on the compressibility of features do not apply to the LHC, while existing density-based searches lack performance. We present the first autoencoder which identifies anomalous jets symmetrically in the directions of higher and lower complexity. The normalized autoencoder combines a standard bottleneck architecture with a well-defined probabilistic description. It works better than all available autoencoders for top vs QCD jets and reliably identifies different dark-jet signals.
Author comments upon resubmission
We would like to point out that the detailed report by the first referee suggests the publication in Scipost Physics.
We think that the introduction of Normalized AutoEncoders is an important step towards necessary anomaly detection searches with robust autoencoders in Particle Physics. Therefore, we believe the paper should be published in Scipost Physics.
List of changes
Changes based on report 1:
Major issue:
We agree with the referee that, given two datasets A and B, an over-density in the low probability region p_A doesn't imply an over-density in the low p_B.
However, the underlying physics suggests this is true when comparing QCD and top jets. A top jet will likely decay (> 70% of the time) to a three prong jet, therefore it is expected that a large region of phase space will be populated exclusively by QCD jets which must look differently.
The differences between the average images of QCD and top jets and in high-level observables (e.g. tau_3/tau_2) corroborate this point.
Given these differences, it is indeed surprising that an AE is not able to tag QCD jets, and we argue that this is connected to the ability of the network to interpolate simple features.
Additionally, it has been shown that the complexity bias is an issue for AE since reconstruction heavily correlates with the number of active pixels (e.g. see arXiv:2104.09051).
In this work, we show that an NAE is symmetric in the sense that we can reliably find overdensities in low-probability regions without modifying the network architecture and fine-tuning the parameters. We see the difference between
performing density estimation on the direct or inverse task, as shown in Figs. 3-4, so there's no indication that we are not approximating the true data distribution.
We rephrased the first paragraph of the "QCD vs top jets" section to make this clear.
Minor changes:
- abstract:
"Autoencoders are the ideal analysis tool for the LHC" -> "Autoencoders are an effective analysis tool for the LHC"
"its main goal of finding physics beyond the Standard Model" -> "one of its main"
- introduction:
"that no assumption" -> "that as few assumptions as possible"
"The problem with these studies...":
The end of the paragraph addresses this point. QCD jets are easier to compress since they have a simple structure. In other words, the effective dimensionality of a QCD jet is on average smaller than a top jet. Therefore, the network is still able to interpolate these features.
We modified the corresponding paragraph to explain our claims on applications of NAEs for LHC triggers.
- network and dataset:
"by using the reconstruction error ..." -> "The NAE training includes two steps... The pre-training phase builds an approximate..."
A new paragraph describes the training procedure and the role of the modified loss function.
-Preprocessing:
We do not expect to lose physical information. The picked preprocessing does not modify the structure of the jet (e.g. prongness).
Additionally, n-reweighting has been selected to enhance features in jets with low complexity. Therefore, the improvement found in our dark jets example is not an accident.
- QCD vs top and QCD vs dark jets:
added black line label in captions
- Outlook:
"However, density-based autoencoders have not been shown to work properly and have a massive dependence on data preprocessing"
rephrased and added citations.
Current status:
Reports on this Submission
Report #1 by Anonymous (Referee 2) on 2023-3-30 (Invited Report)
- Cite as: Anonymous, Report on arXiv:2206.14225v2, delivered 2023-03-30, doi: 10.21468/SciPost.Report.6979
Report
Thank you for taking into account my feedback! Overall, I still have some concerns about some of the claims made in the manuscript and I would encourage the authors to rephrase/tone down some of them (see below). However, I will not insist on further changes and I defer to the editor/authors. This paper is an interesting study and I think SciPost Physics is a good venue for presenting the work.
I don't completely agree about you response to my question about symmetry, but I won't insist on further changes/studies. I also don't understand why your autoencoder is not also sensitive to pre-processing. If your method is learning something about the underlying density of the data (as all compression models do) then it should also be sensitive to how you pre-process the data. Some of the claims along these lines seem too strong.
I still find the strong emphasis on triggers (even in the title!) to be unjustified, as there is really no evidence that your approach is trigger friendly or is otherwise specifically designed for running online. It is true that other people have shown that AEs can be implemented in hardware/firmware, but that I don't see how that justifies such a big claim for your paper. Please consider further modifications along these lines.
Please change the first line of the conclusions ("Autoencoders are ML-analysis tools which ideally represent the idea behind LHC searches.") along the changes you made in the abstract/introduction.