SciPost Submission Page
Ephemeral Learning - Augmenting Triggers with Online-Trained Normalizing Flows
by Anja Butter, Sascha Diefenbacher, Gregor Kasieczka, Benjamin Nachman, Tilman Plehn, David Shih, Ramon Winterhalder
This is not the latest submitted version.
This Submission thread is now published as
|Authors (as registered SciPost users):||Sascha Diefenbacher · Tilman Plehn · Ramon Winterhalder|
|Preprint Link:||https://arxiv.org/abs/2202.09375v1 (pdf)|
|Date submitted:||2022-03-08 18:01|
|Submitted by:||Diefenbacher, Sascha|
|Submitted to:||SciPost Physics|
The large data rates at the LHC require an online trigger system to select relevant collisions. Rather than compressing individual events, we propose to compress an entire data set at once. We use a normalizing flow as a deep generative model to learn the probability density of the data online. The events are then represented by the generative neural network and can be inspected offline for anomalies or used for other analysis purposes. We demonstrate our new approach for a toy model and a correlation-enhanced bump hunt.
Submission & Refereeing History
You are currently on this page
Reports on this Submission
- Cite as: Anonymous, Report on arXiv:2202.09375v1, delivered 2022-04-07, doi: 10.21468/SciPost.Report.4889
This was a very pleasant read. Clearly and pedagogically written, and
the proof-of-concept investigations were straight-forward to
follow. Most of the machine learning tools used are well known and
tested, but putting them in an extreme online environment in this way
is a novel and very interesting idea, which certainly merits a
publication in SciPost.
The only thing I miss in this paper is a more proper feasibility study
for actually implementing this at the LHC. I understand that this is
beyond the scope of the paper, but it would have been nice to see at
least an outlook giving the steps such a feasibility study would need
to follow. Can the authors envisage potential show-stoppers?
As an example I would have liked to see comments about the scalability
of the concept. In the more advanced test case, the authors used only
five input variables per event, and storing those to disk for detailed
off-line analysis would surely not be a problem, even for a year of
LHC running at an event rate of 100 kHz. So, assuming that the authors
envisage a much larger set of input variables, it would be nice to
understand how more complex would the network architecture need to
be. And how much larger the buffer sizes and longer the training
cycles for each buffer. What would this mean in terms of the
requirements for the FPGAs that the author suggest should be
used? Would they then be fast enough to handle the 100 kHz event rate
- Cite as: Anonymous, Report on arXiv:2202.09375v1, delivered 2022-04-05, doi: 10.21468/SciPost.Report.4866
1. The paper presents a novel use of generative neural networks to characterize ensembles of events in a periodic online manner to program a certain class of triggers.
2. The results of the paper show that the proposed trigger approach is, for scenarios considered in the paper, more efficient than conventional approaches used at the LHC using bump hunts, as well as more recent proposals like offline CWoLa algorithms that relies on data samples without truth labels such as "signal" and "background". The juxtaposition of various problems and tools makes it easy for the reader to quickly see its effectiveness.
3. The paper considers a parametric toy problem to demonstrate the tool as well as a more realistic physics scenario that the tool can be used for at the LHC. The build-up makes it easy for the reader to understand the tool.
1. The paper considers a realistic example wherein the existing conventional cut-threshold approach may suffice. This may be a moot point with some textual clarifications. See question/comment 3.
2. The tool as potentially being deployable in FPGA/GPU seems difficult with the amount of detail given in the paper. This may be a moot point with some textual/figure clarifications. See questions/comments 8, 14, and 15.
Congratulations to the authors for the interesting proposal. The approach and the accompanying result show important progress in the field. This makes the paper worth publishing.
Some minor revisions/clarifications are requested.
Questions/comments regarding the physics/proposal presented:
1. The impact the signal contamination has on the training could be made clearer. For example, one might naively guess that the signal is tiny in the beginning and that each update will make the signal significance larger. Here in lies the confusion. If the signal is larger after the update, won’t the subsequent update absorb the signal in the background model? Some clarification may alleviate such concerns.
2. As a follow-up to the previous point, can/does OnlineFlow account for changing pileup conditions? Assuming that it does this well, to what order-of-magnitude level of S/B can this tool find new physics? Presumably given enough training cycles and updates it can squeeze out the significance given any S/B.
3. The physics case considered is the LHC Olympics dataset containing a W’ decay filtering on having a jet with pT > 1.2 TeV. This feel like an odd choice to showcase as the threshold for the lowest unprescaled single jet trigger at CMS and ATLAS was much lower at around 500 GeV during Run-2. One can imagine an argument that the specifics of this problem does not matter as it is demonstrating the capability and that for deployment lower pT jets would be targeted. Is that true? Assuming that it is, the reader is left to wonder whether the results of Figure 9 still holds. Some clarification would be appreciated.
4. The role, rationale, and impact of dummy variables is vague. The appearance of dummy variables in the parametric example seems less controversial since standard optimization tools may not work for for 1d inputs for technical reasons. Three is a mystery here, but not of big concern. However, for the LHCO problem the inputs are doubled with dummy variables. What is going on?
Editorial comments/suggestions approximately in order presented in the paper:
5. The introductory paragraph is confusing as it is trying to accomplish many things. Perhaps it might be helpful to break it up so that each paragraph has a point. For instance, the paper starts rather generally, but transitions quickly into rather technical details such as prescales, and ends with scouting / trigger level analysis. That’s the general comment, now some specific parts.
6. The introductory sentence gives the impression that the collision rate might be going up, rather than the trigger rate at the same collision rate. Perhaps there is a way to reword.
7. The presentation of the scouting / trigger level analysis can be better. As it’s presented, even with the word “additionally”, it looks like the main use case of the trigger system at the LHC is scouting/trigger level analysis. The authors may have tried to motivate the paper as soon as possible without actually giving the physics scenario they have in mind, and this caused much of the confusion.
8. The second paragraph on ML is confusing. It describes the offline use, FPGA use, then GPU use cases. After the FPGA references [15-21], reference  is cited alone in the following sentence. What separates  from the others that were cited before? Also, “sophisticated networks” is rather subject unless you specify the limitations of existing systems that prevent you from accomplishing something that you have in mind. Lastly the final sentence ends with “this idea ” that leaves the reader wondering what it is. What is also lacking in this paragraph is the connection to the next one, because the proposed strategy seems limited to the HLT. There are some hints in the paper that it may be able to be deployed on FPGA/GPU, but lacks details.
9. The final sentence of the third paragraph is rather general without details. It requires too much work on the reader’s part to try to construct trigger menu, even a toy one with only a few main items, using the proposed method.
10. The last sentence of the penultimate paragraph is confusing. Isn’t the network determination based on training data?
11. Would a sketch or example of the idea of “generative ML” be too pedagogical? Perhaps even a simple definition would help.
12. In the first paragraph, it is claimed that L1 trigger algorithms do not perform complex reconstructions. CMS deployed BDT on FPGA during Run-2, which might qualify as complex to some, so this claim seems somewhat subjective. The idea seems to be that L1 is not as complex relative to the HLT, so perhaps it can be reworded.
13. The bulleted steps is helpful. One part that triggered a question is step 3, where it mentions “indication of new physics”. How can the system distinguish new physics compared to a detector flaw? The authors may know of a paper by A. Pol et al (https://arxiv.org/abs/1808.00911) that discusses detector monitoring using anomaly detection. It would be interesting to know what the authors think of the new physics vs. detector issues ambiguity, even if it is beyond the scope of the paper. If appropriate, please add the citation.
14. The presentation of where and how OnlineFlow could be deployed could be made more concrete. Only after reading the paper does the reader understand that the tool is to be deployed at HLT-like environments where all of the reconstructed inputs, including sophisticated variables that require tracking information, are made available as inputs. This preprocessing is taken for granted—which may be somewhat reasonable for HLT, although that would depend whether full tracking is needed—and this would not be a given for FPGA/GPU-like setups.
15. For example, figure 2 is confusing in where OnlineFlow gets its Measurement, which I think is in the HLT-like preprocessing environment. In the current diagram a line connected to the detector. Since the Update arrow is feeding into both LVL1 and HLT, the inclusion of the LVL1 adds to the confusion. It almost feels like the paper would be stronger if it focused on the HLT application for now, with the possibility of FPGA/GPU implementation in the future. This seems to be what is already desired in the paper, but the presentation is a bit confusing.
16. Section 3.1, after referencing figure 4, states that the OnlineFlow reproduces the peak. OnlineFlow seems to have a weak bump above background, but it is much flatter than the S+B. Is that what you mean? This feels like it did _not_ reproduce the peak. Please clarify.
17. In section 3.3, it’s not clear to the reader that the parameters given in equation 4 are important enough to document it in the paper. Perhaps it is sufficient to state them in the figure? Also, the relevant figure should be referenced somewhere earlier in section 3.3.
18. Is there significance of prescale factor ~ 4? It’s not obvious to the reader why 4, if significant. Perhaps it is empirical to the problem at hand. Please clarify.
19. The final paragraph of section 3 is rather dense and is difficult to follow. Why is the errorband in the OnlineFlow larger, in general, compared to the data? Can this be reduced by sampling more frequently or is it an intrinsic feature of the network as chosen?
20. Figure 9 and others. It is difficult to correspond the color to the legend as is especially for the color challenged reader or a reader with a black-and-white printer. If you wish to keep the current aesthetics, perhaps a comment could be added to state that the order of the legend follows the curves, if it indeed does, or change the order such that this holds.
21. Another suggestion on the figure is to make the line in the legend much thicker so that the colors show well. More orthogonality in color choices and color intensity would help.
22. Lastly, regarding figures, it might help to highlight the result for OnlineFlow by choosing, for example, a thicker line.
23. “prove” should be “proof” in the final paragraph.
24. Format of quotes closing Quantum Universe. Various past and present tenses used. Not sure if intentional.
25. Bibliography is in need of serious work. Some examples are given. Incorrect authors (1, 2), missing authors (3, 4), journal abbreviations are inconsistent (Phys. Rev. Lett., JINST, Journal of Instrumentation, Physical Review Letters, etc.), inconsistent capitalizations (cms, tev, fpga, lhc, etc.), incorrect grammar (and et al. in 17), incorrect formatting (et al. in italics and in roman), redundant or missing links (doi is given twice in 32, etc), misformatted subscripts (anti-ktjet, etc.), internal notes remaining in (8, 18), inconsistent use of et al (15 gives ten authors before et al., whereas 22 gives one author).