SciPost logo

SciPost Submission Page

QCD or What?

by Theo Heimel, Gregor Kasieczka, Tilman Plehn, Jennifer M Thompson

This Submission thread is now published as

Submission summary

Authors (as registered SciPost users): Tilman Plehn · Jennifer Thompson
Submission information
Preprint Link:  (pdf)
Date accepted: 2019-02-19
Date submitted: 2019-01-15 01:00
Submitted by: Thompson, Jennifer
Submitted to: SciPost Physics
Ontological classification
Academic field: Physics
  • High-Energy Physics - Phenomenology
Approaches: Experimental, Theoretical


Autoencoder networks, trained only on QCD jets, can be used to search for anomalies in jet-substructure. We show how, based either on images or on 4-vectors, they identify jets from decays of arbitrary heavy resonances. To control the backgrounds and the underlying systematics we can de-correlate the jet mass using an adversarial network. Such an adversarial autoencoder allows for a general and at the same time easily controllable search for new physics. Ideally, it can be trained and applied to data in the same phase space region, allowing us to efficiently search for new physics using un-supervised learning.

List of changes

Review 1

1 - show the MC statistics are high enough to support the conclusions,
or generate more

*Added event counts also to dark shower section. We used more than 100k events for each signal and background for the final testing. This means even the 5% least QCD like histogram still has more than 5000 events.

2 - show or discuss how well the method should work on groomed (pile-up
suppressed) jets.

*We now explicitly recommend using PU removal techniques that do not rely on grooming.

3 - show or discuss impact of the detector simulation used

*Similarly, no detailed detector simulation was included. We expect the autoencoder to learn novel jet-shape variables from the distributions of constituents. There is no a-priori reason why these jet shapes would suffer from larger effects due to the detector simulation than widely used variables like groomed mass, n-subjettiness or energy correlation functions. For the practical application of the autoencoder we foresee training on data, making this technique even less subject to differences between data and simulation than ordinary approaches.

4 - address questions/requests for clarification in the attached PDF
(which include the above as the most significant) (I also so highlighted
some bits of text which look like typos or may need rephrasing)
+pdf comments: 1808.08979v2_report_attachment-1.pdf (attached)
Typos fixed:
-> page 5: Statically -> Statistically
-> page 12: added commas: to, for example,
-> page 12: as usually -> as usual
-> page 14: changed order of references 39<->40
-> page 16: added 'to thank' twice to the acknowledgments

page 3: encodes/preserves instead of 'bottleneck
describes the features'.

*Changed to encodes

page 3: Changed 'calorimeter information' to 'pixellated energy'

page 3: Comment on reliability DELPHES as a detector simulator. Is pixellation the dominant effect?

*Delphes is now the de-facto standard for this kind of phenomenological study, especially for jet substructure. The main effects are indeed a semi-realistic granularization and smearing of the energy.

page 4: how does this work with groomed jets?

*There is no fundamental reason why this should not work for groomed jets as well. We however do not suggest applying grooming together with the autoencoder as grooming algorithms make inherent assumptions on the structure of a parton shower and a major feature of the autoencoder is that it does not need these assumptions. Added a statement along these lines to the text.

page 6: what is used for constituent masses?

*The masses of the constituents are not used for boosting, the 4-vector is directly boosted into the jet rest frame, so effectively the constituent mass is E^2-p^2.

Page 6: why is LoLa needed when the information is already included in the 4-vector?

*The LoLa implementation transforms the 4-vector such that it includes physically relevant information. This helps the network to learn physics and removed symmetries in the phi/eta plane which the network would otherwise have to learn itself. This implementation is discussed further in the reference: arXiv:1707.08966 [hep-ph].

Page 8: the question to the data is


page 9: Comment jargon-heaviness.

*We would like to have a couple of sentences detailing the technical implementation for anyone interested.

page 12: How do you know if an analysis technique is orthogonal?

*The anomalous events can be analysed further in multiple ways. One example of an orthogonal feature would be the Njet distribution of the full event, as this is a jet-by-jet tagger

page 13: display mean on number of constituents plots (for comparison)

*Added plots with means

Page 16:
"We find (essentially) the same flat jet mass distribution for each slice in the loss function. For instance top decay jets are now collected in the least QCD-like slices and lead to a distinct peak in the jet mass"

*Removed the word flat

Review 2
1- On page 2, when saying "we can choose our input format to deep
learning analysis tools", do the authors mean here choose an input
format best adapted for deep learning frameworks?

*Added 'This allows us to pick the data format that is best suited to a given problem.'

2- On page 2, regarding the use of jet images: it seems to me that while
jet images have historically been the first representation used in
conjunction with deep learning networks, there is no particular
consensus on which input type is preferred, and in fact there has been
substantial work in exploring other techniques. I would suggest citing
some of these other methods in this paragraph as well, such as:
* arXiv:1702.00748 (cited)
* arXiv:1704.02124 (cited)
* arXiv:1704.08249 (cited)
* arXiv:1710.01305 (cited)
* arXiv:1712.07124 (cited)
* arXiv:1807.04758 (cited)
* arXiv:1810.05165 (added)

*Now all the approaches are cited on page 1. Added a comment in the conclusions that "This technique is also compatible with other jet representations
and network architectures." We only tested images and LoLa because of prior experience, but we currently see no limitation in using the autoencoder on other representations.

3- On page 2, regarding how to address systematic uncertainties: While I
agree that this article presents an interesting angle, using adversarial
networks to study some of these limitations, I think the statement is
too broad. There are certainly other systematic uncertainties beyond
those considered here.

*Also added a statement on refiner networks. Uncertainties that arise from differences between data and simulation can in general be "solved" by training on data as proposed by the autoencoder. The adversarial training here is only used to control correlations, not to surpress systematic uncertainties.

4- On page 5, equation (2). An obvious downside to this loss function,
and to the jet image approach in general, is that it is very sensitive
to rotations: a small rotation, while leaving the physical properties
mostly unchanged, will lead to a large value of the loss function. A
discussion of this point and whether the authors have any insights into
how it impacts the results would be useful.

*It is true that jet images have problems with rotations. However our pre-processing rotates jet images to a standardised format. Secondly - it is even less of a problem for the loss function: The loss function measures the difference between the input image and the input image processed by the autoencoder. Unless the autoencoder LEARNS to rotate - and there are no training incentives for it to do so - there will be no relevant rotations.

5- On page 6, equation (3). The (kμ,i) matrix is not IRC safe: for
example, a collinear splitting will result in a reshuffle of the
columns, as well as a change of the values in eight of the entries. Did
the authors study the impact of this unsafety?

*The authors are aware that LoLa is inherently IRC unsafe. Again we would like to point out the on-going top tagging comparison and especially the EFN vs PFN result: Most of recent gains in performance through machine learning seem to be inherently IRC unsafe. This is a very interesting question for further study, but beyond the scope of this article. See also previous and next comment.

6- On page 7, equation (7). Since the matrix compared before and after
autoencoding can change substantially due to effects that are not
physically relevant, e.g. soft or collinear splittings, does this impact
the performance of the loss function?
similar to point 4

*No. The loss function is not comparing two "identical" jets where jet A underwent a soft splitting and jet B did not; the loss function compares the initial jet A with A'=decoder(encoder(A)) and B with B'. The autoencoder does not induce splittings/etc. So there is no place where these effects could occur. It is only relevant that the encoding/decoding procedure works equally well for A and B if they are physically similar, but this is ensured by large training statistic.

7- On page 8. It would be interesting to see this study done on groomed
jets, to remove the impact of soft wide angle partons on the jet mass
considered as input.
- copy of ref 1.2 -

8- On page 8, just after the middle of the page: "We know from many
studies that the jet mass is the single most powerful observable in
separating QCD jets from hadronically decaying heavy states". This is
only true at parton level, without considering non-perturbative or
pile-up effects. Otherwise, some of the many studies should be cited.

*Changed to "a powerful observable".

9- On page 12, the last sentence of section 2 is missing a "to".

10- On page 16, the second sentence of the second paragraph in the
Outlook section is missing an "of".
* fixed

Review 3

(1) The authors should add a discussion about experimental effects such
as imperfect calibration and badly measured input objects

(2) The authors should directly test the impact of imperfect calibration
and badly measured input objects by injecting a known miscalibration to
the particle-flow objects to see the impact on the weakly-supervised
adversarial network. It would build confidence if such effects were
shown to be negligible or could be mitigated in some fashion.

* Autoencoding is not a weakly supervised but an unsupervised technique. The final discriminant is trained and evaluated on data. Therefor no effects due to calibration or bad measurements exist.

Published as SciPost Phys. 6, 030 (2019)

Reports on this Submission

Anonymous Report 3 on 2019-2-11 (Invited Report)

  • Cite as: Anonymous, Report on arXiv:1808.08979v3, delivered 2019-02-11, doi: 10.21468/SciPost.Report.819


I am satisfied with the changes provided by the authors and recommend the paper for publication.
A few minor points (referring to Review 2)

I would suggest citing all the different ML methods on page 1, as the authors wrote in their reply, but which does not seem to be the case in the v3: 1710.01305, 1712.07124, 1807.04758 are still only cited in section 2.3.

My point here is that this behaviour of the loss function will lead to undesirable properties. For example, for two particles that have very similar energies but end up in neighbouring bins, the loss function will be large even though the jets are almost identical.

The issue is that a jet where one parton undergoes a soft or collinear splitting is identical to the one without that splitting. Therefore, an ideal autoencoder should in principle be able to collapse these two example jets into the same latent space projection, which can not be the case here because the loss function does not consider them equivalent. But I agree that this problem is perhaps beyond the scope of this paper.

  • validity: -
  • significance: -
  • originality: -
  • clarity: -
  • formatting: -
  • grammar: -

Anonymous Report 2 on 2019-1-31 (Invited Report)

  • Cite as: Anonymous, Report on arXiv:1808.08979v3, delivered 2019-01-31, doi: 10.21468/SciPost.Report.809


In my previous report, I suggested two additions:

(1) The authors should add a discussion about experimental effects such
as imperfect calibration and badly measured input objects

(2) The authors should directly test the impact of imperfect calibration
and badly measured input objects by injecting a known miscalibration to
the particle-flow objects to see the impact on the weakly-supervised
adversarial network. It would build confidence if such effects were
shown to be negligible or could be mitigated in some fashion.

The authors reply was:

* Autoencoding is not a weakly supervised but an unsupervised technique. The final discriminant is trained and evaluated on data. Therefor no effects due to calibration or bad measurements exist.

My follow up:

I appreciate the fact that this unsupervised algorithm is trained directly on data, but that does not mean there is no impact of imperfect detector calibration. In fact, I believe that unsupervised algorithms are currently being investigated by the LHC collaborations precisely to find detector issues that are not known about. Let me outline in detail the problem below.

The response of calorimeters to particles is non-linear and highly dependent on the detector geometry. This response is corrected (either at calorimeter or jet level) using test-beam data and MC simulations. The correction cannot be perfect. Furthermore, the data can contain noise and/or hot-cells that have not been correctly identified and removed from the object reconstruction.

The upshot of experimental reality above is that the unsupervised algorithm can learn features of the detector rather than the underlying physics. It is not clear to what extent this affects the current proposal. I would very much like to know the answer by quantitative tests, but I understand this might be beyond the scope of the current study (which is interesting and should be published). However, a short discussion on experimental effects would be useful to the reader and to promote further investigation.

Requested changes

The authors should add a discussion regarding the fact that unsupervised algorithms could learn features of a imperfectly calibrated (or noisy) detector and that additional checks using control regions in the data would be needed to make sure that any such experimental issues were not mistaken for a new physics signal.

  • validity: -
  • significance: -
  • originality: -
  • clarity: -
  • formatting: -
  • grammar: -

Author:  Jennifer Thompson  on 2019-02-01  [id 423]

(in reply to Report 2 on 2019-01-31)
answer to question

We agree that this is an important point for us to address, and would like to add the following paragraph to the paper:

"A set of events flagged by the autoencoder as anomalies does not automatically qualify as a signal of new physics. It is standard experimental procedure to test whether any signal could be caused by detector effects. Typical tests include checks whether events cluster geometrically (all jets originate from a specific region in the $\eta$-$\phi$ plane, hinting at a misbehaving region of the calorimeter) or temporally (from a specific run or run-period, hinting at problematic LHC or detector conditions). In the case of autoencoding jet images, an additional test would be an analysis of the correlation with well-understood substructure variables such as n-subjettiness, which is opportune to understand the topology of the identified signal. Finally, mis-calibrations of the jet-energy that cause an artificial mass peak can be taken care of using control regions --- if the mass peak is present in sidebands as well, it is likely a miscalibration. All of these are relevant experimental considerations and should be included in any concrete study. However, autoencoding is no more susceptible (and arguably less so) than traditional techniques based on MC simulation."

Anonymous on 2019-02-05  [id 428]

(in reply to Jennifer Thompson on 2019-02-01 [id 423])

That looks good to me. Thanks for adding this paragraph. I have no further comments/requests.

Report 1 by Jonathan Butterworth on 2019-1-23 (Invited Report)

  • Cite as: Jonathan Butterworth, Report on arXiv:1808.08979v3, delivered 2019-01-23, doi: 10.21468/SciPost.Report.802


My only remaining query is the question I had about constituent masses, where I think we may be misunderstanding each other. The reply is simply that the constituent mass it E^2-p^2. However, if the constituents are charged particle tracks we know p, if they are calorimeter clusters we know E. In a sophisticated particle-flow reconstruction we have a measure of both. But the mass of the constituent particles is not so well known in any case, and I can image that once the (large) boost of the jet is removed by boosting the particles into the rest frame of the fat jet, the mass of the constituents has a significant impact. So: are the true particle masses used? Or are all constituents taken or be massless? Or something else? Or can you convince me it is irrelevant or I have misunderstood?

Requested changes

see above, please clarify.

  • validity: -
  • significance: -
  • originality: -
  • clarity: -
  • formatting: -
  • grammar: -

Author:  Jennifer Thompson  on 2019-01-30  [id 418]

(in reply to Report 1 by Jonathan Butterworth on 2019-01-23)
answer to question

Thank you for the clarification. We have produced distributions (attached) for QCD and top jets for both the maximum constituent mass/jet mass and the maximum constituent mass/energy for 10000 jets. In all cases the constituent mass is much smaller than both the constituent energy and the jet mass. The constituent masses are therefore insignificant in these samples, even after boosting to the jet rest frame.



Anonymous on 2019-02-03  [id 424]

(in reply to Jennifer Thompson on 2019-01-30 [id 418])

Thanks, that seems clear enough!

Login to report or comment