Autoencoder networks, trained only on QCD jets, can be used to search for
anomalies in jet-substructure. We show how, based either on images or on
4-vectors, they identify jets from decays of arbitrary heavy resonances. To
control the backgrounds and the underlying systematics we can de-correlate the
jet mass using an adversarial network. Such an adversarial autoencoder allows
for a general and at the same time easily controllable search for new physics.
Ideally, it can be trained and applied to data in the same phase space region,
allowing us to efficiently search for new physics using un-supervised learning.
1 - show the MC statistics are high enough to support the conclusions,
or generate more
*Added event counts also to dark shower section. We used more than 100k events for each signal and background for the final testing. This means even the 5% least QCD like histogram still has more than 5000 events.
2 - show or discuss how well the method should work on groomed (pile-up
*We now explicitly recommend using PU removal techniques that do not rely on grooming.
3 - show or discuss impact of the detector simulation used
*Similarly, no detailed detector simulation was included. We expect the autoencoder to learn novel jet-shape variables from the distributions of constituents. There is no a-priori reason why these jet shapes would suffer from larger effects due to the detector simulation than widely used variables like groomed mass, n-subjettiness or energy correlation functions. For the practical application of the autoencoder we foresee training on data, making this technique even less subject to differences between data and simulation than ordinary approaches.
4 - address questions/requests for clarification in the attached PDF
(which include the above as the most significant) (I also so highlighted
some bits of text which look like typos or may need rephrasing)
+pdf comments: 1808.08979v2_report_attachment-1.pdf (attached)
-> page 5: Statically -> Statistically
-> page 12: added commas: to, for example,
-> page 12: as usually -> as usual
-> page 14: changed order of references 39<->40
-> page 16: added 'to thank' twice to the acknowledgments
page 3: encodes/preserves instead of 'bottleneck
describes the features'.
*Changed to encodes
page 3: Changed 'calorimeter information' to 'pixellated energy'
page 3: Comment on reliability DELPHES as a detector simulator. Is pixellation the dominant effect?
*Delphes is now the de-facto standard for this kind of phenomenological study, especially for jet substructure. The main effects are indeed a semi-realistic granularization and smearing of the energy.
page 4: how does this work with groomed jets?
*There is no fundamental reason why this should not work for groomed jets as well. We however do not suggest applying grooming together with the autoencoder as grooming algorithms make inherent assumptions on the structure of a parton shower and a major feature of the autoencoder is that it does not need these assumptions. Added a statement along these lines to the text.
page 6: what is used for constituent masses?
*The masses of the constituents are not used for boosting, the 4-vector is directly boosted into the jet rest frame, so effectively the constituent mass is E^2-p^2.
Page 6: why is LoLa needed when the information is already included in the 4-vector?
*The LoLa implementation transforms the 4-vector such that it includes physically relevant information. This helps the network to learn physics and removed symmetries in the phi/eta plane which the network would otherwise have to learn itself. This implementation is discussed further in the reference: arXiv:1707.08966 [hep-ph].
Page 8: the question to the data is
page 9: Comment jargon-heaviness.
*We would like to have a couple of sentences detailing the technical implementation for anyone interested.
page 12: How do you know if an analysis technique is orthogonal?
*The anomalous events can be analysed further in multiple ways. One example of an orthogonal feature would be the Njet distribution of the full event, as this is a jet-by-jet tagger
page 13: display mean on number of constituents plots (for comparison)
*Added plots with means
"We find (essentially) the same flat jet mass distribution for each slice in the loss function. For instance top decay jets are now collected in the least QCD-like slices and lead to a distinct peak in the jet mass"
*Removed the word flat
1- On page 2, when saying "we can choose our input format to deep
learning analysis tools", do the authors mean here choose an input
format best adapted for deep learning frameworks?
*Added 'This allows us to pick the data format that is best suited to a given problem.'
2- On page 2, regarding the use of jet images: it seems to me that while
jet images have historically been the first representation used in
conjunction with deep learning networks, there is no particular
consensus on which input type is preferred, and in fact there has been
substantial work in exploring other techniques. I would suggest citing
some of these other methods in this paragraph as well, such as:
* arXiv:1702.00748 (cited)
* arXiv:1704.02124 (cited)
* arXiv:1704.08249 (cited)
* arXiv:1710.01305 (cited)
* arXiv:1712.07124 (cited)
* arXiv:1807.04758 (cited)
* arXiv:1810.05165 (added)
*Now all the approaches are cited on page 1. Added a comment in the conclusions that "This technique is also compatible with other jet representations
and network architectures." We only tested images and LoLa because of prior experience, but we currently see no limitation in using the autoencoder on other representations.
3- On page 2, regarding how to address systematic uncertainties: While I
agree that this article presents an interesting angle, using adversarial
networks to study some of these limitations, I think the statement is
too broad. There are certainly other systematic uncertainties beyond
those considered here.
*Also added a statement on refiner networks. Uncertainties that arise from differences between data and simulation can in general be "solved" by training on data as proposed by the autoencoder. The adversarial training here is only used to control correlations, not to surpress systematic uncertainties.
4- On page 5, equation (2). An obvious downside to this loss function,
and to the jet image approach in general, is that it is very sensitive
to rotations: a small rotation, while leaving the physical properties
mostly unchanged, will lead to a large value of the loss function. A
discussion of this point and whether the authors have any insights into
how it impacts the results would be useful.
*It is true that jet images have problems with rotations. However our pre-processing rotates jet images to a standardised format. Secondly - it is even less of a problem for the loss function: The loss function measures the difference between the input image and the input image processed by the autoencoder. Unless the autoencoder LEARNS to rotate - and there are no training incentives for it to do so - there will be no relevant rotations.
5- On page 6, equation (3). The (kμ,i) matrix is not IRC safe: for
example, a collinear splitting will result in a reshuffle of the
columns, as well as a change of the values in eight of the entries. Did
the authors study the impact of this unsafety?
*The authors are aware that LoLa is inherently IRC unsafe. Again we would like to point out the on-going top tagging comparison and especially the EFN vs PFN result: Most of recent gains in performance through machine learning seem to be inherently IRC unsafe. This is a very interesting question for further study, but beyond the scope of this article. See also previous and next comment.
6- On page 7, equation (7). Since the matrix compared before and after
autoencoding can change substantially due to effects that are not
physically relevant, e.g. soft or collinear splittings, does this impact
the performance of the loss function?
similar to point 4
*No. The loss function is not comparing two "identical" jets where jet A underwent a soft splitting and jet B did not; the loss function compares the initial jet A with A'=decoder(encoder(A)) and B with B'. The autoencoder does not induce splittings/etc. So there is no place where these effects could occur. It is only relevant that the encoding/decoding procedure works equally well for A and B if they are physically similar, but this is ensured by large training statistic.
7- On page 8. It would be interesting to see this study done on groomed
jets, to remove the impact of soft wide angle partons on the jet mass
considered as input.
- copy of ref 1.2 -
8- On page 8, just after the middle of the page: "We know from many
studies that the jet mass is the single most powerful observable in
separating QCD jets from hadronically decaying heavy states". This is
only true at parton level, without considering non-perturbative or
pile-up effects. Otherwise, some of the many studies should be cited.
*Changed to "a powerful observable".
9- On page 12, the last sentence of section 2 is missing a "to".
10- On page 16, the second sentence of the second paragraph in the
Outlook section is missing an "of".
(1) The authors should add a discussion about experimental effects such
as imperfect calibration and badly measured input objects
(2) The authors should directly test the impact of imperfect calibration
and badly measured input objects by injecting a known miscalibration to
the particle-flow objects to see the impact on the weakly-supervised
adversarial network. It would build confidence if such effects were
shown to be negligible or could be mitigated in some fashion.
* Autoencoding is not a weakly supervised but an unsupervised technique. The final discriminant is trained and evaluated on data. Therefor no effects due to calibration or bad measurements exist.