Symmetries, Safety, and Self-Supervision

Barry M. Dillon; Gregor Kasieczka; Hans Olischlager; Tilman Plehn; Peter Sorrenson; Lorenz Vogel

SciPost Submission Page

Symmetries, Safety, and Self-Supervision

by Barry M. Dillon, Gregor Kasieczka, Hans Olischlager, Tilman Plehn, Peter Sorrenson, and Lorenz Vogel

This Submission thread is now published as

SciPost Phys. 12, 188 (2022)

Submission summary

Authors (as registered SciPost users):

Barry M. Dillon · Tilman Plehn · Lorenz Vogel

Submission information
Preprint Link:	scipost_202108_00046v2 (pdf)
Date accepted:	April 8, 2022
Date submitted:	Jan. 18, 2022, 7:16 p.m.
Submitted by:	Barry M. Dillon
Submitted to:	SciPost Physics

Ontological classification
Academic field:	Physics
Specialties:	High-Energy Physics - Phenomenology
Approaches:	Theoretical, Phenomenological

Abstract

Collider searches face the challenge of defining a representation of high-dimensional data such that physical symmetries are manifest, the discriminating features are retained, and the choice of representation is new-physics agnostic. We introduce JetCLR to solve the mapping from low-level data to optimized observables though self-supervised contrastive learning. As an example, we construct a data representation for top and QCD jets using a permutation-invariant transformer-encoder network and visualize its symmetry properties. We compare the JetCLR representation with alternative representations using linear classifier tests and find it to work quite well.

Author comments upon resubmission

We would like to thank the referees for their comments and suggestions on the manuscript, the resulting paper is now much more accessible and informative.

List of changes

CHANGES FROM REPORT 1

1 - A sentence has been added in the second paragraph of section 2. A description of what the jet constituents represent has been added to the third paragraph of section 2.

2 - Added a few sentences at the beginning of section 4.

3 - References added on page 5.

4 - While it’s true that the eta distribution of the jets is not flat, the physical properties of the jet, such as mass or subjettiness, are invariant to shifts in eta. Since we look at individual jets alone rather than whole events, we expect the physical properties to only depend on the differences in eta and phi between the constituents within the jet. Much like in jet angularities or the energy flow polynomials.
While applying the augmentations we do centre the jets initially in the eta-phi plane, then perform a rotation around the centre of the jet, and then a random translation in eta-phi. The initial centring is for convenience in doing the rotation, and so that we can can compare the case of using just centred jets vs teaching the network to produce translation invariant representations. A sentence has been added on page 5 to make this clearer.

5 - Two sentences have been added to the beginning of section 3 to clarify this.

6 - The discussion in the “Attention” subsection has been extended and hopefully written in a clearer way. The vector q_i is simply the query matrix multiplied by the vector x_i. This is now clearer from the text.

7 - References added, now on page 10.

8 - This is a very interesting question. So far we have only looked in detail at individual jets, so the augmentations are driven towards enforcing symmetries that are completely agnostic to the event-level dynamics. If we were to look at whole events, we would certainly need to reconsider the types of augmentations and symmetries we include. The most obvious of these are the translations in eta. So, the proposed set-up is only universal in the context of studying individual jets. More work needs to be done when applying the technique to multi-jet or whole events.

CHANGES FROM REPORT 2

1 - Interesting point. The only reason we limit to the 50 hardest constituents is for computational efficiency. We found that once the soft+collinear augmentations and IR-safe masking were added the representations were automatically insensitive to the lower pT constituents. In our application we used the top-tagging dataset from 1902.09914 which is widely used as a benchmark. Here MPI and pile-up are ignored so that the tagging performance and ability to remove MPI/pile-up effects are separated. It would definitely be interesting to use representation learning in the presence of MPI and pile-up, and test ways of removing these effects using self-supervision. A few sentences have been added to page 3 to explain this.

2 - Below eq. 1 we have added a short paragraph to set-up the discussion in this section.

3 - These papers were already cited in the introduction and on page 8, but further citation has now been added to the beginning of section 3. Note that the energy-flow-network citation was accidentally missed in the first arxiv submission, but added for the scipost submission.

4 - The discussion in the “Attention” subsection has been extended and hopefully written in a clearer way. We feel that the attention mechanism deserves some explanation here, so we have attempted to clarify it’s importance in extracting correlations between jet constituents. We have also tried to make the structure of the mechanism clearer in the text, and amended figure 2 slightly such that the indices on the attention vectors are more informative.

5 - Thanks for the comment!

6 - A paragraph and some additional comments have been added to the beginning of section 4. A hint of the “very different applications” is also added.

7 - A paragraph describing the ROC curve is given at the beginning of the JetCLR subsection in section 4.

8 - The EFPs are calculated using the code from the energyflow package presented in the EFP paper, we have now stated this on page 12. It’s already mentioned in the JetCLR performance section that the EFPs perform better using a linear discriminant analysis classifier, with a much more detailed comparison in the appendix. However we have added a few more sentences on this comparison at the end of section 4. We do not comment on the effect that the implicit linearity of the EFPs has on their performance using a linear classifier, only noting that the performance varies a lot depending on the linear classifier used.

The points listed under ‘Requested changes’ are covered in the points above.

CHANGES FROM REPORT 3

None required.

CHANGES FROM REPORT 4

1 - This is an interesting point, however we did not explore alternative distance measures in the representation space. There is a brief discussion on this in the ‘Uniformity vs alignment’ subsection in section 2, and we have added a few sentences here to clarify the role of the distance measure in eq. 6.

2 - In fig. 4 ‘test data’ refers to the data held aside for the linear classifier test, which is used to calculate the performance curves shown in green. This linear classifier test is re-done every 10 epochs, which is the source of the slight variation in the green curve. The red curve however is calculated from the training data used to train the JetCLR transformer, which in turn produces the representations for the test data used in the linear classifier. So the red curve is an indication of how well trained the JetCLR model is, while the green curve is an indication of the quality of the representations from JetCLR at each epoch.

Published as SciPost Phys. 12, 188 (2022)

Reports on this Submission

Report #2 by Anonymous (Referee 1) on 2022-1-22 (Invited Report)

Cite as: Anonymous, Report on arXiv:scipost_202108_00046v2, delivered 2022-01-22, doi: 10.21468/SciPost.Report.4219

Report

I would like to thank the authors for carefully addressing my comments (report 1). I am happy to recommend the publication of the paper in its present form. In contrast to the comment by Andy Buckley, I consider the work quite innovative, in that it transfers and adopts a novel and cutting edge ML method to a challenging and relevant application in HEP, i.e. jet representations. I clearly recommend to publish the article in SciPost Physics!

validity: high
significance: high
originality: top
clarity: high
formatting: excellent
grammar: excellent

Report #1 by Andy Buckley (Referee 2) on 2022-1-19 (Invited Report)

Cite as: Andy Buckley, Report on arXiv:scipost_202108_00046v2, delivered 2022-01-19, doi: 10.21468/SciPost.Report.4200

Strengths

My points from the first reporting round have been implemented satisfactorily, from a quick review of the response and the updated document.

Report

While a good paper, with strong practical advice on this method for encoding physical symmetries via augmentations and permutation invariance, I am not sure its significance meets the criteria given for SciPost Physics: with reference to the criteria, it is not a "groundbreaking discovery" or novel link between research areas, and while an interesting approach with apparent potential, it's a stretch to call this a "breakthrough on a previously-identified and long-standing research stumbling block" or obviously opening "a new pathway"... as perhaps hinted by the (humorous?) conclusion to the abstract. My feeling is that, inasmuch as the distinction matters, according to the strict criteria this would be more appropriate for acceptance without further iteration or review in SciPost Physics Core. From the authors' point of view, I don't think the specific of e-imprint should make any significant practical difference.

validity: high
significance: high
originality: good
clarity: high
formatting: perfect
grammar: excellent

Author: Barry Dillon on 2022-01-21 [id 2117]

(in reply to Report 1 by Andy Buckley on 2022-01-19)

While we are grateful to the referee for their comments, we strongly disagree with the referees conclusion that this paper does not meet the standards for SciPost Physics. We address the referees criticisms below:

1 - This paper proposes an entirely new method for training neural networks to have approximate invariance to a pre-defined set of transformations or augmentations of the data. We demonstrated the effectiveness of this method through a comparison with the best and widely used data representations in a linear classifier test. Whether or not this constitutes a "ground-breaking discovery" is of course subjective, but in the application of machine-learning tools to particle phenomenology we certainly feel that it does.

2 - The method builds on cutting edge research on self-supervision from the machine-learning community and is therefore clearly a "novel link between research areas".

3 - Constructing symmetry-invariant representations within deep learning tools has been a "long-standing problem" in particle phenomenology. For example, in the paper we describe the inadequacies of the most widely used representation of jet data in machine-learning applications, the jet images, and how they are preprocessed to achieve rotational and translational invariance. We then explicitly compare our invariant JetCLR representations to the jet images in the linear classifier test.

4 - Given that this is the first paper to propose and demonstrate that self-supervision (in particular constrastive-learning) can be used to enforce approximate invariance to transformations or augmentations of the data in neural networks, this paper clearly opens "a new pathway" for research in machine-learning applications to particle phenomenology problems. While not stated explicitly in the paper, there are many other potential applications of this work in particle phenomenology, and possibly beyond.

5- Our senior author would like to point out that by enforcing the quoted conditions the way the referee is doing it, it is unlikely that ATLAS or CMS would have published any paper since 2012. From the feedback we have received from the theory and experimental community and from our own judgement of conceptual machine learning progress we are convinced that our paper should, without any doubt, be published in SciPost Physics.

Login to report

Comments

Anonymous on 2022-01-19 [id 2106]

I recommend to publish the article in its current version.