SciPost Submission Page
Reports of My Demise Are Greatly Exaggerated: $N$-subjettiness Taggers Take On Jet Images
by Liam Moore, Karl Nordström, Sreedevi Varma, Malcolm Fairbairn
|As Contributors:||Sreedevi Varma|
|Submitted by:||Varma, Sreedevi|
|Submitted to:||SciPost Physics|
|Domain(s):||Exp., Theor. & Comp.|
|Subject area:||High-Energy Physics - Phenomenology|
We compare the performance of a convolutional neural network (CNN) trained on jet images with dense neural networks (DNNs) trained on n-subjettiness variables to study the distinguishing power of these two separate techniques applied to top quark decays. We find that they perform almost identically and are highly correlated once jet mass information is included, which suggests they are accessing the same underlying information which can be intuitively understood as being contained in 4-, 5-, 6-, and 8-body kinematic phase spaces depending on the sample. This suggests both of these methods are highly useful for heavy object tagging and provides a tentative answer to the question of what the image network is actually learning.
Submission & Refereeing History
Reports on this Submission
Anonymous Report 1 on 2019-6-27 Invited Report
1. A robust comparison of different representations of jets for machine learning is provided.
2. The type(s) of information used by the machine learning methods is probed.
3. The importance of mass smearing issues with standard pre-processing techniques is clearly discussed.
4. The writing in the paper is clear and understandable.
1. The comparison is restricted to calorimeter/infrared-safe information, without significant discussion of how additional information (counts, charge, flavor) can change the conclusions.
2. Important experimental issues such as pileup and detector effects are not included.
3. A more detailed discussion of the physics of different mass smearing effects is lacking.
This paper provides a direct comparisons of two very different feature representations of jets for machine learning applications: jet-images with convolutional neural networks (CNNs) and a basis of N-subjettiness observables with dense neural networks (DNNs). The two jet representations are found to provide consistent performance, using the identification of hadronic top decays in semi- and highly-boosted limits as a case study. Further, the relevance of jet mass information (smeared by pre-processing steps) for tagging resonant decays is highlighted quantitatively.
This paper tackles the important question of how to explore different jet representations and what information the machine is using. These issues will become increasingly important as these powerful modern machine learning tools advance toward use in experimental analyses. I believe that this paper will be suitable for publication in SciPost once the questions and points discussed below are suitably addressed.
As the studies in the paper appear robust and technically sound, I think it is only necessary to add discussions and clarifications in response to the questions and comments below.
Questions and comments:
1. What determined the pixelization choices (51x51 and 37x37)? Are these intended to reflect experimental resolutions, to optimize the training of the CNN, or something else?
2. Translating the maximum-pT pixel to (eta,phi)=(0,0) is not equivalent to subtracting the pT-weighted centroid, i.e. translating the centroid to (eta,phi)=(0,0). Both of these are sensible pre-processing choices, but they are not the same (as claimed in the paper) since the max pixel and the pT-mean pixel may be different. It is worth clarifying this in the text.
3. How do the conclusions change with the introduction of additional information (particle types, counts, charges, etc)? How can a similar comparison study be done, even in principle? The jet images can include multiple colour channels to accomodate this information, whereas the N-subjettiness observables cannot easily do this.
4. The importance of jet mass is a crucial element of this paper. There are different pre-processing steps that smear the jet mass to different extents, which would be important to enumerate in the paper. For instance, normalizing the pT to 1 impacts both the jet images and the N-subjettiness observables. Discretization affects both CNN and CNN1. Rotation affects only CNN1.
5. If the pT were divided by a constant (e.g. pT->pT/400 GeV for the low pT sample) instead of being normalized to sum to 1, it would provide some of the preprocessing benefits without significantly smearing the mass. Would the authors advocate for something like this instead of adding the mass value back in separately?
6. What are the scope of your conclusions about observable bases vs. image representations and the relevance of mass? It is worth spelling them out more clearly in the conclusions. To me, it seems that conclusions appear generally relevant for using infrared safe information to identify multi-prong topologies originating from resonant decays.
7. The question of "What is the machine learning?" is a broad one of great recent interest with these black-box methods. This paper appears to probe the slightly more narrow question of "What information can the machine be using?" by restricting to N-body kinematics. This is a semantic issue, but it may be worth clarifying in the introduction that this is specifically what will be explored in the paper.
Small typos or wording issues:
1. p. 4: "shown in 1." should be "shown in Figure 1."
2. p. 6: "50 epochs . As". There is a period issue here, and the next sentence appears to be a non sequitur.
3. p. 7: "The output layer are two nodes" sounds a little odd. Perhaps "consists of two nodes" would be more suitable?
4. Fig. 6: "The image networks once again under-performs" would be better as "under-perform".
5. p. 8: "Figures ??" should be something like "Figures 7-9".
1. Discuss the choice of pixelization and revise the discussion of translations in Sec. 2.2.
2. Include additional discussion of the mass smearing for the different methods.
3. Add discussion about how additional information (particle-type, charge, counts, etc.) would affect the conclusions.
4. Spell out the scope and impact of the conclusions more clearly in Sec. 4.
5. Clarify the aspect(s) of "what the machine is learning" that will be probed in the paper.
6. In the introduction, include some relevant references to the alluded-to experimental uses of refs. [2-9].