SciPost Submission Page
CapsNets Continuing the Convolutional Quest
by Sascha Diefenbacher, Hermann Frost, Gregor Kasieczka, Tilman Plehn, Jennifer M. Thompson
|As Contributors:||Tilman Plehn · Jennifer Thompson|
|Submitted by:||Thompson, Jennifer|
|Submitted to:||SciPost Physics|
|Subject area:||High-Energy Physics - Phenomenology|
Capsule networks are ideal tools to combine event-level and subjet information at the LHC. After benchmarking our capsule network against standard convolutional networks, we show how multi-class capsules extract a resonance decaying to top quarks from both, QCD di-jet and the top continuum backgrounds. We then show how its results can be easily interpreted. Finally, we use associated top-Higgs production to demonstrate that capsule networks can work on overlaying images to go beyond calorimeter information.
Author comments upon resubmission
List of changes
1 - Add discussion of differences between original capsule implementation and
the current method.
- Both implementations are, by design, identical, to clarify this we added:
``Analogous to the original capsule paper, we transition between
convolutional and capsule part by re-shaping...'' in section 4.
2 - Add information about how many routings were used.
- We used 3 routings, as was shown to be optimal in other
studies. We have rephrased a sentence to reflect this:
``We repeated this for a chosen number of routings, where
three iterations have in other studies given the best results''
``We repeated this for 3 routings, which has been
shown in other studies to give the best results''
3 - More information about the preprocessing. The authors
mention that CapsNets need less preprocessing, and it sounds like
only scaling the images so the most intense pixel has a value of 1.0
was done. Is this the same for the benchmarks of the Rutgers DeepTop
Taggers as done here?
- We have added: ``In contrast to the minimal pre-processing
we use for the event image capsule network, for the Rutgers tagger and the
jet image capsule network we employ
the full pre-processing for each jet as described in Ref. .
The jets are selected and centered around the $p_T$ weighted centroid of the
jet, and rotated such that the major principal axis is vertical.
The image is then flipped to ensure that the maximum activity is in
the upper-right-hand quadrant. Finally, the images are pixelated and normalized."
4 - Compare CapsNets to networks of similar architecture but without
the Capsules for the Pooling CapsNets'' architecture of Figure 6.
- We have now made this comparison, and have included the plot in a response to the report.
We observe a small but persistent increase in performance by using
capsule networks over a dense network with a similar architecture. This
comes along with the advantage of having the capsule vectors themselves,
which provide a window into how the network is making decisions.
5 - Consider adding a W′ signal to see how CapsNets deal with signal
which has some substructure signals and similar kinematics. While
pre-selection should be able to deal with some of the differences here, it
would be for the study of the CapsNet themselves.
- A W' analysis would be an interesting new application for our
network, but it would be a whole project in itself and falls
outside the scope of our current publication.
6 - Some mention of how the results of  compare to the t¯tH classifier used here.
- We have added ``For this set-up we find comparable performance to
Ref. , with an AUC of 0.715, which is slightly above their
7 - [Optional] Publicly available code or code snippets.
- Unfortunately, we are unable to dedicate the time to make a public code useful to the community.