SciPost logo

SciPost Submission Page

CapsNets Continuing the Convolutional Quest

by Sascha Diefenbacher, Hermann Frost, Gregor Kasieczka, Tilman Plehn, Jennifer M. Thompson

This Submission thread is now published as SciPost Phys. 8, 023 (2020)

Submission summary

As Contributors: Sascha Diefenbacher · Tilman Plehn · Jennifer Thompson
Arxiv Link: (pdf)
Date accepted: 2019-12-02
Date submitted: 2019-11-04 01:00
Submitted by: Diefenbacher, Sascha
Submitted to: SciPost Physics
Academic field: Physics
  • High-Energy Physics - Phenomenology
Approaches: Theoretical, Experimental


Capsule networks are ideal tools to combine event-level and subjet information at the LHC. After benchmarking our capsule network against standard convolutional networks, we show how multi-class capsules extract a resonance decaying to top quarks from both, QCD di-jet and the top continuum backgrounds. We then show how its results can be easily interpreted. Finally, we use associated top-Higgs production to demonstrate that capsule networks can work on overlaying images to go beyond calorimeter information.

Ontology / Topics

See full Ontology or Topics database.

Large Hadron Collider (LHC) Quantum chromodynamics (QCD)

Published as SciPost Phys. 8, 023 (2020)

Author comments upon resubmission

We would like to thank the referee for their interest and careful reading of our paper, as well as their useful comments. We have clarified some points in our paper and addressed the points the referee has raised. Please find attached our list of changes.

List of changes

1 - Add discussion of differences between original capsule implementation and
the current method.
- Both implementations are, by design, identical, to clarify this we added:
``Analogous to the original capsule paper, we transition between
convolutional and capsule part by re-shaping...'' in section 4.

2 - Add information about how many routings were used.
- We used 3 routings, as was shown to be optimal in other
studies. We have rephrased a sentence to reflect this:
``We repeated this for a chosen number of routings, where
three iterations have in other studies given the best results''
now reads
``We repeated this for 3 routings, which has been
shown in other studies to give the best results''

3 - More information about the preprocessing. The authors
mention that CapsNets need less preprocessing, and it sounds like
only scaling the images so the most intense pixel has a value of 1.0
was done. Is this the same for the benchmarks of the Rutgers DeepTop
Taggers as done here?
- We have added: ``In contrast to the minimal pre-processing
we use for the event image capsule network, for the Rutgers tagger and the
jet image capsule network we employ
the full pre-processing for each jet as described in Ref. [10].
The jets are selected and centered around the $p_T$ weighted centroid of the
jet, and rotated such that the major principal axis is vertical.
The image is then flipped to ensure that the maximum activity is in
the upper-right-hand quadrant. Finally, the images are pixelated and normalized."

4 - Compare CapsNets to networks of similar architecture but without
the Capsules for the Pooling CapsNets'' architecture of Figure 6.
- We have now made this comparison, and have included the plot in a response to the report.
We observe a small but persistent increase in performance by using
capsule networks over a dense network with a similar architecture. This
comes along with the advantage of having the capsule vectors themselves,
which provide a window into how the network is making decisions.

5 - Consider adding a W′ signal to see how CapsNets deal with signal
which has some substructure signals and similar kinematics. While
pre-selection should be able to deal with some of the differences here, it
would be for the study of the CapsNet themselves.
- A W' analysis would be an interesting new application for our
network, but it would be a whole project in itself and falls
outside the scope of our current publication.

6 - Some mention of how the results of [43] compare to the t¯tH classifier used here.
- We have added ``For this set-up we find comparable performance to
Ref. [43], with an AUC of 0.715, which is slightly above their
upper limit.''

7 - [Optional] Publicly available code or code snippets.
- Unfortunately, we are unable to dedicate the time to make a public code useful to the community.

Submission & Refereeing History

Published as SciPost Phys. 8, 023 (2020)

You are currently on this page

Resubmission 1906.11265v3 on 4 November 2019

Reports on this Submission

Anonymous Report 1 on 2019-11-14 (Invited Report)


The authors have performed the checks which I asked for. I am now satisfied and happy to recommend this paper for publication.

  • validity: -
  • significance: -
  • originality: -
  • clarity: -
  • formatting: -
  • grammar: -

Login to report or comment