SciPost logo

SciPost Submission Page

CaloDREAM -- Detector Response Emulation via Attentive flow Matching

by Luigi Favaro, Ayodele Ore, Sofia Palacios Schweitzer, Tilman Plehn

Submission summary

Authors (as registered SciPost users): Luigi Favaro · Ayodele Ore · Sofia Palacios Schweitzer · Tilman Plehn
Submission information
Preprint Link: https://arxiv.org/abs/2405.09629v2  (pdf)
Code repository: https://github.com/heidelberg-hepml/calo_dreamer
Data repository: https://calochallenge.github.io/homepage/
Date submitted: 2024-05-31 22:40
Submitted by: Ore, Ayodele
Submitted to: SciPost Physics
Ontological classification
Academic field: Physics
Specialties:
  • High-Energy Physics - Experiment
  • High-Energy Physics - Phenomenology
Approach: Computational

Abstract

Detector simulations are an exciting application of modern generative networks. Their sparse high-dimensional data combined with the required precision poses a serious challenge. We show how combining Conditional Flow Matching with transformer elements allows us to simulate the detector phase space reliably. Namely, we use an autoregressive transformer to simulate the energy of each layer, and a vision transformer for the high-dimensional voxel distributions. We show how dimension reduction via latent diffusion allows us to train more efficiently and how diffusion networks can be evaluated faster with bespoke solvers. We showcase our framework, CaloDREAM, on datasets 2 and 3 of the CaloChallenge.

Author indications on fulfilling journal expectations

  • Provide a novel and synergetic link between different research areas.
  • Open a new pathway in an existing or a new research direction, with clear potential for multi-pronged follow-up work
  • Detail a groundbreaking theoretical/experimental/computational discovery
  • Present a breakthrough on a previously-identified and long-standing research stumbling block
Current status:
Awaiting resubmission

Reports on this Submission

Report #4 by Anonymous (Referee 2) on 2024-7-16 (Invited Report)

Report

The accurate an efficient simulation of calorimeter showers is a topic of high interest in collider physics. Currently, traditional calorimeter simulation are the major bottleneck of the Large Hadron Collider (LHC) simulation pipeline. The topic covered in the manuscript is thus of major significance.

The authors propose a novel and highly promising machine learning based method, dubbed CaloDREAM, capable of generating high-fidelity showers. Their strategy involves several state-of-the art machine learning techniques: Conditional Flow Matching (CFM) autoregressive Vision Transformers (ViT) and Latent diffusion with Variational Auto-encoders (VAE). The time consuming step coming from the CFM implementation is alleviated using bespoke samplers for which they compare several methods. Their implementation is trained and tested with datasets 2 and 3 from the CaloChallenge and the accuracy is assessed via classifier-based methods.
The manuscript is well written. It provides a, generally, well explained description of their methods, and potential ways it could be further improved.

Requested changes

-If I understand correctly, you used the same classifier architecture for all accuracy tests, not only for the ones on Fig. 10. This should be clarified in the manuscript. Also, how were these parameters chosen? Is it clear that they are optimal for both the high level and low level tests?

- You feed the vision transformer patches made of n voxels. After the ViT generation how are the patches transformed back into voxels, is the energy per patch uniformly distributed over the n voxels? Have you checked if there is a noticeable error/loss of information just from mapping voxels to batches and back to voxels?

-According to table 6 shape sampling DS2 is faster with ViT than with LaViT, why is this the case?

-Typo at ‘can be be understood ’ in the introduction.

Recommendation

Ask for minor revision

  • validity: high
  • significance: high
  • originality: high
  • clarity: high
  • formatting: excellent
  • grammar: excellent

Report #3 by Anonymous (Referee 3) on 2024-7-16 (Invited Report)

Strengths

1. High quality output
2. Adoption of new, cutting-edge techniques
3. Focus on reducing inference time

Weaknesses

1. Tersely written, especially when describing techniques, requiring very careful reading to understand the exact procedures
2. Some aspects of the results are not fully quantified (missing are e.g. measurements of inference time, more thorough characterization of physics performance)

Report

This paper presents a model competitive with other leading entries in the community challenge. It is based on new techniques, some of which may be adopted more broadly now that they have been demonstrated to work. I recommend accepting the paper after some minor improvements to clarify details in the text, as suggested below.

Requested changes

1. page 1, par 1: The claim that simulating detector interaction is a bottleneck in speed and precision would be more convincing if it were supported by some citations. During Run 2, the HEP Software Foundation led an effort to quantify this, which can be found in https://arxiv.org/abs/1803.04165.
2. page 1, par 2: The assertion "we have to ensure that LHC simulations remain first-principle and are not replaced by data-driven modelling" should be supported by further discussion. One could argue that, depending on the hypothesis being tested, only event generation necessarily needs to occur from first principles.
3. page 1, par 4: Why is CaloChallenge dataset 1 omitted? It includes some interesting distinctions compared to the other datasets: a different geometry and different particle types. This paper would certainly be improved by including it, but if that is not feasible, at least some discussion should be added to say why it is not there.
4. page 2, footnote: typo avaiable -> available
5. page 5, par 1 / fig 1: If the Transfusion encoder/decoder is trained along with the CFM generator (rather than using a frozen, pretrained encoder/decoder), does that imply that the outputs c_i implicitly depend on the diffusion step time t, and if so, would it be better to condition these outputs on t explicitly?
6. page 6, par 3: This discussion of patching does not specify how the actual patch sizes used in Table 2 were chosen. Was a hyperparameter scan performed?
7. page 9, par 2: Is it just the a_i and b_i parameters that are learnable when training the bespoke solver? This is not quite clear as currently written.
8. fig 6: Though the 1 MeV cut decreases the proportion of events populating the first sparsity bin, the overall agreement in the sparsity distribution actually worsens. Is this understood?
9. page 13, par 2: This paragraph may be intended to explain the question posed above. Did the authors mean to say that it fixes the sparsity *down to* λ_10 <~ 0.7, rather than *up to*? Otherwise it does not make sense.
10. section 4.5: While the classifier AUC and other related quantities are powerful, they also suffer from some limitations and caveats (assumption of optimality in training, dependence on amount of training data, etc.). The paper would benefit from an additional non-parametric evaluation using e.g. the Frechet Particle Distance.
11. page 19, par 2: typo hyperparmeters -> hyperparameters

Recommendation

Ask for minor revision

  • validity: high
  • significance: high
  • originality: high
  • clarity: good
  • formatting: good
  • grammar: good

Report #1 by Anonymous (Referee 1) on 2024-7-5 (Invited Report)

Strengths

See report below.

Weaknesses

See report below.

Report

Simulating particle material interaction in high energy physics detector is a time/CPU consuming task, especially the interaction in the calorimeters. Generative machine learning techniques provide a new approach for this task, with the current challenge of reaching high dimensionality and high fidelity. The community released common datasets via the CaloChallenge. This work reported the "CaloDREAM" method and tested it on the dataset 2 and 3 provided by the CaloChallenge.

The method is a novel method using various techniques. It decouples the deposited energy per layer from the shower shape, and use independent energy network and shape network to generate both separately. The energy network uses a transfusion technique and the shape network uses a vision transformer.

Regarding the results, the networks are still struggling to learn some details of the calorimeter showers, as can been seen from Fig 5-7, 11. Further refinement of the network and/or better training strategy would be needed.

The manuscript is well written and complete: it motivates the problem to solve in the field, and explain how these networks are constructed, followed by the results using the dataset 2 and 3 of the CaloChallenge, and finally closed with an discussion of the future research directions. I suggest to accept the paper with minor edition changes to address my comments/suggestions detailed below.

Requested changes

1. Section 4.1 "obtaining AUC scores around 0.51": The AUC value is impressive, however, from Fig 4, especially u_44 distribution, the difference between G4 and CaloDREAM is quite high. Is it because the classifier is mainly affected by the peak of the distribution where the agreement looks better? Could you clarify in the text this AUC is training or test AUC?

2. Section 4.3 "networks developed for lower-dimensional phase spaces also give the necessary precision for high-dimensional phase spaces": From the context it is not clear the motivation to maintain this assumption. Do you imply some degree of transfer learning? If so, it would be nice to detail the motivation.

3. Section 4.4 "It is interesting to note that the performance of a given solver can be significantly different between high- and low-level classifiers." Not sure how the conclusion "significant difference" is drawn. From Fig 9, they are close to each other except for Euler. I suggest to rephrase this sentence to remove "significantly".

4. Fig 10: Each plot show two sets of results - Geant4 and Gen. I don't get how Geant4 are obtained. I thought the plot shows the "weight" of each event that maps from Gen to Geant4, namely, in Eq 21, p_data is Geant4 distribution, p_model is Gen distribution. Am I missing something?

5. Fig 10 table on the right: I don't find definitions for LL/HL.

6. Fig 10 table on the right: These AUCs are all nice but DS3 HL are quite different in ViT and davit. Is it subject to large uncertainty? Did you get stable results by training the classifier a second time?

7. Sect 5 "However, further studies are needed to understand the effects of mapping the distributions into real detectors with irregular geometries": the rest part in this paragraph is true for both datasets. Suggest to make a new paragraph.

8. Appendix A.2 "which gives an AUC score of 0.512(5)": Again from Fig 11, the separation is quite large. I am not sure why the classifier gives such good AUC scores. Is this training or test AUC? Did you check the classifier score distribution?

9. Table 6 worth commenting why LaTiV is slower on DS2 (assuming it should be faster?)


10. Section 3.3 "Finally, the phase space configurations are provided by the the ...": duplicated "the"

11. Section 3.4 "... with coupling layers [90] which stems from ...": reference of "which" is not clear.

Recommendation

Publish (easily meets expectations and criteria for this Journal; among top 50%)

  • validity: high
  • significance: good
  • originality: high
  • clarity: high
  • formatting: good
  • grammar: good

Login to report or comment