SciPost logo

SciPost Submission Page

Fast Perfekt: Regression-based refinement of fast simulation

by Moritz Wolf, Lars O. Stietz, Patrick L. S. Connor, Peter Schleper, Samuel Bein

Submission summary

Authors (as registered SciPost users): Samuel Bein
Submission information
Preprint Link: https://arxiv.org/abs/2410.15992v1  (pdf)
Date submitted: 2024-10-22 10:26
Submitted by: Bein, Samuel
Submitted to: SciPost Physics Core
Ontological classification
Academic field: Physics
Specialties:
  • High-Energy Physics - Experiment
  • Mathematical Physics
Approaches: Experimental, Computational

Abstract

The availability of precise and accurate simulation is a limiting factor for interpreting and forecasting data in many fields of science and engineering. Often, one or more distinct simulation software applications are developed, each with a relative advantage in accuracy or speed. The quality of insights extracted from the data stands to increase if the accuracy of faster, more economical simulation could be improved to parity or near parity with more resource-intensive but accurate simulation. We present Fast Perfekt, a machine-learned regression that employs residual neural networks to refine the output of fast simulations. A deterministic morphing model is trained using a unique schedule that makes use of the ensemble loss function MMD, with the option of an additional pair-based loss function such as the MSE. We explore this methodology in the context of an abstract analytical model and in terms of a realistic particle physics application featuring jet properties in hadron collisions at the CERN Large Hadron Collider. The refinement makes maximum use of domain knowledge, and introduces minimal computational overhead to production.

Current status:
Awaiting resubmission

Reports on this Submission

Report #2 by Anonymous (Referee 1) on 2024-12-18 (Invited Report)

Strengths

1. The paper contains a realistic LHC example where the Fast Perfekt refinement leads to significant improvements over the established fast simulation. Improvements are even visible in correlations with hidden features.

2. The neural network architecture is well-motivated as it learns the residuals to the fast-sim features, and evaluating the network is computationally cheap.

3. The analytical example gives a very clear explanation on how the method works.

Weaknesses

1. The introduction does not provide the wider context of other machine learning methods developed to accelerate or improve detector simulations.

2. In the LHC example, it is not clear whether the proposed two-step training procedure really leads to a good balance between accurate modeling of the target distribution and modeling of the correlations between the fast-sim and target distribution.

Report

Refining the output of existing fast simulators is a promising approach for fast and accurate detector simulations, as it can improve their quality with small computational effort while making use of existing domain knowledge. Therefore the Fast Perfekt method is a relevant contribution to the field.

The method is described clearly and in sufficient detail to be reproduced. There are some aspects that I would like to be clarified in a minor revision (see "Requested changes"), otherwise the acceptance criteria are met.

Requested changes

- All the equations and in-line math seem to be written in bold font. Is that done on purpose?

- Section 1: The introduction contains relatively few references. Alternative or complementary methods to speed up detector simulations should be mentioned here, for instance fast calorimeter simulations with machine learning (see the CaloChallenge paper, 2410.21611, for an overview of different approaches)

- Section 2.3: The MDMM method is used to determine the value of the Langrange multiplier between the two loss terms. However, as the authors note, the two loss terms become anti-correlated at some point during the training. The training task is therefore not a constrained optimization, but it has to find a balance between two (potentially) conflicting objectives instead. Please clarify why it is advantageous in this situation to choose the MDMM method instead of a hand-tuned multiplier $\lambda$. The latter would also allow for a smoother transition between the two training phases.

- Section 4.3: As discussed in the text and shown in Tab. 2, there was almost no effect of the two-stage training compared to the MMD-only training in the LHC example. The authors then argue that optimizing the refinement using distribution matching alone is sufficient in this case because the modes of the fastsim already coincide well with the target distribution. It would be helpful to clarify whether that really means that the bias introduced from the pure distribution matching training is negligible. It could also mean that the first training stage is not effective in improving the correlations between the fast-sim and refined points, because its effects are "forgotten" by the network during the second training stage.

Recommendation

Ask for minor revision

  • validity: high
  • significance: high
  • originality: good
  • clarity: high
  • formatting: good
  • grammar: excellent

Author:  Samuel Bein  on 2024-12-20  [id 5057]

(in reply to Report 2 on 2024-12-18)

Dear reviewer,

Thank you for the thorough and helpful review, and for considering the paper to make a valuable contribution. We agree with the basis of all of the comments, and propose changes to the paper draft in most cases. We respond and propose below in line. As for the noted point regarding very high-dimensional cases such as the case of refining individual hits within a shower, we have realized the need to clarify the purpose of Fast Perfekt, at least in its presented form, as intended to act on final variables of the simulation and not on intermediate quantities such as hits; we clarify this below in responding to all items.

== Requested changes Reviwer: Is there a way to find out which loss to use in a non-toy case? I guess one always wants to include as many variables as possible, so there will not be an omniscient MMD available. I'm asking because I was wondering about the application to calorimeter simulation, where EM showers for the same incident conditions (incident energy / angle) are very similar, but hadronic showers (for example from pions) differ very much from shower to shower.

Authors: We propose to tweak the final paragraph to make it more clear that we endorse using the 1-stage, MMD-based training in cases where simulation is refined to match real data, and the 2-stage method is ideal to make fastsim more like fullsim. For intermediate objects like sim hits from EM showering, the loss terms would likely need to be restructured. Fast Perfekt’s purpose is really to morph the final analysis variables (summary statistics) of objects and events. We propose to add the following sentence in the last paragraph of the conclusions: “ The refinement acts on final variables or summary statistics of the simulation (i.e., jet properties) rather than intermediate quanities (i.e., calorimeter shower hits).”

Reviwer: When comparing corr(x,h) to corr(x^,h′), the results will be skewed if h and h′ differ from each other. I miss a discussion on this in section 2 and also the toy case does not consider h≠h′.

Authors: We agree, that we can improve the correlation to the hidden observable, but not to the extent that would be possible if one could also refine h. This case is explored in the Delphes example where the pT and other variables are included in the omniscient MMD but not directly refined. We propose to add a qualifying statement about correlations to hidden observables: “we also note that the network cannot refine the hidden variable itself, but only correlations to the target hidden variable.”

Reviwer: Will the method still work if the Fastsim model suffers from mode-collapse? I'm asking because in the limit of arbitrarily close Fastsim-samples, a deterministic model will not be able to spread the samples apart.

Authors: In an extreme case of mode collapse, where all elements of the feature vector default to a given value, Fast Perfket would not be able to refine such an effect away, because the refinement is deterministic and does not add its own noise. However, if some variables (or the ground truth) have not exhibited mode collapse, there is still the possibly to refine.

Reviwer: The use of the terms "fastsim" and "fullsim" in the LHC example is misleading, as both are essentially fastsim, just with different parameter cards.

Authors: We propose to modify the sentences on 243 for clarity to read: “These events are then processed twice in parallel using Delphes, once with the default CMS detector implementation and treated as the fullsim data set, and once with a ``flawed" implementation yielding the data set we treat as the fastsim.”

System Message: WARNING/2 (<string>, line 27); backlink

Inline literal start-string without end-string.

Minor comments Reviwer: The abbreviation MDMM is first used below equation 4, but only explained a few paragraphs below that, just above equation 5.

Authors: We agree with the minor comments propose to properly define and cite the MDMM when it is first introduced. Reviwer: The paragraph below equation 6 uses once N, and m This typo has been corrected

Reviwer: The introduction of section 4 refers to "data collected at the LHC", when in fact only simulation is used in the following.

Authors: We propose to change “based on” to “tailored to” so it can not be interpreted that we have used real data.

Reviwer: GEANT4 asks for 3 references (see https://geant4.web.cern.ch/ bottom)

Authors: We agree to this change

end of requested changes ==

Best regards,

Sam, on behalf of the authors

Report #1 by Anonymous (Referee 2) on 2024-12-16 (Invited Report)

Strengths

- timely: precise and fast simulation is crucial for the success of the current and future generation of collider experiments
- light-weight: it provides a much more lightweight alternative to generative AI-based fast simulation,
- versatile: can be used on top of such generative models or "traditional" fastsimulation

Weaknesses

- examples seem very small (less than 5 dimensional), and it's unclear if the shown performance scales to other applications (where the refinement would be more important)

Report

Report on the Manuscript "Fast Perfekt: Regression-based refinement of fast simulation" by Moritz Wolf, Lars O. Stietz, Patrick L.S. Connor, Peter Schleper, and Samuel Bein.
The authors propose a machine-learning-based algorithm to morph samples from "fastsimulation" closer to samples from "fullsimulation". This provides a light-weight alternative to generative AI applications to fast simulation. Depending on whether or not a "pairing" of events from full- and fastsim based on common ground truth information is possible, the authors propose and investigate the use of two different loss functions and training strategies. The model shows good performance in the considered examples and therefore seems promising.
I think the manuscript should be published in SciPost, however, I have a few questions that I would like to see addressed first (see below).

Requested changes

- Is there a way to find out which loss to use in a non-toy case? I guess one always wants to include as many variables as possible, so there will not be an omniscient MMD available. I'm asking because I was wondering about the application to calorimeter simulation, where EM showers for the same incident conditions (incident energy / angle) are very similar, but hadronic showers (for example from pions) differ very much from shower to shower.
- When comparing $corr(x,h)$ to $corr(\hat{x},h')$, the results will be skewed if $h$ and $h'$ differ from each other. I miss a discussion on this in section 2 and also the toy case does not consider $h\neq h'$.
- Will the method still work if the Fastsim model suffers from mode-collapse? I'm asking because in the limit of arbitrarily close Fastsim-samples, a deterministic model will not be able to spread the samples apart.
- The use of the terms "fastsim" and "fullsim" in the LHC example is misleading, as both are essentially fastsim, just with different parameter cards.


Minor comments
- The abbreviation MDMM is first used below equation 4, but only explained a few paragraphs below that, just above equation 5.
- The paragraph below equation 6 uses once $N$, and $m$ otherwise.
- The introduction of section 4 refers to "data collected at the LHC", when in fact only simulation is used in the following.
- GEANT4 asks for 3 references (see https://geant4.web.cern.ch/ bottom)

Recommendation

Ask for minor revision

  • validity: high
  • significance: high
  • originality: top
  • clarity: high
  • formatting: perfect
  • grammar: perfect

Login to report or comment