SciPost Submission Page
Accelerating Monte Carlo event generation -- rejection sampling using neural network event-weight estimates
by Katharina Danziger, Timo Janßen, Steffen Schumann, Frank Siegert
This Submission thread is now published as
Submission summary
Authors (as registered SciPost users): | Timo Janßen · Steffen Schumann · Frank Siegert |
Submission information | |
---|---|
Preprint Link: | scipost_202202_00024v2 (pdf) |
Date accepted: | 2022-04-28 |
Date submitted: | 2022-03-28 11:51 |
Submitted by: | Schumann, Steffen |
Submitted to: | SciPost Physics |
Ontological classification | |
---|---|
Academic field: | Physics |
Specialties: |
|
Approaches: | Computational, Phenomenological |
Abstract
The generation of unit-weight events for complex scattering processes presents a severe challenge to modern Monte Carlo event generators. Even when using sophisticated phase-space sampling techniques adapted to the underlying transition matrix elements, the efficiency for generating unit-weight events from weighted samples can become a limiting factor in practical applications. Here we present a novel two-staged unweighting procedure that makes use of a neural-network surrogate for the full event weight. The algorithm can significantly accelerate the unweighting process, while it still guarantees unbiased sampling from the correct target distribution. We apply, validate and benchmark the new approach in high-multiplicity LHC production processes, including $Z/W$+4 jets and $t\bar{t}$+3 jets, where we find speed-up factors up to ten.
Author comments upon resubmission
We would like to thank the referee for the detailed second report and for re-iterating details on several points to clarify misunderstandings in our reading of them. This is very much appreciated! We have carefully revisited the points raised and compile our responses below, thereby following the ordering of the referee. We have accordingly adjusted, extended and clarified the text in the paper, as detailed below.
-
We thank the referee for spotting this mistake of ours. It is correctly pointed out that our Eq. (6) does not include the overweights from the first unweighting stage. While this does not affect our algorithm it has obvious consequences for the employed performance measures. We have corrected Eq. (6) to what the referee has given as Eq. (3) in his/her report. In fact, the former version of Eq. (6) was actually a typo. In our implementation of the performance measures computation we had used the proper form including all overweights in the determination of the α factors already. Accordingly, none of the quoted results needed to be corrected.
-
The reference to the Alpgen paper is helpful and we thank the referee for pointing us to it. We think it has some similarities with our method which we were not aware of. Accordingly, we added this reference to our revised paper. However, our overweight treatment has been developed independently and our Eq. (5) is the approach we consider appropriate for the case at hands. The fact that there is no indication of mishandling of the overweights in our examples (see toy example Fig. 1, deviations plot Fig. 9) convinces us of the correctness of our equation. It is, however, not equivalent to the equation(s) suggested by the referee. We have also found no counterpart for the referee’s equation in the given reference. However, to study this further, we applied both equations to a simple toy example. The results can be seen in the attached plot (see resubmitted paper, at the very end). We find that the suggested formula does not reproduce the target function, implying that it handles some of the overweights wrongly. These can be attributed to the case where x<x_max and s>w_max. Our treatment correctly accounts for all overweights. Possibly there could be a typo in the equation suggested by the referee. In any case, we see no reason to change our expression given that it produces correct results.
-
To more clearly illustrate the interplay/dependence of our method with the used sampling technique for generating events, we have extended the discussion in Sec. 2 now also briefly introducing importance sampling and the multi-channel method, see new Eqs. (4)-(6). This is then picked up in the discussion of our actual deliverable, i.e. fully differential cross section integrals, in Sec. 3. We have significantly extended the discussion of Eq. (15) and elaborate on possible alternative treatments for a multi-channel sampler. The case we present in our paper indeed uses a single NN with the external particles’ three-momenta as input variables, that are generated by a (multi-channel) probability density specific to Amegic and the considered process. Our network thus effectively learns the ratio f/g (with g the total mapping function, i.e. the sum of all channels). However, as we are not using random numbers as the input variables, that have channel specific mappings to momenta, we do not need to keep track of the individual channels for example through channel specific NN surrogates or via the random number used to select the channel as further input variable.
-
Clearly, as we point out in our toy example already, the surrogate can be from whatever source, including a VEGAS grid or any other importance sampling density. This is also touched upon in the intro to Sec. 3. However, we here concentrate and explore the potential of NNs that we believe have particular promising capabilities.
-
We have added a sentence and an equation to the relevant footnote to stress the fact that non-expert users have to rely on the generated cross section as calculated by the MC program, which should contain the correct normalisation for the given set of events that have been generated. We prefer to not single out the overweight case with further equations for sigma_gen, since this would have to include not only overweight events faithfully but also correctly include N_trials from the unweighting and from potential rejections in ME+PS merging, (negative) weights from the NLO+PS matching procedure, phase space biasing weights, and other advanced features of modern MC programs.
-
We have made the thesis available on CDS and included an explicit reference, Ref. [91].
Again we would like to thank the referee for insisting, which has helped us to make the manuscript significantly clearer! We hope that the paper in its present form qualifies for publication in SciPost Physics.
List of changes
- extended discussion at the introduction to Sec. 2
- former Eq. (6), now Eq. (8), has been corrected
- extended discussion in Sec. 3.1
-updated footnote 3, concerning sample normalisation
Published as SciPost Phys. 12, 164 (2022)
Reports on this Submission
Report #2 by Anonymous (Referee 8) on 2022-4-7 (Invited Report)
- Cite as: Anonymous, Report on arXiv:scipost_202202_00024v2, delivered 2022-04-07, doi: 10.21468/SciPost.Report.4887
Report
I thank the authors for their careful consideration of my comments and the associated change to the text, which clarify and improves the global quality of the paper. This fully answer all my concerns, and therefore I'm recommending to publish the paper in his current form.
I would like to apologize to the authors since I was clearly missing a point on the multichannel. Thanks a lot for the clarification.
For the record, I would like to comment on the new validation plot made by the author in their last answer. I have done the same toy example as them and do not observe any bias. But I do not think that investigate our disagreement is relevant for the publication of the paper.
Report #1 by Tilman Plehn (Referee 4) on 2022-3-28 (Invited Report)
Report
Thank you for addressing all my comments, and sorry for the misunderstanding. I think the paper is seriously cool and a significant step in ML-enhanced event generation, so let's publish it!