# Accelerating Monte Carlo event generation -- rejection sampling using neural network event-weight estimates

### Submission summary

 As Contributors: Timo Janßen · Steffen Schumann · Frank Siegert Arxiv Link: https://arxiv.org/abs/2109.11964v1 (pdf) Date submitted: 2021-09-28 21:13 Submitted by: Schumann, Steffen Submitted to: SciPost Physics Academic field: Physics Specialties: High-Energy Physics - Experiment High-Energy Physics - Phenomenology Approaches: Computational, Phenomenological

### Abstract

The generation of unit-weight events for complex scattering processes presents a severe challenge to modern Monte Carlo event generators. Even when using sophisticated phase-space sampling techniques adapted to the underlying transition matrix elements, the efficiency for generating unit-weight events from weighted samples can become a limiting factor in practical applications. Here we present a novel two-staged unweighting procedure that makes use of a neural-network surrogate for the full event weight. The algorithm can significantly accelerate the unweighting process, while it still guarantees unbiased sampling from the correct target distribution. We apply, validate and benchmark the new approach in high-multiplicity LHC production processes, including $Z/W$+4 jets and $t\bar{t}$+3 jets, where we find speed-up factors up to ten.

###### Current status:
Editor-in-charge assigned

### Submission & Refereeing History

Submission 2109.11964v1 on 28 September 2021

## Reports on this Submission

### Report

The paper shows a nice and crucial study on the way to accelerate event generation for the LHC using modern numerical methods. It should definitely be published. I have a few comments which the authors might want to consider, maybe other readers will have similar questions:

- I am sorry, but I do not understand the justification behind the two methods of removing large weights, especially their hyperparameters. To an outsider they still appear like statistical cheating or keeping-fingers-crossed, and how do I know that the 1% of the rate that I remove it not the tail I care about? Is that ever checked? I am not sure how this kind of laissez-faire approach corresponds to the strict statement that we always need to consider the true distributions at the end of the algorithm.

- along the same line of argument - where in phase space do the events in the tails of Fig.3 typically sit?

- personally, I would not mind a graphical representation of the two-step algorithm, just to make it a little easier to understand and find in the extensive description of Sec.2.

- in Fig.5, could you include the Rij distributions directly, since they have the sharp jet-separation cuts?

- for instance on p.20 ,the authors mention that some channels are slower than others. Could you please say why and how this scales for instance with the number of Feynman diagrams or whatever else?

- maybe I am stupid, but how sure are we that the surrogate model has to be trained on each partonic channel individually? I would be interested in seeing what happens when we train on a combination, maybe the excellent NN-interpolation helps there as well?

- I am seriously impressed by Fig.8, given that the network is small. Could you maybe also show Rij? And comment on why you get the W and top masses so impressively well?

- I have a few open questions on the next steps of these analyses, just like the other referees. So what can I expect from NLO, for instance, and what happens if we include variable numbers of jets? Does the standard Sherpa approach imply that they will always be represented by different networks?

- finally, it would be nice to cite our paper on flow generators with uncertainties (2104.04543), as it attempts to address the comment on the bottom of p.3. Sorry that our complete study came too late .

As I said, really nice paper!!

• validity: -
• significance: -
• originality: -
• clarity: -
• formatting: -
• grammar: -

### Strengths

1. This paper is very well written, the arguments are clearly and concisely presented.
2. The work is timely, in that the cpu cost of precision predictions needs to be reduced. While the born level predictions presented are no longer considered time-consuming, the method could be of interest in other applications.

### Weaknesses

1. The method studied relates mostly to ML method to approximate born-level matrix elements. Other studies already apply similar methods to NLO calculations.
2. There is no discussion of the validity of using the sum of weights squared as an indication of the uncertainty in each bin, after the use of the given method.

### Report

The timely presentation of a ML method for improving the unweighting efficiency is very well written and presents convincing arguments. Essentially, the unweighting probability for each point in phase space will be evaluated approximately, and then corrected with the full contribution from the matrix element. The authors present several examples of distributions in support of the method.

What is missing from the presentation is a discussion of the extent to which the use of the sum of weights squared as a uncertainty is correct in the surrogate method. Figure 9 is meant to show that the difference in the predictions from the full and the surrogate approaches follow a normal distribution centered at 0 and with width 1. However, if I am not mistaken then the quantity along the x-axis will contain units, unless the denominator (Delta_full^2 + Delta_surrogate^2) is square-rooted? Which would obviously change the distribution. Or perhaps the authors use a different (unstated) definition of the Deltas? Returning to the original point: To what degree is the variance of the surrogate weights themselves an indication of the uncertainty of the surrogate distribution? This of course is the only distribution that would be accessible after an unweighting procedure, so it would be undesirable if a correct estimate of the uncertainty relied upon the underlying generated distribution.

### Requested changes

Minor points:
1. Specify whether the x-axis in figure 9 should have a square root applied in the denominator, when sigma=1 as discussed in section 4.3.3
2. The introduction mentions that an evaluation of the matrix element can take O(1s). It would be helpful to note whether this timing is for a single core on an old CPU, or for a full GPU.

• validity: good
• significance: good
• originality: good
• clarity: high
• formatting: excellent
• grammar: good

### Strengths

1. Very active and promising line of investigation in the context of developing new approaches to Monte Carlo event generation.
2. Very clear and useful presentation of the problem and unbiased presentation of the state-of-the-art result.
3. Direct and very natural approach to study, i.e. to use ML techniques to improve on the matrix-element*phase space sampling. First steps in terms of a two-staged generation, first approximate and then final resampling. Results look promising.
4. Possibilities for extensions to NLO (which are explicitly mentioned, but not checked numerically).

### Weaknesses

1. It is not clear how the multi-channel sampling is handled. More details on this topic would be useful to the reader.
2. The correlation between the increase in the max weight and the speed-gain could be a potential issue. This is also mirrored by the rather significant dependence of the final gains on the exact definition of the max weight .
3. The pattern of the gains in different subprocesses does not seem to be fully accounted for by the explanations provided in the text. In tt+jets for example there is a huge gain in t_full/t_surr for the multi-gluon channel which is then completely lost.

### Report

I think the paper is extremely well written and presented. The key elements of the study are clearly stated. The metrics used for the evaluation of the performance of the new algorithms allow the reader to have a clear picture. The algorithms are clearly explained.

The topic of using ML techniques for speed up MC generation is attracting considerable attention in the HEP community and it is extremely timely.

Apart from a few critical points which will need further studies before the method can be "industrialised", results are promising. Once the points highlighted above and the requests below have been addressed this work should be published.

### Requested changes

1. [Suggestion] Would it be possible to have a two dimensional {w/s, w} plot? One would like to know, in particular, whether the events for which the |log(w/s)| is large are also events where the event weights are large and in any case how their are distributed.

2. [Request] I would like to have an assessment on the interplay of the new method with multi-channel integration.

3. [Suggestion] For the comparison plots e.g. Fig. 5 and Fig. 7,8 it could also be useful to have a ML-boosted sample with much higher statistics to see whether systematic effects appear.

• validity: high
• significance: high
• originality: good
• clarity: top
• formatting: excellent
• grammar: excellent

### Report

I have joined the report as a PDF file.

### Attachment

• validity: -
• significance: -
• originality: -
• clarity: -
• formatting: -
• grammar: -