SciPost Submission Page
Accelerating Monte Carlo event generation  rejection sampling using neural network eventweight estimates
by Katharina Danziger, Timo Janßen, Steffen Schumann, Frank Siegert
Submission summary
As Contributors:  Timo Janßen · Steffen Schumann · Frank Siegert 
Arxiv Link:  https://arxiv.org/abs/2109.11964v1 (pdf) 
Date submitted:  20210928 21:13 
Submitted by:  Schumann, Steffen 
Submitted to:  SciPost Physics 
Academic field:  Physics 
Specialties: 

Approaches:  Computational, Phenomenological 
Abstract
The generation of unitweight events for complex scattering processes presents a severe challenge to modern Monte Carlo event generators. Even when using sophisticated phasespace sampling techniques adapted to the underlying transition matrix elements, the efficiency for generating unitweight events from weighted samples can become a limiting factor in practical applications. Here we present a novel twostaged unweighting procedure that makes use of a neuralnetwork surrogate for the full event weight. The algorithm can significantly accelerate the unweighting process, while it still guarantees unbiased sampling from the correct target distribution. We apply, validate and benchmark the new approach in highmultiplicity LHC production processes, including $Z/W$+4 jets and $t\bar{t}$+3 jets, where we find speedup factors up to ten.
Current status:
Submission & Refereeing History
You are currently on this page
Reports on this Submission
Report 4 by Tilman Plehn on 20211118 (Invited Report)
Report
The paper shows a nice and crucial study on the way to accelerate event generation for the LHC using modern numerical methods. It should definitely be published. I have a few comments which the authors might want to consider, maybe other readers will have similar questions:
 I am sorry, but I do not understand the justification behind the two methods of removing large weights, especially their hyperparameters. To an outsider they still appear like statistical cheating or keepingfingerscrossed, and how do I know that the 1% of the rate that I remove it not the tail I care about? Is that ever checked? I am not sure how this kind of laissezfaire approach corresponds to the strict statement that we always need to consider the true distributions at the end of the algorithm.
 along the same line of argument  where in phase space do the events in the tails of Fig.3 typically sit?
 personally, I would not mind a graphical representation of the twostep algorithm, just to make it a little easier to understand and find in the extensive description of Sec.2.
 in Fig.5, could you include the Rij distributions directly, since they have the sharp jetseparation cuts?
 for instance on p.20 ,the authors mention that some channels are slower than others. Could you please say why and how this scales for instance with the number of Feynman diagrams or whatever else?
 maybe I am stupid, but how sure are we that the surrogate model has to be trained on each partonic channel individually? I would be interested in seeing what happens when we train on a combination, maybe the excellent NNinterpolation helps there as well?
 I am seriously impressed by Fig.8, given that the network is small. Could you maybe also show Rij? And comment on why you get the W and top masses so impressively well?
 I have a few open questions on the next steps of these analyses, just like the other referees. So what can I expect from NLO, for instance, and what happens if we include variable numbers of jets? Does the standard Sherpa approach imply that they will always be represented by different networks?
 finally, it would be nice to cite our paper on flow generators with uncertainties (2104.04543), as it attempts to address the comment on the bottom of p.3. Sorry that our complete study came too late .
As I said, really nice paper!!
Anonymous Report 3 on 20211116 (Invited Report)
Strengths
1. This paper is very well written, the arguments are clearly and concisely presented.
2. The work is timely, in that the cpu cost of precision predictions needs to be reduced. While the born level predictions presented are no longer considered timeconsuming, the method could be of interest in other applications.
Weaknesses
1. The method studied relates mostly to ML method to approximate bornlevel matrix elements. Other studies already apply similar methods to NLO calculations.
2. There is no discussion of the validity of using the sum of weights squared as an indication of the uncertainty in each bin, after the use of the given method.
Report
The timely presentation of a ML method for improving the unweighting efficiency is very well written and presents convincing arguments. Essentially, the unweighting probability for each point in phase space will be evaluated approximately, and then corrected with the full contribution from the matrix element. The authors present several examples of distributions in support of the method.
What is missing from the presentation is a discussion of the extent to which the use of the sum of weights squared as a uncertainty is correct in the surrogate method. Figure 9 is meant to show that the difference in the predictions from the full and the surrogate approaches follow a normal distribution centered at 0 and with width 1. However, if I am not mistaken then the quantity along the xaxis will contain units, unless the denominator (Delta_full^2 + Delta_surrogate^2) is squarerooted? Which would obviously change the distribution. Or perhaps the authors use a different (unstated) definition of the Deltas? Returning to the original point: To what degree is the variance of the surrogate weights themselves an indication of the uncertainty of the surrogate distribution? This of course is the only distribution that would be accessible after an unweighting procedure, so it would be undesirable if a correct estimate of the uncertainty relied upon the underlying generated distribution.
Requested changes
Minor points:
1. Specify whether the xaxis in figure 9 should have a square root applied in the denominator, when sigma=1 as discussed in section 4.3.3
2. The introduction mentions that an evaluation of the matrix element can take O(1s). It would be helpful to note whether this timing is for a single core on an old CPU, or for a full GPU.
Anonymous Report 2 on 20211113 (Invited Report)
Strengths
1. Very active and promising line of investigation in the context of developing new approaches to Monte Carlo event generation.
2. Very clear and useful presentation of the problem and unbiased presentation of the stateoftheart result.
3. Direct and very natural approach to study, i.e. to use ML techniques to improve on the matrixelement*phase space sampling. First steps in terms of a twostaged generation, first approximate and then final resampling. Results look promising.
4. Possibilities for extensions to NLO (which are explicitly mentioned, but not checked numerically).
Weaknesses
1. It is not clear how the multichannel sampling is handled. More details on this topic would be useful to the reader.
2. The correlation between the increase in the max weight and the speedgain could be a potential issue. This is also mirrored by the rather significant dependence of the final gains on the exact definition of the max weight .
3. The pattern of the gains in different subprocesses does not seem to be fully accounted for by the explanations provided in the text. In tt+jets for example there is a huge gain in t_full/t_surr for the multigluon channel which is then completely lost.
Report
I think the paper is extremely well written and presented. The key elements of the study are clearly stated. The metrics used for the evaluation of the performance of the new algorithms allow the reader to have a clear picture. The algorithms are clearly explained.
The topic of using ML techniques for speed up MC generation is attracting considerable attention in the HEP community and it is extremely timely.
Apart from a few critical points which will need further studies before the method can be "industrialised", results are promising. Once the points highlighted above and the requests below have been addressed this work should be published.
Requested changes
1. [Suggestion] Would it be possible to have a two dimensional {w/s, w} plot? One would like to know, in particular, whether the events for which the log(w/s) is large are also events where the event weights are large and in any case how their are distributed.
2. [Request] I would like to have an assessment on the interplay of the new method with multichannel integration.
3. [Suggestion] For the comparison plots e.g. Fig. 5 and Fig. 7,8 it could also be useful to have a MLboosted sample with much higher statistics to see whether systematic effects appear.
Anonymous Report 1 on 2021115 (Invited Report)
Report
I have joined the report as a PDF file.