SciPost Submission Page
Exploring phase space with Neural Importance Sampling
by Enrico Bothmann, Timo Janßen, Max Knobbe, Tobias Schmale, Steffen Schumann
This is not the latest submitted version.
This Submission thread is now published as
Submission summary
Authors (as registered SciPost users):  Enrico Bothmann · Timo Janßen · Max Knobbe · Steffen Schumann 
Submission information  

Preprint Link:  https://arxiv.org/abs/2001.05478v2 (pdf) 
Date submitted:  20200130 01:00 
Submitted by:  Schumann, Steffen 
Submitted to:  SciPost Physics 
Ontological classification  

Academic field:  Physics 
Specialties: 

Approaches:  Theoretical, Computational, Phenomenological 
Abstract
We present a novel approach for the integration of scattering cross sections and the generation of partonic event samples in highenergy physics. We propose an importance sampling technique capable of overcoming typical deficiencies of existing approaches by incorporating neural networks. The method guarantees full phase space coverage and the exact reproduction of the desired target distribution, in our case given by the squared transition matrix element. We study the performance of the algorithm for a few representative examples, including topquark pair production and gluon scattering into three and fourgluon final states.
Current status:
Reports on this Submission
Report 2 by Tilman Plehn on 202032 (Invited Report)
 Cite as: Tilman Plehn, Report on arXiv:2001.05478v2, delivered 20200302, doi: 10.21468/SciPost.Report.1548
Strengths
The paper is one of the first application of machine learning to MC generation and one of the first applications of normalizing flow networks. It is very well done, technically state of the art, and indicates serious potential.
Weaknesses
Just little stuff, plus why does the bloody 4jet case not really work???
Report
I really like the paper, it should be published with minor changes. Requested changes are largely suggestions concerning the presentation.
Requested changes
1 conceptually, I am not sure I understand the two figures of merit and how they are linked. Any chance you could explain more about this on p.3, where they are introduced?
2 I am sorry, but I find the beginning of Sec.2.3 hard to read for NNnonexperts. Any chance you could illustrate it better. Do you have an image illustrating your network structure?
3 I am not superhappy with the coverage arguments, which are partly linked to mapping an infinite weight space on a finite phase space. What exactly is not possible there? We did look into this problem in our Bayesian classification paper, and it is painful with the sigmoid function, but I do not understand the fundamental issues you are implying;
4 looking at Fig.1 (lower) I would argue that all you see is a lack of training data in the tails, as we know causes problems for GANs universally, for instance in event generation or in unfolding. Why do you consider this a fundamental problem?
5 concerning top decays  why is VEGAS not good? Or is flat just unusually good, because you mapped out the BW?
6 please explain Fig.2 more carefully, like what does it show, and what do the peaks imply? By the way, why do not use a linear weight scale for Fig.2? I seems not obvious given the scaleless latent space;
7 at the latest towards the bottom of p.11 I am wondering if the definition of the loss function is related to the two figures of merit, please say something about this;
8 in Sec. 4.2 the elephant sitting in the room is the lack of improvement in the 4jet case. Please say more about this, including maybe what the reason might be and what you tried to improve it. Do you really think it can be improved with better GPUs? It looks like a problem of multichannel naively combined with NNs, also after reading HoChi's paper;
9 On with some comments concerning the presentation: I have to admit that I do not like switching back between integral representations like Eq.(1) and sum representations like Eq.(2). Why do you do that? In any case, if you do this please make sure everything is well defined...
10 please explain the phase space coverage for nonexperts, because it is the main constraint on applying NNs to phase space generation, c.f. p.3, point (i);
11 at the beginning of Sec.2.1 it might be useful to mention that all of this is just a variable transform where we only really case about the Jacobian?
12 please explain Eq.(11) more carefully, and how it relates to Eq.(7);
13 what does the \sim before Eq.(12) mean?
14 the symbols in Eq.(23) are not clearly defined there, I think;
15 why the funny E_N notation in Tab.1?
16 moving Eq.(24) to somewhere in the introduction might make the introduction a little less dry/formal;
17 I am sorry, but the argument in the second paragraph on p.13 (The event weight distribution) is not clear to me. Can you have a look at this paragraph, please?
18 for applications of flow networks in physics it would be nice to cite the paper(s) by Ulli Koethe. They are not particle physics, but they are the first physics applications I know of.
Anonymous Report 1 on 2020226 (Invited Report)
 Cite as: Anonymous, Report on arXiv:2001.05478v2, delivered 20200226, doi: 10.21468/SciPost.Report.1537
Strengths
1  The paper provides a thorough review of existing methods for adaptive MonteCarlo integration. It outlines clearly how known techniques based on Neural Networks can fail to fill the full phase space and therefore yield a biased integration result.
2  The authors construct a novel adaptive integration algorithm based on Neural Networks and Normalizing Flows. They apply this algorithm to various test cases, including a threebody decay, topquark pair production and decay at a lepton collider, and partonic three and fourjet production.
3  They present a comprehensive comparison between one of the best existing adaptive integrators (Vegas) and their new technique for weight distributions, event generation efficiencies and integration uncertainties on physical observables of potential experimental interest.
Report
The paper provides a thorough review of existing methods for adaptive MonteCarlo integration. It outlines clearly how some techniques based on Neural Networks can fail to fill the full phase space and therefore yield a biased integration result.
The authors then proceed to construct a novel adaptive integration algorithm based on Neural Networks and Normalizing Flows. They apply this algorithm to various test cases, including a threebody decay, topquark pair production and decay at a lepton collider, and partonic three and fourjet production.
They present a comprehensive comparison between one of the best existing adaptive integrators (Vegas) and their new technique for weight distributions, event generation efficiencies and integration uncertainties on physical observables of potential experimental interest. They conclude that the efficiency of their novel integrator is superior to Vegas at low finalstate multiplicity, but drops at higher multiplicity due to a lower compute efficiency.
I highly recommend this preprint for publication. I would suggest a minor modification, which is to include a reference to Nucl.Phys. B9 (1969) 568576 in the last paragraph of Sec.2, where channel construction is discussed.
Author: Steffen Schumann on 20200227 [id 746]
(in reply to Report 1 on 20200226)We would like to thank the reviewer for the comments. We will include a reference to the suggested article in the revised version of our manuscript.
Author: Steffen Schumann on 20200327 [id 779]
(in reply to Report 2 by Tilman Plehn on 20200302)Dear Tilman,
thank you for the detailed comments and suggestions. We have tried to answer all your queries and give our detailed replies below. An updated version (v3) has been submitted to the arXiv as well as SciPost.
We hope the the revised version can be considered for publication in SciPost.
All the best,
Steffen (for the authors)
R1  We have somewhat extended the discussion of the unweighting efficiency and the variance and mention their complementarity for optimising a sampler.
R2  We added a more detailed introduction to Sec. 2.3, this was indeed too brief before.
R3/4/10  We have extended the discussion and added additional data to Fig. 1a, illustrating the fact
that even a significant increase in training data does not yield a satisfactory phase space coverage.
In consequence the nonsurjective NN sampler does not reproduce the desired target distribution.
R5/6  We extended the discussion on the event weight distribution, i.e. Fig. 2. In the discussion of the
topquark decay example we explicitly refer to the complementarity of improving the variance vs. the unweighting
efficiency.
R7/14  We have clarified the relation between the loss function and the figurues of merit, adopting the
notation to enhance clarity. To this end, we have reexpressed the loss function in terms of the variables used
throughout, providing further explanation and a reference.
R8  Indeed, no definite answer on why the performance for the highermultiplicity processes drops can be given yet.
We more clearly stress this in the corresponding paragraph as well as the conclusions now, we further removed the
misleading statement that the number of training epochs was limited by our computational resources.
R9  We have carefully checked our notation and would like to stick with both the integral and the sum representation,
with the latter providing the numerical estimate for the former. To somewhat clarify matters we have moved the
reference to the integrand mean to Eq. (2).
R11  We have adjusted the first sentence of the second paragraph in Sec. 2.1
R12  We reworded the description of Eq. (11) and added a reference to Eq. (7)
R13  This was meant to represent that variable x is distributed according to $p_X(x)$, we now explicitly state this in
words.
R15  Given that our integrals have different physical meaning, i.e. decay widths or various cross sections, we hope that
using integral estimate $E_N$ helps to relate the physics applications to the discussion of the method in Secs. 2/3.
For clarification we have adjusted the captions in the various results tables.
R16  We have added a reference to the physics examples to be discussed to the introduction.
R17  We revised the corresponding paragraph.
R18  Thanks for the suggestion for additional references. The work of Koethe is certainly interesting and related in
subject. However, given the rather loose connection to physics applications therein, we would in fairness have to include
a long list of other references to flow network applications as well, which we feel is beyond the scope of the paper.