SciPost Submission Page
MadNIS  Neural MultiChannel Importance Sampling
by Theo Heimel, Ramon Winterhalder, Anja Butter, Joshua Isaacson, Claudius Krause, Fabio Maltoni, Olivier Mattelaer, Tilman Plehn
This is not the latest submitted version.
This Submission thread is now published as
Submission summary
Authors (as registered SciPost users):  Joshua Isaacson · Claudius Krause · Olivier Mattelaer · Tilman Plehn · Ramon Winterhalder 
Submission information  

Preprint Link:  https://arxiv.org/abs/2212.06172v1 (pdf) 
Date submitted:  20221220 10:53 
Submitted by:  Winterhalder, Ramon 
Submitted to:  SciPost Physics 
Ontological classification  

Academic field:  Physics 
Specialties: 

Approaches:  Computational, Phenomenological 
Abstract
Theory predictions for the LHC require precise numerical phasespace integration and generation of unweighted events. We combine machinelearned multichannel weights with a normalizing flow for importance sampling, to improve classical methods for numerical integration. We develop an efficient bidirectional setup based on an invertible network, combining online and buffered training for potentially expensive integrands. We illustrate our method for the DrellYan process with an additional narrow resonance.
Current status:
Reports on this Submission
Anonymous Report 3 on 202332 (Invited Report)
 Cite as: Anonymous, Report on arXiv:2212.06172v1, delivered 20230302, doi: 10.21468/SciPost.Report.6833
Strengths
Novel multichannel Invertible NeuralNetwork incarnation of adaptive MonteCarlo integration
Weaknesses
chosen examples still are very simplistic phase spaces, and no performance comparisons with existing adaptive implementations provided
Report
The authors discuss a novel multichannel adaptation of geometrically
intricate multidimensional integration volumes with the application
of MonteCarlo integration and event simulation for highenergy
physics in mind. This is a very important field of research, not only
to accelerate research in highenergy physics, but also contributes to
make computationally extensive research potentially more
sustainable. The adaptive weight optimization a la VEGAS is replaced
by the authors by deeplearning methods based on invertible neural
networks. The manuscript serves as a(nother) proofofprinciple
implementation of such a learningbased adaptive phasespace
integration and studies its performance with different starting
setups: flat priors and physicsinspired priors of different levels of
sophistication. It contains academic examples, demonstrating the
method and a DrellYan like hadroncollider example with a still
simplistic, but realistic phase space. As such, I consider this a very
interesting development that has already picked up interest and
applications by several research groups all over the world. There are
no major criticisms from my side, just a few remarks or questions,
some of them simple curiosity:
(1) There are a few cases where definitions are not provided in the
manuscript: while I believe that different activation functions are
accessible in the literature and not be specified as they are anyways
not comprehensible without a certain ML background, the unit hypercube
U_i p. 3 should be defined as such.
(2) The authors introduce multichannel adaptive weight VEGAStype sampling
give a lot of references, but in this context it would be fair to cite
https://inspirehep.net/literature/472150 and maybe also its parallel
incarnation https://inspirehep.net/literature/1704950.
(3) For the trainable multichannel weights in Eq. (19) the choice on
the righthand side specified to be an arbitrary real number to me
seems to be also in the interval [0,1], unless the channel weights
would not be positively semidefinite. Could the authors specify which
way they meant this to be?
(4) In Eq. (23), there seems to be a typo, the derivative in the
righthand expression should be capital G, not g.
(5) For the varianceweighted training on p. 9 the authors describe an
algorithm to prevent channels be empty during training: besides the
fact that this would mean that the channel is badly learned, is there
any other reason for that?
(6) For the soft permutations by means of rotations, the authors do
mention that they are not so beneficial in the case of phasespace
features aligned with coordinate axes, or after having been aligned by
a certain mapping. On the other hand, for nonaligned features, would
they not be distorted or wound up between phasespace boundaries by
these rotations, or are the rotations only probed in learning close to
the identity?
(7) Do the authors foresee a classification of channels for
phasespace channels that behave very similar, e.g. in schannel top
pair production via QCD and QED, while the color structure is
different, the phasespace structure is completely identical (mapping
of Eq. (58))?
(8) The phasespace cut studied in Fig. 7 and also the invariantmass
cut for the DY process affects the different channels in a not too
different manner. What do the authors expect for phasespace cut that
act vastly differently on different phasespace channels, e.g. a
rapidity cut on s vs. tchannel like kinematics?
(9) At the end of the section with the implementation details for the
Zprimeaugmeneted DY, the authors compare their integration with the
result from MG5aMC, however they do not provide a timing
comparison. It would be nice to put this into perspective.
(10) In Eq. (81), (84), the second argument of the KullbackLeibler
divergence should be g, not q, or?
Anonymous Report 2 on 202332 (Invited Report)
 Cite as: Anonymous, Report on arXiv:2212.06172v1, delivered 20230302, doi: 10.21468/SciPost.Report.6829
Strengths
Paper contains some new developments to instrument INNs for multi channel integration, in particular when using local channel weights, i.e. in the MG5aMC approach.
Weaknesses
The chosen examples are rather trivial  to the benefit of the presentation  but they do not necessarily reflect the potential or limitations of the proposed developments.
Report
The paper contributes some new ideas to adopt normalizing flows, i.e. INN, to the task of multi channel integration. This is in particular relevant for multi parton matrix element generators, that form an integral part of modern event generators.
The paper is carefully written and contains new ideas and material. While buffered training and the proposed general rotations between coupling layers are generic, the local channel weights are rather specific to the MG5aMC single diagram enhanced integrator.
The novelties are well motivated and introduced and there is sufficient validation to prove correctness. However, the chosen examples are rather trivial and do not fully explore the potential of the techniques. Two examples:
First, the limitation of the single diagram enhanced sampler lies in the assumption that interference effects are rather small. It would have been interesting to more clearly show if this limitation can be ameliorated by constructing an example with a large pos/neg interference. The Z' sample does not fully serve this purpose.
Second, the chosen LHC setup is way too trivial to explore the potential of proposed trainable rotations of the integration variables. This leaves that part essentially untested and one can only speculate about its impact on the performance for other examples, where (typically!) certain variables might be well aligned with the actual target function.
While a realistic and challenging LHC example corresponding to a multi particle production process might be beyond the scope of the authors idea for the presented paper, an illustrative toy example in Sec. 2 that features non vanishing interference effects should be easy to add.
Requested changes
Please add in Sec. 2 an example that features non vanishing interference that poses a challenge to the standard multi channel integrator based on single diagram enhancement and discuss the performance of your approach.
Strengths
Very clear and detailed presentation of the application of new approaches from machine learning to standard problems in multichannel Monte Carlo integration.
Report
The paper presents the application of new approaches using Machine Learning to a standard problem in multichannel Monte Carlo integration. The presentation is very clear and concise, and the paper should be published in its present form.