MadNIS -- Neural Multi-Channel Importance Sampling

Theo Heimel; Ramon Winterhalder; Anja Butter; Joshua Isaacson; Claudius Krause; Fabio Maltoni; Olivier Mattelaer; Tilman Plehn

SciPost Submission Page

MadNIS -- Neural Multi-Channel Importance Sampling

by Theo Heimel, Ramon Winterhalder, Anja Butter, Joshua Isaacson, Claudius Krause, Fabio Maltoni, Olivier Mattelaer, Tilman Plehn

This is not the latest submitted version.

This Submission thread is now published as

SciPost Phys. 15, 141 (2023)

Submission summary

Authors (as registered SciPost users):

Joshua Isaacson · Claudius Krause · Olivier Mattelaer · Tilman Plehn · Ramon Winterhalder

Submission information
Preprint Link:	https://arxiv.org/abs/2212.06172v1 (pdf)
Date submitted:	2022-12-20 10:53
Submitted by:	Winterhalder, Ramon
Submitted to:	SciPost Physics

Ontological classification
Academic field:	Physics
Specialties:	High-Energy Physics - Phenomenology
Approaches:	Computational, Phenomenological

Abstract

Theory predictions for the LHC require precise numerical phase-space integration and generation of unweighted events. We combine machine-learned multi-channel weights with a normalizing flow for importance sampling, to improve classical methods for numerical integration. We develop an efficient bi-directional setup based on an invertible network, combining online and buffered training for potentially expensive integrands. We illustrate our method for the Drell-Yan process with an additional narrow resonance.

Current status:

Has been resubmitted

Reports on this Submission

Anonymous Report 3 on 2023-3-2 (Invited Report)

Cite as: Anonymous, Report on arXiv:2212.06172v1, delivered 2023-03-02, doi: 10.21468/SciPost.Report.6833

Strengths

Novel multi-channel Invertible Neural-Network incarnation of adaptive Monte-Carlo integration

Weaknesses

chosen examples still are very simplistic phase spaces, and no performance comparisons with existing adaptive implementations provided

Report

The authors discuss a novel multi-channel adaptation of geometrically
intricate multi-dimensional integration volumes with the application
of Monte-Carlo integration and event simulation for high-energy
physics in mind. This is a very important field of research, not only
to accelerate research in high-energy physics, but also contributes to
make computationally extensive research potentially more
sustainable. The adaptive weight optimization a la VEGAS is replaced
by the authors by deep-learning methods based on invertible neural
networks. The manuscript serves as a(nother) proof-of-principle
implementation of such a learning-based adaptive phase-space
integration and studies its performance with different starting
setups: flat priors and physics-inspired priors of different levels of
sophistication. It contains academic examples, demonstrating the
method and a Drell-Yan like hadron-collider example with a still
simplistic, but realistic phase space. As such, I consider this a very
interesting development that has already picked up interest and
applications by several research groups all over the world. There are
no major criticisms from my side, just a few remarks or questions,
some of them simple curiosity:
(1) There are a few cases where definitions are not provided in the
manuscript: while I believe that different activation functions are
accessible in the literature and not be specified as they are anyways
not comprehensible without a certain ML background, the unit hypercube
U_i p. 3 should be defined as such.
(2) The authors introduce multi-channel adaptive weight VEGAS-type sampling
give a lot of references, but in this context it would be fair to cite
https://inspirehep.net/literature/472150 and maybe also its parallel
incarnation https://inspirehep.net/literature/1704950.
(3) For the trainable multi-channel weights in Eq. (19) the choice on
the right-hand side specified to be an arbitrary real number to me
seems to be also in the interval [0,1], unless the channel weights
would not be positively semidefinite. Could the authors specify which
way they meant this to be?
(4) In Eq. (23), there seems to be a typo, the derivative in the
right-hand expression should be capital G, not g.
(5) For the variance-weighted training on p. 9 the authors describe an
algorithm to prevent channels be empty during training: besides the
fact that this would mean that the channel is badly learned, is there
any other reason for that?
(6) For the soft permutations by means of rotations, the authors do
mention that they are not so beneficial in the case of phase-space
features aligned with coordinate axes, or after having been aligned by
a certain mapping. On the other hand, for non-aligned features, would
they not be distorted or wound up between phase-space boundaries by
these rotations, or are the rotations only probed in learning close to
the identity?
(7) Do the authors foresee a classification of channels for
phase-space channels that behave very similar, e.g. in s-channel top
pair production via QCD and QED, while the color structure is
different, the phase-space structure is completely identical (mapping
of Eq. (58))?
(8) The phase-space cut studied in Fig. 7 and also the invariant-mass
cut for the DY process affects the different channels in a not too
different manner. What do the authors expect for phase-space cut that
act vastly differently on different phase-space channels, e.g. a
rapidity cut on s- vs. t-channel like kinematics?
(9) At the end of the section with the implementation details for the
Zprime-augmeneted DY, the authors compare their integration with the
result from MG5aMC, however they do not provide a timing
comparison. It would be nice to put this into perspective.
(10) In Eq. (81), (84), the second argument of the Kullback-Leibler
divergence should be g, not q, or?

validity: high
significance: high
originality: good
clarity: high
formatting: excellent
grammar: excellent

Anonymous Report 2 on 2023-3-2 (Invited Report)

Cite as: Anonymous, Report on arXiv:2212.06172v1, delivered 2023-03-02, doi: 10.21468/SciPost.Report.6829

Strengths

Paper contains some new developments to instrument INNs for multi channel integration, in particular when using local channel weights, i.e. in the MG5aMC approach.

Weaknesses

The chosen examples are rather trivial - to the benefit of the presentation - but they do not necessarily reflect the potential or limitations of the proposed developments.

Report

The paper contributes some new ideas to adopt normalizing flows, i.e. INN, to the task of multi channel integration. This is in particular relevant for multi parton matrix element generators, that form an integral part of modern event generators.

The paper is carefully written and contains new ideas and material. While buffered training and the proposed general rotations between coupling layers are generic, the local channel weights are rather specific to the MG5aMC single diagram enhanced integrator.

The novelties are well motivated and introduced and there is sufficient validation to prove correctness. However, the chosen examples are rather trivial and do not fully explore the potential of the techniques. Two examples:

First, the limitation of the single diagram enhanced sampler lies in the assumption that interference effects are rather small. It would have been interesting to more clearly show if this limitation can be ameliorated by constructing an example with a large pos/neg interference. The Z' sample does not fully serve this purpose.

Second, the chosen LHC setup is way too trivial to explore the potential of proposed trainable rotations of the integration variables. This leaves that part essentially untested and one can only speculate about its impact on the performance for other examples, where (typically!) certain variables might be well aligned with the actual target function.

While a realistic and challenging LHC example corresponding to a multi particle production process might be beyond the scope of the authors idea for the presented paper, an illustrative toy example in Sec. 2 that features non vanishing interference effects should be easy to add.

Requested changes

Please add in Sec. 2 an example that features non vanishing interference that poses a challenge to the standard multi channel integrator based on single diagram enhancement and discuss the performance of your approach.

validity: high
significance: high
originality: high
clarity: high
formatting: excellent
grammar: excellent

Anonymous Report 1 on 2023-3-1 (Invited Report)

Strengths

Very clear and detailed presentation of the application of new approaches from machine learning to standard problems in multi-channel Monte Carlo integration.

Report

The paper presents the application of new approaches using Machine Learning to a standard problem in multi-channel Monte Carlo integration. The presentation is very clear and concise, and the paper should be published in its present form.

validity: high
significance: good
originality: high
clarity: top
formatting: perfect
grammar: perfect

SciPost Submission Page

MadNIS -- Neural Multi-Channel Importance Sampling

by Theo Heimel, Ramon Winterhalder, Anja Butter, Joshua Isaacson, Claudius Krause, Fabio Maltoni, Olivier Mattelaer, Tilman Plehn

This is not the latest submitted version.

This Submission thread is now published as

Submission summary

Abstract

Current status:

Reports on this Submission

Anonymous Report 3 on 2023-3-2 (Invited Report)

Strengths

Weaknesses

Report

Anonymous Report 2 on 2023-3-2 (Invited Report)

Strengths

Weaknesses

Report

Requested changes

Anonymous Report 1 on 2023-3-1 (Invited Report)

Strengths

Report

Login to report or comment