SciPost Submission Page

Neural Network-Based Approach to Phase Space Integration

by Matthew D. Klimek, Maxim Perelstein

Submission summary

As Contributors: Matthew Klimek
Arxiv Link: https://arxiv.org/abs/1810.11509v2 (pdf)
Date submitted: 2020-04-27
Submitted by: Klimek, Matthew
Submitted to: SciPost Physics
Discipline: Physics
Subject area: High-Energy Physics - Phenomenology

Abstract

Monte Carlo methods are widely used in particle physics to integrate and sample probability distributions on phase space. We present an Artificial Neural Network (ANN) algorithm optimized for this task, and apply it to several examples of relevance for particle physics, including situations with non-trivial features such as sharp resonances and soft/collinear enhancements. Excellent performance has been demonstrated, with the trained ANN achieving unweighting efficiencies between 30\% -- 75\%. In contrast to traditional algorithms, the ANN-based approach does not require that the phase space coordinates be aligned with resonant or other features in the cross section.

Current status:
Editor-in-charge assigned



Reports on this Submission

Anonymous Report 2 on 2020-6-4 Invited Report

Strengths

This paper gives a new technique to perform phase space integration using artificial neural networks.

1) This is an indeed an interesting subject, and more efficiently sampling phase space in monte carlo event simulations to obtain unweighted events would be useful to the community. These types of tools are used by a substantial number of particle phenomenoligists, and their improvement would make an impact.

2) Artificial neural networks are a relatively new area of study in particle physics, and applying them to monte carlo simulations is promising. This paper is new work, showing potentially interesting application of these new methods. Artificial neural networks may improve existing new tools, and contributions such as this study is worthwhile.

3) The authors show that artificial neural networks can efficiently generate unweighted events for integrands with resonance or singularities, even without variable transformations to different coordinate systems to more efficiently sample sharp peaks.

4) The method presented here is a continuous mapping of phase space, unlike the usual discrete grid of phase space used by VEGAS. These types of methods could promise to create a much more efficient sampling of phase space unlike the usual discrete sampling.

5) The authors explore many different 3 body final states that present different issues with event generation: resonance and collinear/soft signularies.

Weaknesses

The method of this paper appears solid. My major concerns are the comparisons between the artificial neural networks and more traditional methods, and the presentation of results.

1) The authors consistently compare their unweighting efficiency to MadGraph's (MG's) unweighting efficiency. It is not clear this is an apples to apples comparison.

1a) First, it's not clear if the unweighting efficiencies of the artificial neural network (ANN) and MG are before or after training. For a fair comparison, they should either be both before or after training the integration grid of ANN.

1b) Second, the authors specifically design their ANN for a 3-body final state while MG is for generic N-body final states. As the authors do in their paper, a 3-body phase space can be written as a 2-dimensional integration. Hence, all sampled points are physically allowed, which can increase efficiency. MG is designed for a general final state and not specifically a 3-body final state, which may add additional inefficiencies beyond the unweighting procedure.

2) The presentation of results could be be better. There are several plots showing results. However, there is only one comparing the ANN output with the true distribution, and no plot comparing the ANN with more traditional methods.

Report

The authors present a promising new method to perform monte carlo simulations in particle physics by using artificial neural networks (ANNs). The proposal is to ues an ANN to efficiently sample phase space and generate unweighted events. In traditional monte carlo methods, it can be difficult to efficiently sample regions where an integrand is sharply peaked. This happens many times in particle physics, for example when there are narrow resonances or collinear and soft singularities. A major hurdle is that monte carlo integrators such as VEGAS use a grid to sample the phase space. As the authors point, an advantage of an ANN is that it doesn't use a grid but samples phase space continuously.

This paper is well written and timely. The authors apply their method to many different benchmark scenarios in particle physics involving: both narrow resonances and collinear/soft singularities. The analysis, the ANN, and the motivations are well explained. However, the comparison with more traditional VEGAS monte carlo methods is lacking. This is detailed under the "Weaknesses" portion of this report. That being said, it is straightforward to have an apples to apples comparison between the ANN and VEGAS methods. Under "Requested changes" there are several suggestions to help with these corrections, as well as suggestions of plots to illustrate the comparisons. Once these changes are implemented, the paper will be ready to publish.

Requested changes

As mentioned above, it does not seem the authors are performing an apples to apples comparison between their proposed methods and traditional VEGAS methods. Here are several requests for clarification, new plots, and different comparisons.

1) On the bottom of pg. 8 the authors say "We therefore expect that, with an ANN trained for one parameter point, it would be relatively fast to re-train that ANN for a nearby parameter point compared to starting with a randomly initialized ANN, ..." However, this is also true of traditional VEGAS methods, where you can save the grid as a "statefile." These statefiles can be also be used for different parameter points as along as the phase space is the same. The authors should make clear this is not just a feature of the ANN.

2) The authors should make it clear if the quoted ANN unweighting efficiency is before or after the training of the neural network.

3) In light of points (1,2), the comparison I think should be made is between the unweighting efficiencies of the ANN and VEGAS after the training of both. Three body phase spaces are simple enough that it is straightforward to use VEGAS for the phase space integration. Indeed, the authors already list which variables they are using and how to map them onto a unit square, and this can be implemented into VEGAS as well as their ANN. By training VEGAS, I mean precisely point (1). An initialization run can be used to create a VEGAS grid. Then calculating the unweighting effeciency with a VEGAS run with the initialized grid as the input. This would be an apples to apples comparison of two dedicated 3-body monte carlos, and would be a more fair comparison that using MadGraph.

4) There should be plots comparing the distributions obtained between traditional VEGAS methods and the ANN. For example, Fig. 5 could show both the VEGAS and ANN histograms. This would show explicitly how well the two methods sample the phase space. This comparison should be made for all plots in Fig.s 5,6,7.

5) In Fig. 6, as the authors say, it is clear that the ANN is finding both the aligned and unaligned resonances. It would be very helpful to explicitly show the 1-D unaligned invariant mass distribution for both the ANN and VEGAS. This would explicitly show how well the ANN and VEGAS are reconstructing that invariant mass.

6) This one is purely optional. For Fig. 7, the authors point out that enforcing a cut by setting the integrand to zero can make the training of the ANN problematic. Hence, they introduce the function in Eq. (14) to make the cuts continuous. It would be nice to see the comparison between the current Fig. 7 and strictly setting an integrand to zero to explicitly illustrate this point.

  • validity: high
  • significance: good
  • originality: high
  • clarity: high
  • formatting: excellent
  • grammar: excellent

Anonymous Report 1 on 2020-6-1 Invited Report

Strengths

1) Novel approach
2) Improved efficiency
3) Opens up a new research direction in the field

Weaknesses

1) Implementation of ANN architecture

Report

The paper presents a novel approach for sampling and integrating probability distributions in phase space, suitably parametrized as a unit hypercube. The basic idea is to sample the input space uniformly, and recover the required target distribution $f(y)$ by building a map $y_{\mathbf w}(x)$ with the help of an ANN. This approach is complementary to traditional algorithms like VEGAS and, as the authors have argued and demonstrated with several examples, leads to a higher efficiency. Two additional advantages are: first, the map is smooth as opposed to piecewise linear, which offers a better approximation to the target distribution; and second, the need to use special coordinates aligned with interesting features in the distribution is avoided. In my opinion the paper meets the criteria for novelty and originality, and the method could be usefully applied in support of experimental analyses (after all, it is the experimentalists who are the end users of such large MC production jobs). In addition, the paper has already had an impact by inspiring several follow-up studies (including a paper published in SciPost), which further expanded and refined the method. Therefore I think the paper can be published.

Requested changes

I have only a few comments and recommendations which the authors should consider before the final version.

1. Ideally, it would be great if the authors can implement the method using a reversible neural network architecture - this is needed in order to properly compute the Jacobian (1), as discussed in subsequent works 2001.05478 and 2001.05486. At the same time, considering that the purpose of the paper is only to demonstrate the viability of this approach (which the authors have) and not to provide a deployment-ready final product, such an exercise is not crucial to the utility of the paper to the community. A disclaimer (in the introduction and conclusions) that reversible networks should be used (or a rough argument why the used architecture is unlikely to map two inputs to the same value) will be sufficient to warrant publication. Along those lines, the authors should perhaps also add a comment on the surjectivity issue brought up by 2001.05478 (Sec. 2.3).

2. The comparison in Fig. 5 is only qualitative - the two distributions are not supposed to match exactly, although the text might leave this impression: ``the raw events match the resonance feature very well''. To avoid confusion, perhaps add the words "before unweighting" to the figure caption as well, and expand the discussion in the text to clarify the exact purpose of the comparison shown in the plot. As a validation check, is it possible to add a second panel showing distributions after unweighting, e.g., comparing VEGAS, ANN and the true distribution?

3. The authors have focused on the efficiency as their performance benchmark. Yet other benchmarks are also worth exploring - for example, the smoothness of the approximation was mentioned a few times as an advantage over VEGAS, so it would be nice to see an example quantifying the improvement (if any) in terms of the precision (on the calculated integral or on the shape of the generated distribution) which can be achieved with the proposed method over VEGAS or similar piecewise linear approximations.

4. There is a typo on page 5 line 7 from the top: "and indicateD with a subscript". Also in several places in the text there is a reference to "the A". This could be the journal style, but perhaps it can be clarified that what is meant is "Appendix A" and not just "The A".

  • validity: high
  • significance: high
  • originality: top
  • clarity: high
  • formatting: excellent
  • grammar: perfect

Login to report or comment