SciPost logo

SciPost Submission Page

How to GAN Event Subtraction

by Anja Butter, Tilman Plehn, Ramon Winterhalder

This is not the latest submitted version.

This Submission thread is now published as

Submission summary

Authors (as registered SciPost users): Tilman Plehn · Ramon Winterhalder
Submission information
Preprint Link: https://arxiv.org/abs/1912.08824v2  (pdf)
Date submitted: 2020-01-31 01:00
Submitted by: Winterhalder, Ramon
Submitted to: SciPost Physics
Ontological classification
Academic field: Physics
Specialties:
  • High-Energy Physics - Phenomenology
Approach: Computational

Abstract

Subtracting and adding event samples are common problems in LHC simulations. We show how generative adversarial networks can produce new event samples with a phase space distribution corresponding to added or subtracted input samples. We illustrate some general features using toy samples and then show explicit examples of background and non-local collinear subtraction events in terms of unweighted 4-vector events. This event sample manipulation reflects the excellent interpolation properties of neural networks.

Current status:
Has been resubmitted

Reports on this Submission

Report #3 by Anonymous (Referee 4) on 2020-3-22 (Invited Report)

  • Cite as: Anonymous, Report on arXiv:1912.08824v2, delivered 2020-03-22, doi: 10.21468/SciPost.Report.1590

Strengths

The application of GAN techniques discussed in the manuscript is new, as far as I can tell, and it is quite clearly explained in the manuscript.

Weaknesses

The actual domain of applicability of the method to concrete collider problems is not clear.

Report

The manuscript reports on the idea of using GANs to perform the following operation. Using two sets of events (say, "B" and "S", possibly generated by two Monte Carlo generators), one can train a GAN to produce a generator of events that follow the difference between the distributions of the two original sets (i.e., "B-S"). The method is applied to toy problems, with the aim of outlining potential applications to collider phenomenology.

This application of GAN techniques is new, as far as I can tell, and it is quite clearly explained in the manuscript. The manuscript however leaves a number of open questions concerning the actual domain of applicability of the method to concrete collider problems.

In light of this, I believe that the manuscript is not ready for publication in the current form. If the authors can address the points reported below, and clarify the potential of the method to address concrete problems (and convincingly support its competitive advantages), one could address other less crucial aspects related with the presentation of the GAN algorithm and the comparison with other strategies that might be employed for the same task, which should perhaps be slightly extended.

Requested changes

The following points should be addressed before the manuscript can be considered for publication:

1- It is unclear what happens if "B-S" does not have definite sign. Namely if the event density distribution P_B(x) is larger than P_S(x) in some region of "x", and smaller than P_S(x) in some other region. In this case, neither P_S-P_B nor P_B-P_S are densities, and the problem seems ill-defined. Since this can happen in potential applications (see below), one should ask what the GAN would return if trained on a problem of this type, and if the method would at least allow one to recognise that there is an issue or it would instead produce wrong results.

On page 11 it is mentioned that the sign of "B-S" is not a problem because one can always learn "S-B" instead of "B-S". However this seems to assume that "B-S" has fixed sign on the entire feature (x) space, so this comment is not sufficient to address the question above.

2- The first class of applications mentioned in the manuscript are referred to as "background subtraction". However I could not find a discussion of what this should be concretely useful for. The example worked out in the manuscript (photon background subtracted from Drell-Yan, in section 3.1) does not shed light on this aspect because it is not clear why one might want to perform such subtraction.

Maybe the method is supposed to help for problems such as extracting the new physics contribution from a simulation containing also the standard model, for instance in cases where the new physics effect is small and the approach based on bins becomes computationally unfeasible. If this is the case, it should be clearly stated in the manuscript. However one should also take into account that performing a subtraction would be needed only if simulating the new physics contribution separately is not feasible. This is the case in the presence of quantum-mechanical interference between SM and new physics. However in the presence of interference, "B-S" does not have definite sign in general. So the feasibility and the usefulness of the approach in this domain depends on point "1)".

3- The second class of applications are "subtraction" (see section 3.2). Also in this case, the final goal is not clearly stated in the paper. A short paragraph at the end of page 11 alludes to the fact that this could help MC@NLO event generation. If this is the case, it should be clearly stated and extensively explained. Also, it is found in section 3.2 that the required task of subtracting the collinear contribution cannot be accomplished because the method cannot deal with "B-S" distributions that are very small. Would this prevent the method to work, eventually?

  • validity: ok
  • significance: ok
  • originality: high
  • clarity: ok
  • formatting: -
  • grammar: -

Author:  Ramon Winterhalder  on 2020-05-12  [id 823]

(in reply to Report 3 on 2020-03-22)

1- It is unclear what happens if "B-S" does not have definite sign. Namely if the event density distribution P_B(x) is larger than P_S(x) in some region of "x", and smaller than P_S(x) in some other region. In this case, neither P_S-P_B nor P_B-P_S are densities, and the problem seems ill-defined. Since this can happen in potential applications (see below), one should ask what the GAN would return if trained on a problem of this type, and if the method would at least allow one to recognise that there is an issue or it would instead produce wrong results.

-> We expanded the discussion of signs and the zero function in Sec.2.3. As a matter of fact, our CS-like example already has such a sign problem which we solve with an off-set.

2- The first class of applications mentioned in the manuscript are referred to as "background subtraction". However I could not find a discussion of what this should be concretely useful for. The example worked out in the manuscript (photon background subtracted from Drell-Yan, in section 3.1) does not shed light on this aspect because it is not clear why one might want to perform such subtraction.

Maybe the method is supposed to help for problems such as extracting the new physics contribution from a simulation containing also the standard model, for instance in cases where the new physics effect is small and the approach based on bins becomes computationally unfeasible. If this is the case, it should be clearly stated in the manuscript. However one should also take into account that performing a subtraction would be needed only if simulating the new physics contribution separately is not feasible. This is the case in the presence of quantum-mechanical interference between SM and new physics. However in the presence of interference, "B-S" does not have definite sign in general. So the feasibility and the usefulness of the approach in this domain depends on point "1)".

-> Again, we admit that we only work with a toy model. We now add a brief discussion of an appropriate problem, namely the kinematics of a GANned 4-body-decay signal from signal-plus-background and background samples.

3- The second class of applications are "subtraction" (see section 3.2). Also in this case, the final goal is not clearly stated in the paper. A short paragraph at the end of page 11 alludes to the fact that this could help MC@NLO event generation. If this is the case, it should be clearly stated and extensively explained. Also, it is found in section 3.2 that the required task of subtracting the collinear contribution cannot be accomplished because the method cannot deal with "B-S" distributions that are very small. Would this prevent the method to work, eventually?

-> We added some more discussion, including the subtraction of on-shell events as a combination of the two examples. However, we admit that we are not MC authors with a clear vision where exactly such a tool would enter which MC code. We also improved the numerics in Sec.3.2 to show that given some more optimization and enough training time we do not expect precision to be an immediate show stopper.

-> Altogether, we would like to thank the three referees and everyone who has discussed with us since the first version of the paper came out. We have changed the paper in many places, including abstract, introduction, physics discussions, and outlook. This is why we are confident that the current version is significantly improved over the original draft and hope that SciPost agrees with that judgement.

Report #2 by Ayan Paul (Referee 1) on 2020-3-18 (Invited Report)

  • Cite as: Ayan Paul, Report on arXiv:1912.08824v2, delivered 2020-03-17, doi: 10.21468/SciPost.Report.1583

Report

I have provided a detailed report in the attached file. I do not recommend this paper for publication.

Attachment


  • validity: ok
  • significance: ok
  • originality: good
  • clarity: low
  • formatting: good
  • grammar: good

Author:  Ramon Winterhalder  on 2020-05-12  [id 822]

(in reply to Report 2 by Ayan Paul on 2020-03-18)

-> We fear there is a misunderstanding in our problem statement - our goal is to construct a network that can generate events according to the difference of two probability distributions. The referee's network does an excellent job in constructing the distribution corresponding to the difference of two event samples, but it cannot be extended to generate statistically independent events.

  1. I do not think Ref.[11] uses a generative network. They use a DNN and show that they perform better than Ref.[12] which uses a GAN. The authors can maybe take a deeper look into thesepapers.

-> Thank you for pointing this out, we wanted to cite Ref.[11] alongside with the generative phase-space studies, took it out now.

  1. The authors do not provide the code that they use or any details about it or what framework they used (PyTorch/Sci-Kit Learn/ TensorFlow etc.). The authors also do not provide the data they used for the training. While this is not necessary, it is useful to have it if someonewants to reproduce their results. I would suggest the authors provide all these details (possiblyin a public repository) and also an example code since the work is primarily computational.

-> We added a footnote clarifying that our code and out test data are available upon request. We also added details on our software.

  1. The authors do not make explicit the training times and the hardware used for training the GANs. This is useful to benchmark it against other regression methods.

-> As mentioned above we have doubts that this helps in comparing with regression networks, given that we do not actually do a regression :) In any case, we find that quoting such numbers are not helpful in a field with collaborative spirit, but we have a track record of happily participating in proper comparison studies.

  1. The authors do not describe how they get the error-bars in the left panels of Fig.2, Fig.3, etc. Are they from Eq.(1)?

-> They are, and we clarified this in the text.

Report #1 by Anonymous (Referee 5) on 2020-3-7 (Invited Report)

  • Cite as: Anonymous, Report on arXiv:1912.08824v2, delivered 2020-03-07, doi: 10.21468/SciPost.Report.1559

Report

A machine-learning method is proposed to construct new event samples that follow a distribution obtained by summing or subtracting given input samples. The method is based on the use of generative adversarial networks (GANs).

The GAN architecture takes two (or more) sets of input events following given distributions and trains a generator whose output follows a distribution corresponding to a linear combination of the given inputs. Typical cases correspond to the sum or subtraction of distributions.

Simple one-dimensional toy examples are considered to explain the general GAN architecture and to test its applicability. In these cases it is shown that the GAN approach can correctly reproduce the subtracted distribution, within the 1-sigma error band derived by a binned analysis.

Two examples with actual LHC montecarlo simulations are also considered. The first one is the subtraction of the photon continuum from the p p -> e+ e- process at the LHC. The second one is the subtraction of the collinear radiation part from the p p -> Z g process. In both cases the GAN method seems able to perform the wanted subtraction.

There are a couple of points that are not fully discussed in the paper, but are crucial to understand the usefulness of the proposed GAN method.

1) The first point regards the reconstruction error associated to the GAN approach. In the toy examples it is shown that the GSN approach is able to reproduce the target distribution within the 1-sigma error band obtained from a binned analysis. This result, however, might be strongly influenced by the hyper-parameters, i.e. the neural network structure, the training parameters and the training algorithm. In all the examples the hyper-parameters are carefully adjusted to obtain a good performance.

It is not at all clear and not discussed in the text, how the reconstruction error is influenced by those choices. In a more realistic set-up, in which one can not compute the error from a binned analysis, one would not know a priori the error associated to the GAN reconstruction. This might be an issue if one wants to use these techniques for physics analyses, in which all the sources of systematic errors should be carefully estimated and taken into account.

Is there a way to get such an estimate for the GAN approach?

Notice that, at the end of the Outlook section, there is the sentence “we have shown how to use a GAN to manipulate event samples avoiding binning”. Therefore it seems clear that this method is proposed as an alternative to binning. As such, a proper treatment of errors would be needed.

2) The examples discussed in the paper do not seem to be particularly useful from the LHC point of view (“We are aware of the fact that our toy examples are not more than an illustration of what a subtraction GAN can achieve”, taken from the Outlook section). Although the hope that the method is used for actual LHC analyses is expressed (“we hope that some of the people who do LHC event simulations will find this technique useful”), there is no mention to possible “useful” applications. Do the Authors know any example of “useful” application of the GAN technique?

  • validity: high
  • significance: good
  • originality: high
  • clarity: good
  • formatting: -
  • grammar: -

Author:  Ramon Winterhalder  on 2020-05-12  [id 821]

(in reply to Report 1 on 2020-03-07)

1) It is not at all clear and not discussed in the text, how the reconstruction error is influenced by those choices. In a more realistic set-up, in which one can not compute the error from a binned analysis, one would not know a priori the error associated to the GAN reconstruction. This might be an issue if one wants to use these techniques for physics analyses, in which all the sources of systematic errors should be carefully estimated and taken into account. Is there a way to get such an estimate for the GAN approach?

-> What should we say - we are not aware of a serious study of uncertainties in generative networks, but we are working on it. As a matter of fact, we are starting a more serious collaboration on this crucial LHC question with local ML experts...

Notice that, at the end of the Outlook section, there is the sentence “we have shown how to use a GAN to manipulate event samples avoiding binning”. Therefore it seems clear that this method is proposed as an alternative to binning. As such, a proper treatment of errors would be needed.

-> We completely agree, this paper is really meant as another motivation for the major effort of studying errors in GAN output. We added a comment along this line to Sec.2.

2) The examples discussed in the paper do not seem to be particularly useful from the LHC point of view (“We are aware of the fact that our toy examples are not more than an illustration of what a subtraction GAN can achieve”, taken from the Outlook section). Although the hope that the method is used for actual LHC analyses is expressed (“we hope that some of the people who do LHC event simulations will find this technique useful”), there is no mention to possible “useful” applications. Do the Authors know any example of “useful” application of the GAN technique?

-> We changed the introduction, the respective sections, and the outlook accordingly. Now there should be a clearer picture of where such an event subtraction might come in handy.

Login to report or comment