SciPost Submission Page
How to GAN Event Subtraction
by Anja Butter, Tilman Plehn, Ramon Winterhalder
This is not the latest submitted version.
This Submission thread is now published as
Submission summary
Authors (as registered SciPost users):  Tilman Plehn · Ramon Winterhalder 
Submission information  

Preprint Link:  https://arxiv.org/abs/1912.08824v2 (pdf) 
Date submitted:  20200131 01:00 
Submitted by:  Winterhalder, Ramon 
Submitted to:  SciPost Physics 
Ontological classification  

Academic field:  Physics 
Specialties: 

Approach:  Computational 
Abstract
Subtracting and adding event samples are common problems in LHC simulations. We show how generative adversarial networks can produce new event samples with a phase space distribution corresponding to added or subtracted input samples. We illustrate some general features using toy samples and then show explicit examples of background and nonlocal collinear subtraction events in terms of unweighted 4vector events. This event sample manipulation reflects the excellent interpolation properties of neural networks.
Current status:
Reports on this Submission
Anonymous Report 3 on 2020322 (Invited Report)
 Cite as: Anonymous, Report on arXiv:1912.08824v2, delivered 20200322, doi: 10.21468/SciPost.Report.1590
Strengths
The application of GAN techniques discussed in the manuscript is new, as far as I can tell, and it is quite clearly explained in the manuscript.
Weaknesses
The actual domain of applicability of the method to concrete collider problems is not clear.
Report
The manuscript reports on the idea of using GANs to perform the following operation. Using two sets of events (say, "B" and "S", possibly generated by two Monte Carlo generators), one can train a GAN to produce a generator of events that follow the difference between the distributions of the two original sets (i.e., "BS"). The method is applied to toy problems, with the aim of outlining potential applications to collider phenomenology.
This application of GAN techniques is new, as far as I can tell, and it is quite clearly explained in the manuscript. The manuscript however leaves a number of open questions concerning the actual domain of applicability of the method to concrete collider problems.
In light of this, I believe that the manuscript is not ready for publication in the current form. If the authors can address the points reported below, and clarify the potential of the method to address concrete problems (and convincingly support its competitive advantages), one could address other less crucial aspects related with the presentation of the GAN algorithm and the comparison with other strategies that might be employed for the same task, which should perhaps be slightly extended.
Requested changes
The following points should be addressed before the manuscript can be considered for publication:
1 It is unclear what happens if "BS" does not have definite sign. Namely if the event density distribution P_B(x) is larger than P_S(x) in some region of "x", and smaller than P_S(x) in some other region. In this case, neither P_SP_B nor P_BP_S are densities, and the problem seems illdefined. Since this can happen in potential applications (see below), one should ask what the GAN would return if trained on a problem of this type, and if the method would at least allow one to recognise that there is an issue or it would instead produce wrong results.
On page 11 it is mentioned that the sign of "BS" is not a problem because one can always learn "SB" instead of "BS". However this seems to assume that "BS" has fixed sign on the entire feature (x) space, so this comment is not sufficient to address the question above.
2 The first class of applications mentioned in the manuscript are referred to as "background subtraction". However I could not find a discussion of what this should be concretely useful for. The example worked out in the manuscript (photon background subtracted from DrellYan, in section 3.1) does not shed light on this aspect because it is not clear why one might want to perform such subtraction.
Maybe the method is supposed to help for problems such as extracting the new physics contribution from a simulation containing also the standard model, for instance in cases where the new physics effect is small and the approach based on bins becomes computationally unfeasible. If this is the case, it should be clearly stated in the manuscript. However one should also take into account that performing a subtraction would be needed only if simulating the new physics contribution separately is not feasible. This is the case in the presence of quantummechanical interference between SM and new physics. However in the presence of interference, "BS" does not have definite sign in general. So the feasibility and the usefulness of the approach in this domain depends on point "1)".
3 The second class of applications are "subtraction" (see section 3.2). Also in this case, the final goal is not clearly stated in the paper. A short paragraph at the end of page 11 alludes to the fact that this could help MC@NLO event generation. If this is the case, it should be clearly stated and extensively explained. Also, it is found in section 3.2 that the required task of subtracting the collinear contribution cannot be accomplished because the method cannot deal with "BS" distributions that are very small. Would this prevent the method to work, eventually?
Report 2 by Ayan Paul on 2020318 (Invited Report)
 Cite as: Ayan Paul, Report on arXiv:1912.08824v2, delivered 20200317, doi: 10.21468/SciPost.Report.1583
Report
I have provided a detailed report in the attached file. I do not recommend this paper for publication.
Author: Ramon Winterhalder on 20200512 [id 822]
(in reply to Report 2 by Ayan Paul on 20200318)
> We fear there is a misunderstanding in our problem statement  our goal is to construct a network that can generate events according to the difference of two probability distributions. The referee's network does an excellent job in constructing the distribution corresponding to the difference of two event samples, but it cannot be extended to generate statistically independent events.
 I do not think Ref.[11] uses a generative network. They use a DNN and show that they perform better than Ref.[12] which uses a GAN. The authors can maybe take a deeper look into thesepapers.
> Thank you for pointing this out, we wanted to cite Ref.[11] alongside with the generative phasespace studies, took it out now.
 The authors do not provide the code that they use or any details about it or what framework they used (PyTorch/SciKit Learn/ TensorFlow etc.). The authors also do not provide the data they used for the training. While this is not necessary, it is useful to have it if someonewants to reproduce their results. I would suggest the authors provide all these details (possiblyin a public repository) and also an example code since the work is primarily computational.
> We added a footnote clarifying that our code and out test data are available upon request. We also added details on our software.
 The authors do not make explicit the training times and the hardware used for training the GANs. This is useful to benchmark it against other regression methods.
> As mentioned above we have doubts that this helps in comparing with regression networks, given that we do not actually do a regression :) In any case, we find that quoting such numbers are not helpful in a field with collaborative spirit, but we have a track record of happily participating in proper comparison studies.
 The authors do not describe how they get the errorbars in the left panels of Fig.2, Fig.3, etc. Are they from Eq.(1)?
> They are, and we clarified this in the text.
Anonymous Report 1 on 202037 (Invited Report)
 Cite as: Anonymous, Report on arXiv:1912.08824v2, delivered 20200307, doi: 10.21468/SciPost.Report.1559
Report
A machinelearning method is proposed to construct new event samples that follow a distribution obtained by summing or subtracting given input samples. The method is based on the use of generative adversarial networks (GANs).
The GAN architecture takes two (or more) sets of input events following given distributions and trains a generator whose output follows a distribution corresponding to a linear combination of the given inputs. Typical cases correspond to the sum or subtraction of distributions.
Simple onedimensional toy examples are considered to explain the general GAN architecture and to test its applicability. In these cases it is shown that the GAN approach can correctly reproduce the subtracted distribution, within the 1sigma error band derived by a binned analysis.
Two examples with actual LHC montecarlo simulations are also considered. The first one is the subtraction of the photon continuum from the p p > e+ e process at the LHC. The second one is the subtraction of the collinear radiation part from the p p > Z g process. In both cases the GAN method seems able to perform the wanted subtraction.
There are a couple of points that are not fully discussed in the paper, but are crucial to understand the usefulness of the proposed GAN method.
1) The first point regards the reconstruction error associated to the GAN approach. In the toy examples it is shown that the GSN approach is able to reproduce the target distribution within the 1sigma error band obtained from a binned analysis. This result, however, might be strongly influenced by the hyperparameters, i.e. the neural network structure, the training parameters and the training algorithm. In all the examples the hyperparameters are carefully adjusted to obtain a good performance.
It is not at all clear and not discussed in the text, how the reconstruction error is influenced by those choices. In a more realistic setup, in which one can not compute the error from a binned analysis, one would not know a priori the error associated to the GAN reconstruction. This might be an issue if one wants to use these techniques for physics analyses, in which all the sources of systematic errors should be carefully estimated and taken into account.
Is there a way to get such an estimate for the GAN approach?
Notice that, at the end of the Outlook section, there is the sentence “we have shown how to use a GAN to manipulate event samples avoiding binning”. Therefore it seems clear that this method is proposed as an alternative to binning. As such, a proper treatment of errors would be needed.
2) The examples discussed in the paper do not seem to be particularly useful from the LHC point of view (“We are aware of the fact that our toy examples are not more than an illustration of what a subtraction GAN can achieve”, taken from the Outlook section). Although the hope that the method is used for actual LHC analyses is expressed (“we hope that some of the people who do LHC event simulations will find this technique useful”), there is no mention to possible “useful” applications. Do the Authors know any example of “useful” application of the GAN technique?
Author: Ramon Winterhalder on 20200512 [id 821]
(in reply to Report 1 on 20200307)
1) It is not at all clear and not discussed in the text, how the reconstruction error is influenced by those choices. In a more realistic setup, in which one can not compute the error from a binned analysis, one would not know a priori the error associated to the GAN reconstruction. This might be an issue if one wants to use these techniques for physics analyses, in which all the sources of systematic errors should be carefully estimated and taken into account. Is there a way to get such an estimate for the GAN approach?
> What should we say  we are not aware of a serious study of uncertainties in generative networks, but we are working on it. As a matter of fact, we are starting a more serious collaboration on this crucial LHC question with local ML experts...
Notice that, at the end of the Outlook section, there is the sentence “we have shown how to use a GAN to manipulate event samples avoiding binning”. Therefore it seems clear that this method is proposed as an alternative to binning. As such, a proper treatment of errors would be needed.
> We completely agree, this paper is really meant as another motivation for the major effort of studying errors in GAN output. We added a comment along this line to Sec.2.
2) The examples discussed in the paper do not seem to be particularly useful from the LHC point of view (“We are aware of the fact that our toy examples are not more than an illustration of what a subtraction GAN can achieve”, taken from the Outlook section). Although the hope that the method is used for actual LHC analyses is expressed (“we hope that some of the people who do LHC event simulations will find this technique useful”), there is no mention to possible “useful” applications. Do the Authors know any example of “useful” application of the GAN technique?
> We changed the introduction, the respective sections, and the outlook accordingly. Now there should be a clearer picture of where such an event subtraction might come in handy.
Author: Ramon Winterhalder on 20200512 [id 823]
(in reply to Report 3 on 20200322)1 It is unclear what happens if "BS" does not have definite sign. Namely if the event density distribution P_B(x) is larger than P_S(x) in some region of "x", and smaller than P_S(x) in some other region. In this case, neither P_SP_B nor P_BP_S are densities, and the problem seems illdefined. Since this can happen in potential applications (see below), one should ask what the GAN would return if trained on a problem of this type, and if the method would at least allow one to recognise that there is an issue or it would instead produce wrong results.
> We expanded the discussion of signs and the zero function in Sec.2.3. As a matter of fact, our CSlike example already has such a sign problem which we solve with an offset.
2 The first class of applications mentioned in the manuscript are referred to as "background subtraction". However I could not find a discussion of what this should be concretely useful for. The example worked out in the manuscript (photon background subtracted from DrellYan, in section 3.1) does not shed light on this aspect because it is not clear why one might want to perform such subtraction.
Maybe the method is supposed to help for problems such as extracting the new physics contribution from a simulation containing also the standard model, for instance in cases where the new physics effect is small and the approach based on bins becomes computationally unfeasible. If this is the case, it should be clearly stated in the manuscript. However one should also take into account that performing a subtraction would be needed only if simulating the new physics contribution separately is not feasible. This is the case in the presence of quantummechanical interference between SM and new physics. However in the presence of interference, "BS" does not have definite sign in general. So the feasibility and the usefulness of the approach in this domain depends on point "1)".
> Again, we admit that we only work with a toy model. We now add a brief discussion of an appropriate problem, namely the kinematics of a GANned 4bodydecay signal from signalplusbackground and background samples.
3 The second class of applications are "subtraction" (see section 3.2). Also in this case, the final goal is not clearly stated in the paper. A short paragraph at the end of page 11 alludes to the fact that this could help MC@NLO event generation. If this is the case, it should be clearly stated and extensively explained. Also, it is found in section 3.2 that the required task of subtracting the collinear contribution cannot be accomplished because the method cannot deal with "BS" distributions that are very small. Would this prevent the method to work, eventually?
> We added some more discussion, including the subtraction of onshell events as a combination of the two examples. However, we admit that we are not MC authors with a clear vision where exactly such a tool would enter which MC code. We also improved the numerics in Sec.3.2 to show that given some more optimization and enough training time we do not expect precision to be an immediate show stopper.
> Altogether, we would like to thank the three referees and everyone who has discussed with us since the first version of the paper came out. We have changed the paper in many places, including abstract, introduction, physics discussions, and outlook. This is why we are confident that the current version is significantly improved over the original draft and hope that SciPost agrees with that judgement.