SciPost Submission Page
GANplifying Event Samples
by Anja Butter, Sascha Diefenbacher, Gregor Kasieczka, Benjamin Nachman, Tilman Plehn
Submission summary
As Contributors:  Sascha Diefenbacher · Tilman Plehn 
Arxiv Link:  https://arxiv.org/abs/2008.06545v3 (pdf) 
Date submitted:  20210326 11:47 
Submitted by:  Diefenbacher, Sascha 
Submitted to:  SciPost Physics 
Academic field:  Physics 
Specialties: 

Approaches:  Computational, Phenomenological 
Abstract
A critical question concerning generative networks applied to event generation in particle physics is if the generated events add statistical precision beyond the training sample. We show for a simple example with increasing dimensionality how generative networks indeed amplify the training statistics. We quantify their impact through an amplification factor or equivalent numbers of sampled events.
Current status:
Author comments upon resubmission
We are thankful to the referees for their insightful and helpful comments and suggestions. We integrated the suggestions into the text, with any modifications indicated in the list of Changes. Additionally we would like to clarify some of the raised points:
 I think that the factor x25 of expected statistics increase should actually be 19. My math is: LHC data amount to 160 fb1. HLLHC should deliver 3000 fb1. So 3000/160 ~ 19. To get you 25, One would need LHC data to amount to 120 fb1.
> We had based this on the CERN website (https://home.cern/resources/faqs/highluminositylhc) which states that the HLLHC target is 4000 fb^1
 I am concerned by the fact that 100 points are not a lot, and the fit in Fig. 1 shows some bias (within statistical uncertainties, of course). Did you verify that the fit is unbiased in average? Did you try more options that your 100point experiments? I am worried that the results of Fig.1 might be biased by the lowstatistics of the fit sample.
> We have generated a very size able number of versions of Fig.1, without observing a common bias. For larger training samples we see a similar effect, but of course less dramatic. We further also now mention this in the text.
 You say at page 4 that your uncertainty evaluation for the fit is equivalent to using the full covariance matrix returned by the fit. Did you verify that? Why not just using the covariance matrix? I would be interested to see the comparison. I agree with you that this is true in absence of fit biases, for perfectly chisq distributions. But are you in that situation? You likelihood should have a twofold ambiguity and the two minima might overlap if the parameter determination is not precise enough [here I am assuming that the bit parameters are the two mu and sigma + the relative fraction]
> Thanks for pointing this out, we have amended the sentence to be more clear, as it is indeed not just the fit uncertainty, but also the variation stemming from the individual dataset. The main reason we chose this method is to make sure we are consistent between our 3 approaches.
 How was the GAN architecture chosen? Was any optimization performed? Did you try alternative architectures?
> We performed some small by hand optimization, mostly varying the learning rates and the depth of the networks. We did also test other setups such as WGANs and GANs without gradient penalty. However once we found a method that produced consistent and stable training results we did not further optimize the setup, as to keep our message as general as possible.
 In the 2D example, why did you centre one of the two Gaussians at a negative r value? Doing so, you reduce this contribution to a tail below the other Gaussian, so the sense of the camel back shape is gone. I was puzzled by this choice. I would have used some other positive value of mu for the second Gaussian. The same comment applies to the Ndim case.
> The reason we use this specific formula is somewhat involved. In principle what we wanted is a Gaussian located at +4 that we can use as our radius. We sample form this Gaussian, then multiply the radii with vectors sampled from a Ndimensional unit sphere and the end result is our Ndimensional data. However a simple Gaussian at +4 (N(4,1)) with a cutoff at 0 is not normalized. This lead us to use the absolute N(4,1) This is, however equivalent to the Expression N(4,1) + N(4,1), as our dateset is by design symmetrical around the origin. Meaning it is irrelevant whether a half the unit sphere vector are assigned a positive radius and the other half a negative one, or if all or assigned a positive radius, the resulting distributions are equivalent. The reason we went with this expression over N(4,1) was because we hoped it would make it easier to see the similarities between 1D and ND.
 In the Introduction, the authors state that NNs go beyond naive interpolation due to the structure of the network, which e.g. introduces some level of smoothness in the functions it tries to represent. That statement seems intuitive, but one is left with the question of to what extent it holds for different NN architectures and datasets. For example, could the authors comment on why choosing this particular camel back example, and whether they have tested their procedure with other functional forms?
> We have played with other functional forms. The main issue was to avoid Gaussianlike functions, because they can be trivially learned. The camel back seemed as unGaussian as possible, and its topology with a central hole adds a challenge to the network.
List of changes
 Added reference to 'challenging the current simulation tools' statement
 Added a brief discussion to the introduction about statistical and systematic precision trade off of GANs.
 Added mention in text that amplification is also observable for larger sample sizes
 Made descriptions of the 3 compared approaches (Sample, fit, GAN) more visible
 Updated the figures to be more readable, and updated the caption to explain the dotted lines.
 Amended the sentence on fit uncertainties to be more clear
 Clarified sentence regarding the effects of increasing the numbers of quantiles
 Added a brief explanation on how we prevent mode collapse in the GAN training.
 Added a brief comment why we increase the number of samples from 100 to 500 when going to 5D
Submission & Refereeing History
You are currently on this page
Reports on this Submission
Report 1 by Veronica Sanz on 2021427 Invited Report
Report
I am happy with the changes made in the text after the referee 1 and myself made recommendations. I recommend publication in its current form.