SciPost logo

SciPost Submission Page

Uncertainties associated with GAN-generated datasets in high energy physics

by Konstantin T. Matchev, Prasanth Shyamsundar

This is not the current version.

Submission summary

As Contributors: Konstantin Matchev · Prasanth Shyamsundar
Arxiv Link: (pdf)
Date submitted: 2020-06-02 02:00
Submitted by: Shyamsundar, Prasanth
Submitted to: SciPost Physics
Academic field: Physics
  • High-Energy Physics - Experiment
  • High-Energy Physics - Phenomenology
Approach: Phenomenological


Recently, Generative Adversarial Networks (GANs) trained on samples of traditionally simulated collider events have been proposed as a way of generating larger simulated datasets at a reduced computational cost. In this paper we present an argument cautioning against the usage of this method to meet the simulation requirements of an experiment, namely that data generated by a GAN cannot statistically be better than the data it was trained on.

Current status:
Has been resubmitted

Reports on this Submission

Anonymous Report 2 on 2020-7-7 (Invited Report)


Provides a welcome critique of sometimes over-reaching claims for generative methods, particularly as applied to event generation, with many sensible comments on the propagation of statistical unceratinties through analysis chains.


1. The overall contention statistical convergence to the true asymptotic behaviour cannot achieved by generative methods from a low-statistics input sample is obvious, although I would agree that this is sometimes not obvious from the literature on such techniques.

2. The end of Section 1 says that what follows will be "presented carefully", but to my surprise the argument made is not analytical, but polemical, via analogy and intuition, and does not add much to the statement of Section 1. I actually rather enjoyed the presentation, but the absence of formal proof weakens the case for publication, especially given that the core insight is not novel.

3. The bold statement on p3 seems to me to itself be an over-reach, and is a key example of where a formal approach would greatly improve the strength of the argument. I suspect that it is technically correct for a relatively narrow definition of "GAN-generated", but the informal nature of the presentation means that definition is not fully clear. In particular, a process limited by detector simulation costs rather than event generation evidently *could* achieve improved accuracy for parameter inference, provided the statistics of the detector response training (as opposed to the physics MC samples) is not a leading systematic. On the other hand, if the fundamental process-modelling statistics are a limitation, the input to GAN training itself has statistical uncertainties which cannot obviously be reduced by (effectively) bootstrapping. Such gaps in the argument are in fact nicely summarised in the caveats of Section 3, where the difference between in-principle and pragmatic limitations is made clear.


Apologies for this late review, due to the new global circumstances. I enjoyed the short paper, but found it lacking in rigour which undermines the point being made (which I think is very sensible if not 100% watertight, but well known in at least the event generation community).

As a sensible commentary on not getting too excited about AI methods to the extent of forgetting basic statistics, it is rather good, but I am not convinced that such a polemic justifies journal publication. And given the lack of rigour in the central section, I think it would be more effective as messaging to compress Sections 1 and 2 into a single succinct summary of the argument, followed by the Caveats making clear that the perfect need not be the enemy of the good.

Requested changes

1. Make a more rigorous argument in Section 2, clarifying through formalism the scenarios and uncertainty classes being referred to.

2. Reduce repetition of the well-known main argument... but I suspect this would happen anyway by formalising Section 2.

  • validity: good
  • significance: low
  • originality: low
  • clarity: good
  • formatting: excellent
  • grammar: excellent

Author:  Prasanth Shyamsundar  on 2021-06-25  [id 1525]

(in reply to Report 2 on 2020-07-07)

We thank the referee for their comments and suggestions.
Addressing the list of perceived weaknesses: 1. Our focus in this paper is not just on convergence to true-asymptotic behavior, but on discrepancy between the generative models and true distribution even at finite statistics. In this sense, our main arguments are not trivial. 2. We thank the referee for the suggestions. We now present our argument in the form of a theorem in section 2 and information theoretic demonstrations in section 3 (both newly added in this version). 3. We have made the presentation of our arguments more formal throughout the paper to address these concerns.

We have also addressed both the requested changes with an almost complete rewrite of the paper, which now stands at ~27 pages as opposed to the previous version at ~11 pages.


Anonymous Report 1 on 2020-7-1 (Invited Report)


The paper is asking an extremely important question, relevant not only for particle physics applications of generative networks, but also for broader applications of machine learning.


The argument given in Sec.2 is neither mathematically or formally backed up, nor is it illustrated by an analysis. More than that, looking at broader applications of machine learning I would argue that it is most likely wrong. Specifically:
1- The sentence `This implies that the model (parameter) discriminating power...' comes out of nowhere. Why is this implied?
2- The example given below is not correct. A network can interpolate and it can learn an (approximate) functional form, because a neural network is nothing but a learned function. Sampling from a network can improve the analysis, even it those additional events are not a fully adequate replacement for the same number of properly sampled points. Where am I wrong?
3- In Sec.2.2 it sounds like the authors are complaining correctly that GAN analyses typically fail to address the issue of robustness and information content. This is true, I completely agree, but this does not mean that it could not be shown;
4- Making a cynical argument like at the end of Sec.2.2 really requires to be right, otherwise it is irritating and not appropriate;
5- I am not sure why the authors only discuss GANs and not other generative networks. Is this meant as an indication that the argument does not hold generally? For instance event generation is done with GANs the same way it is done with VAEs.


Please note that I am not saying that I can solve the problem the authors address. The issue I have is that I see plenty of evidence that either their general claim is wrong, or it is so specific that most applications of generative networks will end up on the extensive list of exceptions. The latter case would be interesting formally, but it needs a solid proof, separating the good use of generative networks from the bad use.

Requested changes

Please provide a formal and solid proof for the claims in Sec.2. And I am sorry, but my guess is that this will not be possible, because the evidence I have seen in the literature points in the other direction.

  • validity: poor
  • significance: poor
  • originality: poor
  • clarity: low
  • formatting: excellent
  • grammar: excellent

Author:  Prasanth Shyamsundar  on 2021-06-25  [id 1524]

(in reply to Report 1 on 2020-07-01)
answer to question
reply to objection

We thank the referee for their comments and suggestions.
Addressing the list of perceived weaknesses: 1. We thank the referee for this comment. We have converted the sentence under question into a theorem in Section 2, and supported it with mathematical demonstrations in Section 3, a proof in Section 4, and a toy example in Appendix A (all newly added). 2. We thank the referee for this comment. This is a common argument used in favor of GANs. We address this in Section 3.4 and Section 5 (both newly added). 3. We support our claims regarding the limitations of GANs rigorously in Sections 3 and 4 and Appendix A. 4. The argument under question is correct, and to better get the point across, we have rephrased the wording (now at the end of Section 4) and added Figure 1, where the parallels between GANning simulated data and GANning real collider data are explicit. 5. Our arguments do apply to other generative models like VAEs. Our focus on GANs was dictated by their prevalent usage in the literature. We have modified our summary section to indicate this.

Regarding the criticism in the report: We believe that there is no evidence in the literature that our claims are wrong. We discuss this in the newly added Section 6, which reconciles our results with those in the literature. In the revised version, we have extensively discussed examples of both good and bad usages of GANs, as requested.

We have also addressed the requested change by supporting our claims rigorously in Section 3 and 4. The paper has been almost completely rewritten, and now stands at ~27 pages as opposed to the previous version at ~11 pages.


Login to report or comment