SciPost Submission Page
Uncertainties associated with GAN-generated datasets in high energy physics
by Konstantin T. Matchev, Prasanth Shyamsundar
|As Contributors:||Konstantin Matchev · Prasanth Shyamsundar|
|Arxiv Link:||https://arxiv.org/abs/2002.06307v2 (pdf)|
|Date submitted:||2020-06-02 02:00|
|Submitted by:||Shyamsundar, Prasanth|
|Submitted to:||SciPost Physics|
Recently, Generative Adversarial Networks (GANs) trained on samples of traditionally simulated collider events have been proposed as a way of generating larger simulated datasets at a reduced computational cost. In this paper we present an argument cautioning against the usage of this method to meet the simulation requirements of an experiment, namely that data generated by a GAN cannot statistically be better than the data it was trained on.
Submission & Refereeing History
You are currently on this page
Reports on this Submission
Anonymous Report 2 on 2020-7-7 (Invited Report)
Provides a welcome critique of sometimes over-reaching claims for generative methods, particularly as applied to event generation, with many sensible comments on the propagation of statistical unceratinties through analysis chains.
1. The overall contention statistical convergence to the true asymptotic behaviour cannot achieved by generative methods from a low-statistics input sample is obvious, although I would agree that this is sometimes not obvious from the literature on such techniques.
2. The end of Section 1 says that what follows will be "presented carefully", but to my surprise the argument made is not analytical, but polemical, via analogy and intuition, and does not add much to the statement of Section 1. I actually rather enjoyed the presentation, but the absence of formal proof weakens the case for publication, especially given that the core insight is not novel.
3. The bold statement on p3 seems to me to itself be an over-reach, and is a key example of where a formal approach would greatly improve the strength of the argument. I suspect that it is technically correct for a relatively narrow definition of "GAN-generated", but the informal nature of the presentation means that definition is not fully clear. In particular, a process limited by detector simulation costs rather than event generation evidently *could* achieve improved accuracy for parameter inference, provided the statistics of the detector response training (as opposed to the physics MC samples) is not a leading systematic. On the other hand, if the fundamental process-modelling statistics are a limitation, the input to GAN training itself has statistical uncertainties which cannot obviously be reduced by (effectively) bootstrapping. Such gaps in the argument are in fact nicely summarised in the caveats of Section 3, where the difference between in-principle and pragmatic limitations is made clear.
Apologies for this late review, due to the new global circumstances. I enjoyed the short paper, but found it lacking in rigour which undermines the point being made (which I think is very sensible if not 100% watertight, but well known in at least the event generation community).
As a sensible commentary on not getting too excited about AI methods to the extent of forgetting basic statistics, it is rather good, but I am not convinced that such a polemic justifies journal publication. And given the lack of rigour in the central section, I think it would be more effective as messaging to compress Sections 1 and 2 into a single succinct summary of the argument, followed by the Caveats making clear that the perfect need not be the enemy of the good.
1. Make a more rigorous argument in Section 2, clarifying through formalism the scenarios and uncertainty classes being referred to.
2. Reduce repetition of the well-known main argument... but I suspect this would happen anyway by formalising Section 2.
Anonymous Report 1 on 2020-7-1 (Invited Report)
The paper is asking an extremely important question, relevant not only for particle physics applications of generative networks, but also for broader applications of machine learning.
The argument given in Sec.2 is neither mathematically or formally backed up, nor is it illustrated by an analysis. More than that, looking at broader applications of machine learning I would argue that it is most likely wrong. Specifically:
1- The sentence `This implies that the model (parameter) discriminating power...' comes out of nowhere. Why is this implied?
2- The example given below is not correct. A network can interpolate and it can learn an (approximate) functional form, because a neural network is nothing but a learned function. Sampling from a network can improve the analysis, even it those additional events are not a fully adequate replacement for the same number of properly sampled points. Where am I wrong?
3- In Sec.2.2 it sounds like the authors are complaining correctly that GAN analyses typically fail to address the issue of robustness and information content. This is true, I completely agree, but this does not mean that it could not be shown;
4- Making a cynical argument like at the end of Sec.2.2 really requires to be right, otherwise it is irritating and not appropriate;
5- I am not sure why the authors only discuss GANs and not other generative networks. Is this meant as an indication that the argument does not hold generally? For instance event generation is done with GANs the same way it is done with VAEs.
Please note that I am not saying that I can solve the problem the authors address. The issue I have is that I see plenty of evidence that either their general claim is wrong, or it is so specific that most applications of generative networks will end up on the extensive list of exceptions. The latter case would be interesting formally, but it needs a solid proof, separating the good use of generative networks from the bad use.
Please provide a formal and solid proof for the claims in Sec.2. And I am sorry, but my guess is that this will not be possible, because the evidence I have seen in the literature points in the other direction.