SciPost Submission Page
How to GAN LHC Events
by Anja Butter, Tilman Plehn, Ramon Winterhalder
This is not the current version.
|As Contributors:||Tilman Plehn · Ramon Winterhalder|
|Submitted by:||Winterhalder, Ramon|
|Submitted to:||SciPost Physics|
|Subject area:||High-Energy Physics - Phenomenology|
Event generation for the LHC can be supplemented by generative adversarial networks, which generate physical events and avoid highly inefficient event unweighting. For top pair production we show how such a network describes intermediate on-shell particles, phase space boundaries, and tails of distributions. It can be extended in a straightforward manner to include for instance off-shell contributions, higher orders, or approximate detector effects.
Ontology / TopicsSee full Ontology or Topics database.
Submission & Refereeing History
- Report 3 submitted on 2019-11-29 08:37 by Anonymous
- Report 2 submitted on 2019-11-11 00:13 by Anonymous
- Report 1 submitted on 2019-11-10 00:17 by Anonymous
- Report 2 submitted on 2019-10-28 10:46 by Anonymous
- Report 1 submitted on 2019-10-03 00:53 by Anonymous
Reports on this Submission
Anonymous Report 2 on 2019-10-28 Invited Report
- Cite as: Anonymous, Report on arXiv:1907.03764v3, delivered 2019-10-28, doi: 10.21468/SciPost.Report.1265
see previous report
see previous report and detailed requests below
see previous report and detailed requests below
Comments to arxiv version 3
- The authors claim that they were able to sample the "full phase space" of the ttbar process. There is no indication that the GAN can sample the phase space of the samples without "holes". In addition, there is no discussion of how much of the phase space the GAN can sample outside the training data. The training data is a rather small subset of the true high dimensional phase space. This could be visualized by producing many more events through the GAN than were used as training data, and by displaying the sampled phase space with very small bin resolutions, revealing the granularity of the training data. It cannot therefore be concluded that the GAN scans the "full phase space". The training data and the capacity of the GAN are huge. It is not clear what we learn beyond the 1 Million training data events.
- Showing e.g. phi_object1 vs phi_object2 with very small bin sizes could be a way to show how the GAN is able to fill the “holes” in the high-dim phase space beyond the training data. It would also be a way to see how much mode-collapse is a really avoided.
- I like to repeat the demand to show the phi distributions of all 6 objects. It is interesting to show that they are indeed flat given the claim by reference 14.
- It is not clear how essential the MMD term is to reproduce the distributions. Also, the effect of the MMD term on the phi, eta and pt distributions should be shown.
- The details of the MMD configurations should be added to the draft, i.e. which kernels, widths etc. have been used ?
- Since the authors do not want to state the use of MMD in the title, I would recommend to mention this at least in the abstract.
- Code should be released with this publication. At least, the data produced by the GAN and the training data should be made available. The results are otherwise hardly reproducible.
- „page 5 „for each point“ I ssume you mean for each „batch“
Anonymous Report 1 on 2019-10-3 Invited Report
- Cite as: Anonymous, Report on arXiv:1907.03764v3, delivered 2019-10-02, doi: 10.21468/SciPost.Report.1208
This is a followup report, now considering v2. Thank you to the authors for addressing my comments on v1. I now only have two followup points:
- Fig. 4: I still don't understand how the GAN can do better (closer to the true distribution) than the stat. uncertainty on the training dataset. Please explain.
- v1 comment: Can you please demonstrate that your GAN is really able to generate statistically independent examples? If you really claim that it gets the full distribution correct, please show that it can model the tails as well as the bulk. You could maybe do this with bootstrapping to show that the statistical power of a GAN dataset that is 10x bigger than the training one is really 10x the one of the original dataset. My guess is that this will be true for the bulk, but not for the tails (in which case, perhaps you could modify your claims a bit).
Your answer: We already say that not all regions are perfectly learned. We see a systematics effect due to low statistics of the training/batch data, which is described in the text. Furthermore, we show a correlation plot which shows that the full phase-space is covered. We have also checked carefully and that there are indeed no holes.
Followup: Perhaps I should say this another way: you are advocating that people can use your tool to augment physics-based simulations. If I have a simulator, I could use your method to make e.g. 10x the number of events I started with. In order for me to believe that this is a useful exercise, you need to convince me that the 10x more events I got are statistically independent from the original physics-based simulation. If they are not, then I have not gained with the GAN. In my first comment, I proposed a way to show this, but there may be other ways to convince the reader.