Uncertainties associated with GAN-generated datasets in high energy physics

Matchev, Konstantin T.; Roman, Alexander; Shyamsundar, Prasanth

doi:10.21468/SciPostPhys.12.3.104

SciPost Physics

Uncertainties associated with GAN-generated datasets in high energy physics

Konstantin T. Matchev, Alexander Roman, Prasanth Shyamsundar

SciPost Phys. 12, 104 (2022) · published 24 March 2022

doi: 10.21468/SciPostPhys.12.3.104
pdf
Submissions/Reports

Abstract

Recently, Generative Adversarial Networks (GANs) trained on samples of traditionally simulated collider events have been proposed as a way of generating larger simulated datasets at a reduced computational cost. In this paper we point out that data generated by a GAN cannot statistically be better than the data it was trained on, and critically examine the applicability of GANs in various situations, including a) for replacing the entire Monte Carlo pipeline or parts of it, and b) to produce datasets for usage in highly sensitive analyses or sub-optimal ones. We present our arguments using information theoretic demonstrations, a toy example, as well as in the form of a formal statement, and identify some potential valid uses of GANs in collider simulations.

TY  - JOUR
PB  - SciPost Foundation
DO  - 10.21468/SciPostPhys.12.3.104
TI  - Uncertainties associated with GAN-generated datasets in high energy physics
PY  - 2022/03/24
UR  - https://scipost.org/SciPostPhys.12.3.104
JF  - SciPost Physics
JA  - SciPost Phys.
VL  - 12
IS  - 3
SP  - 104
A1  - Matchev, Konstantin T.
AU  - Roman, Alexander
AU  - Shyamsundar, Prasanth
AB  - Recently, Generative Adversarial Networks (GANs) trained on samples of traditionally simulated collider events have been proposed as a way of generating larger simulated datasets at a reduced computational cost. In this paper we point out that data generated by a GAN cannot statistically be better than the data it was trained on, and critically examine the applicability of GANs in various situations, including a) for replacing the entire Monte Carlo pipeline or parts of it, and b) to produce datasets for usage in highly sensitive analyses or sub-optimal ones. We present our arguments using information theoretic demonstrations, a toy example, as well as in the form of a formal statement, and identify some potential valid uses of GANs in collider simulations.
ER  -

@Article{10.21468/SciPostPhys.12.3.104,
	title={{Uncertainties associated with GAN-generated datasets in high energy physics}},
	author={Konstantin T. Matchev and Alexander Roman and Prasanth Shyamsundar},
	journal={SciPost Phys.},
	volume={12},
	pages={104},
	year={2022},
	publisher={SciPost},
	doi={10.21468/SciPostPhys.12.3.104},
	url={https://scipost.org/10.21468/SciPostPhys.12.3.104},
}

Cited by 13

Yallup et al., Exploring phase space with nested sampling
Eur. Phys. J. C 82, 678 (2022) [Crossref]
Valdebenito Maturana et al., Exploration of Metrics and Datasets to Assess the Fidelity of Images Generated by Generative Adversarial Networks
Applied Sciences 13, 10637 (2023) [Crossref]
Danziger et al., Accelerating Monte Carlo event generation -- rejection sampling using neural network event-weight estimates
SciPost Phys. 12, 164 (2022) [Crossref]
Hashemi et al., Deep generative models for detector signature simulation: A taxonomic review
Reviews in Physics, 100092 100092 (2024) [Crossref]
Shyamsundar et al., Variance reduction via simultaneous importance sampling and control variates techniques using vegas
SciPost Phys. Codebases, 28 (2024) [Crossref]
Matchev et al., Analytical Modeling of Exoplanet Transit Spectroscopy with Dimensional Analysis and Symbolic Regression
ApJ 930, 33 (2022) [Crossref]
Deutschmann et al., Accelerating HEP simulations with Neural Importance Sampling
J. High Energ. Phys. 2024, 83 (2024) [Crossref]
Matchev et al., Transverse Vector Decomposition Method for Analytical Inversion of Exoplanet Transit Spectra
ApJ 939, 95 (2022) [Crossref]
Ilten et al., Modeling hadronization using machine learning
SciPost Phys. 14, 027 (2023) [Crossref]
Buhmann et al., EPiC-GAN: Equivariant point cloud generation for particle jets
SciPost Phys. 15, 130 (2023) [Crossref]
Moriwaki et al., Machine learning for observational cosmology
Rep. Prog. Phys. 86, 076901 (2023) [Crossref]
Purohit,
14516, 128 (2024) [Crossref]
Diefenbacher et al., L2LFlows: generating high-fidelity 3D calorimeter images
J. Inst. 18, P10017 (2023) [Crossref]

Authors / Affiliations: mappings to Contributors and Organizations

See all Organizations.

¹ Konstantin T. Matchev,
¹ Alexander Roman,
¹ ² Prasanth Shyamsundar

Funder for the research work leading to this publication

United States Department of Energy [DOE]