How to GAN Event Unweighting

Event generation with neural networks has seen significant progress recently. The big open question is still how such new methods will accelerate LHC simulations to the level required by upcoming LHC runs. We target a known bottleneck of standard simulations and show how their unweighting procedure can be improved by generative networks. This can, potentially, lead to a very significant gain in simulation speed.


Introduction
First-principle simulations have defined data analysis at the LHC since its beginning. The success of the LHC in establishing the Standard Model as the fundamental theory of particle interactions is largely owed to such precision simulations and the qualitative progress in our understanding of QCD. Because the HL-LHC will produce a data set more than 25 times the current Run 2 data set, the current theory challenge is to provide significantly faster simulations, while at the same time increasing the precision to the per-cent level and better. This goal is defined by QCD precision predictions as well as by the expected size of experimental uncertainties, which seriously limit the use of leading-order simulations even for complex signatures at future LHC runs. While it is hard to accelerate standard tools to the required level, there is justified hope that modern machine learning will allow us to reach this goal.
Going back to the LHC motivation, the key question is where we can gain significant speed in precision theory simulations. As mentioned above, we can use flow networks to improve the phase space sampling [15,17]. In addition, we can employ generative networks because they learn more information than a statistically limited training data set [4]. This is why neural networks are successfully used to encode parton densities [49]. Finally, there exist promising hints for network extrapolation in jet kinematics [23].
In this paper we target a well-known bottleneck LHC event simulation, the transformation of weighted into unweighted events [50]. Usually, the information about a differential scattering rate is first encoded in a combination of event weights and the event distribution over phase space. To compare with data we ideally work with unit-weight events, where all information is encoded in the event distribution. For complex processes, the standard unweighting procedures suffer from low efficiency. We will show how a generative network, specifically a GAN, should be able to speed up the corresponding event generation very significantly [51]. We will start with a 1-dimensional and a 2-dimensional toy model in Sec. 2, to illustrate our uwGAN idea in the context of standard approaches. In Sec. 3 we will then use a simple LHC application to show how our GAN-unweighting * method can be applied to LHC simulations.

Unweighting GAN
Before we show how networks can be useful for LHC simulations, we briefly introduce event unweighting as it is usually done, how a generative network can be used for this purpose, and when such a network can beat standard approaches. First, we will use a 1-dimensional camel distribution to illustrate the loss function which is needed to capture the event weights. Second, we use a 2-dimensional Gaussian ring as a simple example where our method circumvents known challenges of standard tools.

Unweighting
For illustration purpose, we consider an integrated cross section of the form where dσ/dx is the differential cross section over the m-dimensional phase space x. To compute this integral numerically we draw N phase space points or events {x} and evaluate The event weight w(x) describes the probability for a single event x. Sampling N phase space points {x} and evaluating their weights {w} defines N weighted events {x, w}. The information on the scattering process is encoded in a combination of event weights and phase space density. This can be useful for theory computations, but actual events come with unit weights, so all information is encoded in their phase space density alone.
We can easily transform N weighted events {x, w} into M unweighted events {x} using a hit-or-miss algorithm, where in practice M N . It re-scales the weight w into a probability to keep or reject the event x, and then uses a random number R ∈ [0, 1] such that the event is kept if w rel > R. The obvious shortcoming of this method is that we lose a lot of events. For a given event sample the unweighting efficiency is [15] If the differential cross section varies strongly, w w max , this efficiency is small and the LHC simulation becomes CPU-intensive.
A standard method to improve the sampling and integration are phase space mappings, or coordinate transformations x → y(x), σ = dx w(x) = dy ∂x ∂y w(y) ≡ dyw(y) .
Ideally, the new integrandw(y) is nearly constant and the structures in w(x) are fully absorbed by the Jacobian. In that case˜ This method of choosing an adequate coordinate transformation is called importance sampling. The most frequently used algorithm is Vegas [52,53], which assumes that g(x) factorizes into phase space directions, as we will discuss later.
In contrast to vetoing most of the weighted events we propose to use all of them to train a generative model to produce unweighted events. We follow a standard GAN setup with spectral normalization [54] as regularization method For weighted training events, the information in the true distribution P T factorizes into the distribution of sampled events Q T and their weights w(x). To capture this combined information we replace the expectation values by weighted means for batches of weighted events, Because the generator produces unweighted events with w G (x) = 1 their weighted mean reduces to the standard expectation value. This way, our unweighting GAN (uwGAN) unweights events, and the standard GAN is just a special case with all information encoded in the event distribution.

One-dimensional camel back
We illustrate the unweighting GAN with a 1-dimensional camel back where N (x; µ, σ) is a Gaussian. To see how the GAN reacts to different ways of spreading the information between weights and distributions, we define three events samples, · unweighted events distributed according to the camel back, X u = (x camel , w uniform ); · uniformly distributed events X w = (x uniform , w camel ); and · a split X hybrid = ( For the camel back example our training data will consist of 1 million weighted events. We use 32 units within 2 layers in the generator and 32 units within 3 layers in the discriminator.
In the hidden layers, we employ the ReLU activation function for both networks. To compensate an imbalance in the training we update the discriminator ten times as often as the generator. As a first test, we show in Fig target distribution from unweighted events, uniformly distributed events, and weighted events equally well. The limitation are always the poorly populated tails in the training data.
To benchmark our unweighting GAN, we first sample the true distribution with a large number of events and bin them finely, in our 1-dimensional case 10 10 events in 2000 bins equally distributed over the full range x = −2 ... 4. This statistics goes far beyond the training sample and is only needed to define a truth benchmark. We then generate an equally large sample of 10 10 GAN events and compare the two high-statistics samples, weighted truth events and unweighted GANned events, in the top panels of Fig. 2. From the bin-wise ratio we see that the GAN reproduces the true distribution at the few per-cent level, again limited by the tails.
Given the true distribution, we can compute event-wise factors which would be needed to shift the GANned unit weights to reproduce the true distribution exactly. We refer to them as truth-correction weights for each (unweighted) GAN event. Because we rely on the binned truth information we assign the same truth correction to all GAN events in a given, narrow bin. Formally, we assume that the generator distribution P G approximates the true distribution P T , so the bin-wise ratio for each unweighted event, given its phase space position x, should tend to one. The actual values for the truth-correction weights are shown in the bottom panels of Fig. 2. For the full x-range we see that they are strongly peaked around unity, but with sizeable tails. The fact that the distribution is not symmetric and includes significant statistical fluctuations suggests that our network could be further improved. Nevertheless, the vast majority of events have  a truth-correction below 3%. In the right panel we see the same distribution after removing the tails. Literally all GAN events now come with a truth-correction below 3%. Comparing the upper and lower panels of Fig. 2 we also see that these truth-correction weights are not statistically distributed corrections, fluctuating rapidly as a function of x. Instead, they reflect systematic limitations to the precision with which the GAN learns P T (x) and encodes it into the phase space distribution.
As discussed above, Vegas encodes P T (x) jointly into the phase space distribution and event weights [52,53]. This means we can compare the GAN and Vegas encodings in the phase space distribution by comparing the truth-correction weights in the sense that for Vegas they will define the perfectly trained output. After a series of 150 adaption steps, Vegas reaches the weight distribution shown in Fig. 2, corresponding to an unweighting efficiency of 0.75. Note that after 50 adaption steps, this Vegas unweighting efficiency was 0.95. The reason is that Vegas is optimized for integration by using tight grids in the bulk and wide grids in the tails. The longer Vegas adapts its grid, the more events are removed from the tails. This improves the numerical integration at the cost of the unweighting efficiency. Indeed, in Fig. 2 we see that the high-weight tails of the Vegas truth-correction are comparable to the GAN case. Again the tails in the event weights correspond directly to the tails of the density distribution over x. When it comes to unweighting the Vegas events, these tails become a major problem, because they drive the denominator in Eq.(4).

Two-dimensional Gaussian ring
Knowing a weakness of Vegas we now choose a 2-dimensional circle in the x-y plane with a Gaussian radial distribution as our second example, with x 0 = y 0 = 0.5, r 0 = 0.25, and σ = 0.05. The normalization is then given by N ≈ 5.079. For the GAN we slightly modify our network architecture and replace the ELU activation function by the ReLU activation function in the generator. Furthermore, we now use 256 units within 8 layers in both the generator and the discriminator.
In Fig. 3 we show the true distribution as well as the asymmetry of the truth and GANned distributions. As for the 1-dimensional camel back, large relative deviations are limited to the tail of the distribution, in this case including the center of the circle. In the lower-left panel of Fig. 3 we see how these regions contribute little to the integral over the density.
It is clear that the Vegas algorithm cannot reproduce the circular shape, because it breaks the factorization with the dimensionality. Instead, Vegas constructs a square with a low unweighting efficiency. Again, we compare the GAN and Vegas truth-correction weights in the lower-right panel of Fig. 3. As expected, the uwGAN now does significantly better, albeit with truth corrections up to ±25% in the tails. Just like for the 1-dimensional example, the tails in the truth-correction correspond directly to the tails in the density, so they reflect the statistical limitations of the training sample. For a realistic application the key question becomes how this kind of truth correction compares to the standard approaches and if it is sufficient given the general statistical limitations in poorly populated phase space regions.
As a side remark, it is of course possible to compute the truth corrections without binning for the 1-dimensional and 2-dimensional toy models. However, for a realistic LHC problem that will in general not be the case, so we stick to the binned definition throughout this paper. We have explicitly tested that our binned distributions agree with the exact truth-correction distributions for the two toy models.

Unweighting Drell-Yan
So far, we have considered two toy examples to motivate our uwGAN. Next, we need to apply it to a simple LHC process, where we can study the phase space patterns in some detail. We consider the Drell-Yan process We generate 500k weighted events at a CM energy of 14 TeV. The 4-dimensional fiducial phase space is defined by the minimal acceptance cut m µµ > 50 GeV (13) to avoid the photon pole in the numerical event generation. The technical requirement on the weighted training events is that they should cover a wide range of weights, so we can test if the uwGAN can deal with this practical challenge. This means we cannot use a standard Monte Carlo, where sophisticated phase space mappings encode p T and m µµ very well.
We implement our own custom event generator in Python, extracting the matrix elements from Sherpa [55], the parton densities from LHAPDF [56], and employing the Rambo-on-diet sampling [57,58]. The integration over the parton momentum fractions is symmetrized in terms of τ = x 1 x 2 as the first phase-space variable, with Eq.(13) translating into τ min ≈ 0.00128. Mapping the phase space onto a unit hyper-cube defines two random numbers r 1,2 through With an additional random number r 3 = (cos θ + 1)/2 we can parametrize the 4-dimensional phase space as In Fig. 4 we show the weight distribution for our event generator, where the shown 500k event weights are computed as the product of scattering amplitude, parton density, and phasespace factor. While the distribution is very smooth, indicating that the phase space is sampled precisely, the range of weights poses a problem for an efficient event unweighting. Even if we are willing to ignore more than 0.1% of the generated events, we still need to deal with event weights from 10 −30 to 10 −4 . Effects contributing to this vast range are the Zpeak, the strongly dropping p T -distributions, and our deliberately poor phase space mapping. The classic unweighting efficiency defined by Eq.(4) is 0.22%, which is considered high for state-of-the-art tools applied to complex LHC processes. In the following panels of Fig. 5 we show a set of kinematic distributions, first for the 500k weighted training events including the deviation from a high-precision truth sample. Indeed, this training data-set describes E µ all the way to 6 TeV and m µµ beyond 250 GeV with deviations below 5%. The perfectly flat φ µ distribution turns out to be the challenge in our specific phase space parametrization, with bin-wise deviations of up to 20% from the true distribution.
In addition to the unweighted training data, we also show the kinematic distributions for unweighted events from a standard algorithm. We use the hit-and-miss method described in Sec. 2.1 without any further improvements, which limits the number of unweighted events to 1000. Correspondingly, the standard unweighted events only cover E µ to 1 TeV and m µµ to 110 GeV. For φ µ the deviations also exceed those of the training data significantly. This poor behavior is simply an effect of the low unweighting efficiency and a serious challenge for LHC precision simulations.
Alternatively, we can employ our uwGAN to unweight the Drell-Yan training data. To take into account symmetries, we only generate the degrees of freedom of the process. By construction, this guarantees momentum conservation and on-shell conditions. Before passing to the discriminator both the generated batches {x G } and the truth batches {x T } are parameterized as where w is the associated event weight. In order to reproduce the sharp resonance appearing in the m µµ distribution which originates from the Z boson propagator, we employ an additional MMD loss [24]. For this we generalize the standard MMD loss [59] into a weighted version to accommodate the event weights appearing in the training data. The weighted MMD takes the form where we already use that w G (y) = 1. Note that we use MMD instead of MMD 2 as this increases the sensitivity of the loss close to zero. This loss is then added to the generator objective   In the right panel of Fig. 4 we again show the truth-correction weights for our uwGAN events, evaluated on the binned phase space either in terms of the unit hyper-cube (r j = 1 ... 0) or the appropriately cut phase space of Eq. (17). The number of bins ignores empty bins and shows the limitations of our bin-wise extraction of the truth correction. While some of the truth corrections are not negligible, we also know that they appear in the tails of the generated phase space distribution and can easily be traced. Even if we consider the finite and bin-wise-defined truth correction with a grain of numerical salt, we find the performance of our relatively slim network quite convincing, given that we start from weighted events with more than 25 orders of magnitude in weights. Most importantly, the tails of the truth correction are a result of the uwGAN unweighting, not a limiting factor like for the standard unweighting procedure. The appropriate measure of success for our uwGAN are the predicted kinematic distributions. In Fig. 5 we compare the weighted training data, a corresponding unweighted event sample using the standard algorithm, and the uwGAN results. In the lower panels we show the relative differences to the truth, defined as a high-statistics version of the training sample. While the training data agrees with the truth very well, we see its statistical limitations in the tail of the E µ -distribution. In addition, the φ µ distribution for the weighted training data is noisier than one would expect for a smooth phase space.
The uwGANned events also reproduce the truth information well. As always, the GAN learns the phase space information only to the point where it lacks training statistics and the GAN undershoots the true distribution [24]. This limitation can be quantitatively improved by using different network architectures [48]. In our case it affects the phase space coverage for E µ 4.5 TeV and m µµ 250 GeV. These values are not on par with the training data, but much better than for standard unweighting. In terms of rate and required event numbers there is a factor 100 between standard unweighting and the uwGAN method. In addition, the φ µ distribution shows that the neural network interpolation actually smoothes out the noisy training data.

Outlook
First-principle precision simulations are a defining aspect of LHC physics and one of the main challenges in preparing for the upcoming LHC runs. Given the expected experimental uncertainties, we need to improve both, the precision and the speed of the theory-driven event generation, significantly to avoid theory becoming the limiting factor for the majority of LHC analyses. One promising avenue is modern machine learning concepts applied to LHC event generation.
In this study we proposed a significant improvement to one of the numerical bottlenecks in LHC event generation, the unweighting procedure. Such an unweighting step is part of every event generator, and for complex final state it rapidly becomes a limiting factor. We showed how to train a generative network on weighted events, with a loss function designed to generate events of unit weights, or unweighted events.
For a 1-dimensional and a 2-dimensional toy model we have shown that our uwGAN can indeed be used for event unweighting and that in the limit of perfect training it reproduces the true phase space distributions just like standard methods like Vegas. While we cannot beat the Vegas performance for a 1-dimensional test case, our uwGAN easily circumvents Vegas limitations from the assumed dimensional factorization.
As an LHC benchmark we use µ + µ − production and a poor in-house event generator with a low unweighting efficiency over phase space. The uwGAN performs significantly better than the standard unweighting procedure, both, in kinematic tails and for noisy training data. While it is not clear how much the speed gain from using an NN-unweighting in standard event generators will be, this application of generative networks could be easily implemented in the established LHC event generation chain. While we were finalizing this study, similarly promising ideas were presented in Ref. [60], showing how generative networks benefit from training on weighted events. edges support by the IMPRS-PTFS and by HeiKA. The research of AB and TP is supported by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under grant 396021762 -TRR 257 Particle Physics Phenomenology after the Higgs Discovery.