Variance Reduction via Simultaneous Importance Sampling and Control Variates Techniques Using Vegas

Monte Carlo (MC) integration is an important calculational technique in the physical sciences. Practical considerations require that the calculations are performed as accurately as possible for a given set of computational resources. To improve the accuracy of MC integration, a number of useful variance reduction algorithms have been developed, including importance sampling and control variates. In this work, we demonstrate how these two methods can be applied simultaneously, thus combining their benefits. We provide a python wrapper, named CoVVVR, which implements our approach in the Vegas program. The improvements are quantified with several benchmark examples from the literature.


Introduction
In many fields of science, including particle physics, one has to compute the value of an integral over some domain Ω with volume where x ∈ d is a d-dimensional vector of independent variables.The function f (x) can be evaluated for a given x, but can be arbitrarily complicated.In high energy physics, integrations like (1) are ubiquitous, arising when computing total cross-sections (or differential crosssections with respect to low-dimensional event variables), particle lifetimes, convolutions with transfer functions describing detector effects, etc.Although widespread, this problem is fundamentally challenging -with the exception of a few trivial cases (typically found in the textbooks), the integral cannot be performed analytically, and one has to resort to numerical methods for its evaluation.Monte Carlo (MC) methods are particularly suited for high-dimensional integrals, since their accuracy scales as N trials , where Var[ f ] is the variance of f (x) over the domain Ω and N trials is the number of trials [1].One obvious way to improve the accuracy is to increase N trials , but this approach soon hits a wall due to resource limitations.Therefore, much attention has been placed on designing variance-reducing techniques.Some of the classic variance reduction methods include importance sampling (IS), stratified sampling, antithetic variables, control variates (CV), etc. [2].More recent techniques use quasirandom or low-discrepancy pointsets in a class of methods known as Quasi Monte Carlo [3,4], apply multigrid ideas in Multilevel Monte Carlo estimation [5], or leverage machine learning (ML) [6][7][8][9][10][11][12][13][14][15][16][17][18][19][20][21].A parallel research thrust has been the synthesis of ideas, whereby one tries to apply two such techniques, for example, by using several control variates [22], combining antithetic variates and control variates [23,24], or combining control variates and adaptive importance sampling [25,26].
In MC methods, random variates are used to sample the function of interest over a number of trials.In importance sampling, one samples points from a distribution p(x) that is, ideally, close to being proportional to | f (x)| ideally.Each sampled point x as weighted by the ratio f (x)/p(x), and the integral is estimated as the expected value of the event-weights.While it may be possible to find good sampling distributions analytically, it is typically very difficult in practice.So, one uses techniques like VEGAS [27,28], Foam [29], and modern neuralnetwork-based techniques [13][14][15][16][17][18][19][20][21][30][31][32][33] to learn good sampling distributions, which adapt the sampling distribution to better mirror f by computing the value of f at various trialvalues.In particular, in the VEGAS algorithm the sampling distribution is adapted by dividing the domain of integration into subregions of equal importance so that the variance of the MC estimate is reduced.A related method, additionally implemented in VEGAS+ [34], is that of stratified sampling, whereby one uses stratification to homogenize the resulting weights within each subregion.
The control variate method, instead, aims to reduce the variance by modifying the integrand.This is accomplished by adding and subtracting a function (the control variate) that is highly-correlated with the integrand and has a known value for its integral.The integral of the control variate converges to the known value, while the variance of the control variate partially cancels the variance in the original integrand.The challenge is to find a control variate that is indeed highly correlated (or anti-correlated) with the integrand f .Usually, this can be achieved for only special, low-dimensional cases.Control variates have been used in various domain-inspired applications in [35][36][37][38][39][40][41].
The thrust of this paper is to develop a general strategy to combine the well-established importance sampling technique with control variates.We note that the VEGAS method already computes a function that is highly correlated with the integrand in a general way.We illustrate how this function (or its evolution as it converges to an optimal value) can serve as a control variate to improve the accuracy for a given computational budget, or equivalently, to reduce the computation time for a given target accuracy.The paper is accompanied with a python code, COVVVR, short for CONTROL VARIATES AND VEGAS VARIANCE REDUCTION.The source code is publicly available at https://github.com/crumpstrr33/covvvr,together with an introductory tutorial for its usage.
This paper is organized as follows.Section 2 lays the theoretical groundwork of our approach.Sections 2.1, 2.2 and 2.3 briefly review the basic ideas of the Monte Carlo integration, the importance sampling and the control variates methods, respectively.Section 2.4 introduces the joint application of importance sampling and control variates, while Section 2.5 explains how one can leverage the successive VEGAS approximations as control variates.Sec. 3 presents the results from our numerical experiments quantifying the achieved precision improvement on some known benchmark examples.Section 4 contains a description and usage examples of the COVVVR code.Section 5 is reserved for a summary and conclusions.

Control variates and VEGAS
In this section, we describe the main idea of our method using a one-dimensional integration example of a real function f (x) from a to b, where for definiteness we take a < b.In special cases, a can be −∞ and/or b can be +∞.

Naive Monte Carlo integration
In the naive Monte Carlo approach, x is sampled uniformly over the domain Ω, i.e., x ∼ U(Ω), obtaining N independent and uniformly distributed samples x 1 , x 2 , . . ., x N .An estimate I N of the integral (3) is obtained in terms of the expectation value E x∼U(Ω) [ f ] of the function f (x) over the domain Ω: where x is sampled uniformly in the domain Ω, i.e., x i ∼ U(Ω).In the case of a onedimensional integral as in (3), The uncertainty on I N can be estimated in terms of the sample variance Var which decreases as N −1/2 .However, increasing N arbitrarily is infeasible, as it runs into resource limitations.This motivates methods which attempt to reduce the variance Var[ f ] of the integrand function f (x).Two such methods are discussed in the next two subsections.

Importance sampling
In importance sampling, we choose a sampling function for the random variates that resembles f (x) as closely as possible, and whose integral over the range (a, b) is known.By rescaling with the value of this integral, the new sampling function can be turned into a unit normalized probability distribution function (PDF), which we shall denote with p(x): The integral of interest (3) can be rewritten as which can in turn be estimated with Monte Carlo in N trials as , where In analogy to (5) the error on the estimate ( 8) is given by This is minimized when In other words, whenever the two functions f (x) and p(x) have similar shapes, the variance is reduced and the precision of the estimate is improved.In fact, if we could choose p(x) to be exactly proportional to f (x) everywhere, then I N is exactly equal to I for any N .Thus, importance sampling is most beneficial when the function p(x) mimics f (x) as closely as possible.

Control variates
In contrast to the importance sampling method, which involves a rescaling of the integrand, the control variates method adds to f (x) a term.The modified integrand can be taken as with and where c is a parameter which at this point we are free to choose.It is easy to see that the modification (11) does not change the value of the integral, i.e.
We can now leverage the freedom to choose the value of the parameter c in order to minimize the variance.Requiring gives the optimal value of c as The resulting variance is where is the familiar Pearson correlation coefficient.Note that if |ρ( f , g)| > 0, the variance is necessarily reduced.Furthermore, the higher the correlation between g(x) and f (x), the larger the benefit.Therefore, just like in the method of importance sampling, we desire a function g(x) that i) is highly correlated with f (x) and, ii) has a known expectation value E The method can be easily generalized to the case of multiple control variates.Appendix A contains the derivation for finding the optimal values c * i of the respective coefficients c i in that case.

Combining importance sampling and control variates
The two methods discussed in the previous two subsections 2.2 and 2.3 (importance sampling and control variates) can be combined together as follows.Given a known PDF p(x) and a control variate function g(x), we modify the integrand as The corresponding Monte Carlo estimate is We would still need a p(x) that is approximately proportional to f (x) and a g(x) whose integral is known, so that f /p is correlated with g/p.In this case, the optimal value c * is given by

Repurposing VEGAS intermediate outputs as control variates
VEGAS is an adaptive and iterative Monte Carlo where, at each iteration i, a unit-normalized probability distribution p i (x) is updated to serve as a probability distribution for importance sampling as described in Section 2.2 [27].Since VEGAS stores and reports these sampling distributions for each iteration, they can be usefully repurposed as control variates.We can then apply the result in (20), by choosing the IS function as p(x) = p n (x) for the final iteration n and the control variate function g(x) = p i (x) from some previous iteration i < n.Note that since p i (x) is unit-normalized the last term in (20) becomes In this way, VEGAS can be used to simultaneously provide both the sampling distribution p(x) and the control variate g(x).In principle, different prescriptions for selecting the previous iteration i to be used in ( 22) can be designed.Choosing the optimal among those prescriptions can be done on a case by case basis, as discussed below in Section 3.

Test cases
For comparison with other studies, we compute values for the benchmark functions presented in Ref. [13].The domain of integration for each variable is over the range of 0 to 1, and the corresponding values are given in the second column of Table III in Ref. [13].We confirmed those values as shown in Table 1 and, in what follows, focus on the accuracy of the MC estimate.
For clarity, the definition of the functions used as benchmarks is reproduced below.

d-dimensional Gaussians
The d-dimensional Gaussians are defined as with mean µ = 0.5 and standard deviation σ = 0.2.They serve as a good starting point and display the effectiveness of control variates for separable functions.We consider d = 2, d = 4, d = 8 and d = 16.

Annulus with cuts
The fourth function is an annulus defined by cuts at r min and r max : The cut parameters are chosen to be r min = 0.2 and r max = 0.45.

One-loop scalar box integral
The fifth function represents a one-loop scalar box integral encountered in the calculation of g g → gh, an important contribution to Higgs production from gluon fusion.The integral is where and The integrand function f 5 (x 1 , x 2 , x 3 ) in this case is a sum of four terms of the type 1/ F 2 Box .We test with the same parameters as in [13]: 2 , and m i = 173.9for i = 1 − 4.

Polynomial functions
The final set of integrand functions are quadratic polynomials of d variables We consider d = 18, d = 54 and d = 96.

Numerical Comparisons
Results from integrating the six types of test functions from Section 3.1 are shown in Table 1.
The first two columns list the name of the function and the dimension of the integration.The third column shows the expected true answer, while the next two columns give the MC estimates from VEGAS and from COVVVR, respectively.For the latter, we use the one control variate which gives the maximum variance reduction, as explained below, and choose the optimal value of c * according to (21).In the last two columns we show the normalized RMS error defined as We see that, as expected, the accuracy using a CV is comparable or improved.
As implied by eq. ( 19), the variance reduction results from the presence of a correlation between the function ratios f (x)/p(x) and g(x)/p(x).This correlation is illustrated in Figure 1 for the case of the 16-dimensional Gaussian (left) and the 96-dimensional polynomial (right) function.The iteration used for the CV was chosen automatically.The correlation is readily visible by eye, and the correlation coefficient is ρ ≈ 0.42 (left) and ρ ≈ 0.75 (right).
The effect of adding more than one CV is illustrated in Table 2 and Figure 2. We show results with one (1 CV), two (2 CV) or all 49 intermediate approximations (All CVs) from VEGAS Table 1: Results for the test case integrals described in Section 3.1 with n = 50 iterations, N events = 5, 000 events per iteration, and averaged over N runs = 1, 000 runs.The results displayed in the CVInt column were obtained with the one control variate that gives maximum variance reduction.The normalized RMS error is given by (31).iterations with, at most, N events = 10000 evaluations per iteration.The iteration used for the control variate (CV) was chosen automatically using the parameter cv_iters="auto1" as described in Section 4. We see that there is a correlation between the functions shown both qualitatively by the plot and quantitatively by a correlation coefficient of ρ ≈ 0.42 (left) and ρ ≈ 0.75 (right).Additionally, the expectation is approximately 1, in agreement with (22).
as control variates.The results are quoted in terms of the variance reduction in percentage (VRP) and the corresponding computational cost (in seconds, as well as relative to the VEGAS benchmark time).In each case, we select the CV or CVs that reduce the variance the most.Table 2 confirms that, by construction, the use of CVs always improves the accuracy of the estimate.The size of the VRP effect depends on the type of function at hand, and can vary from ∼ 1% to as much as 50% for one CV and 60% for two CVs.The associated computational cost is an increase of about 1.5 − 2.5 times for one CV and 2 − 4 times for 2 CVs.Note that an increase in accuracy could also be achieved by increasing the number of events used in a standard VEGAS calculation.We discuss the benefits of the CV method later.The dependence of the variance reduction on the choice of CVs is illustrated in Figure 2. The heat map in each panel depicts the variance reduction due to 1 CV (along the diagonal) and 2 CVs for all combinations.We show results averaged over 10 runs for a 16d Gaussian (left panel) and the scalar top loop (right panel).In each run, we perform 50 iterations with at most 25,000 evaluations per iteration.Along each axis, each panel contains a plot of the normalized variance (red line) and the correlation coefficient between the target function f (x) and the respective CV (purple line).
Figure 2 shows that the optimal iteration for choosing a CV can vary significantly -in the case of a 16d Gaussian, it is around 30, while for the scalar top loop integral, it is at the beginning.When we have the freedom to choose two CVs, some interesting patterns appear as shown in the heat maps, and the optimal choice for the Gaussian are the iterations around 25 and 40, respectively.
Since the optimal choice of the iteration is a priori unknown, we show in Table 3 the corresponding results when the iteration is decided and fixed at the very beginning.In this case, we pick the CV from the iteration which is 1/4 of the way from the beginning, i.e., i = ⌊n/4⌋.The number of events N events and total number of iterations n are chosen based on reaching the precision required in Ref. [13], namely, relative uncertainty of 10 −4 for the first 11 cases and 10 −5 for the last three.We see that even when the CV is fixed rather arbitrarily, the variance reduction is still significant, and can be ∼ 50%, as in the case of the polynomial functions.
Table 2: Results for the variance reduction in percent (VRP) after using one, two, or all 49 intermediate approximations from VEGAS as control variates.In each case, we ran n = 50 iterations and at most N events = 5000 events per iteration averaged over N runs = 100 independent runs.The time column shows the corresponding computational cost (in seconds and relative to the VEGAS benchmark time).

COVVVR: Control variates & Vegas variance reduction
The paper is accompanied with a python code, COVVVR, short for CONTROL VARIATES & VEGAS VARIANCE REDUCTION, that can be used to reproduce our results.The source code is publicly available at https://github.com/crumpstrr33/covvvr.In this section, we provide an introduction tutorial.
Installation: COVVVR can be installed via pip:

$ python -m pip install covvvr
Usage: The workflow involves creating a class that inherits the Function subclass and passing that to the CVIntegrator.The Function class contains the function to be integrated but also other information such as its name, the true value of the integration (if available) and parameters of the function.This class can be a built-in function, such as those used in this paper, or a custom-made function.The CVIntegrator class does the integration and stores the results like mean and variance.
To not have to deal with classes, one can use the classic_integrate function that does the steps above and returns the CVIntegrator object.To run the previous code block, one would use  --------+-----------+---------- Note that in the classic_integrate function, bounds is not optional as it is needed in order to define the dimension of the integral.(In contrast, the bounds argument is optional for the CVIntegrator class, since CVIntegrator determines the dimension of the integral from the Function class being passed, and then sets the limits of the integration from 0 to 1 by default.)

Specifying control variate iterations:
There are multiple valid arguments that can be passed to cv_iters for both the CVIntegrator class and classic_integrate which we list here.
• For using one control variate, pass an integer representing the iteration to use.
• For multiple control variates, pass a list of integers representing the iterations to use.
• The string 'all' will use every iteration.
• The string auto1 will estimate the best single CV to use by sampling each possibility using auto1_neval samples specified by the user.

Manual use of function class:
To access the function call manually, use function or f.This wraps around the private, vectorized _function/_f used by VEGAS.

Summary and Outlook
The MC method is widely used in many disciplines, and the issue of variance reduction in MC estimates has existed as long as the method.A number of techniques have been developed to reduce the variance of such MC estimates relative to brute force sampling, including importance sampling and control variates.In this paper, we combined these two methods by leveraging the ability of the importance sampling method implemented in VEGAS to automatically provide viable control variate candidates.We demonstrate and quantify the resulting variance reduction using several benchmark functions considered previously in a similar context [13].Our new approach was able to reduce the variance by O(1-50)% using 1 CV and O(1-60)% using 2 CVs.The reduced variance comes at the cost of some computational time.In our experiments, the main source of additional computational overhead in our approach is the computation of the control variates for all the datapoints, using the intermediate VEGAS distributions.The inherent theoretical advantages of our scheme should always be weighed against the possibility to improve the precision by simply increasing the number of data points.The recommended use case of our technique is in situations where the latter approach is comparatively less effective, which will be the case if computing the integrand f is sufficiently computationally expensive.Potential examples of such situations in particle physics include inference involving the slow simulation of the experimental apparatus, or the presence of additional convolutions in the definition of the integrand f (x) itself, e.g., due to transfer functions.Note that the technique proposed here is meant to improve the estimate on the integral of f (x), and is not intended for the task of improving event generation.The weights of the events are modified under our technique in such a way that the resulting differential distributions are different from f (x).The contributions of the control variates only cancel out after integrating over the full phase space.
There are several possible avenues for refining the technique proposed in this paper.The VEGAS algorithm has a few parameters which control how the sampling distributions are iteratively adapted.The effect of these parameters on the performance of the VEGAS-based control variates can be studied in order to construct good controls.Furthermore, the VEGAS grid adaptation algorithm could potentially be modified for the specific purpose of providing good control variates.There are other possible avenues for reducing the variance of a MC estimate which leverage deep learning [42,43] or specific domain knowledge about the integrand [44][45][46][47][48][49][50][51].Ideally, one would also like to better understand for what types of integrand functions f (x) our technique is more likely to produce a significant improvement (see Table 2).These ideas are left for future work.

Figure 1 :
Figure 1: Correlation between the function f /p and the control variate g/p.This is shown for the 16d Gaussian (left) and the 96d polynomial (right) run for n = 100 iterations with, at most, N events = 10000 evaluations per iteration.The iteration used for the control variate (CV) was chosen automatically using the parameter

Figure 2 :
Figure2: The variance reduction due to 1 control variate (along the diagonal) and 2 control variates for all combinations.We show results for n = 50 iterations with at most N events =5,000 evaluations per iteration averaged over N runs =100 runs for a 16d Gaussian (left panel) and the scalar top loop (right panel).The solid red line shows the normalized variance, i.e., the variance when using the i-th control variate iteration divided by the variance obtained without any control variates.

Table 3 :
[13]arison between native VEGAS and the addition of a single control variate (CV).Number of events and iterations are chosen based on reaching the precision required in Ref.[13], namely, relative uncertainty of 10 −4 for the first 11 cases and 10 −5 for the last three.There are a maximum of N events =5,000 events per iteration.The iteration chosen for the CV is set at a quarter of the total number of iterations.VRP is the variance reduction in percentage; it is the reduction in variance between VEGAS and CVInt by percentage.