Targeting Multi-Loop Integrals with Neural Networks

Numerical evaluations of Feynman integrals often proceed via a deformation of the integration contour into the complex plane. While valid contours are easy to construct, the numerical precision for a multi-loop integral can depend critically on the chosen contour. We present methods to optimize this contour using a combination of optimized, global complex shifts and a normalizing flow. They can lead to a significant gain in precision.


Introduction
High-precision predictions based on quantum field theory are the cornerstone in the LHC research program probing fundamental physics and the way towards identifying physics beyond the Standard Model. Improved analysis techniques, better controlled systematics and the planned 25-fold luminosity increase during the LHC Runs 3 and 4 translate into a major challenge for the corresponding theory calculations and simulations [1]. Furthermore, the prospect of a future lepton collider will require the calculation of electroweak corrections with many different mass scales, where the underlying multi-loop integrals are out of reach for analytical approaches so far, while numerical methods are promising [2].
Essentially all of these ML-applications are driven by three properties of neural networks: they are very flexible in what they describe and how they are trained, they provide an excellent interpolation, and they are extremely fast once trained. These numerical advantages naturally lead us to investigate where ML could be useful in multi-loop calculations.
We present a first application of modern neural networks in loop integrals. Our starting points are Feynman integrals in a parametric representation, where the loop momenta have been integrated out analytically by the standard procedure, leading to the two Symanzik polynomials U and F, see e.g. [1] for a description. Such integrals often have poles which manifest themselves as powers of 1/ in dimensional regularization and can be factorized efficiently with sector decomposition [37,38]. After factorizing the poles, integrable singularities, related for example to thresholds, remain. They can be dealt with by a deformation of the integration contour into the complex plane [39][40][41][42][43][44]. An automated procedure to do so has been implemented for the first time in SecDec [45] and has been refined in SecDec-3 [46] and pySecDec [47][48][49].
The deformation of the integration contour can be performed in many ways, the only requirement is that no pole is crossed by the deformation. Applications to multi-loop integrals with a certain complexity show that the numerical precision can vary by orders of magnitude depending on the choice of a particular contour. In this work we present methods to optimize the choice of the contour based on neural networks.
In Section 2 we briefly review the construction of the Feynman parametric representation of multi-loop integrals, which forms the starting point of our investigations, as well as the contour deformation procedure employed in pySecDec. In Section 3 we describe our new approach to contour deformation based on neural networks and show results for several examples, before we give an Outlook.

Multi-loop Feynman integrals
For our study of ML-methods we focus on the numerical evaluation of integrals in the Feynman-parameter representation. Before we show how neural networks can improve the numerical evaluation of such integrals, we briefly review their definition and the way they are evaluated in pySecDec.

Feynman parametrization
A generic scalar Feynman integral in D space-time dimensions with L loops and N propagators of arbitrary powers ν j can be represented by where the propagators P j are of the form P j ({k}, {p}, m 2 j ) = q 2 j − m 2 j + iδ, with q j being a linear combination of loop momenta k and external momenta p. Introducing Feynman parameters leads to Further details can be found e.g. in [1,37]. Integration over the momenta gives us an expression in terms of the Symanzik polynomials U and F, The first Symanzik polynomial, U, is a positive semi-definite function of the Feynman parameters. The second Symanzik polynomial, F, contains kinematic invariants and Feynman parameters. A vanishing F is a necessary, but not sufficient condition for infrared or kinematic singularities to arise.

Contour deformation in pySecDec
After mapping all integration variables onto the unit-hypercube and integrating out the delta distribution we can absorb any additional factors by redefining and renaming U → U and F → F . Then, eq. (4) can be written in the compact form where, again, F ( x) can vanish inside the integration region. If we deform the integration over a Feynman parameter away from the real line segment x ∈ [0, 1] into the complex z-plane, Cauchy's theorem ensures that the integral does not change as long as no singularities are enclosed by the contour, In the complex plane the −iδ prescription in eq. (1) and eq. (4) ensures that we stay on the physical and causal Riemann sheet.
To construct an appropriate deformation into the complex plane we write z = x − i τ and expand F ( z) around x, The second term gives the leading term for the imaginary part of F ( x). We can guarantee that it is always negative by choosing τ j ∝ ∂F ( x)/∂x j . We also ensure that the integration endpoints are invariant under the contour deformation by requiring τ j ∝ x j (1 − x j ), defining the contour deformation as The deformation parameters λ j can be chosen arbitrarily, provided they are small enough for the leading order in τ to dominate the imaginary part of F ( z). If the λ j are too large the tri-linear term in eq. (7) can flip the sign of the imaginary part.
In SecDec 3.0 [46], the λ j are chosen by first determining their maximal values at which the tri-linear terms in eq. (7) have the same magnitude as the linear ones; then, some fractions of these maximal values are selected via several heuristics depending on the sampled values of ∂F ( x)/∂x j ; a detailed description is provided in Section 6.2.3 of Ref. [50]. In pySecDec [47] λ j are selected as the smallest of sampled values of In both cases the initial selection is followed by iterative refinement steps: if during the integration a sign check error occurs (i.e. either Im F ( x) is found to be positive or Re U ( x) is found to be negative) for one of the sampling points, then all λ j are multiplied by a factor of 0.9 and the integration is repeated. As a consequence, pySecDec often selects the largest allowed λ vector along the initially chosen direction.

Example diagrams
The Feynman diagrams we use to develop and benchmark our approaches are shown in Figure 1.
The top left diagram is a one-loop pentagon integral as it occurs in the production of a top quark pair in association with another massive particle and depends on four independent Mandelstam invariants as well as the top quark mass and the invariant mass of p 5 . Analytically it depends on logarithms and dilogarithms of ratios of kinematic invariants, leading to a complicated branch-cut structure. After Feynman parametrization the corresponding integral is described by 4 independent Feynman parameters.
The top right diagram is a two-loop box diagram with one massive on-shell leg and one off-shell leg. This diagram is a topology occurring for example in ttV production at two loops, where the boson V is radiated off an external top quark. It is close to the configuration of a 2-loop gluon ladder diagram where the exchange of gluons between two top quark lines gives rise to a Coulomb singularity. The analytic expression for this type of diagram is not known, but it is anticipated that it will contain elliptic functions. This integral depends on 6 Feynman parameters and is the most complicated example we consider in terms of dimensionality.
The diagram on the lower left of Figure 1 is a two-loop three-point function with a massive sub-triangle occurring, for instance, in NLO corrections to Higgs production in gluon fusion. It is the easiest 2-loop diagram we consider and serves as a stepping stone towards more complicated 2-loop diagrams. Analytic results for this diagram can be found in Refs. [51][52][53]. Depending on 5 Feynman parameters this integral is in between the previous two examples in terms of dimensionality of the integration.
The diagram on the lower right is a topology occurring in Higgs+jet production in gluon fusion at two loops. Its analytic expression contains elliptic functions and therefore is cutting edge for integrals that are currently accessible analytically. It has been calculated (semi-)analytically in Refs. [54,55] and also served as a benchmark for the development of the program pySecDec [47], where it is contained in the list of examples. This integral is 5-dimensional, so it has the same number of Feynman parameters as the triangle diagram, but it depends on four kinematic invariants rather than two.

Machine learning contour deformations
Numerically solving the contour integral introduced in eq. (6), with the contour deformation defined in eq. (8) still leaves the question how to choose optimal values for λ, and the functional form is not necessarily optimal. The Monte Carlo estimate of the integral is Its statistical error is minimized if the integrand approaches a constant, Correspondingly, to construct an optimal contour through a neural network we use the variance of the Monte Carlo integration for large n as the loss function, All terms inside the absolute value squared are complex numbers. Note that the loss function has to be real valued.

Global complex shift
The standard pySecDec approach of choosing the deformation parameters works fast because it only requires to evaluate F ( x) and its derivatives on a set of e.g. 10 4 points, and   (marked with a cross), which is the optimum point selected by Λ-glob. Note that integration error only varies between 0.32 and 0.56 over 3 orders of magnitude in λ 0 and λ 1 . We find this to be a common feature.
often produces λ j that are good enough in practice. For challenging integrals, however, it is useful to invest time into improving the λ j . For this purpose, we search for deformation parameters λ j which minimize the Monte Carlo integration error or the loss function L on a reduced set of points.
The beneficial effect of tuning the λ j values is illustrated in Figure 2 and Figure 3. Two aspects complicate this minimization problem. First, the Monte Carlo integration error depends on the reference sample and can be noisy, as shown for the ladder2L example in Figure 2: different choices of the 10 6 sampling points lead to a variation of the integration error by up to an order of magnitude. Minimizing the loss on a frozen set will overfit to the selected points and lead to an non-optimal choice for the actual integration. Second, the allowed region in the λ-space has a non-trivial shape and is currently only determined by searching for sign check errors. This noisy determination of the allowed region becomes a problem in practice, because the optimal λ j often lie close to this boundary, as can be seen from Figure 2. As a side remark, this is why the λ-construction in standard pySecDec can simply choose the largest possible λ j -vector in some predetermined direction and still work well in practice.
To circumvent these problems in optimizing the λ j we introduce the Λ-glob algorithm, a modified version of the Rprop algorithm [56,57] with the added explicit handling of the allowed region for λ j . A detailed description is given in Algorithm 1. We start with a point λ j . To converge faster to a potentially far-away minimum we work with a logarithmic scale j = log λ j . In the first step, called backtracking, the algorithm decreases j by some increment β if a sign check error E sign ( j ) occurs. After backtracking, we choose a proposal pointˆ j employing gradient decent. If the loss for the proposal point is smaller than for the previous point it is kept and the step size s j is decreased (increased) depending on whether the gradient of the loss has changed (not changed) its sign. If the loss is larger, the proposal point is rejected and the step size s j is decreased by a factor of η − . All these steps are repeated a predetermined number of times. To avoid overfitting, we draw a new sample of points x (i) for each iteration.
The Λ-glob algorithm benefits from a large initial step size that allows to efficiently step over local minima of L. Because the step size is adjusted automatically we do not for the elliptic2L diagram these two turn out to be the same, so only one is shown.
have to define a learning schedule, unlike for standard gradient descent optimization.
In Figure 3 we show the landscape of the loss function and observe that our algorithm has found the global minimum and that this minimum is very broad. While it is very flat in λ 0 and λ 1 individually, a correlated shift increases the loss function more steeply. Results of this algorithm for different Feynman integrals are presented in Figure 4. In addition to a standard Monte Carlo algorithm we also show the result for the Quasi Monte Carlo algorithm in pySecDec [48]. Depending on the integral and the kinematic configuration Λ-glob gives comparable or improved results compared to the standard pySecDec construction.

Generalized local transformation
Moving beyond the optimization of a global deformation parameter, we can exploit the full freedom of the reparametrization with a local transformation of λ to further minimize the Monte Carlo error. In principle any reparametrization in eq. (8) would serve our purpose as long as the defined contour does not cross or enclose any singularities. However, a good contour should also obey the following criteria: 1. The Monte Carlo error should be minimized, i.e. the product of the Jacobian of the transformation and the integrand should be nearly constant, see eq. (12); 2. The endpoints have to be fixed for Cauchy's theorem to be applicable as in eq. (6); 3. The parametrization should be numerically stable and, if possible, have tractable Jacobians. A tractable Jacobian is not only more stable numerically, it also helps to make sure that the procedure gives meaningful results.
Let us consider again the integral along a contour γ, We parametrize this contour in terms of the real parameters y j ∈ [0, 1], where the form of the imaginary part guarantees the correct boundary conditions. In contrast to eq. (8), the deformation λ j is now a local parameter, depending on y.
To minimize the variance in the numerical integration, y needs to be sampled according to some non-trivial probability distribution. In practice, this probability distribution is neither known nor is it possible to easily sample from it. Therefore, we introduce an additional mapping Except for the boundaries, the functions λ and f can be chosen freely. A flexible and promising way to parametrize these functions is with neural networks. A critical aspect of the reparametrization are the Jacobians the first of which is complex. For these Jacobians to be non-singular, we require our mappings to be bijective. While the complex Jacobian is always non-singular by construction, we have to ensure explicitly that the function f is bijective. The function λ does not have to be bijective. However, one needs to ensure that the sub-Jacobian is numerically stable.

Normalizing flow setup
The reasons to split the full mapping x → z into the real mapping x → y and the complex mapping y → z as in eq. (15) and eq. (16) are the following: First, it will allow us to use a normalizing flow [58][59][60][61][62] for the real mapping, giving us a tractable Jacobian. The Jacobian of the complex mapping needs to be evaluated numerically and is computationally more expensive than the Jacobian of the normalizing flow. Second, when we evaluate a kinematic phase-space point which does not require any contour deformation, as there are no integrable singularities, we just turn off the complex mapping. The real mapping then becomes a version of neural importance sampling [4][5][6][7][8][9].
To train our network we will use the variance loss defined in eq. (13). From the discussion of the Λ-glob algorithm and Figure 2 we know that the integration error is fairly insensitive to small changes in λ. Furthermore, for the parametrization in eq. (15), we found that making λ local (i.e. dependent on x) also hardly affects the loss. Because the introduction of a neural network comes with a computational cost, we keep λ global in our NN-approach. This means we rely on the Λ-glob algorithm to first find optimized λ j and then use a normalizing-flow network to optimize the sampling of the real parameters and minimize the variance. In our experiments we perform the numerical loop integration for various Feynman diagrams given in Figure 1, which are represented in the N -dimensional Feynman parameter space.

Network architecture
Normalizing flows encode a bijective mapping between a physics and a latent space. The model can be evaluated in either direction with comparable efficiencies, at least in the remapping of reals invertible network (INN) variant [63][64][65]. Even if we are not interested in this symmetric evaluation, normalizing flows have the considerable advantage of a tractable Jacobian. A simple realization are stacked coupling layers [64,66], where we split the input vector x in x 1 and x 2 and use an element-wise multiplication and sum to define the mapping y 2 = x 2 e s 2 (y 1 ) + t 2 (y 1 ) where s 1 , s 2 , t 1 and t 2 are parametrized by neural networks. The Jacobian of such a coupling block is [64] J = 1 0 ∂y 2 ∂y 1 diag(e s 2 (y 1 ) ) diag(e s 1 (x 2 ) ) ∂y 1 While J is not triangular, we will only be interested in the log-determinant, which can be calculated efficiently as For all examples we employ a normalizing flow consisting of these affine coupling blocks, where each coupling block describes a bijective mapping R N ↔ R N . To map the Feynman parameters x ∈ [0, 1] N from the unit-hypercube to R N bijectively we apply the logit function which is the inverse of the sigmoid function As both Jacobians are diagonal these functions can be easily combined with the coupling blocks. For convenience, we use the sigmoid as the final network layer, such that the output domain is again the unit-hypercube. We sandwich 14 coupling blocks between the logit and sigmoid functions. In each coupling block we use a simple fully connected neural network consisting of 3 layers with 128 units and Leaky ReLU as activation function. To regularize the exponentials of the affine coupling block we use soft clamping [66], s clamp = c · tanh(s), with c = 0.5, and activation normalization [65]. Furthermore, we use random orthogonal matrices [67] to allow for more interaction between the two parts y 1 , y 2 in the coupling blocks. Our network is implemented using TensorFlow [68].

Training
Before introducing the neural network, we employ the Λ-glob algorithm to find optimal values of λ j , which minimize the variance loss in eq. (13) and define a valid contour on the physical Riemann sheet. Next, we train the normalizing flow to re-sample the real parameters in the spirit of neural importance sampling. This gives us a complex mapping y → z parametrized as in eq. (8) with optimized λ j , and a real mapping where f is represented by a normalizing flow.
For kinematic phase-space regions below threshold, no contour deformation is needed. Here the Λ-glob algorithm will find λ j = 0, the complex mapping eq. (8) will be omitted, and the real mapping alone will improve the calculation. The complete workflow is summarized in Figure 5.
In contrast, for kinematic phase-space points above threshold the contour is vital and we need to make sure to have the correct sign for the imaginary part of F as well as for the real part of U . For the Λ-glob algorithm we use a simple backtracking method to discard a proposal state and step back, i.e. reduce the value of λ j , if a sign check-error occurs. For  the network we add a term to the loss function. As we employ the Adam optimizer [69], this sign loss has to be differentiable, so we add with ReLU(x) = max{0, x} to the variance loss of eq. (13). Using a validation set x val , the relative scales Y, X F , X U are estimated in the beginning of the training and updated every K th iteration according to Re U x val,(i) . (26) In practice, we find that K = 10 works well in our experiments. An illustration of the sign loss and its derivative of the F and U part is shown in the left and right panels of Figure 6, respectively.
Moreover, a numerical bottleneck in our contour optimization is the calculation of the complex-valued determinant and its derivative. As the TensorFlow implementation of complex-valued determinants yields wrong gradients * , we implement our own version of the determinant. It relies on the recursive Laplace expansion and becomes computationally expensive for higher dimensional cases. This can be seen in the GPU-memory usage in Figure 7, which is significantly higher for processes involving more Feynman parameters, such as the ladder2L example. This is one of the reasons why the timings are not competitive with the timings for the standard contour deformation in pySecDec. The largest benefit from the ML-approach is expected for high-dimensional multi-scale cases, where the contour avoiding all poles and branch cuts is a highly non-trivial hypersurface in the complex integration space. In such cases the gain in numerical precision can be so large that it outweighs the time spent to train the network. Indeed, the true advantage would show up in calculations of complete amplitudes, rather than individual integrals, containing a few integrals that would barely converge at all in pySecDec but would converge well with an optimized contour.

Performance
Finally, we illustrate the performance gain achieved by applying both, the Λ-glob algorithm only and its combination with the normalizing flow.
In Figure 8 we show results for the triangle2L (left) and the elliptic2L (right) integral. For both integrals we consider the first sector integral after sector decomposition. We sample 100 phase space points varying over 4-5 orders of magnitude in the squared centerof-mass energy s ≡ (p 1 + p 2 ) 2 . For both processes, we intentionally consider points below and above threshold, to compare the performance when no contour deformation is needed. We normalized the kinematic invariants using m 2 = 1. For the triangle2L integral, shown in the left panel of Figure 8, the average integration error over all phase-space points is reduced by a factor two for the Λ-glob algorithm and by a factor of 5 for our ML-approach. In the low-energy regime the error reduction stays around the average value. For increasing energies towards threshold at s/m 2 = 1, the absolute integration error of the standard pySecDec method and the pure Λ-glob algorithm increase, while absolute integration error of our ML-approach keeps decreasing. This results in a relative performance gain by a factor of up to 30 close to the threshold. The threshold being located at s/m 2 = 1 is a consequence of considering sector one, which effectively corresponds to a topology where one of the massive triangle propagators connecting to p 3 is pinched. In contrast, in the elliptic2L sector 1 integral, shown in the right panel of Figure 8, the importance sampling through the normalizing flow reduces the integration error by a factor of 20 and does not show the rising profile towards the threshold. The average integration error is reduced by a factor of 7 or 2 depending on whether the additional mapping of the normalizing flow is used or not. The kinematic points for this diagram are chosen to have varying values of t = (p 1 + p 3 ) 2 and p 2 4 . In general, for energies close but above threshold the performance gain is less pronounced, as the contour deformation in this regime has less freedom for optimization and the effect of modifying the real parts is diminished. For increasing energies, the absolute integration error also increases and eventually starts fluctuating. This is driven by the singularities moving toward the endpoints. A possible way to control this behavior has been proposed in Ref. [49]. Together with the absolute integration error, the improvement factors also start to fluctuate strongly for large energies.
Finally, in Figure 9 we show the results for the more complicated pentagon1L (left) integral and the ladder2L (right) integral. Again, for both integrals we consider the first sector integral after sector decomposition. The increasing complexity originates from both a higher-dimensional integration space, i.e. more Feynman parameters, and from having more kinematic scales involved. In order to cover possible dependencies on other kinematic variables than s and m 2 , we decided to sample different kinematic phase-space for the same values of s/m 2 . For both integrals we find that the average integration error reduces by a factor two for the Λ-glob algorithm. By employing the ML-method we achieve an average error reduction factor of 6 and 8 for the ladder2L and pentagon1L, respectively. For individual phase-space points we achieve an improvement factor of up to 30. However, there are also phase-space points for which both the Λ-glob and the flow supplemented algorithm show inferior performance. This clearly indicates the shortcomings of the optimization procedures which are related to the strict sign requirement on the imaginary part.

Outlook
We have shown, for the first time, that the application of modern machine learning methods to numerical multi-loop calculations can lead to a considerable reduction of the numerical uncertainties and hence speed. This has been achieved in a two-step procedure, first applying an algorithm to globally optimize the contour deformation parameters λ, and subsequently employing a normalizing flow to optimize the complex integration con-tour, after splitting the full contour deformation into a real and an imaginary part. We have demonstrated the performance with several one-and two-loop examples. All of these examples contain massive propagators and several kinematic scales, leading to a complicated threshold structure of the integrand, such that the contour deformation is a highly non-trivial task, which was dealt with successfully by the neural networks. While the results presented in this paper can only be a first step, they very much motivate further investigations.