Two-loop amplitude generation in OpenLoops

Numerical tools, such as OpenLoops, provide NLO scattering amplitudes for a very wide range of hard scattering amplitudes in a fully automated way. In order to match the numerical precision of current and future experiments, however, the higher precision of NNLO calculations is essential, and their automation in a similar tool highly desirable. In our approach, D-dimensional amplitudes are decomposed into loop-momentum tensor integrals with coefficients constructed in four dimensions and rational terms. We present a fully generic algorithm for the efficient numerical construction of the tensor coefficients, which constitutes an important building block for an automated NNLO tool.


Introduction
Precise Monte Carlo simulations of scattering processes have played a major role in the success of the LHC. The hard scattering amplitudes at the core of these simulations can be obtained by fully automated numerical tools, such as OpenLoops [1][2][3], at tree and one-loop level. This is sufficient in order to obtain LO and NLO predictions for most processes, but in order to fully exploit the potential of the LHC and future colliders, NNLO predictions are required for a wide range of processes. While dedicated NNLO calculations, which involve two-loop amplitudes, exist for many 2 → 2 and a few 2 → 3 processes, a fully automated NNLO tool for processes with four, five and possibly more scattering particles would greatly expand the scope of precision phenomenology.
In the following, we will summarize the building blocks of the NLO OpenLoops program, and describe the ones required for a future NNLO tool. We will then present a major building block for a NNLO OpenLoops program, namely a new algorithm for the numerical construction of two-loop amplitudes in terms of loop-momentum tensor integrals.

Tree-level and one-loop amplitude construction in OpenLoops
L-loop scattering amplitudes are computed as sums of Feynman diagrams Γ , the amplitudes of which depend on the helicity configuration h of the external particles and are factorised into a colour factor and colour-stripped amplitude, While the colour factors C L,Γ are handled algebraically, the colour-stripped amplitudes A L,Γ are constructed numerically in OpenLoops. Tree-level diagrams are decomposed into subtrees w a , represented as blue bubbles in our graphs, which are then constructed through recursion steps, from two subtrees with less propagators and a universal function X derived from the Feynman rule of the connecting vertex and adjacent propagator. The denominator contains the mass m a and momentum k a of this propagator. The recursion starts from the external wave functions, and ends in connecting two subtrees which form the full diagram. This recursion is implemented in four dimensions, achieving a high level of efficiency through the recycling of already constructed subtrees in multiple tree-level and loop diagrams. 1 Starting from one-loop level, divergences can appear and need to be treated through renormalisation and IR subtraction procedures. In addition, the numerators of Feynman integrals are constructed in integer dimensions in a numerical tool. Hence, one-loop amplitudesM 1 in D dimensions are split into an amplitude M 1 constructed from Feynman integrals with four-dimensional numerators and a remainder stemming from (D − 4)-dimensional numerators. The latter can be fully reconstructed through rational counterterm [4][5][6][7] insertions into tree level amplitudes, which are computed together with the one-loop UV counterterms in the chosen renormalisation scheme as M 0,1l−CT .
For a large class of processes the helicity and colour-summed squared tree-level amplitude where 1/N hcs encodes the average over initial-state helicity and colour d.o.f as well as symmetry factors for identical final-state particles (see [3]), constitutes the LO contribution of the scattering probability density, while the NLO contribution is computed from the Born-loop interference The colour-stripped amplitude of a one-loop diagram Γ is given by with the integration measure in loop momentum space dq = µ 2 d Dq (2π) D and scalar propagator denominators D a (q) = (q + p a ) 2 − m 2 a with mass m a and external momentum p a . The numerator factorises into loop segments with at most linear q-dependence, which consist of a loop vertex and propagator encoded in the universal building blocks Y, Z and one or two external sub-trees w a with external momentum k a . These segments should be understood as matrices with Lorentz or spinor indices β a−1 , β a . In OpenLoops, the one-loop diagram is cut open at a chosen off-shell propagator D 0 and the resulting chain of segments constructed recursively through steps (k = 1, . . . , N ) starting from N 0 = 1. The numerator can be written as and the numerical recursion is implemented at the level of the tensor coefficients N k,µ 1 ...µ r , retaining the analytical structure in q throughout the amplitude construction. The tensor integrals in the resulting amplitude are either reduced a posteriori, using external libraries such as Collier [8], or on the fly, i.e. during the amplitude construction [2], with Collier or OneLoop [9] for the final evaluation of scalar integrals. This completely generic algorithm is fully implemented for NLO QCD and NLO EW and available in the public OpenLoops tool [3].

Structure of a two-loop OpenLoops tool
A full NNLO calculation consists of a double-virtual, real-virtual and real-real part. The latter two are already provided by the public OpenLoops tool, as well as the squared one-loop amplitude entering the renormalised double-virtual contribution, where the bar marks the amplitude in D dimensions and the operator R the renormalisation procedure. In the following, we focus on the crucial piece for which new efficient methods need to be developed and implemented in the OpenLoops framework, namely the Born two-loop interference. The numerators of two-loop integrals are again decomposed into a part that can be numerically constructed in four dimensions, and (D − 4)-dimensional remainders. In [10][11][12] it was demonstrated that the renormalised D-dimensional two-loop amplitude can be split into amplitudes computed with four-dimensional loop numerators, where the four terms on the rhs are the unrenormalised two-loop amplitude, the one-loop amplitude with one-loop rational and UV counterterm insertions, the tree-level amplitude with two-loop rational and UV counterterm insertions, and the tree-level amplitude with double one-loop rational and UV counterterm insertions. 2 The most challenging part is the Born two-loop interference term constructed with four dimensional numerators where we use (2), and the sum is taken over the full set of two-loop diagrams Γ of the scattering process. In the following, we will discuss two-loop diagrams, which become 1PI on amputation of all external subtrees. 3 The colour-stripped amplitude A 2,Γ of an irreducible two-loop diagram is constructed from three chains, C 1 (q 1 ), C 2 (q 2 ) and C 3 (q 3 ), connected by two vertices V 0 , V 1 , and has the form with the three denominator chains (i = 1, 2, 3) The numerator construction is again performed at the level of tensor coefficients. The tensor integral reduction and evaluation is then the remaining piece to be developed and implemented in our framework. For the tensor coefficient construction we exploit the factorisation of the numerator into three chains and two connecting vertices, and the factorisation of the chains -each dependent on a single loop momentum -into segments of the same structure as the one-loop segments, Here we made the dependence of each segment on the helicity d.o.f. h (i) a of the associated subset of external particles explicit. 4 In order to construct (13), the numerator is interfered with the colour factor and full Born amplitude of the process, 4 The helicity labels are defined in an additive way, such that the global helicity h = h (i) a , which are constructed from the segment helicities h (i) a . For the simplicity of our description in this section, we assume three-point vertices V 0,1 . In the case of four-vertices V 0,1 with external subtrees, additional helicity labels are introduced for these two vertices. This is also fully implemented and included in the studies presented in section 4. For details on the helicity definitions and treatment see [13].
The objective is now to construct U(q 1 , q 2 ) in a recursive way, at the level of tensor coefficients, which are then contracted with tensor integrals in the two loop momentaq 1 ,q 2 . In order to find the most efficient recursion with N r steps, with V N r = U(q 1 , q 2 ) and the building blocks −1 }, a CPU cost analysis for the possible algorithms of this form was performed, each for several QED and QCD Feynman diagrams. Here we estimated the CPU cost of each step by the number of multiplications, the most expensive numerical operation. The most efficient recursion was then fully implemented and validated for QED and QCD corrections to SM processes. It consists of the following steps. 0. The three chains are sorted by their number of segments, such that N 1 ≥ N 2 ≥ N 3 . The order of V 0 and V 1 is determined by vertex type, such that the number of multiplications in the following steps is minimal (for details see [13]).

The shortest chain C 3 (q 3 ) is constructed through the recursion
and n = 0, . . . , N 3 − 1. This is usually the chain with the least helicity d.o.f. and intermediate results can be recycled in multiple Feynman diagrams, such that this step is negligible in the overall CPU cost for a full process.
2. The full diagram is then constructed through a sequence of sub-recursions: 2.1 The longest chain C 1 (q 1 ) is constructed through steps and n = 0, . . . , N 1 −1. The initial condition defined in (18) contains the interference with the full Born, which allows for the on-the-fly summation of the helicities of each chain segment during the recursion step, in which it is attached. 5 Since C 1 (q 1 ) is the longest chain, a large portion of helicity d.o.f is already summed over at a stage at which the partially constructed diagram depends only on a single loop momentum.

2.2
The two-loop vertex V 1 is connected to the previously constructed chains, summing over the helicities of C 3 (q 3 ), and introducing the dependence on a second loop momentum, and hence a much higher complexity.

2.4
The remaining chain C 2 (q 2 ) is constructed through steps and n = 0, . . . , N 2 − 1. Here the complexity stemming from the high tensor ranks in the loop momenta is counterbalanced by the dependence on only a few remaining helicities. By construction, in the final result all helicities are summed.
This algorithm is completely generic. For QED and QCD corrections to the SM, it has been fully implemented and validated at the level of tensor coefficients in the OpenLoops framework.

CPU efficiency and numerical stability
In order to test the CPU efficiency of this new algorithm, we computed the tensor coefficients for a wide range of QED and QCD processes, each for 1000 uniform random phase space points (psp) on a computer with a single Intel i7-6600U @ 2.6 GHz processor and 16GB RAM. The average time per psp is shown in the upper plot of Fig. 1 against the number of Feynman diagrams. The runtimes for the complete two-loop recursion, including full colour and helicity sums, range from a few ms for simple QED and QCD processes to O(1s) for more complex 2 → 3 processes. The computation time scales linearly with the number of diagrams. It is interesting to compare these two-loop (2l) time measurements to the ones for the corresponding real-virtual corrections, i.e. the same process with one additional photon or gluon at one-loop level (1l+g), which constitutes another, already fully available component of a full NNLO calculation. The ratio of 2l and 1l+g timings is depicted in the lower plot of Fig. 1, once including only the tensor coefficient constructions, and once including the tensor integrals in the one-loop calculation. These ratios are fairly constant over all considered processes with 2l (tensor coefficients) 1l+g (tensor coefficients) = 9 ± 3, 2l (tensor coefficients) 1l+g (full calculation) = 4 ± 1.
Compared to the corresponding one-loop tensor coefficient construction with two extra gluons/photons, the two-loop tensor integral construction is even a factor 3 − 8 faster (see [13]). Considering the much higher complexity of two-loop diagrams as compared to one-loop diagrams, these are very promising values, and we expect that the efficiency of the future tool for full two-loop calculations will largely depend on the efficiency of the tensor reduction. Our implementation of this new algorithm also shows high numerical stability at the level of the tensor integral coefficients, as demonstrated by relative uncertainty measurements for 2 → 2 and 2 → 3 QCD amplitudes computed in double precision for 10 5 uniform random psp. These relative uncertainties are in the range of 10 −16 to 10 −14 for the bulk of the psp, and never below order 10 −12 and 10 −11 . For details we refer to [13].  For processes with external e ± , QED corrections were considered, for all other processes QCD corrections. Amplitudes with external tops were computed with a massive top, all others with purely massless internal fermions.

Conclusion
We presented a completely new algorithm for the CPU efficient and numerically stable construction of the loop-momentum tensor coefficients of two-loop amplitudes. This is an important building block in the development of a fully automated two-loop tool in the OpenLoops framework.