QED at NNLO with McMule

McMule is a framework for fully differential higher-order QED calculations of scattering and decay processes involving leptons. It keeps finite lepton masses, which regularises collinear singularities. Soft singularities are treated with dimensional regularisation and using FKS$^\ell$ subtraction. We describe the implementation of the framework in Fortran 95, list the processes that are currently implemented, and give instructions on how to run the code. In addition, we present new phenomenological results for muon-electron scattering and lepton-proton scattering, including the dominant NNLO corrections. While the applications presented focus on MUonE, MUSE, and P2, the code can be used for a large number of planned and running experiments.


Introduction
Perturbation theory is a well-established tool to provide accurate theoretical descriptions of many scattering and decay processes. In fact, it is often the case that the coupling (either electromagnetic, strong, or electroweak) is small enough to facilitate a perturbative treatment and nonperturbative effects are either subdominant or can be isolated and modelled to a sufficient precision. Hence, there has been a huge effort and impressive progress in computational techniques for higher-order perturbative calculations.
While most of the effort of the community is geared towards high-energy colliders, there is also a very important low-energy programme ongoing. For example elastic electron-proton scattering at the Jefferson Laboratory lead to a determination of the weak charge of the proton by QWeak [1] or allowed PRad [2] to provide crucial input towards the solution of the proton radius puzzle [3,4]. The same process has been measured at MAMI by the A1 collaboration [5] to determine form factors and will be studied again at MESA, where P2 [6] aims at a precise determination of the weak mixing angle through an asymmetry measurement at a small beam energy of 155 MeV. A similar approach but using electron-electron scattering is pursued by the Moller experiment [7].
Two planned experiments for which we provide new results are MUonE [8] and MUSE [9]. The idea of MUonE is to use a 150 GeV muon beam at CERN to measure the differential cross section for elastic muon-electron scattering at a centre-of-mass energy of √ s ∼ 400 MeV. This is motivated by the connection [10] of hadronic vacuum polarisation (HVP) effects with the anomalous magnetic moment of leptons. From the shape of the muon-electron cross section it is possible to extract the effective electromagnetic coupling and, hence, to obtain an independent determination of the leading hadronic contribution. The idea of MUSE is to measure simultaneously electron-proton and muon-proton scattering, for positively and negatively charged leptons. The experiment will be carried out at the Paul Scherrer Institut with lepton momenta O(100 MeV) and will shed further light on the proton radius puzzle and two-photon exchange contributions. Bhabha scattering is a further example which has been studied extensively [11] in connection with luminosity measurements. Finally, we mention muon and tau decay processes that can be described through QED corrections in the Fermi theory. This list is tailored towards the applications discussed in this paper and is by no means complete. But it shows that there is a demand for precise higher-order QED calculations for low-energy scattering and decay processes involving leptons. It is the aim of McMule (Monte Carlo for MUons and other LEptons) to provide a Monte Carlo code that can be used to obtain precise theoretical predictions for a wide range of low-energy processes dominated by QED effects, with a particular focus on processes involving muons. More precisely, McMule is an integrator that allows to obtain histograms for arbitrary, fully differential observables.
QED calculations are typically simpler than QCD computations. First, due to the abelian nature of QED, the algebra is less involved. A more important aspect is the simplified structure of infrared singularities in QED, which reduces the complexity of the divergent phase-space integrations. Generally, it is a highly non-trivial problem to move from matrix elements to fully differential physical observables. However, the abelian gauge structure of QED leads to a simple Yennie-Frautschi-Suura (YFS) exponentiation of multiple soft singularities [12]. Also, in QED collinear singularities are only possible if a gauge boson (photon) becomes collinear to a fermion. These singularities can be regularised through non-vanishing fermion masses.
The relative simplicity of QED might well be responsible for a remarkable divide in the computational techniques that are used in the QED and QCD community. Typically, scattering processes in QED are computed using an infinitesimal photon mass to regularise infrared singularities and using a slicing method to extract the infrared-divergent part of phase-space integrations. In McMule we follow more closely techniques familiar from QCD calculations and therefore use dimensional regularisation and a subtraction scheme. In this context, the simplicity of the infrared structure of QED has been exploited in [13], where a subtraction scheme at next-to-next-to leading order (NNLO) and beyond has been developed that allows to obtain arbitrary fully exclusive quantities as soon as the matrix elements are known.
Despite the simplicity of QED, there is one aspect in which QED computations are more complicated than QCD calculations. It is related to potentially large logarithms log(m 2 /Q 2 ) that are remnants of collinear singularities. Here, Q is a typical scale of the process, which is often much larger than some of the fermion masses m. In QCD, quantities are usually considered that are inclusive enough such that final-state collinear singularities cancel. Hence, no corresponding large logarithms appear in the final result. Initial-state collinear singularities are factorised into parton distribution functions. Thus, it is possible to set m = 0, often to a very good approximation. In QED, this is not the case. Many distributions that are measured are dominated by these logarithms, such that it is often not possible to work with massless leptons. The dominance of the logarithmic terms can be exploited to obtain approximate expressions for higher-order corrections, see e.g. [14] for a review. Keeping finite fermion masses is a substantial complication for the evaluation of virtual corrections. In addition it potentially leads to numerical problems if a fully differential Monte Carlo approach is taken. Thus, in many cases QED results cannot be simply extracted from corresponding QCD results, but a dedicated effort is required.
The Fortran 95 code McMule can be downloaded at https://gitlab.psi.ch/mcmule/mcmule where also an up-to-date table of implemented processes, a documentation, and some sample results can be found. At the time of writing, the following processes are implemented: where ℓ and ℓ ′ are different leptons and l is either equal to ℓ ′ or the third possible lepton. The lepton decay processes are computed in the Fermi theory. For the processes with a proton p the approximation is made whereby its interaction is only due to the exchange of a single photon. In this article we will start in Section 2 by briefly recapitulating the techniques we use to do fully differential higher-order QED calculations. The structure of the code, which consists of several modules with a simple, mostly hierarchic structure is described in Section 3. In Section 4 we perform a basic leading-order (LO) calculation in order to illustrate how to run the code. The following two sections are devoted to our main new phenomenological results. We start with MUonE in Section 5. First we explain how to use McMule to reproduce next-to-leading order (NLO) results available in the literature [15]. Then we present new results for µ-e scattering, including numerically dominant NNLO corrections. Section 6 is devoted to lepton-proton scattering. We discuss how to extend the partial NNLO calculation of the previous section to elastic e-p and µ-p scattering and provide some phenomenological results adapted to P2 and MUSE. These processes are just the beginning of the McMule programme. In Section 7 we discuss possible future developments of McMule. Finally, the input parameters used by McMule are listed in Appendix A.

QED corrections as implemented in McMule
As mentioned in the introduction, some of the techniques used within McMule are somewhat different to what is typically used for higher-order QED calculations. For a start, infrared singularities due to soft photons are regularised through dimensional regularisation in d = 4 − 2ǫ dimensions. The photon is kept strictly massless also in intermediate steps. However, the masses of the fermions are always kept at their physical value and not set to zero. This regularises all collinear singularities in QED and gives rise to terms involving the logarithm of the fermion mass, log(m 2 /Q 2 ). Such terms often form the dominant corrections in QED and, thus, it is essential to keep fermion masses different from zero. This leads to a substantial complication in the evaluation of virtual corrections. If the two-loop amplitudes are available only for massless fermions, massification [16][17][18][19][20] can be used to obtain amplitudes suitable for our approach. While it is possible to partially resum logarithmic terms, at the current stage no effort is made within McMule to do so. Presently, McMule is a strict fixed-order fully differential particle/parton-level Monte Carlo integrator.
Since we are dealing with low-energy processes we always renormalise the fermion masses and coupling in the on-shell scheme. The treatment of infrared singularities that occur when combining real and virtual corrections is coded according to the FKS subtraction method [21,22] and its generalisation beyond NLO for massive QED, FKS ℓ [13].
The core idea of this method is to render the phase-space integration of a real matrix element finite by subtracting all possible soft limits. The subtracted pieces are partially integrated over the phase space and combined with the virtual matrix elements to form finite integrands. For a detailed discussion of the method we refer to [13]. Here, we just give a schematic overview with the basic information required to understand the structure of the code.
The NLO corrections σ (1) to a cross section are split into a n-particle and (n + 1)-particle contribution and are written as In (2c), ξ 1 is a variable of the (n + 1)-parton phase space dΦ d=4 n+1 that corresponds to the (scaled) energy of the emitted photon. For ξ 1 → 0 the real matrix element M (0)f n+1 develops a singularity. The superscripts (0) and f indicate that the matrix element is computed at tree level and is finite, i.e. free of explicit infrared poles 1/ǫ. In order to avoid an implicit infrared pole upon integration, the ξ 1 integration is modified by the factor ξ 1 (1/ξ 1 ) c , where the distribution (1/ξ 1 ) c acts on a test function f (ξ 1 ) as Thus, for ξ 1 < ξ c , the integrand is modified through the subtraction of the soft limit f (0). This renders the integration finite. However, it also modifies the result. The missing piece of the real corrections can be trivially integrated over ξ 1 . This results in the integrated eikonal factorÊ(ξ c ) times the tree-level matrix element for the n-particle process, M n . The factorÊ(ξ c ) has an explicit 1/ǫ pole that cancels precisely the corresponding pole in the virtual matrix element M (1) n . Thus, the combined integrand of (2b) is free of explicit poles, hence denoted by M (1)f n , and can be integrated numerically over the n-particle phase space dΦ d=4 n .
The parameter ξ c that has been introduced to split the real corrections can be chosen arbitrarily as long as where the sum is over all masses in the final state. The ξ c dependence has to cancel exactly between (2b) and (2c) since at no point any approximation was made in the integration. Checking this independence is a very useful tool to test the implementation of the method as well as its numerical stability. The finite matrix element M (1)f n is simply the first-order expansion of the general YFS exponentiation formula [12] for soft singularities where we exploited the implicit factor α inÊ. As detailed in [13], for QED with massive fermions this scheme can be extended to NNLO and, in fact, beyond. The NNLO corrections are split into three parts Thus we have to evaluate n-parton contributions, single-subtracted (n + 1)-parton contributions, and double-subtracted (n + 2)-parton contributions. This structure will be mirrored in the Fortran code. The ξ c dependence cancels, once all three contributions are taken into account. An example of this will be shown in Figure 6. The method described above has actually already been used for several processes. The radiative [23] and rare decay [24] of the muon and tau [25] have been implemented at NLO in the Fermi theory in a fully differential code. In addition, the Michel decay of the muon has been added at NNLO [13]. These results have been verified by comparison to more analytic and more inclusive computations [26][27][28][29][30]. Thus, the method is fully established and McMule can be seen as a natural extension of these previous computations and a container to include further phenomenologically relevant processes.

Structure of McMule
McMule is written in Fortran 95 with helper and analysis tools written in python 1 . An online documentation can be found at the git repository listed in the introduction [31]. The code is written with two kinds of applications in mind. First, several processes are implemented, some at NLO, some at NNLO. Since new processes are continuously added, we refer to the online documentation for a list of available processes. For these, the user can define an arbitrary (infrared safe), fully differential observable and compute cross sections and distributions. Second, the program is set up such that additional processes can be implemented by supplying the relevant matrix elements.
To obtain a copy of McMule we recommend the following approach $ git clone --r e c u r s i v e https :// gitlab . psi . ch / mcmule / mcmule To build McMule, a Fortran compiler such as gfortran and a python installation is needed. The main executable can be compiled by running $ ./ c o n f i g u r e $ make mcmule Alternatively, we provide a Docker container [32] for easy deployment and legacy results. In multiuser environments, udocker [33] can be used instead. In either case, a pre-compiled copy of the code can be obtained by calling $ docker pull yulrich / mcmule # requires Docker to be i n s t a l l e d $ udocker pull yulrich / mcmule # requires uDocker to be i n s t a l l e d We provide instructions on how McMule is used in Section 4.
When started, mcmule reads options from stdin as specified in Table 1 of Section 4. The value and error estimate of the integration is printed to stdout and the full status of the integration is written in a machine-readable format into a folder called out/ (see below).
McMule consists of several modules with a simple, mostly hierarchic structure. The relation between the most important Fortran modules is depicted in Figure 1. A solid arrow indicates "using" the full module, whereas a dashed arrow is indicative of partial use. In what follows we give a brief description of the various modules and mention some variables that play a prominent role in the interplay between the modules. global def: This module simply provides some parameters such as fermion masses that are needed throughout the code. It also defines prec as a generic type for the precision used. 2 Currently, this simply corresponds to double precision.
functions: This is a library of basic functions that are needed at various points in the code. This includes dot products, eikonal factors, the integrated eikonal, and an interface for scalar integral functions among others.
collier: This is an external module [34][35][36][37]. It will be linked to McMule during compilation and provides the numerical evaluations of the scalar and in some cases tensor integral functions in functions.
phase space: The routines for generating phase-space points and their weights are collected in this module. Phase-space routines ending with FKS are prepared for the FKS subtraction procedure with a single unresolved photon. In the weight of such routines a factor ξ 1 is omitted to allow the implementation of the distributions in the FKS method, see (2c). This corresponds to a global variable xiout. This factor has to be included in the integrand of the module integrands. Also the variable ksoft is provided that corresponds to the photon momentum without the (vanishing) energy factor ξ 1 . Routines ending with FKSS are routines with two unresolved photons, see (6c). Correspondingly, a factor ξ 1 ξ 2 is missing in the weight. The global variables xiout1 and xiout2 as well as ksoft1 and ksoft2 are provided.  Matrix elements are grouped into process groups such as muon decay (mudec) or µ-e and µ-p scattering (mue). Each process group contains a mat el module that provides all matrix elements for its group. Simple matrix elements are coded directly in this module. More complicated results are imported from sub-modules not shown in Figure 1. A matrix element starting with P contains a polarised initial state. A matrix element ending in av is averaged over a neutrino pair in the final state.
{pg}: In this module the soft limits of all applicable matrix elements of a process group are provided to allow for the soft subtractions required in the FKS scheme. These limits are simply the eikonal factor evaluated with ksoft from phase space times the reduced matrix element, provided through mat el.
This module also functions as the interface of the process group, exposing all necessary functions that are imported by mat el, which collects all matrix elements as well as their particle labelling or particle identification.
user: For a user of the code who wants to run for an already implemented process, this is the only relevant module. At the beginning of the module, the user has to specify the number of quantities to be computed, nr q, the number of bins in the histogram, nr bins, as well as their lower and upper boundaries, min val and max val. The last three quantities are arrays of length nr q. The quantities themselves, i.e. the measurement function, is to be defined by the user in terms of the momenta of the particles in quant. Cuts can be applied by setting the logical variable pass cut to false 4 . Some auxiliary functions like (pseudo)rapidity, transverse momentum etc. are predefined in functions. Each quantity has to be given a name through the array names.
Further, user contains a subroutine called inituser. This allows the user to read additional input at runtime, for example which of multiple cuts should be calculated. It also allows the user to print some information on the configuration implemented.
vegas: As the name suggests this module contains the adaptive Monte Carlo routine vegas [38]. The binning routine bin it is also in this module, hence the need for the binning metadata, i.e. the number of bins and histograms (nr bins and nr q, respectively) as well as their bounds (min val and max val) and names, from user.
integrands: In this module the functions that are to be integrated by vegas are coded. There are three types of integrands: non-subtracted, single-subtracted, and double-subtracted integrands, corresponding to the three parts of (6). The matrix elements to be evaluated and the phase-space routines used are set using function pointers through a subroutine initpiece.
The factors ξ i that were omitted in the phase-space weight have to be included here for the single-and double-subtracted integrands.
mcmule: This is the main program, but actually does little else than read the inputs and call vegas with a function provided by integrands.
test: For developing purposes, a separate main program exists that is used to validate the code after each change. Reference values for matrix elements and results of short integrations are stored here and compared against.
The library of matrix elements deserves a few comments. As matrix elements quickly become very large, we store them separately from the main code. This makes it also easy to extend the program by minimising the code that needs to be changed.
We group the various contributions into process group, generic processes, and generic pieces as indicated in Figure 2. The generic process is a prototype for the physical process such as ℓp → ℓp (cf. Section 6) where the flavour of the lepton ℓ is left open. The generic piece describes a part of the calculation such as the real or virtual corrections, i.e. the different pieces of (2) (or correspondingly (6) at NNLO), that themselves may be further subdivided as is convenient. In particular, in some cases a generic piece is split into various phase-space partitions, as in the example of em2emREE in Figure 2. A more detailed listing of the various contributions required for µ-e scattering is given in Figure 5.
When running mcmule, the code generates a statefile from which the full state of the integrator can be reconstructed should the integration be interrupted. This makes the statefile ideal to also store results in a compact format. To analyse these results, we provide a python tool pymule, additionally to the main code for McMule. The tool pymule, which can be found under tools/pymule, uses numpy [39] for data storage and matplotlib for plotting [40]. While pymule works with any python interpreter, IPython [41] is recommended. A full list of functions provided can be found in the online manual of pymule [31].
An important issue are numerical instabilities arising in problematic regions of the phase space. This is typically the case if an emitted photon becomes soft or collinear to a massive, but light, fermion. For soft photon emission the numerical instability is related to the FKS subtraction discussed in Section 2. When ξ 1 becomes very small, the difference (3) becomes potentially troublesome as f (ξ 1 ) can be calculated less precisely than f (0). To avoid this, we choose a very small softcut, below which we set the integrand directly to zero. In the collinear case small fermion masses give rise to pseudo-collinear singularities that further complicate a numerical stable evaluation of the matrix element. McMule addresses this issue through a dedicated tuning of the phase-space parametrisation to help the vegas integration find and deal with these problematic regions. In addition, a collcut is applied if the photon becomes very collinear to a light fermion. During development, softcut and collcut are varied to make sure that, within the integration error, the cross section is independent of the chosen values. Afterwards, a suitable value is chosen and hard-coded. However, the user retains the ability to modify this in inituser.

Running McMule: double radiative muon decay as an example
In order to provide a simple example with concrete instructions on how to run the code and to illustrate how it works, we consider the double radiative decay of the muon µ → e[νν]γγ at leading order. Since the neutrinos are not detected, we average over them, indicated by the brackets. Hence, we have to be fully inclusive with respect to the neutrinos. But the code allows to make any cut on the other final-state particles.
To be concrete let us assume we want to compute two distributions, the missing energy / E ≡ E(µ) − E(e) − E(γ 1 ) − E(γ 2 ) and cos θ e , the cosine of the angle between the outgoing positron and the muon polarisation. Both quantities are determined in the rest frame of the decaying muon. Of course, / E corresponds to the combined energies of the neutrinos. To avoid an infrared singularity in the branching ratio, we have to require a minimum energy of the photons. We choose this to be E γ ≥ 10 MeV individually for both photons. In addition, we require for the angle between the two photons θ γγ > 15 • .
As mentioned in Section 3 the quantities are defined in the module user (file src/user.f95).
McMule process group mudec generic process m2enn: µ → ννe, τ → ννe, τ → ννµ generic piece m2enn0 generic piece m2ennF generic piece m2ennR  where we have decided to have 50 bins for both distributions and nr q determines the number of distributions. The boundaries for the distributions are set as 0 < / E < 50 MeV and −1 ≤ cos θ e ≤ 1. The quantities themselves are defined in the function quant. This function takes arguments q1 to q7. These are the momenta of the particles, arrays of length 4 with the fourth entry the energy. To figure out which momentum corresponds to which particle the user needs to check the headers in the module mat el or in the manual [31]. In our case, we find ! ! From file m u d e c _ p m 2 e n n g g a v . f95 use mudec , only : p m 2 e n n g g a v! !( p1 , n1 , p2 , p3 , p4 , p5 , p6 ) ! ! for massive ( and massless ) electron ! ! average over neutrino tensor taken Indicating that we have p1 for the incoming µ, p2 for the outgoing e, and p5 and p6 for the two outgoing photons. The momenta of the neutrinos must be given but do not enter, as we average over them. Schematically, the function quant might look like if ( p5 (4) < 10. _prec . or . p6 (4) < 10. _prec ) pass_cut = . false . if ( cos_th ( p5 , p6 ) > 0 . 9 6 5 9 2 6 ) pass_cut = . false .
Here, we have used the function cos th provided by the module functions. This returns the cosine of the angle between the two momenta given as arguments. We have also specified the polarisation vector pol1 in accordance with the µ + beam used by MEG. This polarisation has been measured [42] to be P µ = −0.85 ± 0.05. Since McMule defines the polarisation through µ − , the sign has to be changed. The variable pass cut controls the cuts. Initially it is set to true, to indicate that the event is kept. Applying a cut amounts to setting pass cut to false. All that remains to be done is to prepare the input read by mcmule from stdin, as specified in Table 1.
Variable name Data type Comment nenter ad integer calls / iteration during pre-conditioning itmx ad integer iterations during pre-conditioning nenter integer calls / iteration during main run itmx integer iterations during main run ran seed integer random seed xinormcut1 real(prec) the 0 < ξ c ≤ 1 parameter xinormcut2 real(prec) the second ξ c parameter for NNLO (or the δ cut ) which piece char (20) the part of the calculation to perform flavour char(8) the particles involved (opt) unknown the user can request further input during userinit To be concrete let us assume we want to use 10 iterations with 1000 × 10 3 points each for pre-conditioning and 20 iterations with 5000 × 10 3 points each for the actual numerical evaluation. We pick a random seed, say 24225, and for the input variable which piece we enter m2enngR. Since the double radiative muon decay is not on the list of processes (1), we actually compute the real corrections (hence the suffix R) of the generic process µ → eννγ. The flavour variable is set to mu-e. We could e.g. use tau-e to change from the generic process µ → eννγ to the process τ → eννγ. This system will be used for other processes as well. The input variable which piece determines the generic process and the part of it that is to be computed (i.e. tree level, real, double-virtual etc.). In a second step, the input flavour associates actual numbers to the parameters entering the matrix elements and phase-space generation.
Thus, we run the code by giving the input $ ./ mcmule 1000 10 5000 In practice the input will typically not be given by hand. We mention a more efficient way in Section 5 as well as the manual [31]. The two variables xinormcut1 and xinormcut2 have no effect at all for a tree-level calculation and will be discussed in Section 5.1 in the context of the NLO and NNLO run for muon-electron scattering. We also ignore the optional input for the moment.
Alternatively, McMule can be run using Docker or udocker without compiling it first by running $ ./ tools / run -docker . sh -i yulrich / mcmule : latest \ -u path / to / user . f95 -r followed by the same input as above. Now the mule is ready to trot. After about fifteen minutes on an Intel i5 processor, it returns which, after adding the factor G 2 F α 2 /Γ µ results in a branching ratio of B = 2.6611(4) · 10 −5 . Here Γ µ is the measured width of the muon and G F the Fermi constant as given in the Appendix A. The error is of course only the statistical error of the Monte Carlo and not a theory error. In addition, the two distributions are written into a binary file that contains the full state of the integrator out/m2enngR mu-e S0000024225X1.00000D1.00000 ITMX020x005M.vegas. The corresponding results are shown as green histograms in Figure 3, where dB/d cos θ e has been normalised.
The results have rather poor statistics. In particular the precision in the low-energy tail of the sharply falling / E distribution is very low since the Monte Carlo generates very few points there. If the user is interested in this tail it is advisable to perform dedicated runs. This can be done simply by adding a cut like if ( emiss > 20.) pass_cut = . false .
in quant. The Monte Carlo will then adapt and result in a more precise determination of the / E distribution in the region / E < 20 MeV. For illustration in Figure 3 such a tailored run with the same statistics is overlayed in yellow to the original run in the plot for / E. This is all that is required for simply running the code. In what follows we give a brief outline how the code works. The first step it does in mcmule is to associate the numerical values of the masses, as specified through flavour. In particular, the generic masses Mm and Me are set to Mmu and Mel. This is done in initflavour(scms), defined in global def. For other processes this might also involve setting e.g. centre-of-mass energies scms to default values.
Next, the function to be integrated by vegas is determined. This is a function stored in integrands. There are basically three types of integrands: a standard, non-subtracted integrand sigma 0, a single-subtracted integrand sigma 1 needed beyond LO, and a double-subtracted integrand sigma 2 needed beyond NLO. It is the variable which piece that determines which of the three functions is called. Usually, for a LO case, we only need sigma 0. However, since the process µ → eννγγ as such is not implemented in McMule, we compute it at LO by calling the real corrections of the radiative muon decay µ → eννγ. Thus, from a technical point of view we call a single-subtracted integrand. The function quant, however, is constructed such that no subtraction takes place. This is ensured by the demand E γ > 10 MeV. In addition, which piece determines ndim, the dimension of the integration (11 in our case), and the matrix element that needs to be called, Pm2ennggAV(q1,n1,q2...q6). The name of the function suggests we compute µ(q 1 , n 1 ) → e(q 2 )[νν]γ(q 5 )γ(q 6 ) with the polarisation vector n1 of the initial lepton, and the neutrinos are averaged over. Note that the momenta of the neutrinos are given as arguments, even if they are redundant. This simplifies the code a lot because it means that all matrix elements have the same calling convention.
The interplay between the function sigma 1(x,wgt,ndim) and vegas is as usual, through an array of random numbers x of length ndim. In addition the vegas weight of the event, wgt, is passed. The function sigma 1 simply evaluates the complete weight wg of a particular event by combining wgt with the matrix element supplemented by symmetry, flux, and phase-space factors. In a first step a phase-space routine of phase space is called. For our calculation this is the optimised phase space psd6 p 25 26 m50 fks(x,p1,Mm,p2,Me,p3...p6,0.,weight) generating the momenta with correct masses as well as the phase-space weight weight. The d in the name of the phase-space routine indicates that we are considering a decay process (one initial state particle), the 6 indicates the total number of momenta generated and the meaning of fks will be explained below. The other labels indicate the particular tuning and partition which are irrelevant in this case. With these momenta the observables to be computed are evaluated with a call to quant. If one of them passes the cuts, the variable cuts is set to true. This triggers the computation of the matrix element and the assembly of the full weight. In a last step, the routine bin it, stored in vegas, is called to put the weight into the correct bins of the various distributions. These steps are done for all events and those after pre-conditioning are used to obtain the final distributions.
Since, technically speaking, we are computing a subtracted matrix element, the code also generates for each event the associated soft event, i.e. the same event with ξ 1 → 0. This is realised by having a parametrisation of the phase space, such that setting the first entry of x to 0 results in ξ 1 → 0. Such a phase-space routine is called FKS compatible and named with the ending fks. It is then checked whether the subtraction condition ξ 1 < ξ c , (3), is satisfied. If yes, quant is evaluated with this new set of momenta, and if the event passes, the soft limit of the matrix element is evaluated and the subtraction is performed according to (3). The global variable xiout is required for this, since it is left out of the FKS phase-space weight and has to be included in the integrand. In our case, the soft event never passes the cuts, due to the requirement q6(4) > 10. in quant.
To conclude this section, we mention that the process considered here is actually relevant to searches for lepton-flavour violating decays mediated by a light particle X. Indeed, double radiative muon decay µ → eννγγ in the region of very small / E cannot be distinguished from µ → eX with X → γγ.
MEG has performed a search for this decay [43]. In order to assess the background from double radiative muon decay, we have computed the / E distribution, with cuts adapted to the MEG detector. Here, θ γ and φ γ are the polar and azimuthal angles of the two photons. Further, we require that the two photons can be separated in the calorimeter. This is implemented by specifying them to be δx = 20 cm apart on the detector surface which is at a radius of R = 67.85 cm resulting in The results are shown in Figure 4, where special emphasis has been given to the small / E region. Integrating the differential distribution up to / E = 10 MeV yields a branching ratio of B( / E < 10 MeV) = 1.2 × 10 −14 .

Muon-electron scattering with McMule
Muon-electron scattering is a classic process which at low energy is completely dominated by QED. There is renewed interest in this process in connection with the long-standing (3 − 4)σ discrepancy between the anomalous magnetic moment or (g − 2) µ of the muon and its Standard Model prediction. The theory calculation of (g − 2) µ suffers from uncertainties originating from non-perturbative hadronic corrections. The largest source of this uncertainty is the HVP, followed by the contribution due to hadronic light-by-light scattering [44]. A better understanding of the hadronic contributions is therefore of utmost importance, even more so in light of the new (g − 2) µ experiments at Fermilab [45] and J-Parc [46] that will further increase the experimental precision achieved by the BNL E831 experiment [47]. The HVP correction can be related to measurement data of electron-positron annihilation using a dispersive approach [48,49]. The resulting integrand is, however, highly fluctuating due to hadronic resonances and threshold effects. This makes the corresponding analysis rather challenging. Furthermore, a recent lattice evaluation [50] of the HVP contribution to (g − 2) µ substantially deviates from the dispersive approach.
In this context it has been proposed to extract the HVP corrections from the measurement of the running of the QED coupling in the space-like region [51]. Contrary to the traditional time-like approach, the corresponding integrand is smooth and free of resonances. Moreover, this would yield an independent determination of the HVP contribution resulting, in turn, in a better understanding of the theory error.
While the original proposal was based on Bhabha scattering [51][52][53], it was recently established that the elastic scattering of muons on atomic electrons could, in principle, be sufficiently sensitive to reach a competitive precision with this novel approach [54]. This is the objective of the proposed MUonE experiment [8]. Since in this case the effect of the HVP to the running of the QED coupling for muon-electron scattering ranges from 10 −3 to 10 −5 , the differential cross section would have to be measured at a precision of 10 ppm.
The radiative corrections to muon-electron scattering represent one source of systematic uncertainty that has to be carefully studied [55]. To reach the target precision these corrections have to be known at a level of 10 ppm as well. The effect of hadronic corrections was recently addressed in [56,57] using two independent methods. At leading order also Z-exchange has to be taken into account. The main corrections are, however, due to QED radiation. The minimal requirement is expected to be the NNLO QED corrections, for which we have to consider up to two photons in the final state matched to leading-logarithmic resummation. In this section we report on the progress towards this goal made through McMule.

Running McMule for muon-electron scattering
The NLO QED corrections to muon-electron scattering have been known for a long time [58,59]. Motivated by the MUonE experiment, they have been revisited and, together with the NLO electroweak corrections, implemented in a fully differential Monte Carlo code [15]. As a first step towards a sufficiently precise description of muon-electron scattering within McMule, we have also implemented the NLO QED corrections. We have compared our results with [15] and found full agreement. McMule also contains the dominant electronic NNLO corrections that are proportional to Q 2 µ Q 6 e , where Q µ and Q e denote the charge of the muon and electron, respectively [55]. Also this part of the code is fully verified after comparing the observables defined in 'Setup 2' and 'Setup 4' of [15] with [60]. Since [15,60] use a photon mass as infrared regulator and the phase-space slicing method, the agreement is a strong cross check for a correct technical implementation. Details of the computation and physical results will be presented in Section 5.2. In this section we focus on a description on how to run the code.
There are several changes with respect to the example discussed in Section 4. First of all, the process is different. The generic process now is em2em. For a tree-level computation we can proceed analogous to Section 4 with which piece set to em2em0. For a NLO computation we need to evaluate the virtual and real corrections. As shown in (2), using FKS this results in two terms, the subtracted real corrections (2c) and the finite virtual corrections (2b), i.e. the virtual corrections combined with the infrared counterterm. The corresponding which piece are em2emR and em2emF, respectively.
The results obtained with em2emR and em2emF taken separately are ξ c dependent. This dependence has to cancel in the sum. The ξ c parameter is set through the variable xinormcut1 of Table 1. The latter has to be set to a value between 0 and 1 and is related to ξ c through (4) as xinormcut1= ξ c /ξ max . Checking the independence of physical results on ξ c serves as a consistency check and is an implicit check on the infrared safety of the observable implemented in quant. To do this, it helps to disentangle em2emF into em2emV and em2emC, according to (2b). 6 The former corresponds to the pure virtual corrections whereas the latter is the infrared counterterm, i.e. the integrated eikonal times the tree-level matrix element. Of course, taken separately these terms are infrared divergent. McMule returns the finite part, as defined in [31,61]. Only em2emC depends on ξ c and this part is typically much faster in the numerical evaluation. process group mue generic process em2em: µe → µe generic piece em2em0 generic piece em2emV, em2emC generic piece em2emFEE, em2emFEM, em2emFMM generic piece em2emA generic piece em2emREE partition em2emREE15, em2emREE35 generic piece em2emREM, em2emRMM generic piece em2emFFEEEE generic piece em2emRFEEEE partition em2emRFEEEE15, em2emRFEEEE35 generic piece em2emRREEEE partition em2emRREEEE1516, em2emRREEEE3536 generic piece em2emAA, em2emAFEE em2emNFEE generic piece em2emAREE partition em2emAREE15, em2emAREE35 Figure 5: A complete list of contributions (which piece) currently implemented for µ-e scattering. This is a subset of Figure 2.
In fact, em2emR and em2emF are divided up further, as can be seen in Figure 5, where a complete tree of possible which piece for the generic process em2em is depicted. This additional separation corresponds to a gauge-invariant split of the NLO corrections into emission/absorption from the electron line EE, emission/absorption from the muon line MM, and the interference EM. As shown in [15], the EE contributions are by far dominant. In addition, these contributions suffer most from pseudo singularities that arise from photon emission nearly collinear to the electron. To deal with these regions of phase space in a numerically stable way, there is one further purely technical partitioning of em2emREE into em2emREE15 and em2emREE35. These two partitions have a tuned phase space in s 15 = 2p 1 ·p 5 and s 35 = 2p 3 ·p 5 , respectively, to deal with initial-state and final-state pseudo-collinear singularities.
Finally, we note that also hadronic contributions are implemented. This is done together with the leptonic vacuum polarisation (VP) in em2emA. The user can then set the variables nel, nmu, ntau, and nhad to decide which contributions to include. For the calculation of the HVP the Fortran library alphaQED [62][63][64] is used. Specifically, we rely on the hadronic stand-alone version hadr5n12.f.
Choosing different random seeds, varying ξ c and having to compute the various real and virtual parts results in quite a few jobs. A particularly convenient way to run McMule is using menu files. A menu file contains a list of jobs to be computed such that the user will only have to vary the random seed and ξ c as the statistical requirements are defined globally in a config file. This is completed by a submission script, usually called submit.sh. The submit script is what will need to be launched. It will take care of the starting of different jobs. It can be run on a normal computer or on a Slurm cluster [65].
To prepare the run in this way we can use pymule, a tool provided together with McMule. When using pymule create, we are asked various questions, most of which have a default answer in square brackets. In the end pymule will create a directory, where all results will be stored. In addition pymule also provides tools to analyse the results, such as combining runs with different random seeds and different choices of ξ c . A more detailed description of pymule can be found in the online documentation [31,61].
Moving from NLO to NNLO increases the number of partial results further. Now we have to run with which piece set to em2emRREEEE (double-real corrections), em2emRFEEEE (real virtual corrections), and em2emFFEEEE (double-virtual corrections). As discussed above, the additional ending EEEE indicates that only electronic corrections are included. As for the real corrections, also the real-virtual and the double-real contributions are computed with a partition to disentangle initial-state and final-state pseudo-collinear singularities.
Also at NNLO the correction due to hadronic and leptonic VP is included. These contributions are split up according to the classification of [57]. The diagrams where the VP factorises are implemented in em2emAA, em2emAR, and em2emAF. The former takes into account diagrams with one or two insertions of the VP into the tree level diagram. The latter two implement QED NLO corrections combined with one insertion. The remaining non-factorisable vertex correction can be computed via em2emNF. This relies on the results of [56] which uses the hyperspherical integration method to calculate the hadronic corrections to muon-electron scattering [66].
As listed in Table 1, running at NNLO there are two ξ c variables to be set in the input. However, to obtain ξ c independent physical results it is imperative that they are set equal, xinormcut1=xinormcut2. The reason McMule works with two variables is that for computations with massless fermions, xinormcut2 corresponds to the unphysical cut variable related to the collinear subtraction, often denoted by δ cut . Also, an independent xinormcut2 can be used for internal checks.
An example of a typical check of the ξ c (in)dependence is shown in Figure 6, where the nparticle (orange), (n + 1)-particle (green), and (n + 2)-particle (red) contributions are shown separately for the total cross section according to 'Setup 4' of [15]. 7 These are just the parts given in (6). In the sum (blue) the ξ c dependence cancels. This can be seen particularly well in the bottom panel, where the results of seven separate choices of ξ c are shown, together with a 1σ band of a fit to the ξ c dependence.
Once more we stress that this cancellation is exact. Thus, in principle any choice is allowed. However, for very small choices of ξ c there are large numerical cancellations. In the case of production runs, it is thus advisable to pick a value of ξ c where the separate contributions have roughly the same magnitude as the final result. From experience, a choice around ξ c ∼ 0.1 is a good starting point.

The dominant NNLO corrections
We now turn to the technical details of the calculation as well as the phenomenological discussion of the results. As previously mentioned, at NNLO we restrict ourselves to the gauge-invariant subgroup that only contains electronic corrections, i.e. contributions proportional to Q 2 µ Q 6 e . These corrections are expected to be dominant compared to the other contributions at this perturbative order as a consequence of enhanced collinear logarithms. To be consistent, at NNLO we therefore also only include VP with electrons inside the loop.
Furthermore, we assume the electron to be unpolarised in correspondence with the atomic electrons of the MUonE experiment. Our results are therefore independent of the muon polarisation due to parity invariance of QED. Additionally, the considered gauge-invariant subset is also independent of whether the muon beam consists of µ + or µ − . This does not hold, however, for the full set of NLO QED corrections that is included here. This includes the muon and tau VP.
The double-virtual diagrams were calculated with the full electron mass dependence using the analytic expressions for the heavy quark form factors of [67]. Furthermore, the genuine two-loop corrections to the photon self-energy were taken from [68]. The diagrams for the real-virtual and double-real contributions were generated using QGraf [69] and calculated with Package-X [70]. An independent calculation was performed using FORM [71]. Complicated scalar triangle-and box-functions were then evaluated with the COLLIER library [34]. Additionally, COLLIER was used to perform a numerical stable tensor reduction in problematic regions of the phase space.
With the momenta of the particles labelled as in (9) we define the invariants t e = (p 1 − p 3 ) 2 and t µ = (p 2 − p 4 ) 2 . In the case of purely virtual corrections we have t e = t µ . The energy of the outgoing electron and muon are denoted by E e and E µ , respectively. Additionally, we use θ e and θ µ as the corresponding scattering angles relative to the beam axis. We further assume a muon beam of energy E = 150 GeV, consistent with the M2 beam line at CERN North Area [8].
The total cross section is ill-defined due to the behaviour dσ/dt ∼ t −2 with t min ≤ t ≤ 0. We therefore have to apply a cut on the maximal value of t or equivalently on the minimal energy of the outgoing electron. In all of the results below we have chosen E e > 1 GeV. To model the geometry of the detector we require in addition that θ µ > 0.3 mrad.
with s the centre-of-mass energy. This allows to restrict radiation with the elasticity cut In the following we present results with and without this additional cut, in order to analyse its impact on the radiative corrections. A similar effect can be expected as in the case of the acoplanarity cut of [15], where the NLO corrections flatten out significantly.
In summary, we consider the two scenarios • S1: E e > 1 GeV, θ µ > 0.3 mrad, The order-by-order contributions, σ (i) , to the integrated cross section, σ 2 = σ (0) + σ (1) + σ (2) , are presented in Table 2. 8 It also shows the corresponding K factors defined as In particular, we present distributions with respect to θ e and t µ . The differential cross section at LO as well as at NNLO are displayed in the upper panels. In addition, the lower panels show the differential K factors with x ∈ {θ e , t µ }. In dotted lines, the K factors without the inclusion of the VP are shown. We first remark that the numerical error for the distribution dσ/dθ e (Figure 7) is much smaller than for dσ/dt µ (Figure 8). This is due to the fact that the cross section in the latter case is practically zero in most parts of the kinematically allowed region. As exemplified in Section 4, the statistics for dσ/dt µ could be drastically improved using tailored runs. Nevertheless, the discontinuities of Figure 8 indicate that the Monte Carlo error for individual bins provided by McMule might be underestimated.
Furthermore, sizeable NLO and NNLO corrections of up to 30% and 0.5%, respectively, can be observed. Naively, one could therefore conclude that the target precision of 10 ppm of MUonE is far out of reach. First of all, however, it has to be noted that the enhancement of the corrections at the end points of the distributions are due to soft photon emission. For a reliable description in this region, the logarithms need to be resummed. The leading logarithms can be resummed with a  7: The differential cross section w.r.t. θ e at LO (green) and NNLO (red) for scenarios S1 (without elasticity cut) and S2 (with elasticity cut). The NLO and NNLO K factors are shown in blue and red, respectively. parton shower. Moreover as detailed in [55], also the calculation of the next-to-leading logarithms to all orders might be feasible. Secondly, the elasticity cut has the important effect of significantly reducing the variation in the K factors. Since the MUonE experiment proposes to measure ratios of cross sections of different kinematic regions to cancel systematic uncertainties as opposed to absolute values, the flatness of the corrections is highly advantageous [55,72].

Lepton-proton scattering with McMule
There is a long history in the study of elastic electron-proton and muon-proton scattering and the computation of radiative corrections to these processes started in the sixties [73]. If lepton-mass effects are taken into account appropriately, the two processes are the same from a computational point of view. Typically, the corrections to lepton-proton scattering at NLO [74,75] are split into different gauge-independent parts: corrections from the lepton line, corrections due to VP effects, corrections from the proton line and the so-called two-photon exchange corrections [76][77][78][79][80]. The latter contain contributions from inelastic intermediate states. This makes it difficult to obtain solid predictions from first principles. Looking at the charge asymmetry, i.e. the difference between ℓ + p and ℓ − p scattering, is a useful tool to gain more information on two-photon exchange [81].
Going beyond NLO, the situation becomes considerably more complicated. To mention just two complications, the VP effects cannot be factorised any longer [57] and apart from the emission of photons also the emission of an additional ℓ + ℓ − pair potentially needs to be considered.
Due to the small mass of the electron, corrections from the lepton line are typically dominant for ep → ep. Hence, they received particular attention. Effects beyond the the soft approximation for the radiation from the electron were considered [82], as well as resummation of leading logarithmic effects [83]. Recently a calculation including the corresponding NNLO effects was presented in [84]. Resummation was also studied in an effective theory approach [85].
For these corrections, the only difference between lepton-proton scattering and the results presented in the previous section is due to the fact that the proton is not pointlike. This can be accounted for by parametrising the photon-proton interaction through form factors. Of course, electron-proton scattering has been used to determine these form factors and, in particular, their behaviour for small momentum transfer squared. This allows for an extraction of the proton radius, see e.g. [86]. However, for the results presented in this section we will simply use the standard dipole form factors, as given in Appendix A.
Thus, in this section we show NNLO results for unpolarised elastic lepton-proton scattering in the approximation that the lepton interacts with the proton through the exchange of a single photon with the standard dipole form factor. All lepton mass effects as well as leptonic and hadronic VP effects are taken into account. On the other hand, two (or more) photon exchange as well as radiation from the proton is neglected. We also make the assumption that there are no additional lepton pairs in the final state.

NNLO effects in elastic electron-proton scattering
As a first example, we consider e − p → e − p in a setting with kinematics adapted to the P2 experiment [6]. An incoming electron of energy E = 155 MeV is scattering off a proton initially at rest. We consider scattering angles in the range 25 • < θ e < 45 • . Following [84], we also apply a cut on the energy of the outgoing electron and require E e > 45 MeV.  Starting with the total cross section (subject to the cuts above) we list the results in Table 3. Apart from listing the full NLO and NNLO corrections, σ (1) and σ (2) , we also give separately the VP contribution (leptonic and hadronic) to the NLO and NNLO corrections. While the NLO corrections are rather large (about 5% with the VP contributing about 1%) the NNLO corrections are below 0.1%.
The first differential observable we consider is dσ/dθ e . In the top panel of Figure 9 we show the LO (green) and NNLO (red) differential cross section. The latter includes VP contributions. In order to assess the effect of higher-order corrections we show the K factors in the bottom panel. The solid (dotted) lines refer to the corrections with (without) VP contributions. Since there are no large logarithms for this observable, the size of the corrections is in agreement with the expectation due to the counting of powers of α for all values of θ e . Consequently, the missing N 3 LO contributions due to emission from the electron are expected to be O(10 −6 ) and, hence, negligible. Emission from the proton and two-photon exchange contributions, however, will need to be properly taken into account.
Results similar to those shown in Figure 9 have been presented in [84], not including VP contributions. Our NLO results (without VP) agree with these results. However, we disagree substantially with the NNLO corrections of [84], even if we adapt to their calculation and include the electron loop in the two-loop vertex diagram. With respect to the results presented in Section 5.2 that have been verified independently by [60], the only new ingredients are the matrix elements. They have been compared pointwise with [84] and agree.
As a second example we show dσ/d|t| in Figure 10. The difference between the LO result (green) and NNLO result (red) is barely visible in the top panel. The size of the higher-order correction can be read off from the lower panel. While at LO, t = (p 2 − p 4 ) 2 determined from the proton kinematics is the same as t e ≡ (p 1 − p 3 ) 2 determined from electron kinematics, these two quantities start to differ at NLO. In our approximation the 'true' Q 2 that enters the form factors is Q 2 = −t. To illustrate the difference, we show the K factors with |t| as well as the electronic |t e |. The size of the corrections differs by about 20% between the two observables.
Generally speaking, the corrections are well under control for most values of Q 2 = |t|, but increase towards the endpoint as in Figure 8. Indeed, NLO (NNLO) correction up to 10% (0.5%) are found in the tail of the distribution, and to obtain a very precise theoretical prediction in this region, large logarithms would have to be resummed.

NNLO effects in elastic muon-proton scattering
Elastic muon-proton scattering µp → µp can be used to obtain an independent extraction of the proton radius and shed light on possible differences between muons and electrons. In fact, MUSE [9] will measure simultaneously ℓ ± -p scattering with ℓ ∈ {e, µ}. Since we are neglecting two-photon exchange, there is no difference between ℓ + and ℓ − and the only difference to the process of Section 6.1 is the mass of the lepton. As we will see below, the larger mass of the muon     typically results in smaller corrections.
For the purpose of illustration we consider an incoming muon of momentum | p 1 | = 210 MeV scattering off a proton at rest. For the scattering angle range we use 20 • < θ µ < 100 • , as appropriate for MUSE. We include the same contributions as in Section 6.1. Again we start with the total cross section (subject to the cuts above) and present our results in Table 4. In this case, the NLO (NNLO) corrections are just over 10 −2 (10 −4 ) and are actually dominated by the VP contributions.
The results for the differential cross section dσ/dθ µ are depicted in Figure 11. Again we show the K factor with (solid) and without (dotted) VP contributions. This shows the dominance of the VP effects which themselves are entirely driven by the contribution of the electron. The corrections are roughly a factor 4 smaller than for electron-proton scattering shown in Figure 9. Accordingly, we expect N 3 LO corrections from the emission of the muon to contribute well below O(10 −6 ) to dσ/dθ µ . This is encouraging in particular if these effects are seen as background to measure and study two-photon contributions.
As a second differential observable we consider dσ/dE kin µ , where the kinetic energy of the muon is defined as E kin µ ≡ E µ − m µ . At LO there is a one-to-one relation between the scattering angle θ µ and E kin µ . Beyond LO, for a given θ µ there will be events with smaller E kin µ due to additional radiation. In order to illustrate this, we define four θ µ bands as follows: The corresponding values for E kin µ at LO are also indicated. At LO, all events of a given band will fall into this range of E kin µ . This can be seen in the top panel of Figure 12, where dσ/dE kin µ at NNLO is shown in red (band 1), azure (band 2), green (band 3), and yellow (band 4). Outside the LO E kin µ range, the cross section falls sharply and is only non-zero due to radiative events. The middle panel shows the NLO K factor. Since K (1) is formally infinity outside the LO E kin µ range, this factor is only shown in the region where the LO cross section does not vanish. Finally, in the lowest panel we show the NNLO K factor. Within the LO E kin µ range, these corrections are small in accordance with the α 2 suppression. Outside the LO E kin µ range, however, the NNLO corrections are quite large, up to 1.5%. This is not very surprising, since in this kinematic regime the NNLO terms are in fact only a NLO description of the observable.

Future developments of McMule
Once a mule has made up its mind, it is difficult to stop it. Hence, there will be continuous further developments and extensions of the code.
Roughly speaking, further developments can be divided into two classes. First, new processes or more complete descriptions of already implemented ones will be added. Second, there will be technical advances that improve the performance and precision, and potentially enable the implementation of more complicated processes.
With respect to the first class, work is ongoing to implement Møller scattering and e + e − → γ γ at NNLO. As an example for improvements on already implemented scattering processes we mention the inclusion of polarisation effects. So far, only the descriptions of decays are available for polarised initial states. Many of the processes mentioned in the introduction are, however, related to measurements of asymmetries that typically require polarised initial states. Such observables also often require the inclusion of electroweak corrections, as they lead to parity-violating effects. In addition, a full NNLO description of muon-electron scattering is envisaged. In order to go beyond the approximation of electronic corrections the full two-loop matrix element is required. The corresponding integrals in the limit of massless electrons are known [87,88] and the amplitude is being computed [89]. In addition, also the one-loop matrix element for eµ → eµγ is required. The implementation of one-loop amplitudes for NNLO calculations requires particular care, since they are to be integrated over singular corners of the phase space. This results in two requirements. First, they have to be implemented with extreme numerical stability. Second, the numerical evaluation has to be reasonably fast.
To address these issues, in the long term it is probably advisable to link McMule to a dedicated code that evaluates higher-order amplitudes. There are several one-loop tools that specialise in this (for example [90][91][92][93]). While so far all attempts regarding automated computations of oneand two-loop amplitudes were dedicated to high-energy processes, it should be possible to adapt these tools to QED computations with massive fermions. OpenLoops [94] is one such tool that in the past has been relied upon for real virtual corrections. We also plan to set up an interface to OpenLoops to facilitate their numerically stable calculation. Of course, a major hurdle on the path towards using an external tool for all amplitudes is that the tool would have to be extended to two-loop calculations. While first steps have been made in this direction [95], we anticipate that two-loop amplitudes will have to be implemented directly in McMule for the time being.
Also related to the numerical stability of the integration is the treatment of pseudo singularities related to near collinear emission of photons. This is dealt with by splitting up the phase space such that only a small number (ideally one) of pseudo singularities is possible in each partition. Then the phase space is tuned such that there is a simple one-to-one match between the dangerous regions of phase space and an integration variable. Such a phase-space parametrisation typically results in a stable and reliable numerical evaluation of the integrals. As a possible further development, there is the option to subtract the pseudo-collinear singularity and add back a partially integrated counterterm [96]. However, since the logarithms arising from these phase-space region are physical, it is important to have a very flexible and exclusive treatment of the final-state particles.
Since FKS ℓ works at all orders in perturbation theory, it is only the lack of the matrix elements that prevents us from going beyond NNLO. One example, where a N 3 LO calculation might be feasible in the near future concerns the dominant electronic contribution to muon-electron scattering. As a more futuristic development we mention the idea to possibly compute the finite (eikonal-subtracted and ultraviolet-renormalised) matrix elements M (ℓ−i)f n+i that are the ingredients of FKS ℓ directly numerically.
Finally, many observables will be dominated by large logarithms, at least in some range of the distributions. Combining fixed-order calculations with a QED parton shower is a generic and powerful tool to resum the leading logarithms. Thus, the mule might want to take a shower after a hard day's work. The structure of FKS ℓ is particularly amicable to a YFS parton shower because it already exploits the YFS structure.
Apart from technical developments we have also made steps towards being as open as possible with our results and facilitating their cross checks. All data that has been used in the plots presented here are available on a public git repository https://gitlab.psi.ch/mcmule/user-library For each data set, we give the input data and a SHA1 identifier of the code used to create it. Since the code is available as a Docker image, anyone will be able to reproduce our results, regardless of operating system and dependencies. We hope this will accelerate progress in the theoretical description of low-energy particle physics experiments.

A Input parameters
The computations in this paper are performed in the on-shell scheme for the coupling and using pole masses. Accordingly, the input parameters we use are [97] To convert from MeV to µb we use (c ) 2 = 1 = 3.89379372 · 10 8 MeV 2 µb. When presenting branching ratios of the muon, we always divide by the full width, determined from the lifetime 2.196981 · 10 −6 s as Γ µ = 2.995984 · 10 −16 MeV. The interaction of the photon of momentum q = p ′ − p with the proton is parametrised as where Q 2 = −q 2 ≥ 0. The form factors F 1 and F 2 are related to the Sachs form factors as where τ ≡ Q 2 /(4m 2 p ). Using the standard dipole parametrisation with Λ 2 = 0.71 GeV 2 we set Here κ = 2.79284734 is the proton's magnetic moment in units of the nuclear magneton.