Reconstructing partonic kinematics at colliders with Machine Learning

In the context of high-energy physics, a reliable description of the parton-level kinematics plays a crucial role for understanding the internal structure of hadrons and improving the precision of the calculations. Here, we study the production of one hadron and a direct photon, including up to Next-to-Leading Order Quantum Chromodynamics and Leading-Order Quantum Electrodynamics corrections. Using a code based on Monte-Carlo integration, we simulate the collisions and analyze the events to determine the correlations among measurable and partonic quantities. Then, we use these results to feed three different Machine Learning algorithms that allow us to find the momentum fractions of the partons involved in the process, in terms of suitable combinations of the final state momenta. Our results are compatible with previous findings and suggest a powerful application of Machine-Learning to model high-energy collisions at the partonic-level with high-precision.


Introduction
Thanks to recent technological advances and increased computational power, Machine Learning (ML) has taken by storm our everyday life. Applications of ML cover fields as diverse as image and speech recognition, automatic language translation, product recommendation, stock market prediction and medical diagnosis, to mention some examples. High-energy physics has not remained indifferent to the opportunities offered by these techniques. In the last years several applications have been developed, particularly in regards to data analysis. Novel jet clustering algorithms that use improved classification to identify structures [1], reconstruction of the Monte-Carlo (MC) parton shower variables [2], and reconstruction of the kinematics [3] are just some of the explored uses. In particular, the high luminosity upgrade of the Large Hadron Collider (LHC) and the upcoming Electron-Ion Collider (EIC) are feeding the interest of the community in ML 1 . From a theoretical perspective there has been progress in the calculation of higher-order scattering amplitudes assisted by ML algorithms [4] and, in phenomenology the NNPDF collaboration has pioneered the determination of the partonic structure of hadrons [5][6][7][8][9][10][11][12].
The successes of the perturbative expansion of Quantum Chromodynamics (QCD) to describe processes involving hadrons lies in the factorisation of the physical observables into hard (perturbative, process-dependent) and soft (non-perturbative, universal) terms [13]. The former describe the interaction between elementary particles while the latter encode all the information concerning non-perturbative physics, i.e., the description of the partons inside the hadrons before the interaction and their posterior hadronisation into detected particles. For these, only the scale evolution can be determined once they are known at some other scale, and thus must be obtained from data through global fits 2 .
The simplest description of a hadron is that of a collection of partons moving in the same direction. The probability of finding a particular parton a in a hadron H carrying a fraction x of its momentum is given by the parton distribution function (PDF) f H/a (x, µ), when the hadron is explored at scale µ. After the hard interaction all outgoing coloured particles will hadronise; the probability of a parton a to fragment into a hadron H with a fraction z of its original momentum is described by the fragmentation function (FF) D a/H (z, µ). This collinear picture is the best explored and in this framework several sets of PDFs and FFs have been extracted using standard regression techniques (e.g. [15][16][17][18][19]), MC sampling (e.g. [20,21]) and MC sampling with neural networks (e.g. [5]).
In order to perform a meaningful calculation, the hard cross-section must be convoluted with the PDFs and/or FFs, over the corresponding momentum fractions of the partons. In the inclusive deep inelastic scattering (DIS) process, where a lepton and a parton inside a hadron interact by exchanging momentum Q 2 ≥ 1 GeV 2 , measuring the scattered lepton (and/or final hadrons) provides the full kinematics of the event. Unfortunately, in protonproton (p + p) collisions the situation is not so simple. One has to estimate the momenta of the initial partons (that enter in the evaluation of the PDFs) using the measured momenta and scattering angles of the final state particles. Depending on the process and the characteristic of the detectors, it can become a complicated task. Despite its inherent complexity, it is of the utmost importance in some situations. For example in the case of asymmetric proton-nucleus (p + A) collisions, particles created in the the backward (nucleus-going) direction are linked to initial partons in the nucleus with low-x, and those in the forward (proton-going) direction are related to partons in the nucleus with large-x. Depending on its exact value, one could have an enhancement or a suppression of the nuclear PDF w.r.t. the free proton one. Knowing the region of the detector associated with the kinematics of interest for a given process is also relevant for the efficient design and construction of the detectors [22]. The proper mapping of the measured kinematics onto the partonic level is crucial for a correct evaluation of the cross-sections and interpretation of the perturbative calculations. This can be done analytically at leading order (LO) for processes involving few particles, but as one considers higher orders the emission of real particles makes it hard to fully determine the kinematics, and normally phenomenological approximations are used.
In the present work, we aim to use ML to determine the relation between the measurable four-momenta of the final particles and the parton-level kinematics. In particular, we focus on p + p collisions with one photon plus one hadron in the final state, computed using QCD and Quantum Electrodynamics (QED) corrections. This process has already been identified as an interesting observable at the Relativistic Heavy-Ion Collider (RHIC) [23]. Our goal is to obtain the functions that, depending on the four-momenta of the photon and hadron, give x i (the fraction of momentum of the proton i carried by the parton coming from it, i = 1, 2) and z, the fraction of energy of the parton coming from the hard interaction that is taken by the hadron (in our analysis a pion).
This article is organised as follows. In Sec. 2 we describe the framework used to implement the MC simulation of hadron-photon production, with special emphasis on the isolation prescription (Sec. 2.1). Relevant phenomenological aspects of the process are discussed in Sec. 3. The distributions w.r.t. different variables are presented in Sec. 3.1, with the purpose of identifying the most probable configurations. We also explore the correlations between different measurable variables and the partonic momentum fractions in Sec. 3.2. In Sec. 4, we detail the implementation of reconstruction algorithms based on ML to approximate the partonic momentum fractions using only measurable quantities. Finally, we discuss the results and comment on potential future applications of our methodology in Sec. 5.

Computational setup
From the theoretical point of view, the calculation relies on the factorization theorem to separate the low-energy hadron dynamics (i.e. the non-perturbative component embodied into the PDFs and FFs) from the perturbative interactions of the fundamental particles. This approach is valid in the high-energy regime, under the assumption that the typical energy scale of the process is much larger than Λ QCD ≈ 900MeV. The process under consideration is described by (2.1) and the differential cross-section is given by where {a i } denote the possible flavours of the partons entering into the fundamental highenergy collision. f H i /a j (x, µ I ) is the PDF of the parton at the initial state factorization scale µ I , and D a j /h (z, µ F ) is the FF of the parton at the final state factorization scale µ F . The partonic cross-section, dσ, depends on the kinematics of the partons as well on the factorization and renormalization scales (µ R ) and can be computed using perturbation theory. It is worth appreciating that we consider all the partons to be massless.
In Eq. (2.2) we consider the photon as a parton, i.e. a i ∈ {q, g, γ}. Namely, we rely on the extended parton model to include mixed QCD-QED corrections in a consistent way [24][25][26][27][28]. However, we will assume that the fragmentation of a photon into any hadron is highly suppressed w.r.t. the same process initiated by a QCD parton. This implies that we neglect D γ/h and a 3 is always a QCD parton (quark or gluon). Also, since we are looking for a photon in the final state, we can write which leads to where a 4 is a QCD parton. By rewriting Eq. (2.2) in this way, it is possible to separate two different mechanisms originating photons in the final state 3 . The first term describes the direct production of an observed photon in the partonic collision; in the second term the observed resolved photon is generated from a non-perturbative process initiated by the parton a 4 . It is worth appreciating that these contributions are not individually distinguishable; however the latter can be suppressed by applying adequate prescriptions. By realising that the resolved component appears in the context of hadronisation, the photon being produced together with a bunch of hadrons, one can exploit this signature to enhance the direct photon: it is the motivation for introducing isolation prescriptions. By selecting mainly those events that contain photons isolated from hadronic energy, the total cross-section can be approximated to i.e. neglecting the resolved component and summing over all QCD-QED partons. The partonic cross-section dσ a 1 a 2 →a 3 γ incorporates the isolation prescription and is described in greater detail in Sec. 2.1.
We can now move to the discussion of how to include the QED corrections. The next-toleading order (NLO) pure QCD corrections for this process were computed in Refs. [23,29]. Since in this case we are dealing with mixed QCD-QED corrections, we have to consider the two couplings involved in the perturbative expansion. From the computational point of view, we can profit from the Abelianization techniques to directly obtain QED contributions from the QCD ones [26,27,[30][31][32]. Given that the energy scale of the process is roughly O(10 GeV), we have α S ≈ 0.12 and α ≈ 1/129. This means α ≈ α 2 S , indicating that the LO QED corrections have the same weight as the NLO QCD ones. Therefore, the dominant contribution is given by the partonic channels qq → gγ and qg → qγ at O(α S α), i.e.
with S 2 the measure function containing the definition of the kinematical selection cuts for the 2 → 2 sub-processes. We have then to include O(α 2 S α) and O(α S α 2 ) contributions, associated to the partonic channels and qγ → qγ , qq → γγ , (2.8) respectively. In this way, the corrections to the partonic cross-section are given by [33] dσ ISO,(1) whereŝ is the partonic center-of-mass energy and r denotes the extra parton associated to the real radiation correction. |M (0) | 2 and |M (1) | 2 are the squared matrix-elements for the tree-level and one-loop corrections, respectively. In these expressions, S 3 represents the measure function that implements the experimental cuts and the isolation prescription for the 2 → 3 sub-processes.
Since we are dealing with higher-order corrections, singularities will appear in the calculation. The LO QED is given by a (finite) Born level process. However, the NLO QCD corrections involve both ultraviolet (UV) and infrared (IR) singularities that must be regularized and cancelled to get a physical result. The regularization was done using Dimensional Regularization (DREG) [34][35][36][37]. The virtual corrections were computed starting from the one-loop QCD amplitude for the process 0 → qqgγ, removing the UV poles through the renormalization in the MS scheme. In order to cancel the IR singularities, we relied on the subtraction formalism [38][39][40][41][42], splitting the real phase-space in regions containing only one kind of IR singularity. When combining the real and the virtual corrections, some of the IR divergences associated to final state radiation (FSR) cancel by virtue of the KLN theorem [43,44]. But to achieve a full cancellation, counter-terms were added to remove the remaining initial-state and final-state contributions absorbed into the PDFs and FFs, respectively. In this way, the master formula for the partonic cross-section at NLO QCD + LO QED accuracy is symbolically given by where dσ ISO,cnt,(I) a 1 a 2 →a 3 γ and dσ ISO,cnt,(F ) a 1 a 2 →a 3 γ are the initial and final-state IR counter-terms, respectively. Here, C UV a 1 a 2 →a 3 γ is the renormalization counter-term for the partonic process a 1 a 2 → a 3 γ in the MS scheme 4 .

Isolation prescription and other assumptions
In order to suppress events with photons originated from the decay of hadrons, it is necessary to implement an isolation prescription. The idea behind most of the strategies available in the literature consists in quantifying the amount of hadronic energy surrounding a wellidentified photon, and rejecting events with more hadronic energy than a certain threshold. Whilst most of the prescriptions work nicely at LO, not all of them are infrared safe. For instance, it is known that choosing a fixed cone eliminates events that play a crucial role in the cancellation of IR singularities. Thus, special care is needed in the implementation of these methods 5 .
In this work, we rely on the smooth cone prescription introduced in Ref. [49]. Its main advantage is that it suppresses the resolved component without preventing the emission of soft/collinear QCD radiation, which makes it IR-safe and fully suitable for higher-order calculations. In the first place, we fix a reference point in the rapidity-azimuthal plane (η 0 , φ 0 ), and define the distance with (η j , φ j ) the angular coordinates of the parton j. Once we identify a photon in the detector, we trace a cone of radius R around it and look for QCD partons inside. If no QCD radiation lays inside the cone, the photon is isolated. If not, we identify the QCD partons inside the cone, {a j }, and measure their distance to the photon following Eq. (2.11). Then, for a fixed r ≤ R, we calculate the sum of the hadronic transverse energy according to We want to restrict E T by imposing an upper bound, thus limiting the amount of hadronic energy surrounding the photon. In the fixed cone prescription, this limit is a constant. However, for the smooth prescription, we introduce an arbitrary smooth function ξ(r) satisfying ξ(r) → 0 for r → 0, and require E T (r) < ξ(r) for every r < r 0 . Only if this condition is fulfilled, the photon is isolated; otherwise, the event is rejected.
The experimental implementation of this criterion requires a very high angular resolution, something that is usually not achievable in practise. This is one of the reasons because most of the current experiments still rely (mainly) on the fixed cone prescription. Fortunately, the difference between both approaches can be neglected for several relevant observables [46,47]. In any case, technological improvements in detector science will certainly reduce the experimental limitations in the near future.
Finally, let us mention one further detail about the implementation. We will neglect the partonic channel qq → γγ in Eq. (2.5), which would imply the introduction of the fragmentation D γ/h . From the point of view of perturbation theory, this fragmentation can be interpreted as a collinear electromagnetic splitting γ → a+X, with a a QCD-parton that undergoes hadronization to generate the observed hadron h. Performing a naive counting, this contribution is O(α 3 ) and turns out to be sub-leading w.r.t. the NLO QCD + LO QED terms studied in this work 6 .

Phenomenological results
Using the formalism explained in the previous Section, we calculated the unpolarized crosssection via a code that uses adaptive MC integration. In this program, the different contributions to 2 → 2 and 2 → 3 processes are computed independently, and kinematic cuts can be imposed. In particular, we reproduced the experimental cuts corresponding to the PHENIX detector, i.e.
with η the rapidity of the particles measured in the hadronic center-of-mass frame. On top of that, we require |φ h − φ γ | > 2 to retain those events with the photon and hadron produced almost back-to-back. We perform the simulations at centre-of-mass (c.m.) energy ( √ S CM ) 200 GeV for RHIC and at √ S CM = 13 TeV for LHC Run II, keeping in this case the same cuts described in Eq. (3.1). Since the pion is the lightest hadron and is produced more copiously, we restrict our attention to the case h = π + . Additionally, we considered the scenario for Tevatron at √ S CM = 1.96 TeV because it involves proton-antiproton (p+p) collisions. In principle, this process might exhibit a different dependence on PDFs and FFs, compared to p + p collisions.
Regarding the non-perturbative ingredients of the calculation, we used the LHAPDF package [50,51] to have a unified framework for the PDF implementation. We relied on the NNPDF4.0NLO [12] and NNPDF3.1luxQEDNLO [52][53][54][55] parton distributions for the pure QCD and mixed QCD-QED calculations, respectively. In both cases, we use the set 0, which corresponds to an average over the different replicas. For the fragmentation functions, we used the DSS2014 set at NLO accuracy [18,56]. Also, we evolve the QCD and QED couplings using the one-loop RGE with the initial conditions α S (m Z ) = 0.118 and α(m Z ) = 1/128.
Finally, we fixed the factorization and renormalization scales to be equal to the average transverse momenta of the hadron and the photon, i.e.
Regarding the implementation of the smooth isolation criteria, we used the function where E γ T is the transverse energy of the photon and r 0 = 0.4. As mentioned before, the only requirement for ξ(r) is that ξ(r) → 0 smoothly, and Eq. (3.3) fulfils this condition.

One-dimensional distributions
Since we are looking at the process p + p → π + γ + X, are the experimentally accessible variables measured in the c.m. system. Notice that we consider only the difference of the azimuthal angles, because the problem has rotational symmetry around the collision axis. Moreover, it turns out that cos(φ π − φ γ ) is a variable often used by experimental collaborations [57]. Figure 1. Unpolarized cross-section for the production of one photon plus one pion as a function of the transverse momentum of the pion (left) and the photon (right), respectively. We considered the selection cuts described in the previous section, for LHC Run II and RHIC, respectively.
In Figures 1, 2, 3, 4 and 5 we present the single differential cross-section as a function of the variables V Exp for RHIC and LHC Run II. Our predictions are shown for LO QCD (dotted red), NLO QCD (solid black) and NLO QCD + LO QED (dashed blue), considering the default scale choice defined in Eq. (3.2). In first place, we study the pion (p π T ) and photon (p γ T ) transverse-momentum spectrum in Fig. 1. The cross-section increases for higher c.m. energies and the impact of the QED corrections also becomes more sizable.
The distribution in p π T falls faster than the p γ T -spectrum, mainly because of the convolution with the FFs. In fact, the experimental cuts imposed ensure an important contribution of events with close-to-Born kinematics. In this case, p γ T is associated to the transverse momentum of the parton c which fragments into a pion with momentum fraction z. Since the FFs tend to favour the region with z ≤ 0.2 [58], the suppression observed in Fig. 1 can be understood. Next, we present the distributions in the rapidities (Fig. 2) and the azimuthal variable cos(φ π − φ γ ) (Fig. 3). In both cases, we show a comparison between RHIC and LHC Run II. For the rapidity distribution, we appreciate a significant NLO QCD correction, although the added LO QED effects are very small. Regarding the azimuthal spectrum, we can observe in Fig. 3 a peak in the back-to-back region (i.e. cos(φ π − φ γ ) = −1), with a fast suppression for configurations beyond Born-level kinematics. Besides the distributions w.r.t. the experimentally-accessible quantities, we can compute the differential cross-section as a function of the partonic momentum fractions, x 1 , x 2 and z. For p + p collisions we consider only the distributions w.r.t. x 1 due to the symmetry of the system. In what follows, x and x 1 will be used interchangeably. The corresponding plots are shown in Fig. 4, for x = x 1 (left) and z (right). We notice that the experimental cut in p γ T induces a restriction on the maximum value of x involved in the collision. In fact, using a LO approximation, we get beyond this value, the cross-section is drastically suppressed. For RHIC and LHC Run II, it translates to x Max ≈ 0.03 and x Max ≈ 0.001, respectively. Thus, we will use this information to restrict the x-range in the correlation analysis presented in the next section. In this way, we will avoid dealing with regions with a negligible amount of events. Notice that the higher the energy of the process, the lower the x-range accessible by the experiment. Regarding the dependence in z (right panel of Fig. 4), it reaches almost the endpoint region (i.e. z = 1) with a reasonable amount of events. The fact that we impose p π T ≥ 2 GeV translates into a lower bound for z given by which corresponds to z Min ≈ 0.004 and z Min ≈ 0.0001 for RHIC and LHC Run II, respectively. Opposite to the case of the x-distribution, here the higher the energy of the process, the wider the accessible z-range. It is worthwhile noticing that the FFs used in this work do not include in the fit data with z ≤ 0.05 and extrapolations into that region are most likely unreliable. The distribution present a peak, located at z Peak ≈ 0.35 for RHIC (z Peak ≈ 0.25 for LHC Run II). The position of the peaks depends on the explicit functional form of the PDFs and the FFs.
To conclude this section, we study the case of p+p collisions at Tevatron, with √ S CM = 1.96 TeV. In this case, the symmetry between x 1 and x 2 is broken, since x 1 (x 2 ) corresponds to the momentum fraction of a parton inside a proton (antiproton). In Fig. 5 we present the distribution for x 1 (left) and x 2 (right). We can appreciate that the distribution in x 2 reaches a peak around x 2 ≈ 0.01 and then falls faster than the x 1 -distribution. We know from previous studies that the partonic channel gg is dominant [23], and thus we expect the differences to take place in the qq and qQ channels. This also has an impact when studying the x 1 vs x 2 correlations, as we will show in the next subsection.

Correlations with the partonic momentum fractions
Since one of the main goals of this work is to reconstruct the partonic kinematics starting from experimentally accessible quantities, it is useful to first study the correlations among the different variables. This helps us to prioritize certain ansatzes depending on their functional form, in such a way that we capture the leading behaviour when exploring linear models. In the following, we restrict the discussion to RHIC kinematics (with the cuts defined in the previous section).
We start by considering the relation between x = x 1 and the transverse momentum of the particles in the final state. In Fig. 6, we present the correlation between x 1 and p γ T (left column) and p π T (right column). Each bin contains the corresponding integrated cross-section at LO QCD (upper row) and NLO QCD + LO QED (lower row) precision. Notice that the inclusion of higher-order corrections leads to a broadening of the patterns, originated by the presence of events in previously empty bins due to an extended phasespace. This is a general behaviour that also manifests when studying the correlations of other variables. Events with low p γ T are associated with low x 1 , and there is a somehow linear correlation between these variables. Events with low p π T are mostly uniformly spread in the region of x 1 ∈ [0.2, 0.6]. This behaviour is expected from the fact that the photon is originated from the partonic event (its energy is directly related to the energy of the colliding partons), whilst the pion comes from an hadronization (which implies the convolution with the FF and the consequent spreading of the distributions).
Next we move on to analyze the correlation between x = x 1 and the rapidities of the particles in the final state. It is important to highlight that the analysis here does depend on the momentum fraction being used, i.e. x 1 or x 2 , since the rapidity introduces an asymmetry in the direction of the colliding particles. We show, in Fig. 7, the plots of x 1 vs. η γ at LO QCD (left) and NLO QCD + LO QED (right), respectively. Similar results were found when considering x 1 vs. η π and are thus not presented here. Since the distributions are rather flat for −0.3 ≤ η ≤ 0.3, we find that most of the events are uniformly distributed for x 1 ∈ [0.2, 0.5]. Finally, notice that below x 1 ≈ 0.2, the cross-section falls steeply as a consequence of the imposed kinematical cuts, and the bins are empty.   The analogous results on the z dependence are presented in Fig. 8, the upper (lower) row corresponding to the LO QCD (NLO QCD + LO QED) contributions. On the left column we show the correlation between z and p γ T , and between z and p π T on the right column. The former seems to be slightly negative, i.e. smaller values of z tend to be favoured in events with higher p γ T , while the latter has a concentration of events in the low p π T region with z ≥ 0.4. Also, as expected, events with high p π T require higher values of z since the amount of partonic energy is limited by the cut p γ T ≤ 15 GeV. The correlation between z and the rapidities of the final state particles shows a rather flat dependence on η, as depicted in Fig. 9 for the case of η γ (similar plots were obtained when considering η π ).
Then, let us consider the correlations with the azimuthal variable cos (φ π − φ γ ) in Fig.  10. Of course, the contributions associated to the Born kinematics are restricted to the first bin because cos (φ π − φ γ ) = −1 (i.e. the pion and the photon are produced back-to-back). The remaining bins are heavily suppressed, since they only receive contributions from the real radiation. We see that the events are strongly concentrated in the medium and low-x region without a clear trend or dependence w.r.t. cos (φ π − φ γ ). For z, the distribution spreads over more bins, and there is a subtle trend to favour events with a bigger azimuthal separation (smaller values of − cos (φ π − φ γ )) and slightly lower values of z. Finally, we analyze the correlation between x 1 and x 2 for p+p collisions. In Fig. 11, we show the correlation plots at LO QCD (left) and NLO QCD + LO QED (right) accuracy, for RHIC kinematics. As expected, there is a compact region containing events at LO, reflecting the kinematical constraints of a 2 → 2 process. The events are concentrated in the low-x region and show a strong positive linear correlation between x 1 and x 2 : this reflects the fact that it is more probable to have events in the back-to-back region, in agreement with Fig.  10. When introducing higher-order corrections, the real emission phase-space gets enlarged and the distributions are spread. In any case, the positive correlation between x 1 and x 2 remains, with an strong concentration of events in the middle and low-x region. Also, it is worth appreciating that the NLO real corrections are not enough to enhance the number of events with rather different values of x 1 and x 2 . This is, in part, a consequence of the kinematical cuts that favour central events rather than highly boosted ones. To conclude this section, let us comment on the importance of the study of correlations. Since we want to reconstruct the partonic momentum fractions by using the measurable variables, it is important to know which ones are the most relevant. From the previous discussion, we expect that x strongly depends on p γ T (positive correlation) but not on the other variables. Analogously, z exhibits a negative correlation with p γ T , a positive one with p π T and a slight dependence on − cos (φ π − φ γ ). This knowledge will be applied to the construction of a basis of functions for determining x and z in the next section.

Reconstruction of parton kinematics
We now focus on our main goal, which is to determine the partonic variables x 1 , x 2 and z in terms of the measured momenta of the final state particles. At LO this is fully determined by energy-momentum conservation, and thus the LO case will serve as control. The real challenge appears at NLO, where real emissions prevent a straightforward determination of closed analytic formulae: this is what we will attempt to approximate using ML 7 .
In supervised ML, we have an initial set of data (the training set) and we want to map it into another known set (the target). Each entry in the training set is a vector of dimension d, with d the number of variables (features) that the target depends upon. We also assume that there is some underlying function, the so-called target function, that connects the two; the task of a ML algorithm is to find a good estimation of this function. This estimator, in turn, depends on a set of parameters that is determined by minimising a function (the cost function) that measures some distance between the prediction of the estimator and the actual targets. As a last step, one takes another set of data with corresponding labels (test set) and compares how well the estimator does over it. To prevent the estimator from performing well over the training data set but poorly over the test set (overfitting), the cost function includes also some parameters to control the trade off between a low training cost and a low test cost. The total number of regularization parameters depend on the specific method used, and the optimal value/s have to be found by picking the one/s that minimize the test cost function.
Armed with these basic concepts, we first discuss the generation of our input and target sets using the outputs of our MC code. After that, we present results obtained through the application of supervised ML for estimating x ≡ x 1 and z at LO QCD and NLO QCD + LO QED accuracy. For the purpose of the present analysis, we explore three models: a Linear Model (LM), a Gaussian Regression (GR) and the Multi-Layer Perceptron (MLP) algorithm based on neural networks. These models have been implemented in Python using the scikit-learn library [60].

Construction of the training data sets
The training and test sets were generated with the MC code used and described in the previous sections. As was mentioned already, it deals independently with each term of the computation (LO, NLO real radiation, NLO virtual terms, NLO counter-terms). This poses two difficulties when generating the training set for feeding the ML algorithms. On the one hand, only the LO calculations are finite on their own; for the NLO cross-section, we have to combine all terms (real, virtual and counter-terms) to have a meaningful finite quantity. On the other hand, by the same nature of the MC integration, no two identical points are generated in the sampling, which in turn spoils the fully local cancellation of the divergences. Instead, one has to split the different variables into bins and sum over all contributions 7 Doing a formal description of the ML methods that we used is beyond the scope of this work, and would take much more than a simple article. Moreover much literature is available on the topic (see e.g. [59]), so we will leave out such a discussion and mention just a few basic concepts needed in the rest of the section.
entering each of them. If a sufficient number of points is sampled, the divergences cancel and we obtain the finite cross-section per bin. This is a common feature of MC integration, and many codes provide routines that take care of this for one-dimensional binning. In our case we are interested in a more differential observable, so that we had to generate a large number of points to meet this condition. Moreover, not all sampled points pass the selection cuts, e.g. from the 10 9 points sampled we retain ≈ 30% at LO.
For the LO we can directly use the generated points, but for the NLO case we need to discretize the differential cross-section w.r.t. the external kinematical variables defined in Eq. (3.4). For this purpose, we create a five-dimensional grid by binning the variables in V Exp . Explicitly, we define 10 bins for p γ T and p π T , 5 bins for η γ and η π , and 6 bins for cos(φ π − φ γ ). The set of discretized experimentally-measurable variables is denoted as whereā denotes the mean value of the variable a in a given bin. In totalV Exp contains 15000 bins. Then, we define the cross-section per bin according to with x j,MIN (x j,MAX ) the minimum (maximum) value of the variable x in the j-th bin,x the corresponding average of x over the j-th bin and is the fully-differential hadronic cross-section as a function of the partonic momentum fractions and the experimentally-measurable variables V Exp . At LO, σ j can be straightforwardly calculated since we only need to integrate the tree-level scattering amplitude in a 2 → 2 phase-space. However, as we explained in Sec. 2, the NLO corrections include several contributions calculated with different kinematics (virtual, real, counter-terms): all of these are taken into account in dσ and integrated over their corresponding phase-space to obtain σ j 8 .
Once the grid and the discretized cross-section are defined, we use the MC code to generate three histograms per each bin in the grid. These histograms corresponds to the distributions dσ j /dx 1 , dσ j /dx 2 and dσ j /dz, respectively. So, given a point in the grid

4)
8 It is worth appreciating that binning could be avoided using a fully-local framework for computing higher-order corrections [61,62]. One of these methods is the Four-Dimensiona Unsubtraction (FDU) [63][64][65][66] based on the Loop-Tree Duality [67][68][69]. Since FDU leads to a fully-differential and finite representation of the cross-section, it constitutes a perfectly suited candidate to improve the efficiency of the analysis presented in this article.
we can define which correspond to the weighted average of the partonic momentum fractions extracted from the histograms generated with the MC code. At this stage, we can identifyV Exp as the training set and {(x 1 ) j , (x 2 ) j , (z) j } as the target one. Then, we can train the ML algorithms to find the target functions that will allow us to reconstruct the MC partonic momentum fractionsX 1,REAL ,X 2,REAL andZ REAL . To conclude this discussion, notice that the definitions given in Eqs. (4.5)-(4.7) are crucial beyond LO. In fact, for a 2 → 2 process, fixing the bin p j ∈V Exp leads to a unique value of the partonic-momentum fractions. Explicitly, we have as explained in Ref. [23]. Due to the presence of 2 → 3 sub-processes contributing to the real radiation, the value of {x 1 , x 2 , z} for a given p j is not unambiguously defined at NLO (and beyond). If we pick up an event with a fixed p j from our NLO MC generator, the real partonic momentum fractions might take all the kinematically-allowed values. However, the probability of the different outcomes is given by the differential-cross section of the event, which motivates the definitions introduced in Eqs. (4.5)-(4.7). In the following, we explain how these data sets are used with the different ML frameworks.

Linear regression
Linear methods, as the name indicates, provide the estimation of the target function as a linear combination of the input set. However, the linearity occurs at the level of the parameters and one can apply prior knowledge to construct new features upon which the target dependence is simpler. Choosing a good set of features (basis) plays an important role to achieve an accurate reconstruction.
For example, at LO we take inspiration from the exact analytical expressions given by Eqs. (4.11)-(4.13) and propose the basis (4.14) We then expect x 1 to be well reconstructed by a linear combination of the first two elements of the basis (with coefficient 1), whilst z should be mainly proportional to the last element. In Fig. 12, we show the correlation between the MC partonic momentum fractions (vertical axis) and the output of the linear regression (horizontal axis). Each bin contains the integrated cross-section at LO QCD accuracy. We can appreciate that the reconstruction is perfect, and the LM approach leads exactly to the Eqs. (4.11)-(4.13). When dealing with the NLO scenario, in principle, it should be expected an enlargement of the basis. The elements of B LO are not enough to fully capture the additional dependencies introduced by the NLO real kinematics. In fact, in Ref. [23] the authors proposed that agree with Eqs. (4.11)-(4.13) at LO, but introduce an additional dependence on the azimuthal variables at higher-orders. The study of correlations performed at NLO QCD accuracy using these expressions showed a good reconstruction of the MC partonic momentum fractions.
With this precedent in mind, we propose here to include additional functional dependencies to have a more flexible reconstruction. We start by defining a primitive set of functions in such a way that the reconstructed variables take the form The ansatz proposed in Eq. (4.19) generalizes the basis B LO and includes products of up to three kinematical variables, which gives more flexibility to fit the data. In total, there are eightyone functions in the basis, that we denominate general basis. However, as we will now explicitly see, a larger basis does not imply a better reconstruction.
If we take Eq. (4.19), with Y = {x 1 , z} we obtain the results shown in the upper row of Fig. 13. In this figure, we indicate the strength of the correlation with the integrated cross-section per bin at NLO QCD + LO QED accuracy. The coefficients ij are given in App. A. We can appreciate that the reconstruction is good in the low-x and low-z region. This is expected because the cross-section is larger in that region, so there are more data-points to perform the fit. However, the reconstruction becomes noisy and imprecise for higher values of the momentum fractions. The LM is unable to keep the functional dependencies that better approximate the real momentum fractions in regions with low number of events. For this reason, we explore a second approach. We take profit from the findings in Sec. 3.2, and distinguish different basis for Y = x 1 and Y = z. It was shown that x 1 exhibits a positive correlation with p γ T , so we remove the contributions involving K 6 = (p γ T ) −1 from Eq. (4.19). Regarding z, the conclusion of Sec. 3.2 was that it is correlated with K 6 = (p γ T ) −1 , K 2 = p π T and that also presents a mild correlation with K 5 . So, we remove the contributions that involve the primitive functions K 1 and K 7 . As a result, we propose a physically-motivated reconstruction by taking Eq. (4.19) and setting for for z. The coefficients obtained with these assumptions are presented in App. A, whilst the corresponding correlations with the real MC momentum fractions are shown in the middle row of Fig. 13. We can appreciate that the correlation is slightly better for z, but it is worse for x. Even if the physically-motivated basis includes elements that are selected according to the correlations with physical variables, it turns out that the abundance of points in a particular region of the parameter space imposes a very tight constraint in the whole fit. For z, it is not a big problem since it seems to be dominated by the ratio p π T /p γ T . However, the dependence of x w.r.t. the kinematical variables is more complicated, and a linear fit is not enough to capture it. Thus, reducing the basis does not lead to an improved reconstruction of the momentum fractions.
To conclude this discussion, let us mention that we tested the LM with another basis inspired by the LO formulae. Namely, this LO-inspired basis is given by for x ≡ x 1 and for z. In this case, the reconstruction was even worse, as can be seen in the lower row of Fig.  13. In particular, X 1,REC seems to be uncorrelated with X 1,REAL . So, we can appreciate that the approach followed in Ref. [23] was more efficient than the LM. In other words, forcing a linear combination that describes the LO kinematics and then using the same formulae for higher-orders, allows to achieve a more precise reconstruction. In the next subsections, we explore other methods that will lead to a better approximation of the MC momentum fractions in a more automatized way.

Gaussian regression
While the LM method provides a good description for the LO case, at NLO the result strongly depends on the variables used to feed the algorithm. As the larger basis seems to render a slightly better reconstruction, we could use this as a motivation to further expand our basis, e.g. by including higher-powers of its elements. However this relies on deciding i) which appropriate combinations of K i are needed, and ii) to which power it would be convenient to go. The first point was addressed in Subsec. 4.2 by constructing several bases, with different degree of success. Regarding the second point, we could try with different powers of a given basis, but this would be a cumbersome task. A more general and computationally efficient approach can be implemented by using the kernel trick (see e.g. [70,71]). In this method, the feature vector in the calculation is replaced by writing everything in terms of a function (kernel) of the dot product of the elements of the training set. In particular we use the radial basis function (RBF), defined as where x i , x j are two elements of the training set, d(x i , x j ) is the Euclidean distance between them, and l is a distance parameter (not necessarily the same for all {i, j}). The RBF has Figure 14. Correlation between the MC momentum fractions (i.e. X REAL and Z REAL ) versus the ones obtained at NLO QCD + LO QED accuracy (X REC and Z REC ). We show the results corresponding to the GR approach, using the general basis (upper row), the physically-motivated basis (middle row) and the LO-inspired basis (lower row).
the advantage of including all possible powers of the exponent, and therefore we expect a better reconstruction of the kinematic variables.
Similarly to the LM, the GR requires a set of input variables. In order to properly compare the methods, we take the same bases for both. The GR also needs the user to select the width of each Gaussian function, l, which is by default l = 1. In principle it could be different for each feature of the input set, but for simplicity we keep it featureindependent. However we did find better reconstructions when using different l for x and z. The optimal values of l for each basis can be found in Table 1 We find that, when using the most general basis, a better agreement between the reconstructed and the real data sets requires broad Gaussian functions. In addition, if we reduce the basis the GR tends to require wider Gaussian functions to achieve a good description of the data sets. Finally, we find that in the physically-motivated basis, the GR finds the best agreement by choosing l = 1 for the prediction of x and l = 1.5 for z, i.e. sharp Gaussian functions are needed meaning that a combination of these variables is enough to reproduce the full data sets.
These facts can appreciated in Fig. 14 where we present the results obtained at NLO QCD + LO QED accuracy. As expected, the inclusion of higher-order terms (higher nonlinearity) in the training set brings a significant improvement with respect to the LM, in particular for the reconstruction of x. In addition, we point out that among the three basis, in general, the reconstruction of x is harder than the z momentum fraction. The general basis can extract the information to almost determine completely a function for the prediction of the momentum fractions but with wide Gaussian functions. In contrast, the physically-motivated basis makes a good job in the determination of z but is not that accurate on the extraction of x, although it requires sharp Gaussian functions, meaning that they are well localized and determined.
To conclude this section, we appreciate that the GR method leads to a more reliable reconstruction of the MC momentum fractions, compared to the LM. The best results are obtained with a larger basis, in order to have more flexibility. Moreover, the non-linearity inherent to the GR allows to overcome the limitation of the overfitting in the low-x and low-x region that we observed in the LM, leading to a very accurate reconstruction in a wider range.

Neural Networks
Before jumping into the results of this section, let us briefly remind the reader of what is a neural network (NN). The building blocks of a NN are algorithms (called Perceptrons) used in supervised learning to decide if an input belongs into a class or not (binary classifier).
They consist of a set of input values X, that will be linearly combined by weights (W ) and independent terms B (biases), after which the sum will be transformed by the (usually non-linear) activation function f , giving an output Y : Y = f (z) with z = X * W + B. Each Perceptron mimics a neuron, and a combination of them makes a NN. The standard nomenclature labels the inputs and outputs as input and output layers, respectively. To increase the capabilities of the NN (and its complexity) one can add more neurons in between, organised in hidden layers. The activation functions connecting one layer to the next do not need to be the same, neither the number of neurons in each hidden layer. The learning proceeds in two steps. First, the NN computes the output from the inputs (feedforward). In a second step (back-propagation), it calculates the cost and then minimizes it. This can be implemented in different ways, one of the most popular being stochastic gradient descent 9 .  The choice of the activation function/s and relevant parameters is highly non-trivial, and trial-and-error was required to find a configuration that could reproduce the momentum fractions. A non-exhaustive comparison of different combinations is presented in App. B, but here we limit ourselves to present the results corresponding to the parameters summarised in Table 2, which are used within the scikit-learn framework.
The results of the MPL algorithm are presented in Fig.15 for the LO QCD contribution (upper row) and the NLO QCD + LO QED correction (lower row). In the LO case the reconstruction is quite good, without reaching the level of accuracy of the LM or GR. This is a strong evidence that the complexity of the NN machinery greatly exceeds that of the task to be solved. In the NLO case, on the contrary, the reconstruction is much better than the one obtained with the LM using any basis, and similar to the GR one with the general basis (upper row of Fig. 14). The plots show an almost perfect agreement in all bins for both x and z. The largest discrepancy appears for x, which can be partially due to the higher complexity of the target function for x than for z, already suggested by the analytic LO expressions. Indeed, almost all trials performed with different methods and configurations Figure 15. Left: Comparison of the momentum fractions X REAL and X REC obtained with MPL neural networks with the parameters given in Table 2. The upper (lower) row corresponds to the LO QCD (NLO QCD + LO QED) data set. Right: same as the r.h.s but for Z REAL and Z REC . arrive to reasonable relations between Z REAL and Z REC . However for x, we have to either increase the number of elements in our basis (GR) or the number of layers/nodes (NN).
In any case, we can highlight that the MPL algorithm does not require to choose any particular basis: the complexity is translated into defining the proper architecture. This task is more suitable for automation, thus more appropriate for tackling generic physical processes regardless of the number or kind of particles involved. Whereas LM or GR could take advantage from physically-motivated parameter's choice to speed-up an accurate reconstruction, the NN framework relies mainly on computational power to reduce the problem to a black-box function.

Conclusions and outlook
In this work we have explored the reconstruction of the parton-level kinematics for the process p + p → γ + h using Machine-Learning (ML) tools. In first place, we implemented the calculation in a Monte-Carlo (MC) code with NLO QCD and LO QED accuracy. We relied on the FKS algorithm to cancel the infrared singularities, and the smooth cone isolation criteria to select those events with direct photons. This prescription is crucial to have access to cleaner information from the hard process.
Then, we studied different kinematical distributions with the purpose of identifying the regions with the largest number of events. After imposing selection cuts similar to those used by experimental collaborations, dynamical cuts were induced in the x and z distributions. These restrictions were taken into account when selecting events for analysing the correlations between experimentally-accessible quantities (p T , η and φ for the photon and pion) and the partonic momentum fraction. We realized that x strongly depends on p γ T (positive correlation) but not on the other variables, whilst z exhibits a negative correlation with p γ T , a positive one with p π T and a mild dependence with cos(φ π − φ γ ). After that, we applied ML algorithms to reconstruct the partonic variables x 1 , x 2 and z. We started by introducing a proper discretization of the multi-differential crosssection w.r.t. the set of variables {p π T , p γ T , η π , η γ , cos(φ π − φ γ )}, in order to have a reliable estimation of the higher-order corrections in each bin. For these distributions, we generated the data sets and explored three different ML reconstruction strategies: linear methods (LM), Gaussian Regression (GR) and Multi-Layer Perceptron (MLP). For the first two approaches, we introduced three bases of functions inspired by the results obtained from the analysis of two-dimensional correlations in Sec. 3.2. In all the cases, the reconstruction at LO QCD accuracy was very successful, and in agreement with the known analytical expressions. When dealing with the NLO QCD + LO QED corrections, the flexibility of the MLP approach leads to a very reliable reconstruction, achieving a better performance than the LM and comparable to the GR when using a sufficiently large basis. In particular, the LM results were highly-influenced by the abundance of data in the low-x and low-z region, leading to an unreliable fit when extrapolated outside these regions.
It is worth appreciating that the number of assumptions related to the setup of the MLP framework is rather limited, compared to the ones done for linear and Gaussian regression. In particular, we want to highlight that there was no need to introduce an specific basis of functions, which makes this approach fully process-independent and suitable for other analysis.
In conclusion, the application of ML-inspired methods (and Neural Networks in particular) is suitable to unveil the partonic kinematics at hadron colliders, including also higherorder corrections. In this way, ML-assisted event reconstruction might allow to achieve a highly-precise description of the deepest constituents of matter and their interactions, complementing the current developments in other areas of theoretical particle physics.
H.-P. is also funded by Sistema Nacional de Investigadores from CONACyT and PROFAPI 2022 (Universidad Autónoma de Sinaloa). P.Z. acknowledges support from the Deutsche Forschungs-gemeinschaft (DFG, German Research Foundation) -Research Unit FOR 2926, grant number 409651613.

A Coefficients for the Linear Method
For completeness, we present the coefficients associated to the linear regression for each of the three bases studied in Subsec. 4.2. We restrict our attention to the fit of the data sets at NLO QCD + LO QED accuracy, since the LO contributions were perfectly in agreement with the analytical LO formulae. In Tab. 4 we present the coefficients of the most general basis, Eq. (4.19), that reproduce the plots in the upper row of Fig. 13. The parameters of the physically-motivated basis, given by Eq. (4.19) with the constraints of Eqs. (4.20)-(4.21), are in Tabs. 5 and 6 for x and z, respectively. The corresponding correlation with the real MC variables can be seen in the middle row of Fig. 13. Finally, the coefficients for the LO-inspired basis, associated to the constraints in Eqs. (4.22)-(4.23), can be found in Tab. 7. These fall short in the quality of the fit, as we can appreciate from the lower row of Fig. 13.

B Comparison of different NN architectures
We summarize here some results that were obtained before the optimal architecture described in Subsec. 4.4 was found. In Tab. 3 we present the parameters corresponding to three different tests implemented.

Parameters
TEST 1 TEST 2 TEST 3 # hidden layers  Table 3. Architectures for the MLP of three different tests for the reconstruction of the momentum fractions at NLO in QCD. All parameters are taken to be the same for X REC and Z REC .
In TEST1 (upper row of Fig. 16), we use a lower number of neurons/layer and less layers than for obtaining the results in Fig. 15. We find a poor agreement between the real and reconstructed quantities, in particular for low-z bins. An improvement is achieved by increasing the number of layers and neurons/layer (TEST2), while simultaneously requiring the NN to see no variation of the cost function (within a given tolerance) through a larger number of iterations. As seen in Fig. 16 (middle row), this gives a better reconstruction, thought it is still far from ideal. A third example, TEST3, reinforces the conditions for convergence and returns a significantly improved result (lower row of Fig. 16). Each step towards a more complex architecture and more stringent requirements for convergence is Figure 16. Comparison of the momentum fractions X REAL vs. X REC (left) and Z REAL vs. Z REC (right) obtained with MPL at NLO QCD + LO QED accuracy. The parameters for TEST1 (upper row), TEST2 (middle row) and TEST3 (lower row) are given in Tab. 3. translated into an increase of the computational time required for the training. These, and other trials, have guided us to the selection of the best architecture for our task, summarised in Tab. 2.