The MSHT20 parton distribution functions

We present the new MSHT20 set of parton distribution functions (PDFs) of the proton, determined from a global analysis of the available hard scattering data and superseding the MMHT14 sets. The parameterisation is now adapted and extended and we include a large number of new data sets: the ﬁnal HERA and Tevatron data, and a signiﬁcant number of LHC data sets on vector boson production, inclusive jets and top quark distributions. We include up to NNLO QCD corrections for all data sets that play a major role in the ﬁt. There are some changes to central PDF values and a signiﬁcant reduction in the uncertainties, but the PDFs and the predictions using them are generally within one standard deviation of the MMHT14 results. We discuss the phenomenological impact of our results.


Introduction
We summarise the most important results pertaining to the MSHT20 PDFs [1]. The acronym MSHT stands for Mass Scheme Hessian Tolerance, i.e. it incorporates some of the central and enduring features of our approach, and is now intended to be a permanent naming convention. The 2020 analysis includes new theoretical developments, and an extended parameterisation -particularly ford/ū and the strange quark -and eigenvector sets. There is the addition of much new, largely LHC data, but also HERA and Tevatron data sets. Nearly all cross sections are included at NNLO in QCD perturbation theory. The fit quality is generally very good, but there are problems with correlated uncertainties and tensions for some data sets. NNLO is now very much the default, and NLO QCD is clearly no longer sufficient for real precision. The new PDFs join the list of others recently obtained via global fits [2][3][4].

Theoretical Procedures
As in the MMHT14 [5] analysis we use a general mass variable flavour scheme based on the TR scheme [6,7], using the "optimal" choice [8] for smoothness near threshold. We use deuteron and heavy nuclear corrections, the former fit using a 4 parameter model, as in MMHT14 and the latter use the same corrections [9] as MMHT14 with the fit allowing an additional penaltyfree freedom of order 1%. We fit data with systematic uncertainties using either nuisance parameters if possible (the preferred method) or with the correlation matrix provided, and use statistical correlations whenever these are available. (Some old data sets which are dominated by uncorrelated uncertainties and/or where there is a limited understanding of correlations have errors added in quadrature.) We fit to absolute cross sections in preference to normalized to avoid loss of information from normalizations.
The analysis includes many new NNLO corrections compared to MMHT14. We now use the NNLO calculations for dimuon production [10], where the correction is negative, but larger in size at lower x. This negative correction allows the strange quark to be larger in the fit to the dimuon data and helps relieve tension between the dimuon data [11] and LHC W, Z data [12][13][14] which prefers a larger strange quark [15], as seen in Fig. 1. Nearly all other data have the theoretical calculations at full NNLO precision. In particular we also include NNLO cross-section calculations [16] for all LHC jet data, i.e. we fit inclusive jet production at 2.76, 7 and 8 TeV, using the larger available jet radius, e.g. R = 0.6, 0.7 and scales µ R,F = p T, jet . (Older Tevatron jet data are still included with the threshold approx for NNLO [17] -which is a better approximation for these data which also carry little weight.) CMS 7 TeV W + c data [18] only have NLO theory available for the specific measurement, but the correction is not expected to be large compared to the uncertainties and the few data points carry little weight. The Z p T distribution and all top quark cross sections used are included at full NNLO. We also use EW corrections where possible, if these are not already subtracted from the data supplied.
There has been a very significant extension of our parameterisation. In MMHT14 the general parameterisation used for PDFs was It was shown in [19] how the achieved precision possible improved with increasing n using a fit to pseudo-data. In MMHT14 n = 4 was deemed sufficient, but using n = 6 will lead to much better than 1% precision. Hence, we have investigated extending the parameters of different flavour PDFs sequentially using n = 6 and also, now parameterise (d/ū) instead of (d −ū), with the sole constraint that (d/ū) → constant as x → 0. This leads to significant improvements in the global fit: changing to (d/ū)(x,

018.2
was not significant but further extending g(x, Q 2 0 ) gave ∆χ 2 t ot ∼ −50; and finally extending sea(x, Q 2 0 ) and s + (x, Q 2 0 ) gave ∆χ 2 t ot = −73. Overall we see an improvement in the fit to highx fixed-target data, a reduction in tension between E866 DY ratio data and LHC data, and an improvement in the description of the LHC lepton asymmetry data, while the gluon-induced improvement is in HERA and other data. Using n = 6 in general now, except for s −s, means an increase to 52 parton parameters.

New Data Sets
The first new data set to be updated compared to the MMHT14 PDFs was the final HERA total cross section data [20]. This was already studied in [21] and found to have a limited effect on the PDFs, but there was some trouble fitting the lower Q 2 , x data. We now also include final combinedσc c andσb b data [22]. The best fit is χ 2 = 132/79, quite high but there is no tension with other data within the global fit, except the inclusive HERA data which carries enormously more weight on the relevant PDFs. The fit at low Q 2 is not optimal, but similar results are seen in other PDF studies [22].
Another important additional new data set is D0 electron/W asymmetry. We first fit D0 e asymmetry [23], and found good agreement with MMHT14, but alternatively we can use W -asymmetry [24]. The W +/− boson is produced preferentially in the proton/antiproton direction, but the V−A structure of the lepton decay means e +/− is emitted preferentially opposite to W +/− -leptons at particular η e come from a range of η W values and dilute the direct constraint on PDFs at given x. Mapping the lepton to W asymmetry requires PDF-dependent modelling, with a small uncertainty and this gives a more direct constraint from W asymmetry data. We see a reduced uncertainty on d/u compared to using the e asymmetry. There is a marked effect at very high x, where d V is reduced, see  The MSHT20 analysis contains a large amount of new LHC data: extremely high precision data on W, Z at 7 TeV from ATLAS, and high precision W +/− data and double differential Z data at 8 TeV; CMS 8 TeV precise data on the W +,− rapidity distribution; LHCb data at 7 and 8 TeV on W, Z rapidity distributions at higher rapidity; W +c jets data at 7 TeV from CMS; ATLAS high mass Drell Yan data at 8 TeV; ATLAS data on W +/− + jets at 8 TeV; Z p T distributions at 8 TeV; new data on σ tt at 8 TeV plus ATLAS single differential distributions in p T,t , M tt , y t , y tt and CMS double differential distributions in p T,t , y t both at 8 TeV; inclusive jet data from ATLAS at 7 TeV and CMS at 2.76, 7 and 8 TeV. We include all these recent LHC data updates in the fit at NNLO (for default α S (M 2 Z ) = 0.118). The fit quality is generally good, as seen in Table 1  an increase in the strange quark for 0.001 < x < 0.3 and thed,ū details, though some of these are also partially from the parameterisation change. There is a slight decrease in the high-x gluon. We will illustrate these changes later. Generally the fit is good, but the most straightforward approach gives a distinctly poor fit quality to some data sets due to tensions between different kinematic regions (e.g. rapidity bins) or different differential distributions of the same data. Sometimes this is clearly related to modelling-type systematic uncertainties, particularly for jet and tt data, as illustrated in detail in [42,43], and for some data sets we use the sort of smooth decorrelation advocated for 8 TeV ATLAS inclusive jet data [44].

The new PDFs
When determining the PDF uncertainties in MSHT20 we go from 25 eigenvector pairs to 32there is one extra parameter for each PDF and two for s +s. The mean tolerance is T ∼ 3 − 4. About half the constraints are primarily provided by precision electroweak collider data, largely D0 W asymmetry, 7 TeV and 8 TeV ATLAS W, Z and CMS W data. 8-10 eigenvectors are mainly constrained by the E866 Drell-Yan ratio which is vital for thed/ū constraint, ∼ 10 eigenvectors are constrained by fixed target DIS data (i.e. BCDMS, NMC, NuTeV, CCFR) and these data sets still mainly constrain high-x quarks, ∼ 10 eigenvectors are constrained by CCFR, NuTeV dimuon data, i.e. this is still the main constraint on the strange quark and its asymmetry. Hence, a fully global fit is necessary for a full constraint on all PDFs without use of assumptions and/or models. HERA data provides good constraints on the widest variety of PDF parameters, mainly the gluon and light sea, but now is very rarely the best. However the HERA data are a very strong constraint on the best fit PDFs, and central values and uncertainties at small x are strongly constrained by HERA data as seen in Fig. 3, and the quark normalization at high-x is also affected -which is related to sum rules. We now consider the new MSHT20 PDFs compared to those of MMHT14. First we show the gluon distribution, Fig. 4 (left), where there is no significant change in the central value, though the uncertainty is reduced. The details in shape at high x depend on the LHC jet, Z p T and differential tt data. The Z p T data pull the gluon up and differential tt data pulls the gluon down, each also affecting the lower x normalization via the momentum sum rule. This is seen in Fig. 4 (right). Not all jet data pull in the same direction though the total effect is slightly downwards.
More significant changes in the PDFs include an increase in the strange quark below 018.4  x = 0.1, Fig. 5 (left), due to ATLAS 7, 8 TeV data which influence PDFs similarly. There is also a significant change in the shape in valence quarks, most notably d V , due to LHC data on W, Z and the improved parameterisation flexibility, Fig. 5 (right). The strange asymmetry is similar to MMHT14, but now is non-zero outside uncertainties. There is a change in the details of light antiquarks at high-x where constraints are weak, and a slight decrease at low x due to compensation for the increase in the strange quark. The details of theū,d difference, shown in Fig. 6 are completely changed due to the new type of parameterisation. There is a huge increase in uncertainty at small x, and a slight tendency for negatived −ū. However, a different impression is formed looking atd/ū which has small low-x uncertainty and notably the ratio → 1 as x → 0 to a good accuracy even without this being a constraint. As well as at NNLO, we also produce PDFs at NLO (and also still at LO, where the fit is very poor). We start to notice significant deterioration in fit quality for some of the precision LHC data, NNLO is now very much preferred.
The strong coupling value obtained from the analysis is α S (M 2 Z ) = 0.1174 ± 0.0013 [45]. There are constraints from a variety of new LHC data, but in different directions -in general jet data prefer slightly lower, while W, Z data prefer slightly higher α S (M 2 Z ), and no single new set constrains α S (M 2 Z ) more strongly than a number of older data sets. For quark masses, unlike previous results [46]

Predictions
We show in Fig. 7 the predictions for a variety of benchmark processes. There are some changes in σ W , σ Z and particularly their ratio largely due to changes in strange quarks. For gluon initiated top and Higgs cross sections there is an improvement in uncertainties but the central values remain stable.  We have also produced numerous predictions for data sets not included in the fit. For example there is a good prediction for CMS 13 TeV W + c data [47] which is mainly dependent on strange quarks. Single top data is not fit (since the uncertainties are much larger than PDF uncertainties), but good predictions are obtained (using [48,49] for 13 TeV CMS data [50], as seen in Fig. 8.

Conclusion
We have presented the MSHT20 PDF analysis. LHC data are starting to have a very significant impact on PDF extractions. Theory precision is catching up to that of data, e.g NNLO calculations for jets,differential top, Z, W p T distributions. We have also made improvements in our PDF parameterisation, which gives a better fit to data and improves some data tensions and increases some uncertainties in extreme kinematic regions. There are significant changes in thē d,ū difference, in the s +s distribution and small-x d V (x) distribution for both uncertainties and central values. Generally there is stability for other PDFs, but an uncertainty reduction in PDFs/benchmark processes. Precision data and theory are causing problems in cases where correlated systematics (which increasingly dominate) are important and improved interplay between theory/experiment on these seems a priority. Additional PDFs with varying α S (M 2 Z ) and quark masses, have appeared, as have also the PDFs with the photon distribution [51]. Theory uncertainties on MSHT PDFs will appear, but take a little longer.