SciPost Submission Page
Reweighting Monte Carlo Predictions and Automated Fragmentation Variations in Pythia 8
by Christan Bierlich, Philip Ilten, Tony Menzo, Stephen Mrenna, Manuel Szewc, Michael K. Wilkinson, Ahmed Youssef, Jure Zupan
This is not the latest submitted version.
Submission summary
Authors (as registered SciPost users):  Christian Bierlich · Philip Ilten · Tony Menzo 
Submission information  

Preprint Link:  https://arxiv.org/abs/2308.13459v2 (pdf) 
Code repository:  https://gitlab.com/uchep/mlhadweightsvalidation 
Date submitted:  20231018 14:47 
Submitted by:  Ilten, Philip 
Submitted to:  SciPost Physics 
Ontological classification  

Academic field:  Physics 
Specialties: 

Approaches:  Computational, Phenomenological 
Abstract
This work reports on a method for uncertainty estimation in simulated colliderevent predictions. The method is based on a Monte Carloveto algorithm, and extends previous work on uncertainty estimates in parton showers by including uncertainty estimates for the Lund stringfragmentation model. This method is advantageous from the perspective of simulation costs: a single ensemble of generated events can be reinterpreted as though it was obtained using a different set of input parameters, where each event now is accompanied with a corresponding weight. This allows for a robust exploration of the uncertainties arising from the choice of input model parameters, without the need to rerun full simulation pipelines for each input parameter choice. Such explorations are important when determining the sensitivities of precision physics measurements. Accompanying code is available at https://gitlab.com/uchep/mlhadweightsvalidation.
Current status:
Reports on this Submission
Strengths
1. In the manuscript, the authors presented for the first time the possibility of reweighting Monte Carlo hadronisation predictions.
2. This approach opens up a new avenue to achieve hadronisation uncertainty efficiently (especially if full detector simulation is considered) and can be potentially used for tuning (fitting) hadronisation models.
3. The manuscript is supplemented with publicly available code (gitlab.com/uchep/mlhadweightsvalidation).
Weaknesses
1. The authors consider a simplified version of the string model (i.e. they reweighted just a few parameters of the model).
2. It is unclear how this method can be applied when the base function has a domain with zero value.
3. Some citations are missing (see also the other report)
Report
In the manuscript “Reweighting Monte Carlo Predictions and Automated Fragmentation Variations in Pythia 8” the authors present for the first time a framework to obtain reweighting Monte Carlo hadronisation predictions. This approach opens up new possibilities for efficiently obtaining hadronisation uncertainties (especially if a full detector simulation is considered) and can potentially be used to tune (fit) hadronisation models. The results presented are, in my opinion, very interesting. However, before the article is published, I have some comments/questions (see requested changes), the answers to which I think would help to further improve the article.
Requested changes
Requested changes (in order of appearance in the text):
1. In the Introduction, the authors write that the proposed approach can, for example, be applied to a multiparticle interaction model. This does not seem obvious to me. It is also not clear to me how the approach could be applied for colour reconnection  which could be considered a part of hadronization or MPI. Could the authors please elaborate on this more?
2. It would be good to add references to more modern versions of the software and add references to Parton Shower/Generators uncertainty studies of other groups.
3a. The authors consider a simplified version of the string model. For example, they neglect to reweight the flavour parameters of the model. What is the reason for this? The string model has many more parameters (for example, Monash Tune, which the authors use as the default setting for the Lund model, had more than 20 hadronisation parameters tuned). How would the methods work for such a large number of parameters? What are the potential problems in such a more realistic situation?
3b. Related to this question is the problem described at the bottom of page 4 concerning lowmass strings. The authors write openly about this problem, but it is not clear what exact limitations this problem brings to the estimation of hadronisation uncertainty using the proposed method.
4. Some of the parameters of the string model are discreet. How the method can be applied to the discreet parameters?
5. In section 3.1 the authors write:
“This agreement breaks down, if the Lund fragmentation function for the alternative parameter values is large in a range where the Lund fragmentation function approaches zero for the baseline parameter values, as shown in fig. 2 (bottom left) and fig. 3 (bottom right). The reweighting then requires large weights and samples the phase space poorly.” It is even worse in the case when the base function has a domain where it is zero and the function for the alternative parameter is nonzero.
Clearly, in that situation, the method can not be applied. This appears to be a serious limitation of the method. It would be interesting to see what solutions the authors propose in such a situation.
6. In Fig. 2 in the bottom panel for the case of b=0.58 there is a large error bar for w’ line between charge multiplicity 3035. What is the reason for that?
7. In perturbation calculations, a variation of the renormalisation and factorisation scales is usually used as a rough estimate of uncertainty. What variations in the parameter of the Lund model would the authors suggest to estimate the uncertainties associated with it?
In summary, I would like to say that I think the document is very interesting. However, there are still some issues which have to be addressed before the publication. Therefore, I recommend that the author addresses the points raised above before resubmitting their paper.
Strengths
1. The authors provide a first implementation and validation of a hadronisation code that can generate alternative event weights for relevant parameter variations, thus greatly reducing the cost of subsequent detector simulation steps in the simulated events pipeline at highenergy colliders such as the LHC. This lays the groundwork to establishing automated onthefly hadronisation uncertainties for particle level simulation results, similar to what has been established for onthefly hard process and parton shower uncertainties in recent years.
2. The authors point out potential numerical issues with the ansatz when the baseline and the target probability distributions do not overlap well, as has been observed for shower variations, and attempt to find diagnostics to spot pathological cases. This is also studied in the given validation results.
Weaknesses
1. Some citations are missing (see "Requested changes").
2. The criterion of how far the mean is from 1 is not well argued in my opinion, or perhaps I don't understand the point the authors are trying to make here. The actual issue, i.e. the weaker statistical significance due to the wider weight distribution, will of course lead to larger fluctuations of $\mu$ around 1 (as seen from the MC error $\sigma$ given for $(1\mu)$ for the pathological cases) but then the reliable and more direct measure would be $\sigma$, not $(1\mu)$, which might be arbitrarily close to 0 given its statistical nature. Alternatively, quoting the effective sample size might be the most natural measure and would at the same time communicate the loss of significance for the alternative weight event sample in a clear way.
Report
This submission reports the first development and application of reweighting methods to a hadronisation model (here, Lund String hadronisation, which is one of two main models in wide use) and the validation of an implementation that enables for the first time the generation of alternativeweight samples for hadronisation uncertainty studies.
Mostly the same methods have been developed for the perturbative parts of the Monte Carlo simulation toolchain (matrix elements, parton showers, matching/merging) between 2011 and 2016 and their use has become a standard for largescale simulated event sample production at the LHC by ATLAS and CMS and a useful tool for phenomenological studies alike. I expect that the calculation of alternative weights for the hadronisation part will have a similar impact.
Therefore, the submission is a highly relevant contribution. It is of a high quality and definitely worth publishing in SciPost Physics. However, I include a few points in "Requested changes" which were confusing to me and/or that I think could be improved, and I would ask the authors to address them in a minor revision.
Requested changes
1. In the introduction on page 2, after "efficient methods exist for the hard process and the parton shower", it is in my opinion not sufficient to cite only the VINCIA and PYTHIA reweightingrelated publications [3, 4]. As for the hard process, LO reweighting is trivial, but publications that developed NLO reweighting should be added, e.g. [1310.7439] for the reweighting of NLO MonteCarlo using the CataniSeymour subtraction. When it comes to the shower (which might include reweighting in the context of matching and merging which should perhaps be mentioned, too), the implementations of the two other generalpurpose generators heavily in use at the LHC, in [1605.08256] (HERWIG) and [1606.08753] (SHERPA), should be cited.
2. The previous point also applies to the use of the same citation group [3, 4] on page 2 after the sentence "The presented method is similar to the one used previously for parton shower uncertainty estimates", i.e. [1605.08256] and [1606.08753] should be cited, too. Here, as you now refer to the reweighting of the veto algorithm, one should also cite [0912.3501], which predates any of the parton shower uncertainty papers by about two years and uses the same "modified veto algorithm" method, albeit to bias shower emissions to generate additional photons. See App. B of [0912.3501].
3. Since in my (limited) understanding the HERWIG cluster hadronization model is itself an implementation based on ideas from the 80s [Nucl. Phys. B214 (1983) 201, Nucl. Phys. B239 (1984) 349, Nucl. Phys. B288 (1987) 729, Nucl. Phys. B238 (1984) 492], it is unclear why it was picked out as an example, instead of citing also the second implementation of this method in a widelyused generalpurpose event generator, i.e. in SHERPA [hepph/0311085].
4. As mentioned in the "Weaknesses" part of the review, I am not convinced by the arguments around eqs. (13)(15) to put forward the deviation of the mean from unity as the most prominent/straightforward criterion to assess the quality of the reweighting. Isn't it much clearer/more straightforward (and less dependent on random fluctuations, which might send the mean $\mu$ arbitrarily close to unity even if $\sigma$ is large) to use the MC error $\sigma$ of the $w'$ sample itself, or the effective sample size? The eqs. (13)(15) are not required to establish this even, as it is selfevident that the weight distribution widens by the reweighting. So I would ask the authors to either (i) point out what I have misunderstood and/or (ii) clarify their reasoning in the draft, why the deviation of the mean itself is the relevant criterion here or (iii) just use the greater MonteCarlo error and/or reduced effective sample size as a criterion and discuss that for the given results. In the latter case, it would be interesting to quote the effective sample size in Tab. 1.
Author: Philip Ilten on 20240312 [id 4359]
(in reply to Report 1 on 20231205)
We thank the reviewer for their constructive comments, which we address in detail below. We note that since the submission we have further developed the code which has resulted in some of the distributions and numbers changing (in particular, a bug in how the diquark production was handled was fixed).
A.2 Weaknesses
 Thank you for the suggested citations; we have added them.
 We agree that we should also quote the effective sample size in addition to the 1 − μ metric, and give further details below about why we think 1 − μ is still a relevant metric.
A.4 Requested changes
 Thank you for the suggestions. We have added these citations. For matching and merging, we have included a sentence with citations, although we note that typically these methods are not used in the context of variations or uncertainty estimation. We have tried to cover the major methods (initial methods, POWHEG, MC@NLO, CKKWL, MLM, MENLOPS, UMEPS, UNLOPS, and FxFx), but since this is a very active area, it is possible that we have missed relevant citations, so please let us know, if you notice any that need adding.
 Thank you, we have now included these citations at that point.
 This is an oversight on our part. We have added citations here.
 We agree that we should include the effective sample size in Table 1, which we have now done. However, we have still included 1 − μ for the following reason. Consider a reweighted distribution that is wider than the underlying base distribution. As you note, the neff will tell us the reduced sample size. However, if our reweighted distribution extends beyond the support of the base distribution, this will not be apparent from neff, and we will be in a scenario where we are missing a substantial portion of our phase space. Here, the 1 − μ distribution is useful because it will now statistically deviate from zero and indicate that we do not have proper support for the reweighted distribution. Additionally, if our reweighting algorithm is simply wrong, we will see a statistical deviation of 1 − μ from zero. Already in the writing of this paper 1 − μ helped us find a number of bugs in our implemented algorithm. We have added an additional paragraph after the discussion of 1 − μ to clarify this.
Author: Philip Ilten on 20240312 [id 4360]
(in reply to Report 2 on 20240103)We thank the reviewer for their constructive comments. We address them in detail below. We note that since submission we have further developed the code which has resulted in some of the distributions and numbers changing (in particular, a bug in how the diquark production was handled was fixed).
B.2 Weaknesses
B.4 Requested changes