Iterative HOMER with uncertainties

Anja Butter; Ayodele Ore; Sofia Palacios Schweitzer; Tilman Plehn; Benoît Assi; Christian Bierlich; Philip Ilten; Tony Menzo; Stephen Mrenna; Manuel Szewc; Michael K. Wilkinson; Ahmed Youssef; Jure Zupan

SciPost Submission Page

Iterative HOMER with uncertainties

by Anja Butter, Ayodele Ore, Sofia Palacios Schweitzer, Tilman Plehn, Benoît Assi, Christian Bierlich, Philip Ilten, Tony Menzo, Stephen Mrenna, Manuel Szewc, Michael K. Wilkinson, Ahmed Youssef, Jure Zupan

Submission summary

Authors (as registered SciPost users):

Ayodele Ore · Sofia Palacios Schweitzer · Tilman Plehn · Manuel Szewc

Submission information
Preprint Link:	scipost_202601_00048v1 (pdf)
Code repository:	https://github.com/ayo-ore/iterative-homer
Date accepted:	Feb. 3, 2026
Date submitted:	Jan. 20, 2026, 6:24 a.m.
Submitted by:	Ayodele Ore
Submitted to:	SciPost Physics

Ontological classification
Academic field:	Physics
Specialties:	High-Energy Physics - Experiment High-Energy Physics - Phenomenology
Approaches:	Computational, Phenomenological

Abstract

We present iHOMER, an iterative version of the HOMER method to extract Lund fragmentation functions from experimental data. Through iterations, we address the information gap between latent and observable phase spaces and systematically remove bias. To quantify uncertainties on the inferred weights, we use a combination of Bayesian neural networks and uncertainty-aware regression. We find that the combination of iterations and uncertainty quantification produces well-calibrated weights that accurately reproduce the data distribution. A parametric closure test shows that the iteratively learned fragmentation function is compatible with the true fragmentation function.

Author indications on fulfilling journal expectations

Provide a novel and synergetic link between different research areas.
Open a new pathway in an existing or a new research direction, with clear potential for multi-pronged follow-up work
Detail a groundbreaking theoretical/experimental/computational discovery
Present a breakthrough on a previously-identified and long-standing research stumbling block

Author comments upon resubmission

We thank the referees for their endorsement of the manuscript, and address their comments below:

Do I understand correctly that the method is specific to the string model? If so, could you comment on the limitation on the flexibility coming from the string model? And would it be possible to adapt the method to other hadronization model, e.g. cluster?

The reviewer is correct in that HOMER builds on the string model to construct its implicit fragmentation function, and thus inherits inductive bias from this choice. Although in this work we observed that iHOMER is flexible enough to capture an arbitrary string fragmentation function, we have not studied if it is able to match other fragmentation models at the observable level (even if the underlying fragmentation picture is different) and thus cannot comment on how much the choice of the string model limits its flexibility in that sense. Regarding adapting the method to any other hadronization model, the adaptation is, in some sense, trivial. All one needs to do is write the explicit generation structure of the model, and reweight the variables of interest with the adequate factorized function. This is by design, since the HOMER methodology is meant to take advantage of whatever simulator one has implemented irrespective of its underlying details provided one knows these details well enough to use the adequate weights. We have clarified these points in the introduction and the conclusion.

The method assumes negligible systematic uncertainty in the reweighing step 1 and negligible statistical uncertainty in the factorization step 2. Is this always guaranteed? What would one need to do if otherwise?

In step 2, we are relatively safe to neglect statistical uncertainties. This is because only simulated data is used for training step 2, so there is no hard limit on the statistics and one can simply simulate more events as needed. Having said this, the networks in step 2 could also be trained as Bayesian networks if desired.

Regarding step 1, the answer depends on the kind of systematic uncertainty: The assumption that the simulator is perfect except for the hadronization model does not apply in general. In a realistic application, these uncertainties (e.g. from parton shower or detector mismodelling) should be included by following existing approaches, e.g. fitting HOMER conditional on a range of nuisance parameters (1506.02169). This is a direction for future work that we flag in the outlook. The other assumption, that the classifier is sufficiently expressive, is reasonable for the 13-dimensional feature space defined by our high-level observables. Since the step-1 classifier operates on observable inputs, one can always use a hold-out dataset to check that variables of interest are reweighted to within a satisfactory accuracy, and that classifier predictions are calibrated. This kind of validation should always be performed with HOMER, and Appendix C serves as this check. We extended this appendix with a demonstration of the calibration of the BNN classifier score.

Could you explain further how using the Machine Learning based hadronization models could improve MCEG accuracy?

In this work, we focus on how MCEG accuracy could be improved with Machine Learning by providing a more flexible model that better reproduces the data in its range of validity and thus reduces the associated error budget in any measurement that depends on it. We have expanded on this point in the introduction.

Could you provide any benchmarks on the training / computing time with the proposed method?

We do not report the training times because they depend on a number of arbitrary factors, most notably the size of the training datasets and the available computing hardware.

Compared to the original HOMER method, the modifications of the training losses and network architectures are not significant bottlenecks, and don’t meaningfully extend the training time. Of course, the training time does scale with the number of iterations. We find that after the first iteration, subsequent iterations converge faster since each stage learns finer and finer corrections. As a result, the scaling is actually better than linear.

List of changes

Added text in response to referee comments, which is highlighted in manuscript.
Added summary diagram of the method (Figure 1).

Current status:

Accepted in target Journal

Editorial decision: For Journal SciPost Physics: Publish
(status: Editorial decision fixed and (if required) accepted by authors)

SciPost Submission Page

Iterative HOMER with uncertainties

by Anja Butter, Ayodele Ore, Sofia Palacios Schweitzer, Tilman Plehn, Benoît Assi, Christian Bierlich, Philip Ilten, Tony Menzo, Stephen Mrenna, Manuel Szewc, Michael K. Wilkinson, Ahmed Youssef, Jure Zupan

Submission summary

Abstract

Author indications on fulfilling journal expectations

Author comments upon resubmission

List of changes

Current status:

Login to report or comment