We introduce a novel method for extracting a fragmentation model directly from experimental data without requiring an explicit parametric form, called Histories and Observables for Monte-Carlo Event Reweighting (HOMER), consisting of three steps: the training of a classifier between simulation and data, the inference of single fragmentation weights, and the calculation of the weight for the full hadronization chain. We illustrate the use of HOMER on a simplified hadronization problem, a $q\bar{q}$ string fragmenting into pions, and extract a modified Lund string fragmentation function $f(z)$. We then demonstrate the use of HOMER on three types of experimental data: (i) binned distributions of high level observables, (ii) unbinned event-by-event distributions of these observables, and (iii) full particle cloud information. After demonstrating that $f(z)$ can be extracted from data (the inverse of hadronization), we also show that, at least in this limited setup, the fidelity of the extracted $f(z)$ suffers only limited loss when moving from (i) to (ii) to (iii). Public code is available at https://gitlab.com/uchep/mlhad.
Author indications on fulfilling journal expectations
Provide a novel and synergetic link between different research areas.
Open a new pathway in an existing or a new research direction, with clear potential for multi-pronged follow-up work
Detail a groundbreaking theoretical/experimental/computational discovery
Present a breakthrough on a previously-identified and long-standing research stumbling block
Author comments upon resubmission
We thank the reviewers for constructive comments. Below we address in detail the issues raised, and list the corresponding changes made to the manuscript.
List of changes
We have added a more detailed explanation in section 2.2.3 on how one obtains the measured fragmentation function, which now also complements the discussion in section 3.1.1, where we describe how fig. 7 was obtained.
We have added a warning regarding a source of possible overfitting in section 3, recommending the use of three datasets in case of more realistic applications.
We have changed the binning of fig. 3 and others in that style to logarithmic.
We have clarified our definition of point cloud and the use of a Deep Sets-based classifier in section 3.2.
We have fixed the typos pointed out by the reports.