SciPost Submission Page
Generative Unfolding of Jets and Their Substructure
by Antoine Petitjean, Anja Butter, Kevin Greif, Sofia Palacios Schweitzer, Tilman Plehn, Jonas Spinner, Daniel Whiteson
Submission summary
| Authors (as registered SciPost users): | Sofia Palacios Schweitzer · Antoine Petitjean · Tilman Plehn · Jonas Spinner |
| Submission information | |
|---|---|
| Preprint Link: | https://arxiv.org/abs/2510.19906v2 (pdf) |
| Code repository: | http://github.com/heidelberg-hepml/high-dim-unfolding |
| Date submitted: | Nov. 10, 2025, 5:55 p.m. |
| Submitted by: | Antoine Petitjean |
| Submitted to: | SciPost Physics |
| Ontological classification | |
|---|---|
| Academic field: | Physics |
| Specialties: |
|
| Approaches: | Computational, Phenomenological |
Abstract
Unfolding, for example of distortions imparted by detectors, provides suitable and publishable representations of LHC data. Many methods for unbinned and high-dimensional unfolding using machine learning have been proposed, but no generative method scales to the several hundred dimensions necessary to fully characterize LHC collisions. This paper proposes a 3-stage generative unfolding framework that is capable of unfolding several hundred dimensions. It is effective to unfold the jet-level kinematics as well as the full substructure of light-flavor jets and of top jets, and is the first generative unfolding study to achieve high precision on high-dimensional jet substructure.
Author indications on fulfilling journal expectations
- Provide a novel and synergetic link between different research areas.
- Open a new pathway in an existing or a new research direction, with clear potential for multi-pronged follow-up work
- Detail a groundbreaking theoretical/experimental/computational discovery
- Present a breakthrough on a previously-identified and long-standing research stumbling block
Current status:
Awaiting resubmission
Reports on this Submission
Strengths
- The paper is very clearly and concisely written and introductory sections and the rest of the content as well as the summary are very well balanced.
- The problem setup up (variable-length and high-dimensional) unfolding is an important challenge in the field.
- The staging "multiplicity->kinematics->constituents" is pragmatic and, as far as I know, a novel approach.
- The physics motivated pre-processing is is state-of-the-art.
- The architecture comparison with the various flavors of equivariance is highly interesting.
Weaknesses
Please see the requested changes.
Report
The paper is well suited for the journal and I recommend publication after the following two items have been addressed.
Requested changes
- A possible prior dependence is noted (Eq. 5), but apparently no attempt is made to even quantify it (let alone correct - which is maybe not a requirement). I believe this should be done. All these methods are only as strong as their weakest element, and more often than not do we encounter showstoppers when moving from demonstrators (like this work) to real-world applications.
- The performance evaluation should be more elaborate and go beyond 1D comparisons. In particular, the separation of multiplicity from kinematics warrants more detailed studies. Waat happens if the former is off but not the latter and/or vice versa? This could be probed with a C2ST test on the unfolded quantities. Can an independent classifier tell apart the unfolded data from the training data when that should be very close? And which aspects? I don't mean ML-based performance evaluation as a binary metric (success or failure of the method) - there will always be tails where the algorithm doesn't work; but as a tool to quantify how much of the bulk phase-space is accessible to unfolding and in which regions more work is needed. I appreciate the tau_21 remark, but follow up on this.
- Consider making the top quark data set available on Zenodo or reference it otherwise. "Upon request from the authors" probably does not satisy the journal's criteria: "Provide (directly in appendices, or via links to external repositories) all reproducibility-enabling resources: explicit details of experimental protocols, datasets and processing methods, or processed data and code snippets used to produce figures, etc."
Recommendation
Ask for major revision
