Loading [MathJax]/extensions/Safe.js
SciPost logo

SciPost Submission Page

How to Unfold Top Decays

by Luigi Favaro, Roman Kogler, Alexander Paasch, Sofia Palacios Schweitzer, Tilman Plehn, Dennis Schwarz

Submission summary

Authors (as registered SciPost users): Luigi Favaro · Sofia Palacios Schweitzer · Tilman Plehn
Submission information
Preprint Link: https://arxiv.org/abs/2501.12363v2  (pdf)
Date submitted: 2025-02-06 09:31
Submitted by: Palacios Schweitzer, Sofia
Submitted to: SciPost Physics
Ontological classification
Academic field: Physics
Specialties:
  • High-Energy Physics - Phenomenology
Approaches: Experimental, Computational, Phenomenological

Abstract

Using unfolded top-quark decay data we can measure the top quark mass, as well as search for unexpected kinematic effects. We show how generative unfolding enables both tasks and how both benefit from unbinned,high-dimensional unfolding. Our method includes an unbiasing step with respect to the training data and promises significant advantages over standard methods, in terms of flexibility and precision.

Author indications on fulfilling journal expectations

  • Provide a novel and synergetic link between different research areas.
  • Open a new pathway in an existing or a new research direction, with clear potential for multi-pronged follow-up work
  • Detail a groundbreaking theoretical/experimental/computational discovery
  • Present a breakthrough on a previously-identified and long-standing research stumbling block
Current status:
In refereeing

Reports on this Submission

Report #1 by Anonymous (Referee 1) on 2025-4-1 (Invited Report)

Strengths

This paper studies the application of unbinned unfolding methods to semileptonic top-antitop final states in a boosted topology. These unbinned methods use ML algorithms to learn the features of the multidimensional phase space and allow the detector effects to be removed from the data.

(1) the authors demonstrate (for the first time) that the hadronic top-decay can be unfolded at the per-event level, allowing the mass of the top decay products to be reconstructed and the top quark mass to be extracted by a fit to a newly-binned distribution.

(2) The authors carry out detailed and robust studies into the unfolding performance, including the underlying ML architecture, methods to reduce biases from the assumed top quark mass in the simulation/training.

(3) The authors compare the performance of their unfolding when extracting the top mass to the results published CMS data

(4) The results demonstrate that the method works as a proof of principle and this opens up future possible applications to LHC data.

Weaknesses

(1) The paper does not address the impact of backgrounds, which in particular could bias a constraint on the trijet mass sampled from the 'data' in the training. This is discussed below in the requested changes

(2) Likewise the paper does not address the impact of additional sources of systematic uncertainty . Of particular interest is the possible interplay between these systematics and the top mass used in the training. This is discussed below in the requested changes

Report

I recommend the paper is published after the authors answer the questions detailed below and modify the draft accordingly. The research done in this paper meets the SciPost criteria for originality.

Requested changes

(1) Backgrounds: W+jets and single-top are present in the CMS phase space with about 5% contamination from each (see ref [34]). The paper does not address backgrounds and there are some questions that they pose:
- Most importantly for the results in this paper: could backgrounds bias the m_jjj^batch requirement in sec 3.2? This is particularly relevant for the W+jets background, which will have a different shape, but might also be true for single top. The paper should explain how this bias would be mitigated during training. Ideally, this bias could be studied in this paper along with any solutions needed to mitigate it.
- How can backgrounds be subtracted in this method? I presume this has been studied elsewhere and if so should be cited. If not, some statement needs to be made as to how it can be done.

(2) Systematics: the paper considers the bias due to the choice of top mass in the training and takes steps to mitigate the bias. However, there are systematic uncertainties that change the shape of the m_jjj distribution. During unfolding, there would presumably be some interplay between these systematics and the steps taken to remove the top mass bias. It isn’t clear whether any impact of these systematics is then smaller, similar to, or larger than the systematics in standard unfolding methods using TUnfold. Systematics that jump to mind include jet energy scale/resolution and the hadronisation/shower models. At minimum, some statements are needed to explain that these systematics exist and qualitatively explain the impact (perhaps by citation to previous work). Ideally, the authors could perform some sort of injection test to show the impact of such systematics on this method.

(3) Comparison to CMS data: Fig 10 shows that the statistical precision is much better for the unbinned unfolded method compared to TUnfold and this is stated to be from the finer binning allowed in the analysis. I think the discussion around this needs to be more detailed:
- The improvement looks better than simply the finer binning, because the TUnfold stat-only error in fig 10 is +-0.21 whereas the 5-bin CFM fit has a stat-only error of +-0.19 and the CFM-4d 60 bin result has a stat-only error of +-0.17. This feature should be explained in the text.
- It is likely that this difference is due to the use of the CMS measurement, which contains background subtraction and also fluctuations in the data itself. Would it not be better to compare apples-to-apples by applying TUnfold directly to your simulated events?
- If there is an improvement in stat uncertainty die to finer binning, there might be some tradeoff with worse systematics due to jet energy resolution. This should be discussed.
- More trivially, the y-axis range on figure 10 should be reduced as we are most interested in the 0 < Delta Chi^2 < 10.

(4) Results with and without mjjj sampling: on page 13, it is stated that both the mjjj batch sampling of data and the use of different top masses in the training are required for unbiased results. Could a plot be added to show this? ie showing original bias, inclusion of only mjjj batch sampling, inclusion of only combined training samples, inclusion of both. It would help the reader to understand the relative importance of each step.

Minor queries/changes:

(5) Does the unfolding rely on events being present at both truth and reco, or also correct for truth&!reco and reco&!truth?

(6) Text improvements:
- finite efficiency -> inefficiency (p4)
- recoconstruction -> reconstruction (p5)
- section 2.4: this is aiming for a complete description of generstove unfolding, but quite a lot of terms are not defined, ie w(x_gen), w(x_reco), p_latent.

Recommendation

Ask for minor revision

  • validity: high
  • significance: high
  • originality: top
  • clarity: high
  • formatting: good
  • grammar: excellent

Login to report or comment