SciPost Submission Page
Machine Learning for Event Reconstruction in the CMS Phase-2 High Granularity Calorimeter Endcap
by Théo Cuisset
This is not the latest submitted version.
Submission summary
| Authors (as registered SciPost users): | Théo Cuisset |
| Submission information | |
|---|---|
| Preprint Link: | https://arxiv.org/abs/2510.01851v1 (pdf) |
| Date submitted: | Oct. 5, 2025, 3:11 p.m. |
| Submitted by: | Théo Cuisset |
| Submitted to: | SciPost Physics Proceedings |
| Proceedings issue: | The 2nd European AI for Fundamental Physics Conference (EuCAIFCon2025) |
| Ontological classification | |
|---|---|
| Academic field: | Physics |
| Specialties: |
|
| Approaches: | Experimental, Computational |
The author(s) disclose that the following generative AI tools have been used in the preparation of this submission:
Used GPT-5 for language advice
Abstract
The high-luminosity era of the LHC will offer greatly increased number of events for more precise Standard Model measurements and Beyond Standard Model searches, but will also pose unprecedented challenges to the detectors. To meet these challenges, the CMS detector will undergo several upgrades, including the replacement of the current endcap calorimeters with a novel High-Granularity Calorimeter (HGCAL). To make optimal use of this innovative detector, new and original algorithms are being devised. A dedicated reconstruction framework, The Iterative Clustering (TICL), is being developed within the CMS Software (CMSSW). This new framework is designed to fully exploit the high spatial resolution and precise timing information provided by HGCAL. Several key ingredients of the object reconstruction chain already rely on Machine Learning (ML) techniques and their usage is expected to further develop in the future. The existing reconstruction strategies will be presented stressing the role played by ML techniques to exploit the information provided by the detector. The areas where ML techniques are expected to play a role in the future developments will be also discussed.
Current status:
Reports on this Submission
Strengths
1- The paper is very clear. 2- It does a good job in giving a good overview of what the contribution was about in a limited space.
Weaknesses
Report
I and a colleague have reviewed the paper named "Machine Learning for Event Reconstruction in the CMS Phase-2 High Granularity Calorimeter Endcap" with great interest. Despite the limited space, the paper achieves to give a good overview of the subject. I do not have any major comment on the content, just a few recommendations to improve the clarity of the paper.
Content:
- At the end of section 1, it is mentioned that the HGCAL will feature 6m channels, and Figure 3 is suggesting that about 1M sensor will be active in a typical event. Taking the numbers at face value, this seems to imply an average occupancy of 15-20% which seems high. I believe this deserves a comment somewhere.
- The usefulness of figure 4 is unclear. It seems a pretty standard picture of an electron doing bremsstrahlung. Perhaps the author can decide whether there is added value in the HGCAL context for figure 4, and, if so, clarify in the text.
- It might be useful to better quantify the benefits of the ML approaches, for example by reporting numerical improvements observed in the cited reference (e.g., percentage gain in resolution, efficiency, background rejection) and/or by explicitly stating the metrics used to evaluate performance (AUC, ROC, etc.)
Minor editorial - Figure 2 is referenced before figure 1: how about swapping them? - I don’t see a reason to keep writing tricksters in italics after they have been introduced in the first paragraph of section 2.
Recommendation
Publish (easily meets expectations and criteria for this Journal; among top 50%)

Author: Théo Cuisset on 2025-12-09 [id 6126]
(in reply to Report 1 on 2025-11-27)Dear referee,
Thank you for reviewing the paper. I have applied your remarks. You may find the answers inline between your comments below :
Content:
The occupancy in HGCAL in a 200 pileup event is high, as the density of sensors is much higher in the regions of high activity (high pseudorapidity and in EM calorimeter). Occupancy is ranging from 5% to 60% in the silicon sensors depending on the region (the scintillators have low occupancy but represent only about 6% of the number of channels). (NB: These numbers are from the HGCAL TDR, so they are not completely up to date) I have added to the end of a sentence in the introduction (after "energy deposition and timing"): "In total, the calorimeter will feature about 6 million sensors, each capable of measuring both energy deposition and timing, with occupancy ranging from 60\% in the front of the EM section down to less than 1\% in the rear scintillator sensors~\cite{HGCALTDR}."
The figure was purely for illustration. I removed it.
In Section 5 (hadron regression), I have added a quantified improvement. The sentence now reads : "Using a Graph Neural Network, fed with all the reconstructed hits of a test beam setup with charged pions (without pileup), it was shown that an improvement in the energy resolution of hadronic showers up to a factor 2 was possible [12], as the neural network can learn the structure of the shower and partly compensate for energy leakage."
Minor editorial
done
done