SciPost Submission Page
Precision calibration of calorimeter signals in the ATLAS experiment using an uncertainty-aware neural network
by ATLAS Collaboration
Submission summary
Authors (as registered SciPost users): | ATLAS Collaboration |
Submission information | |
---|---|
Preprint Link: | https://arxiv.org/abs/2412.04370v1 (pdf) |
Date submitted: | 2024-12-16 15:26 |
Submitted by: | Collaboration, ATLAS |
Submitted to: | SciPost Physics |
Ontological classification | |
---|---|
Academic field: | Physics |
Specialties: |
|
Approaches: | Experimental, Computational |
Abstract
The ATLAS experiment at the Large Hadron Collider explores the use of modern neural networks for a multi-dimensional calibration of its calorimeter signal defined by clusters of topologically connected cells (topo-clusters). The Bayesian neural network (BNN) approach not only yields a continuous and smooth calibration function that improves performance relative to the standard calibration but also provides uncertainties on the calibrated energies for each topo-cluster. The results obtained by using a trained BNN are compared to the standard local hadronic calibration and to a calibration provided by training a deep neural network. The uncertainties predicted by the BNN are interpreted in the context of a fractional contribution to the systematic uncertainties of the trained calibration. They are also compared to uncertainty predictions obtained from an alternative estimator employing repulsive ensembles.
Author indications on fulfilling journal expectations
- Provide a novel and synergetic link between different research areas.
- Open a new pathway in an existing or a new research direction, with clear potential for multi-pronged follow-up work
- Detail a groundbreaking theoretical/experimental/computational discovery
- Present a breakthrough on a previously-identified and long-standing research stumbling block
Current status:
Reports on this Submission
Report
The ATLAS Collaboration presents a detailed study of Bayesian neural networks (BNN) for the energy calibration of clusters in the ATLAS calorimeters at the LHC. The study is novel in that it is one of the first applications of BNNs under realistic conditions at a collider experiment. It is also an application in a highly-relevant field of LHC physics, as energy clusters in calorimeters are used in almost every data analysis at the LHC. This work could hence very well open new pathways in the analysis of LHC data, as deep neural networks are frequently used for calibrations but the associated with network predictions have not gained much attention so far. It is hence plausible that the presented work contributes to improving the precision at the ATLAS experiment and possibly other high-energy physics experiments.
The presented work contains a detailed study of the achievable mean energy regression and the associated energy resolution, in comparison to uncalibrated clusters (EM scale), to an ATLAS standard calibration, and to a previously published regression with a deep neural network (DNN). In addition, the uncertainty predictions of the BNN are studied in detail and are compared to the predictions from repulsive neural networks (RE), which provide an independent uncertainty measure. The BNN is shown to provide similar (and sometimes even better) calibration improvements (over the standard technique ) as the DNN. The uncertainties of the BNN and the RE are found to be consistent and conservative, which gives confidence in their robustness.
The paper is in general well written and contains interesting results that are worth publishing in this journal. My main criticism is two-fold:
1) The discussion of uncertainties is not consistent throughout the paper. As the main purpose of this study are the uncertainties, this should be improved. The main difficulty arises from the different nomenclature of uncertainties in data science and high-energy physics, and hence the interpretation of the BNN and RE uncertainties. Please refer to my comments below.
2) The difference in performance between the BNN and the DNN is not discussed in detail. In principle, the BNN should have the same performance of the DNN, unless there are differences in these approaches beyond the BNN’s estimate of the weight variances. One difference is the use of the 3 Gaussians for the loss term in the BNN, which is also mentioned in the paper, but is not discussed in detail. Other differences could be due to the self-regularizing nature of BNNs or to differences in the networks’ input features, pre-processing, network widths, depths etc. Also here, please refer to comments below.
I recommend this paper to be published in SciPost Physics once my comments below are answered.
Uncertainties:
page 12, last paragraph of Section 3.2: “They provide measures of the ultimate accuracy achieved with the trained calibration network.” - This is a strong statement that would benefit from a more detailed discussion of what predictive uncertainties are, how they are defined and what they are intended to cover. I suggest to introduce the relevant concepts much earlier and with more clarity rather than in Section 5 (see also below).
page 18, last paragraph of Section 3.3.2: “They potentially represent important contributions to the nuisances characterising the overall local systematic uncertainties.” - Also these statement are potentially strong. It is important for the reader what is understood by “local systematic uncertainties” in this context and this would again benefit from more clarify in the discussion of what BNN uncertainties are intended to cover and what is only introduced very late in Section 5.
Figure 2: introduces statistical and systematic uncertainties without further context of whether these correspond to the HEP understanding of statistical and systematic uncertainties. Again, information from Section 5 is necessary to understand this.
Section 5: Only this section introduces the concepts of episematic and aleatoric uncertainties. As discussed in the three points above, it is necessary to discuss this earlier in the paper, so that no confusion arises for readers that were not already familiar with BNNs. It is necessary to be very clear in how epistemic and aleatoric uncertainties are linked to statistical and systematic uncertainties. I find the discussion in Section 5 not very clear. First it is indicated in Section 5.1 that epistemic uncertainties may be understood as statistical uncertainties and aleatoric uncertainties as systematic uncertainties. At the end of Section 5.1, it is mentioned that both (epistemic and aleatoric uncertainties) give rise to nuisance parameters (= systematic uncertainties in the HEP physicists’ understanding). And then, in Section 5.2, it is discussed that both, systematic and statistical uncertainties, have epistemic and aleatoric “components” without specifying further how large these components could be or whether these components are even well-defined. In addition to making the discussion clearer here, it is important to make the link to HEP physicists’ understanding of statistical and systematic uncertainties in calibrations or measurements.
Comparison of BNN to DNN:
Section 3.3.2: Do you have evidence that 3 Gaussians are appropriate for approximating the likelihood? How do you choose this number? How do the results differ if you change it?
Figures 6, 7, 9: What is the origin of the differences in performance between BNN and DNN? Wouldn’t one expect that the mean predictions of the BNN reproduce those of the DNN if they use the same inputs and are both expressive enough for this regression task? How does the depth and width of the BNN and DNN compare? How is the DNN regularized compared to the BNN (which typically does not require regularization)? Or is this coming from the Gaussian mixture model? If it is the Gaussian mixture model, you may be able to show this by reducing the likelihood to a single Gaussian in the BNN.
Other clarifications and questions:
page 11, last paragraph: I suggest to clarify in in Eq. (3) which of these inputs are “all of the topo-cluster observables employed by the classification and the hadronic calibration in the LCW sequence”. I assume that these are those from zeta_clus^EM to f_emc, but it would be helpful for the reader to have this explicitly stated, so that the reader does not feel the need to look this up in the literature referenced earlier.
page 14, discussion of R^EM_clus: I am missing a discussion in terms of physics of the peak at ~50 in Figure 1c) in the EM and mixed components. Why do the effects that reduce the true deposited energy (denominator) not also reduce the measured energy at the EM scale (numerator)? Why is there a peak in particular at 50? Is this related to selection cuts in the measured energy (numerator) and if yes: how?
Table 3: Why do you use a batch size for inference? In general, I would expect inference on the full validation/test dataset.
page 16, footnote: The footnote claims that there is no “observable effects on the accuracy of the predictions” when using other functions than Gaussians for the representation of the distribution over the network weights. The statement reads as if this was tested in the context of this work, while I would have expected this to be a general statement about BNNs. If it is the former, please give more information. If it is the latter, please give proof with a reference.
page 21: Why do you say “|Delta_E^EM|>=Delta^kappa_E “and not “-Delta_E^EM > |Delta^kappa_E|”? Delta_E^EM should be mostly negative, as the EM scale should be further away from the hadronic scale than the calibrated scales. However, the hadronic scale (as in Delta^kappa_E in the formula) can vary around the true scale and would need the magnitude sign.
page 21, footnote 11: Why does the EM scale correspond to the hadronic scale (i.e. E^dep_clus) at high energies?
page 21, last paragraph before Section 4.1.3: What is the difference between a kinematic bias and a kinematic shift?
Figure 5: Why do you cut the y-axis at ~7, while there are relevant features up to ~100? Is the second peak around 50 (Fig. 1c) reproduced by the BNN and the RE?
page 23. first paragraph: I did not understand the explanation for the “upper ridge”. What does the following sentence mean? “The upper ridge populated by topo-clusters with rising REM for decreasing femc is likely introduced by E^EM_clus at EM scale that overestimates energy deposits more and more dominated by ionisations in hadronic showers extending into the Tile calorimeter.” Can you please rephrase for clarity?
Section 4.2.2: Please clarify how the choice of the loss function is connected to the fact that the mean response is described worse than the median response? Isn’t the loss function based on means rather than medians?
page 24, last paragraph: Which input features are used to tag the resistant topo-clusters? Please provide more detail to the readers.
Figure 13: What does an uncertainty of ~10 mean? An uncertainty of 10 on R or on log(R)? (Also related to the next question.)
page 33, Eq. (20): I did not understand this formula (and hence Figure 12). If the target is log10(R), why isn’t the pull then just (log10R_prediction - log10R_target)/sigma_prediction. Why do you multiply by R^BNN_clus? Does sigma_prediction represent the variation in log10R (the training target) or in R itself? I may be missing something here that may need clarification in the text of the paper.
page 35: Do all clusters in the gap region show these large variations or only a small fraction? If it is only a small fraction, can you point to what causes those few topo-clusters to be badly regressed? (This is also related to the next question.)
page 35, last paragraph: “appropriate and traceable total uncertainty” - What is meant by “appropriate” and “traceable” and how do you come to this conclusion? Did you check that an uncertainty of 10 is indeed the correct value? (For example by a pull plot for these badly estimated topo-clusters?)
page 37, last sentence of the conclusions: I did not follow the argument why high-level variables are better than low-level variables for the retraining of a neural network. Please consider rephrasing for more clarity.
Typos and minor suggestions:
page 10, numbered list, item 3: “it not all” -> “if not all”
page 19, last paragraph: I suggest to rephrase that the median “is better defined than the statistical mean”. The choice of median/mode/arithmetic mean/… is only a choice. You argue why use the median here, which is fine, but I would refrain from a statement saying that one or another choice is “better defined”. Please consider rephrasing.
page 29: “the use *of* a Gaussian mixture model”
page 34, first line: I suggest to rephrase “confirms that” to “is consistent with the observation that”.
Recommendation
Ask for minor revision
Strengths
1- First application of BNNs to calorimeter calibration in ATLAS.
2- A key advantage over standard deep learning-based calibrations.
3- Performance comparisons with existing methods (LCW and DNN) are well-structured.
4- Logical structure and good readability.
Weaknesses
1- The work does not yet demonstrate performance on real experimental data, making systematic uncertainties related to simulation inaccuracies a concern.
2- The discussion on Bayesian inference and loss function derivation is somewhat lengthy and should be streamlined.
3- It is not clear if the trained models and inference scripts will be provided for independent verification.
Report
This paper presents an application of Bayesian Neural Networks (BNNs) for the multi-dimensional calibration of calorimeter signals in the ATLAS experiment. The proposed method provides a continuous and smooth calibration function and estimates uncertainties associated with the calibrated energies. The results are compared with standard local hadronic calibration (LCW) and a deep neural network (DNN)-based approach. The BNN-calibrated energy is shown to improve upon previous techniques, with an additional advantage of providing well-characterized uncertainties.
The study is well motivated and relevant for calorimeter-based measurements with the ATLAS experiment. The application of uncertainty-aware machine learning techniques in energy calibration represents an important forward. Its importance, however, has not been demonstrated.
The methodology provides a foundation for future applications of uncertainty-aware deep learning in detector calibrations beyond calorimetry, potentially extending to other domains such as jet reconstruction.
The paper is well structured, with minimal jargon and a clear motivation. The methodology is described in sufficient detail, although some explanations of Bayesian techniques could be made more concise. The methodology, including dataset composition, feature selection, training procedures, and evaluation metrics, is clearly documented. The paper also references an external repository for further details on implementation.
It meets the journal acceptance criteria.
Requested changes
1- The discussion on Bayesian inference and loss function derivation is somewhat lengthy and should be streamlined.
2- "They potentially represent important contributions to the nuisances characterising the overall local systematic uncertainties.". This statement deserves a justification, i.e. a test which shows that this is indeed the case.
Recommendation
Publish (meets expectations and criteria for this Journal)