SciPost Submission Page
Invertible Networks or Partons to Detector and Back Again
by Marco Bellagente, Anja Butter, Gregor Kasieczka, Tilman Plehn, Armand Rousselot, Ramon Winterhalder, Lynton Ardizzone, Ullrich Köthe
This is not the current version.
|As Contributors:||Tilman Plehn · Ramon Winterhalder|
|Arxiv Link:||https://arxiv.org/abs/2006.06685v2 (pdf)|
|Date submitted:||2020-07-07 02:00|
|Submitted by:||Winterhalder, Ramon|
|Submitted to:||SciPost Physics|
For simulations where the forward and the inverse directions have a physics meaning, invertible neural networks are especially useful. A conditional INN can invert a detector simulation in terms of high-level observables, specifically for ZW production at the LHC. It allows for a per-event statistical interpretation. Next, we allow for a variable number of QCD jets. We unfold detector effects and QCD radiation to a pre-defined hard process, again with a per-event probabilistic interpretation over parton-level phase space.
Submission & Refereeing History
You are currently on this page
Reports on this Submission
Anonymous Report 3 on 2020-8-25 (Invited Report)
- Cite as: Anonymous, Report on arXiv:2006.06685v2, delivered 2020-08-25, doi: 10.21468/SciPost.Report.1938
This paper introduces for the first time the use of invertible neural networks to the key issue of unfolding. It lays the initial foundations for the development of a new generation of machine-learning based unfolding methods, which might have substantial impact on a wide range of LHC analyses.
The two main weakness of this paper that I see are the following:
1. It is that it is difficult to evaluate how well the cINN algorithm performs against other state of the art ML approaches, notably OmniFold.
2. It is also very difficult to know how robust the trained unfolding algorithm is and what issues one might encounter when applying it to real data, where there is no possibility of comparing to the "truth", and the paper does not really address this issue.
As such, while the ideas in the paper are promising, it is hard to evaluate if they could become competitive tools in real LHC analyses.
The authors introduce a novel method using a conditional invertible neural network to invert detector simulations directly on observables. This tackles a crucial problem present when analyzing experimental data at the LHC. Overall, the paper is well written and an interesting contribution to the literature, taking a novel approach to unfolding. As such I recommend it for publication after the authors address my minor remarks.
- In the introduction, the authors could cite 1004.2006 and 1712.01814
- The authors mention several alternative methods, such as OmniFold, but do not provide any comparison to show the performance of their method except against their own previous FCGAN model. If a full comparison is too difficult to achieve, some discussion of the differences with 1911.09107 and 1806.00433 could be included.
- One limitation of having a single data sample that is then split into training and test set is that it it is very difficult to know if the resulting INN suffers from overfitting. While perhaps imperfect, one could probe this by applying a (conditional) INN trained on a specific process, tune, or parton shower on a sample generated differently, and gain at least some sense of the robustness of the unfolding method.
- It would be helpful to include a ratio of cINN to truth in figures 8 and 10 as was done in other figures.
- The paper makes broad claims about the applicability of the method to high-level observables, however only a handful of observables are shown. Have the authors looked at others, e.g. jet shapes or substructure observables?
Anonymous Report 2 on 2020-8-13 (Invited Report)
- Cite as: Anonymous, Report on arXiv:2006.06685v2, delivered 2020-08-13, doi: 10.21468/SciPost.Report.1897
The authors discuss the use of an invertible neural network (INN) to unfold detector effects and, in addition, the QCD shower. It appears that INNs are indeed a nice tool for this purpose. The paper is well written and provided that the authors can answer satisfactorily my comments and concerns below, I would recommend it for publication.
According to the acceptance criteria of SciPost Physics (https://scipost.org/SciPostPhys/about#criteria), in my opinion this paper falls under expectation 4: 'Provide a novel and synergetic link between different research areas'. The (all required) general acceptance criteria could be met once the points below are addressed.
1. In the second paragraph of the introduction, the authors list some shortcomings of current methods of simulation and unfolding, the two statements below appear selectively strong.
1.a They claim that one 'cannot avoid simulating events for each point in model space' when testing BSM hypotheses. This is perhaps true for *some* variables and only if one has a specific model to test. However, in an EFT framework where a small set of Wilson coefficients can be identified (WZ at high pT is a particular example here), a fit in the space of Wilson coefficients can be constructed avoiding the need for point by point simulation.
1.b Regarding recasting an existing analysis, the statment seems absolute while in reality this is not the case. For example see: ATL-PHYS-PUB-2020-007.
Since the authors claim that these shortcomings leave us with no choice but to invert the simulation chain, I would appreciate a more objective and nuanced discussion of these points.
2. The authors choose to invert the simulation chain back to the leading order partonic level distributions. Why is this the right choice? If the goal is to apply this inverted simulation to actual LHC events, it's far from obvious that the leading order partonic distributions are the correct choice. There are several points that enter here, first a measured distrubtion is not purely signal but also contains backgrounds which cannot be removed. In addition, some processes suffer from large QCD and even QED corrections which distort the leading order distributions. At the very least a discussion of these issues should be given and ideas as to how to address them should be profferred.
3. If the end goal to use such a setup for bounding or discovering BSM models that contribute to diboson processes (e.g., their chosen example of WZ), where one of the electroweak bosons decays hadronically, the use of jet substructure techniques are indispensable. With this in mind, could the authors elaborate on their statement that this technique only works for analyses that don't employ jet substructure techniques?
4. According to SciPost Physics' general acceptance criteria 3 & 5, enough details should be provided such that the results are reproducible. At least all details of the networks used should be given in an appendix in order to reproduce the results in this paper. While not required, it would be even better if sample code is shared on a git hosting platform.
5. In general, a clear discussion of what can be gained from an event by event unfolding as they suggest is lacking.
Anonymous Report 1 on 2020-8-4 (Invited Report)
- Cite as: Anonymous, Report on arXiv:2006.06685v2, delivered 2020-08-04, doi: 10.21468/SciPost.Report.1883
In the article "Invertible Networks or Partons to Detector and Back Again" the authors present a new proof of concept that machine learning techniques can be used to dramatically improve the efficiency of collider simulations. To the best of my knowledge this is the first time in which an invertible neural network has been applied to a full simulation and therefore providing a simple way to unfold data to Parton level.
The paper is well written overall. I believe it is easily to the standard expected by SciPost but have a few remarks and questions which the authors should consider before publication.
1) Firstly, a very minor point. The subscripts 'd' and 'p' are introduced in the 2nd paragraph of section 2 but more clearly defined at the beginning of section 2.1.
3) Before equation (13) the authors state 'the form gets modified by an exponential'. I did not understand the logic here although it appears the construction has been used already in the literature. Did the authors mean 'can be modified'? If so perhaps a few more words on how this improves the numerical performance would be helpful.
4) Figures 8 and 10 do not include the useful ratio plots given in other figures. I suggest they should be added to these cases.
6) The authors may like to clarify the choice of parameters in Tables 1 and 2,
and whether has been any tuning of the parameters specific to this test case.
Could there be bias introduced if the 10% testing sample has also been used in
the subsequent analysis?
7) The analysis of jet radiation is interesting. As I understand it this is an attempt to unfold to different stages in the Parton shower. Is this method dependent one having a leading order hard scattering before the shower in the simulation?
8) The outlook takes a very positive stance on the generalisation to any collider simulation. It would be interesting if the authors could expand upon the limitations of the current study. The reference process, pp->WZ->lljj, has a very simple resonance structure at leading order so perhaps there could be new features in other channels that would cause problems for the network. Perhaps the authors could state where they feel future developments and improvements would be interesting or necessary before the method could be reliably applied to real data.