SciPost logo

SciPost Submission Page

$\nu$-Flows: Conditional Neutrino Regression

by Matthew Leigh, John Andrew Raine, Tobias Golling

This is not the latest submitted version.

This Submission thread is now published as

Submission summary

Authors (as registered SciPost users): John Raine
Submission information
Preprint Link: scipost_202208_00052v2  (pdf)
Code repository: https://github.com/mattcleigh/neutrino_flows/
Data repository: https://zenodo.org/record/6782987
Date submitted: 2022-11-15 16:26
Submitted by: Raine, John
Submitted to: SciPost Physics
Ontological classification
Academic field: Physics
Specialties:
  • High-Energy Physics - Phenomenology
Approach: Phenomenological

Abstract

We present $\nu$-Flows, a novel method for restricting the likelihood space of neutrino kinematics in high-energy collider experiments using conditional normalising flows and deep invertible neural networks. This method allows the recovery of the full neutrino momentum which is usually left as a free parameter and permits one to sample neutrino values under a learned conditional likelihood given event observations. We demonstrate the success of \mbox{$\nu$-Flows} in a case study by applying it to simulated semileptonic \ttbar events and show that it can lead to more accurate momentum reconstruction, particularly of the longitudinal coordinate. We also show that this has direct benefits in a downstream task of jet association, leading to an improvement of up to a factor of 1.41 compared to conventional methods.

Author comments upon resubmission

We would like to thank the reviewers for their very useful feedback and suggested changes. We have incorporated and addressed all points in the latest draft of the manuscript, and believe that they have made this a far stronger manuscript, in particular in the introduction and conclusions.

Following on from requests and questions we have the following general comments in reply to requests and comments.

We have been asked to provide a convincing argument or scenario that we are improving physics parameter measurements and not just able to plot the neutrino momentum posterior distributions.
We would like to point out that as presented in the results section, we not only study the conditional posterior distributions but also the impact on the event reconstruction efficiencies with the chi2 approach.
Combinatoric jet-parton assignment is a key component of a wide range of top quark analyses at ATLAS and CMS for measuring e.g. (differential) cross-sections [1-4], top quark mass [5-8], charge asymmetry [9] and spin correlation [10].
By being able to quantify the improvement in the reconstruction efficiency in the chi2 method, we can show that the approach is valuable for a wide range of measurements rather than focussing on a single one, and without needing to shift the focus of the manuscript away from the core method.
As such we feel that this comment is already addressed in the manuscript, and these references are all included in the updated draft with emphasis on this part added to the manuscript.

[1] https://arxiv.org/abs/1610.04191
[2] https://arxiv.org/abs/1803.08856
[3] https://arxiv.org/abs/1908.07305
[4] https://arxiv.org/abs/2108.02803
[5] https://arxiv.org/abs/1507.01769
[6] https://arxiv.org/abs/1805.01428
[7] https://arxiv.org/abs/1810.01772
[8] https://arxiv.org/abs/1905.02302
[9] https://arxiv.org/abs/2208.12095
[10] https://arxiv.org/abs/1511.06170

We would also like to address the confusion that we are simply picking a neutrino from the posterior distribution of neutrino momenta. This is a conditional posterior distribution given the O(50) dimension input space and with permutation invariance over all the jets in the event. This is not something that trivially can be done by drawing from the monte carlo directly, as one would have to know the jet assignments first. Furthermore, one would need to preserve the full joint posterior distribution in that high dimension space, which could have areas of low statistical significance, in order to evaluate for all possible conditional values.
During development we found the inclusion of all final state objects critical to achieve the best performance, and so even learning this solely as a the joint distribution of neutrino and lepton kinematics would not be sufficient.
Furthermore normalising flows are very adept at learning conditional distributions and perform very well at interpolation.

Additionally, although we are confident that v-flows can be applied to final states with multiple neutrinos and that it will work, this has not been demonstrated in the work covered in this manuscript.
This is something we plan to demonstrate with future work expanding the applications of v-flows to more processes. For now, we have changed the statement in the paper to reflect that the architecture can be modified to any number of neutrinos, without making explicit statements of how well it will perform (except that being a harder challenge, it is expected to be slightly less performant than the one neutrino case).

For the v-flows architecture, although we restrict the architecture to the first 10 jets in the event in principle we do not need to and can absolutely go to higher multiplicities; the cut off at 10 was done at the data processing level.
However, this choice has an almost negligible impact as fewer than one permille of all events have 10 jets, and it is expected that even fewer would have had 11 or more.
Furthermore, we included the target coordinate system as part of the hyperparameter scan for the v-flows model in this work. The choice presented in the manuscript choice performed best of all, with px, py, pz the second best as measured with the RMSE to the truth and with the loss function of the flow.
All choices of coordinate systems still performed well, and it does not have a strong influence on the performance.

An additional point raised which we also think deserves further discussion is on the dependence of the deep sets in being able to "identify" the b-jet from the leptonic W, and whether performing reconstruction on the events first would help with reconstructing the neutrino, and how this might lead to a cyclical dependence.
This is a very good point and something that we indeed have been discussing amongst ourselves as we move to using this for more advanced methods of jet association.

We have run some tests originally that used the fixed jet ordering based on truth matching (and therefore telling the network which was the blep, bhad, etc) and this performed much better than our current setup, so this does suggest an improved identification of which jet comes from which parton would be very useful.
We have also tested the performance without including any jet information and with the DeepSets we find it does bring a gain in performance even without the network knowing which jet is which.
Our hope also is that with a DeepSets architecture the network is able to use the summation nature of the deep sets pooling to be akin to momentum conservation. As a result, we hope that this is a generalisable architecture which would not need further optimisation for other final states and underlying processes.
But indeed, there is a cyclic dependency, whereby the jet-parton assignment and the neutrino estimation both improve each other, and a combined training approach is indeed something we would like to investigate in future work. But for now we hope that the simpler architecture means that this approach would be the egg to come before the chicken.

A suggestion was made on investigating poorly reconstructed events. This is not something we have done yet but absolutely want to do as part of future studies. A further suggestion was made to perform a study that explicitly removes poorly reconstructed events.
We wholeheartedly agree with this suggested study, though for this initial paper we wanted to restrict the scope to the machine learning approach. We would choose to study potential downstream applications and interpretations in detail when applying this in a concrete setting with the inclusion of background processes and a target measurement.
Currently we are identifying different metrics for identifying “poorly estimated events”, for example using the variance of the 256 samples, though since the learned predicted kinematic distributions of good events are almost always multimodal we need to balance this against poor reconstruction with a single wide distribution.

Finally, the colour palette of the plots has been checked to be colour blind friendly for most forms of colour blindness, however with changes for greyscale we found the plots worse for general readability. As such we have left the colour scheme as it was.

Once again, we would like to reiterate our thanks for the feedback on this manuscript.
Best regards,
Johnny, on behalf of the authors

List of changes

Introduction

On request we have reduced the amount of potential studies in the introduction, however we prefer to keep the mention to potential applications to avoid confusion that v-flows is purely for top quark physics.
We also motivate the choice of chi2 as a quantitative measure of improvement thanks to v-flows with the addition of more references.

- "...momentum must sum to zero." -> Overly strong statement rephrased to be more in line with standard description:

“Instead, their presence is inferred from the momentum imbalance calculated from all
visible particles in the plane perpendicular to the beam pipe. This imbalance is known as
the missing transverse momentum…”

- Paragraph introducing potential applications has been reformulated to remove emphasis from processes not covered in the paper.

- "...the possible phase space..." -> improved the clarity by removing the “phase space” sentence and replaced it with:
“Any meaningful insight into the kinematics of non-interacting can be useful in a wide range of both SM measurements and BSM searches.”

Method

- We have changed the phrasing to remove the use of ill-posed, and focus on the sample in the case study rather than mention 2l case.

- Parameter theta now defined in the text as network parameters

- "our work falls under unfolding" -> removed explicit statement, as work is not necessarily unfolding

- "by default agrees with direct measurements" -> Removed the sentence, so that it is more coherent. The paragraph now reads:
“Restrictions on the probability space of momenta are achievable by testing the probability of potential solutions under the observed kinematics of reconstructed physics objects in the event and the relationships between them given the assumed process.”

Case Study

- "designed and applied for" -> "applied to"

- "produced frequently...relatively high efficiency" -> removed sentence

- Added a Feynman diagram of ttbar (semi-leptonic decay)

- "at least two bjets, two other jets" -> Removed ambiguity: "The final-state of this process contains at least four jets, two of which are required to be identified as b-jets, a lepton, and a single neutrino."

- Ref. [26] has been moved to the beginning of paragraph, including additional reference for ttbar example

- Defined all terms in the equation including mW in the text:
“Here pℓx , pℓy , pℓz , Eℓ are the components of the four momenta of the lepton, and mℓ is its invariant mass (511 keV for electrons and 105.7 MeV for muons), pνT is the transverse momentum of the neutrino, measured by pTmiss, with x and y components pνx and pνy. The mass of the boson is set to mW = 80.38 GeV.”

- Expanded on statement that quadratic equation approach has several drawbacks (previously only one given):
“This approach has several drawbacks. Firstly, by assuming an exact value for mW, any
results or downstream tasks are biassed, as it does not consider the natural width of mW. Secondly, it assumes that the transverse momentum of the neutrino pνT is perfectly captured by pTmiss and does not account for the misidentification, resolution, or mismodelling effects in the lepton or pTmiss reconstruction. These two effects can lead to Equation 3 yielding no real solutions. Here, the convention is to drop the imaginary component. An additional drawback is that even in the case where all objects are perfectly reconstructed, the equation can yield two real solutions. There is typically no strong reason to favour one solution over the other, though the result with the smaller magnitude is usually taken. Alternatively, both solutions are considered in any downstream tasks.”

And to later in the section

“…different approaches need to be optimised for final states with other multiplicities of neutrinos in the final state, for example Neutrino Weighting [28–30] in the case of dilepton ttbar production.”

- Revised statement that v-flows learns resolution of lepton kinematics to be more accurate:
“By providing ν-Flows with additional information from the event, it learns the probabilistic relationship between pTmiss , pℓ, and the target.”

- Revised statement on application of v-flows to final states with multiple neutrinos:
“... while performance is expected to degrade, the architecture of ν-Flows can be trivially scaled to predict any fixed number of neutrino momenta, depending on the chosen underlying process.”

- "decaying leptonically to a bjet..." -> Changed the statement to be:
“The data used in this work consists of simulated ttbar events where exactly one of the top quarks produces a b-jet and leptonically decaying W boson.”

- added an explicit statement on the b-tagging criteria of selected jets: “At least two of the jets are required to pass the b-tagging criteria.”

- "These include...event observables" -> remove comma after "kinematics"

- Add explicit mention that v-flows chooses top 10 jets ordered by pT (NB: this is not a strict requirement for the architecture but done for technical reasons)

- Clarify that coordinate system is part of hyperparameter scan, but that it has only a small impact on performance:
“The coordinate system used to represent the momentum of each physics object, including the neutrino, was optimised as part of a hyperparameter scan, though there is not a strong dependence on coordinate choice. In this study using eta instead of pz was found to deliver the best performance, alongside the natural logarithm of the energy logEj for the lepton and jets.”

- "For cross-validation, 10% of the training is..." -> "For cross-validation, 10% of the training dataset is..."

Performance
- Added information on training and inference time
“The nu-Flow (nu-FF) network was trained using an NVIDIA GeForce RTX 2080 Ti and the minimum validation loss was reached after approximately four (two) hours. Single event inference for one neutrino as measured on an AMD Ryzen 5900Hx is O(20ms). For a single event, multiple solutions can be calculated with the flow in parallel, and multiple events can be processed as a batch, resulting in faster inference times over a full dataset.”

- Clarify definition of v-flows(mode) and how it is determined. As the likelihood here is the same from the loss using the change of variables formula. We now refer directly to using Equation 2 to calculate this probability, and have removed use of jargon and unclear phrasing.

- Added clarification to Mode in the figure captions:
“The eta marginal for full conditional probability density learned by nu-Flows is shown in orange. The nu-Flows(sample) method corresponds to taking a single random sample under the conditional probability distribution and $\nu$-Flows(mode) corresponds to taking the most probable solution, which is equivalent to choosing the value at the peak of the distribution.”

- nu-FF details moved them to a dedicated subsection of Sec3

- "true values of the neutrino" -> "true values of the neutrino momenta"

- Removed statement on why there may be a preference for solutions. We have discussed this amongst ourselves and find that our initial suggestion was assuming too much from the architecture. In principle the deep sets learns how to extract the most useful information to the task from the jets and additional inputs.
In general, the preference comes from the training data themselves, as the learned probability distribution over the neutrino kinematics reflects the data seen during training.

- We have tested the proposal of v-FF being the mean of the 256 examples and have now stated this in the manuscript rather than hypothesising:
“We observe that the nu-FF predictions are almost identical to taking the average of the 256 samples generated by the flow. This is expected as the symmetrical loss function used to train nu-FF collapses the posterior towards its centroid value.”

- Added a statement on investigating poorly reconstructed events, as at yet these have not yet been understood (is it due to the architecture, lack of information, or fundamental to the events):
“An important avenue of future work is investigating the common features of the events with poor reconstruction.”

- "the negative bias in nu-FF is..." -> this sentence has been moved to the previous paragraph

- "too high a variance" -> "a higher variance"

- Added z axis labels to Figs 5, 9, 10, 11

- Addition of many more references for chi2 and neutrino determination

- Added clarification to text for truth assignment used in chi2 method:
“For truth labelling, jets were matched to partons within a radius of delta R < 0.4.
Events containing jets matched to multiple partons were removed from the training and evaluation datasets.”

Conclusions

The conclusion is also suitably shorter, removing reference to studies not performed. However, we directly address that we have not tested this approach on multiple neutrino systems or other final states.

"significant improvements in downstream tasks" -> "in the downstream task of jet-parton assignment"
References

- Fixed typo in collaboration -> Collaboration

- Additional references added

Apendix

- Added an extra plot in the appendix showing nearly no correlations between mW and met+truth pz.

- Added table for the sigma values used in chi2 method to the appendix.

- Added the matching efficiencies for the bhad and whad in a table

Current status:
Has been resubmitted

Reports on this Submission

Anonymous Report 2 on 2023-1-13 (Invited Report)

  • Cite as: Anonymous, Report on arXiv:scipost_202208_00052v2, delivered 2023-01-12, doi: 10.21468/SciPost.Report.6525

Report

Overall I find v2 of this paper to be a significant improvement in terms of language and presentation, with most of my comments addressed.

I would still like to see more on the poorly reconstructed events; eg, some basic checks on if there are common kinematic features in these events, in order to understand better what the network is/isn't learning. Additionally, the discussion the authors included in their response regarding the improvements from including additional jets, giving the truth jet-parton assignments, etc are interesting studies that I think are worth discussing in the paper.

Nonetheless, I think the paper is already of publishable quality.

Requested changes

p3: "of potential process" -> "of potential processes"

p6: The deltaR requirement for truth matching is stated in the text as 0.2, but in the authors response as 0.4 - check the paper is the right number :) Out of interest, how many events were removed due to ambiguous matching, and how many were removed due to incomplete matching?

p6: "This is in contrast to traditional approaches
where different approaches need to be optimised for final states with other multiplicities of
neutrino s in the final state, for example Neutrino Weighting [35–37] in the case of dilepton
t ̄t production"
-> this sentence is messy. I think what is trying to be conveyed is that "there are specific methods for specific neutrino multiplicities"? Since nu-flows would be re-optimised for whatever use-case (as stated in the sentence at the end of section 2), I think this sentence should be revisited. Again, since no attempt is made at final states with >1 neutrino, I think one should be careful on overpromising extensions because it is unclear how performant it will be.



p9: Please add discussion of the mass constraint method shown in Fig 5. The observation that v-FF more or less gives the average of the v-Flows method is interesting; I almost wonder if a combination of the v-FF and constraint methods would be able to give a prediction of similar quality to the full v-Flows method by using the distance from the v-FF to the two quadratic solutions to weight them, and what this would imply about what v-Flows is really learning.


p11: I find the text on fig7 to be very difficult to read - can the figures be made a bit larger to improve this? Fig 11,12,13,15 are better in that way, though increasing the text size slightly in all of these would also be good

p12: Some additional references for jet-parton assignment with deep learning are still missing eg https://arxiv.org/abs/2010.09206 and https://arxiv.org/abs/2012.03542


references: check for author/collaboration formatting, eg [37] is "T. C. Collaboration" :)

  • validity: high
  • significance: ok
  • originality: high
  • clarity: high
  • formatting: excellent
  • grammar: excellent

Anonymous Report 1 on 2023-1-11 (Invited Report)

Report

I find the author much improved manuscripts and the details replied to the comments received.

First of all, I agree with the authors statement that obtaining the posterior distribution of the neutrino conditional on a high-dimensional space is indeed not trivial and is a problem that your method solves well. However, I still believe that further (quantitative) evidence needs to be presented to claim that this improves the physics reach of ongoing analysis.

With the current draft rescoping, as manifested by the improved Introduction and Conclusion sections, I believe that this article is now suitable for publication.

  • validity: good
  • significance: ok
  • originality: good
  • clarity: good
  • formatting: good
  • grammar: good

Login to report or comment