SciPost Submission Page
NuHepMC: A standardized event record format for neutrino event generators
by Steven Gardiner, Joshua Isaacson, Luke Pickering
This is not the latest submitted version.
Submission summary
| Authors (as registered SciPost users): | Steven Gardiner |
| Submission information | |
|---|---|
| Preprint Link: | https://arxiv.org/abs/2310.13211v2 (pdf) |
| Code repository: | https://github.com/NuHepMC/Spec |
| Code version: | 0.9.0 |
| Code license: | CC BY 4.0 |
| Date submitted: | June 27, 2024, 8:59 p.m. |
| Submitted by: | Steven Gardiner |
| Submitted to: | SciPost Physics Codebases |
| Ontological classification | |
|---|---|
| Academic field: | Physics |
| Specialties: |
|
| Approach: | Computational |
Abstract
Simulations of neutrino interactions are playing an increasingly important role in the pursuit of high-priority measurements for the field of particle physics. A significant technical barrier for efficient development of these simulations is the lack of a standard data format for representing individual neutrino scattering events. We propose and define such a universal format, named NuHepMC, as a common standard for the output of neutrino event generators. The NuHepMC format uses data structures and concepts from the HepMC3 event record library adopted by other subfields of high-energy physics. These are supplemented with an original set of conventions for generically representing neutrino interaction physics within the HepMC3 infrastructure.
Current status:
Reports on this Submission
Report #2 by Anonymous (Referee 2) on 2024-10-7 (Invited Report)
- Cite as: Anonymous, Report on arXiv:2310.13211v2, delivered 2024-10-07, doi: 10.21468/SciPost.Report.9871
Strengths
Weaknesses
Report
Requested changes
- To deal with the fact that only in some cases the averaged cross section is known at the beginning of the simulation, the mixed solution of G.C.5 and E.C.4 is proposed. However, the association of unrealistic <sigma> to events generated early in the run could lead to confusion and mishandling among the users. I wonder if <sigma> could be stored by a procedure analogous to G.C.5 but at the end of the run, rather than at the beginning. Probably, there are reasons why a solution of this kind is not possible/feasible/convenient but perhaps a short explanation, not just for this referee but for the readers and potential users would be useful.
- In G.R.8, the authors refer to “generator-specific quasiparticles”, presumably with unresolved nuclear remnants in mind. However, the name “quasiparticle” has a different meaning in physics, which could lead to misunderstanding. It is therefore advisable to find a different name. For example, if no other “quasiparticles” than nuclear remnants are expected, the later term could be directly used.
- In connection to the previous point, in P.C.2, “the new particle number 2009900000 to correspond to a nuclear remnant pseudo-particle” is defined. On the other hand, in a given simulation there might be more than one kind of such remnants, for instance when there are different targets. Wouldn’t it then be convenient to have a range rather than a single number associated to these states?
- The authors strongly recommended adopting picobarns as cross section unit but they are certainly aware that its use is practically inexistent in the neutrino cross section literature.
- In table 1, ID ranges are associated with process categories. The latter correspond to the common classification adopted in the field but it is not necessarily a good choice because it is model dependent and, in some cases, there is no consensus about the kinematic boundaries of each region. Furthermore, not all generators provide information in these terms in their output. In my opinion it would be wise to be able to redefine this structure in the future.
- The beginning of Sec. 4 refers to “preliminary tools for converting proprietary neutrino event formats to the NuHepMC standard”. Such tools would be highly valuable for potential users. I understand the authors are not yet in the position to include these tools in this release but a sentence promising them (soon) would be encouraging.
- In connection to the MARLEY prediction in Fig 1 (right) it is stated that “adding future data points to the plot would be easily achieved with the present NuHepMC-based workflow.” It is not clear in which way the new format would help in this comparison. I would naively say that once the results of the simulation have been obtained, adding (future) data to the plot should be straightforward no matter the format.
- Finally, with the font chosen for Fig. 1, characters “I” and “1” look strange.
Recommendation
Ask for minor revision
Report #1 by Anonymous (Referee 1) on 2024-9-11 (Invited Report)
- Cite as: Anonymous, Report on arXiv:2310.13211v2, delivered 2024-09-11, doi: 10.21468/SciPost.Report.9745
Strengths
Weaknesses
Report
Requested changes
General comments:
Repository: It would be highly beneficial to include a link to a repository. Specifically, I recommend that the authors provide the scripts used to generate Figure 1. This will greatly enhance the utility of the manuscript for readers.
Dependencies: The authors should clarify the dependencies that the proposed format introduces for event generators. For instance, does this imply that generators need to install HepMC3? Does this limit the format to C++-based generators, or are there options for Python-based codes? Are there Python bindings for the NuHepMC structures? A detailed explanation of how to integrate this format into existing generators and any associated limitations would be helpful.
Specific comments:
-
Page 8: The attributes for "Beam Energy Distribution Description" are not well-suited for atmospheric neutrino experiments, which involve multiple energies and baselines. The manuscript should clarify how such information can be incorporated.
-
Page 9: The term "Nevents" should be clarified—does it refer to the number of generated events or the number of interactions?
-
Page 9: Currently, "ParticleStatusInfo" is labeled as a "Suggestion". Why not formalize this as a "Convention"?
-
Page 9: It might be useful to include a label that informs users about the reference frame used for all the provided data.
-
Page 11: Table 1 lists several processes. The difference between SIS and DIS should be clearly explained for better understanding.
-
Page 13: The link between "outgoing real particle" and "observable" is unclear. For instance, particles like taus, which can decay but are observable at high energy, could fit this category. This distinction should be explained in more detail.
-
Page 21-22: Figures 2, 3, and 4 are difficult to read. Please enlarge these figures for better legibility.
Recommendation
Publish (meets expectations and criteria for this Journal)

Author: Steven Gardiner on 2025-06-20 [id 5587]
(in reply to Report 1 on 2024-09-11)We thank the referee for the review of the manuscript and the helpful comments. Replies and explanations of changes made before resubmission are given below.
We thank the referee for this suggestion and agree that this addition would be useful. The location of the code to generate the right-hand plot from Figure 1 has now been referenced in the footnotes. We have also added a link in the main text to a supplemental repository (https://github.com/NuHepMC/cpputils) that provides example C++ scripts for manipulating events stored in the NuHepMC format. The MicroBooNE data set shown in Fig. 1 is available for use in automated NuHepMC-based comparisons within the NUISANCE software package.
We agree that this is an important clarification, and we have added some text to the paper accordingly.
Because NuHepMC represents a set of guidelines for using existing data structures from HepMC3 to represent neutrino scattering events, there are no extra dependencies required beyond anything needed to process generic HepMC3 events. Since the HepMC3 data format can be represented in simple ASCII text files, an event generator may produce output compliant with the NuHepMC standard without recourse to any external code whatsoever. Exactly this approach was used in draft code provided by the authors for proposed inclusion in a release of the GiBUU event generator; the NuHepMC output implementation was written in regular Fortran with no external dependency.
That being said, the HepMC3 reference library provides a standard implementation of the event format with many convenient tools that facilitate reading/writing the metadata recommended in the NuHepMC standard. We have ourselves created some example NuHepMC tools (C++ and Python) that themselves depend on the reference library. In particular, more advanced applications of NuHepMC, such as interoperability between multiple event generators, will likely be much simpler to implement by relying on the reference HepMC3 library.
We agree with the referee that these attributes are not well-suited for atmospheric neutrino experiments. Standardizing representations of general neutrino fluxes is a complex topic that we leave to future work, but we wanted to provide an initial convention for representing some relevant information in the specific context of the accelerator neutrino community.
We have therefore added text to the paper that highlights that this is a preliminary definition of how to store flux metadata into the NuHepMC event record. We emphasize the need for additional work on standardizing flux formats across multiple kinds of neutrino experiments.
We thank the referee for pointing out the need for greater clarity on this point. The "Nevents" attribute was intended to assist users in reproducing the events from a particular event generator run; we consider its exact interpretation to be an implementation detail. However, we have tidied up this section, and this example no longer appears in the manuscript.
We believe that the referee is referring to G.S.2 here, but we are not completely sure. Note that event generators are required (originally submitted G.R.6, now G.R.10) to provide definitions of all particle status codes used beyond those already standardized in HepMC3. The suggestion G.S.2 only adds explicit definitions of the HepMC3 codes as well. Since an unambiguous interpretation of the output is possible without this addition, for now we prefer to leave the inclusion of the standard codes as only a suggestion.
There is a de facto standard among event generators that the lab frame is used to record particle momenta and vertex positions. This choice is also most convenient for use in experimental production workflows and in comparing predictions to cross-section measurements. We therefore adopt the lab frame in keeping with standard practice in the field.
However, we believe all information needed to transform to other reference frames can be calculated from the 4-vectors stored in the event. If it is desirable to report observables in other frames in the context of a specific model, these may be added as metadata to relevant particles and vertices as appropriate. We have added some text to the manuscript to discuss this issue in light of this comment from the referee. In particular, a new requirement (E.R.9) makes the choice of the lab frame explicit.
We agree that the boundary between SIS and DIS is ill defined, and we have updated the text to add an explanation. Ultimately we emphasize that these process ID ranges represent rough guidelines for generator authors rather than strict definitions.
We initially intended these phrases to be synonyms in this context. To improve clarity, we have decided to simply remove the relevant sentence.
The figures have been enlarged as requested.