SciPost logo

SciPost Submission Page

Making Digital Objects FAIR in High Energy Physics: An Implementation for Universal FeynRules Output (UFO) Models

by Mark S. Neubauer, Avik Roy, Zijun Wang

This is not the latest submitted version.

Submission summary

Authors (as registered SciPost users): Avik Roy
Submission information
Preprint Link: https://arxiv.org/abs/2209.09752v3  (pdf)
Code repository: https://github.com/Neubauer-Group/UFOManager
Data repository: https://github.com/Neubauer-Group/UFOMetadata/
Date submitted: 2023-02-03 03:52
Submitted by: Roy, Avik
Submitted to: SciPost Physics Codebases
Ontological classification
Academic field: Physics
Specialties:
  • High-Energy Physics - Experiment
  • High-Energy Physics - Phenomenology
Approaches: Experimental, Computational

Abstract

Research in the data-intensive discipline of high energy physics (HEP) often relies on domain-specific digital contents. Reproducibility of research relies on proper preservation of these digital objects. This paper reflects on the interpretation of principles of Findability, Accessibility, Interoperability, and Reusability (FAIR) in such context and demonstrates its implementation by describing the development of an end-to-end support infrastructure for preserving and accessing Universal FeynRules Output (UFO) models guided by the FAIR principles. UFO models are custom-made python libraries used by the HEP community for Monte Carlo simulation of collider physics events. Our framework provides simple but robust tools to preserve and access the UFO models and corresponding metadata in accordance with the FAIR principles.

List of changes

- The code repository UFOManager has been modified to be usable as a Python package, the corresponding discussion in the manuscript about how to use this package has been accordingly modified
- Instructions to setup the necessary environments to run this package have been updated
- Validation checks have been improved. Additional checks to verify the validity of PDG IDs have been added. The enriched metadata also contains additional dictionaries to separately accommodate SM particle, BSM particles with valid PDG IDs, and BSM particles with PDG-like IDs
- A demonstration of using UFOManager to search for models has been added in Table B2
- List of FAIRified UFO models in table C3 (previously B2) has been standardized
- Some of the text and references in section 1 and 2 have been modified following the referee's suggestions
- Some typos and grammatical errors have been fixed

Current status:
Has been resubmitted

Reports on this Submission

Report #1 by Anonymous (Referee 2) on 2023-3-15 (Invited Report)

  • Cite as: Anonymous, Report on arXiv:2209.09752v3, delivered 2023-03-15, doi: 10.21468/SciPost.Report.6903

Report

The authors have addressed most of the points of the first referee report to satisfaction. A few minor points remain to be corrected in the manuscript as listed below. Once this is taken care of, I recommend publication in SciPost Physics Codebases.

- Clickable links are provided throughout the text for some repositories or tools, but not for all. For example, links are given for NNPDF Collaboration, github and Zenodo, but not for HEPData or MC simulation tools. This should be systematically.

- The introduction states that "While collaborations facilitate the internal preservation of their data and common software frameworks for collaboration-wide usage, preservation of digital resources independently de- veloped by smaller research groups or individuals is equally important to be able to reproduce the results from HEP research. " This gives the impression that data and software preservation by large experimental collaborations is for collaboration-internal use only. This is utterly misleading to my mind. There is huge benefit of making this material open to the public; results from HEP research shouldn't be reproducible only within the collaborations but by the whole HEP community. Please revise!

- Refs. [12–14] are still incomplete, but since the text says "for example" it can be acceptable.

- The acronym BSM is defined more than once.

  • validity: low
  • significance: low
  • originality: low
  • clarity: ok
  • formatting: good
  • grammar: good

Login to report


Comments

Anonymous on 2023-02-03  [id 3306]

We would like to thank the editor for arranging the review of this paper and the reviewer for their insightful comments and suggestions. Based on the comments we have received during review, this revised version is submitted to SciPost Physics Codebases. The following is a detailed list of changes accommodated in the current submission based on the responses we have received-

provide the UFOManager as a python tool where upload and download functionalities as well as python2 or python3 can be given as arguments; clearly describe requirements and dependencies, and make the UFOManager importable as a library;

Response: We have implemented UFOManager as a python package (while retaining the simplistic, standalone use of Upload and Download scripts). It internally determines the Python version being used and uses the scripts accordingly. Specific dependencies provided as requirements_Python2.txt and requirements_Python3.txt can be used to build dedicated conda environments (or virtual environment) based on the needs of the user. Details about some of the specific dependencies are detailed in a separate section of the current README

improve validation checks

Response: Following recommendations above, validation checks now include checking the validity of PDG code assigned to each particle as well as checking that the particle’s spin and charge conform to the expected spin and charge of the particle based on the assigned PDG code. Besides the dictionary of “All particles”, Metadata now contains dictionaries of “SM particles”, “BSM particles with valid PDG codes” and “BSM particles with non-standard PDG codes”. It also includes an additional flag to determine whether a model allows NLO calculations.

extend and standardise the list of models in Table B2;

Response: We have standardized the model namings in table C3 (which is Table B2 from the previous version). The model names also represent the title of the corresponding digital entry in Zenodo. Hence, we keep the prefix phrase “UFO Model for” with these titles to ensure that the content-type of these entries is clear to any user who comes across them. To avoid redundancy in the paper, we have omitted this prefix from the table entries in Table C3. Our implementation serves as a demonstrative example of FAIRifying UFO models. Since we don’t own the individual models, we cannot arbitrarily FAIRify models authored by others. Table B2 only includes FAIRified UFO models for which we have received permission to use our tools to demonstrate the FAIRification of models.

include a demonstration of how to query for models with specific properties and/or particle content;

Response We have added a demonstration of searching for models using PDG ID in Table B2.

revise sections 1 and 2 as indicated in the report;

Response: We have added the requested revisions detailed in the report. These include-

Modified description of HEPData: We have rephrased the description of HEPData as “For instance, the Durham High-Energy Physics Database (HEPData)~\cite{hepdata} is an open-access repository established for preserving and sharing scattering data from HEP experiment, containing digitized details of plots and tables from thousands of physics analysis publications.” This description closely follows the “About HEPData” section in the HEPData website. In the following line “dedicated data analysis frameworks” has been replaced by “data analysis code and frameworks” to reflect that it refers to RECAST implementation of analysis codes and frameworks developed for individual analyses.

Added/Modified references: Ref 13 (previously 12) has been added to the reference list of [15,16] (previously [14,15]), also the Snowmass report has been cited (Ref. 17).

Removed confusing statement about UFO compatibility: We have removed the sentence "UFO models have been demonstrated to be compatible with other MC generators as well" to avoid any confusion. The immediately preceding line has been rephrased as "Developing new physics models as python libraries enabled using the same digital format to be interoperable across different MC generation platforms like MadGraph, Herwig, and Sherpa [6,7,28]."

*Other: * Changed “simulated MC events” to “MC events” in section 2

there are a number of typos to be corrected.

Response: The entire manuscript has been carefully checked once more for typos and grammatical mistakes and those issues have been fixed.