Data Reduction for Low Energy Nuclear Physics Experiments Using Data Frames

Caleb Marshall

SciPost Submission Page

Data Reduction for Low Energy Nuclear Physics Experiments Using Data Frames

by Caleb Marshall

This Submission thread is now published as

Submission summary

Authors (as registered SciPost users):

Caleb Marshall

Submission information
Preprint Link:	https://arxiv.org/abs/2406.15404v2 (pdf)
Code repository:	https://github.com/camarsha/sauce
Code version:	2.0.0
Code license:	MIT License
Data repository:	https://github.com/camarsha/sauce/releases/tag/v2.0.0
Date accepted:	Oct. 14, 2024
Date submitted:	Oct. 1, 2024, 4:46 p.m.
Submitted by:	Caleb Marshall
Submitted to:	SciPost Physics Codebases

Ontological classification
Academic field:	Physics
Specialties:	Nuclear Physics - Experiment
Approach:	Computational

Abstract

Low energy nuclear physics experiments are transitioning towards fully digital data acquisition systems. Realizing the gains in flexibility afforded by these systems relies on equally flexible data reduction techniques. In this paper, methods utilizing data frames and in-memory techniques to work with data, including data from self-triggering, digital data acquisition systems, are discussed within the context of a Python package, \texttt{sauce}. It is shown that data frame operations can encompass common analysis needs and allow interactive data analysis. Two event building techniques, dubbed referenced and referenceless event building, are shown to provide a means to transform raw list mode data into correlated multi-detector events. These techniques are demonstrated in the analysis of two example data sets.

Published as SciPost Phys. Codebases 37 (2024) , SciPost Phys. Codebases 37-r2.0 (2024)

Reports on this Submission

Report #1 by Jim Pivarski (Referee 2) on 2024-10-2 (Invited Report)

Cite as: Jim Pivarski, Report on arXiv:2406.15404v2, delivered 2024-10-02, doi: 10.21468/SciPost.Report.9838

Strengths

The author addressed the issues I raised.

Weaknesses

The text is patched, rather than integrated.

Report

The text about CMS and ATLAS event sizes was appended after the text about CDF and D$\emptyset$; really, you only need the modern numbers. Also, the "size of an event" in HEP is ambiguous—it depends a great deal on which stage of processing.

About ragged array implementations, the motivating section still compares the paper's technique against a strawman ragged array implementation, with a shout-out to Coffea and Awkward Array as "superior implementations." That's not what I was asking for in my first review. This paper's implementation and the Coffea/Awkward/Apache Arrow/ROOT RNTuple implementation are both ragged arrays, but indexed in complementary ways. (This paper's implementation is similar to, but not exactly the same as, Apache Parquet.)

But I won't request any further changes.

Requested changes

None.

Recommendation

Publish (meets expectations and criteria for this Journal)

validity: high
significance: high
originality: good
clarity: good
formatting: good
grammar: excellent

SciPost Submission Page

Data Reduction for Low Energy Nuclear Physics Experiments Using Data Frames

by Caleb Marshall

Submission summary

Abstract

Reports on this Submission

Report #1 by Jim Pivarski (Referee 2) on 2024-10-2 (Invited Report)

Strengths

Weaknesses

Report

Requested changes

Recommendation

Login to report or comment