SciPost Submission Page
Data Reduction for Low Energy Nuclear Physics Experiments Using Data Frames
by Caleb Marshall
This Submission thread is now published as
Submission summary
Authors (as registered SciPost users): | Caleb Marshall |
Submission information | |
---|---|
Preprint Link: | https://arxiv.org/abs/2406.15404v2 (pdf) |
Code repository: | https://github.com/camarsha/sauce |
Data repository: | https://github.com/camarsha/sauce/releases/tag/v2.0.0 |
Date accepted: | 2024-10-14 |
Date submitted: | 2024-10-01 16:46 |
Submitted by: | Marshall, Caleb |
Submitted to: | SciPost Physics Codebases |
Ontological classification | |
---|---|
Academic field: | Physics |
Specialties: |
|
Approach: | Computational |
Abstract
Low energy nuclear physics experiments are transitioning towards fully digital data acquisition systems. Realizing the gains in flexibility afforded by these systems relies on equally flexible data reduction techniques. In this paper, methods utilizing data frames and in-memory techniques to work with data, including data from self-triggering, digital data acquisition systems, are discussed within the context of a Python package, \texttt{sauce}. It is shown that data frame operations can encompass common analysis needs and allow interactive data analysis. Two event building techniques, dubbed referenced and referenceless event building, are shown to provide a means to transform raw list mode data into correlated multi-detector events. These techniques are demonstrated in the analysis of two example data sets.
Published as SciPost Phys. Codebases 37-r2.0 (2024) , SciPost Phys. Codebases 37 (2024)
Reports on this Submission
Report #1 by Jim Pivarski (Referee 2) on 2024-10-2 (Invited Report)
- Cite as: Jim Pivarski, Report on arXiv:2406.15404v2, delivered 2024-10-02, doi: 10.21468/SciPost.Report.9838
Strengths
1. The author addressed the issues I raised.
Weaknesses
1. The text is patched, rather than integrated.
Report
The text about CMS and ATLAS event sizes was appended after the text about CDF and D$\emptyset$; really, you only need the modern numbers. Also, the "size of an event" in HEP is ambiguous—it depends a great deal on which stage of processing.
About ragged array implementations, the motivating section still compares the paper's technique against a strawman ragged array implementation, with a shout-out to Coffea and Awkward Array as "superior implementations." That's not what I was asking for in my first review. This paper's implementation and the Coffea/Awkward/Apache Arrow/ROOT RNTuple implementation are both ragged arrays, but indexed in complementary ways. (This paper's implementation is similar to, but not exactly the same as, Apache Parquet.)
But I won't request any further changes.
Requested changes
None.
Recommendation
Publish (meets expectations and criteria for this Journal)