SciPost Submission Page
Data-parallel leading-order event generation in MadGraph5_aMC@NLO
by Stephan Hageböck, Daniele Massaro, Olivier Mattelaer, Stefan Roiser, Andrea Valassi, Zenny Wettersten
Submission summary
| Authors (as registered SciPost users): | Zenny Wettersten |
| Submission information | |
|---|---|
| Preprint Link: | https://arxiv.org/abs/2507.21039v2 (pdf) |
| Code repository: | https://github.com/madgraph5/madgraph4gpu |
| Date submitted: | Aug. 4, 2025, 2:28 p.m. |
| Submitted by: | Wettersten, Zenny |
| Submitted to: | SciPost Physics |
| Ontological classification | |
|---|---|
| Academic field: | Physics |
| Specialties: |
|
| Approaches: | Computational, Phenomenological |
Abstract
The CUDACPP plugin for MadGraph5_aMC@NLO aims to accelerate leading order tree-level event generation by providing the MadEvent event generator with data-parallel helicity amplitudes. These amplitudes are written in templated C++ and CUDA, allowing them to be compiled for CPUs supporting SSE4, AVX2, and AVX-512 instruction sets as well as CUDA- and HIP-enabled GPUs. Using SIMD instruction sets, CUDACPP-generated amplitude routines routines are shown to speed up linearly with SIMD register size, and GPU offloading is shown to provide acceleration beyond that of SIMD instructions. Additionally, the resulting speed-up in event generation perfectly aligns with predictions from measured runtime fractions spent in amplitude routines, and proper GPU utilisation can speed up high-multiplicity QCD processes by an order of magnitude when compared to optimal CPU usage in server-grade CPUs.
Author indications on fulfilling journal expectations
- Provide a novel and synergetic link between different research areas.
- Open a new pathway in an existing or a new research direction, with clear potential for multi-pronged follow-up work
- Detail a groundbreaking theoretical/experimental/computational discovery
- Present a breakthrough on a previously-identified and long-standing research stumbling block
Current status:
Reports on this Submission
Strengths
2 - Relevant benchmarks of the event generator in realistic use cases
3 - Reproducible runtimes and scaling tests
Weaknesses
2 - Some data only contained in figures should be made publicly available
3 - The GPU development targets only NVidia hardware
Report
Requested changes
The manuscript is well written and most of the data are presented in a reproducible fashion. Only a few minor changes are needed: 1- In addition to Refs.[13,14], I would ask the authors to cite similar developments from collaborations other than their own, in particular arXiv:1905.05120, arXiv:2107.06625, arXiv:2109.11964, arXiv:2112.09588, arXiv:2203.07460, arXiv:2209.00843, arXiv:2302.10449, arXiv:2309.13154, arXiv:2506.06203, arXiv:2505.13608 2- In addition to Refs.[10-12], arXiv:2110.15211 should be cited. 3- Page 3, 3rd paragraph: "[..] never reached production" should either read "[..] never reached production quality" or "[..] was never used in production", depending on which scenario the sentence is supposed to describe 4- Page 4, 1st paragraph: "gridpacks" is technical jargon and should not be used in the introduction without an explanation 5- Page 11, Sec. 4, 2nd paragraph. It would be helpful if the authors could discuss or at least mention the possible effects of roundoff error on higher-order calculations, and whether their FP32 code base could still be of use as a component of the MadGraph NLO event generation framework 6- Page 12, 4th paragraph. It would be helpful if the authors would briefly explain why large gauge cancellations arise in the VBF process 7- Page 14, 1st paragraph. It is not clear what the statement on the washing out of roundoff errors means. I would argue that the roundoff error should always be smaller than the statistical precision of the event sample, and in many cases much smaller. This statement is entirely independent of the parametric precision of the calculation. Take for example the production of Z+b at the LHC. Even though the theory precision is no better than 10%, a 10% roundoff error on the mass of the b-quark in the final state would be detrimental, as it would change deadcone effects and the spectrum of the B hadrons, which can be resolved through vertexing. 8- Figs.11 and 22 should be made available in their original format, i.e. the searchable and clickable flame graph, which allows to investigate the entire call stack.
Recommendation
Ask for minor revision
