An algorithm to parallelise parton showers on a GPU

Seymour, Michael H.; Sule, Siddharth

doi:10.21468/SciPostPhysCodeb.33

SciPost Physics Codebases

An algorithm to parallelise parton showers on a GPU

Michael H. Seymour, Siddharth Sule

SciPost Phys. Codebases 33 (2024) · published 12 August 2024

doi: 10.21468/SciPostPhysCodeb.33
pdf
live repo (external)
Submissions/Reports

This Publication is part of a bundle

When citing, cite all relevant items (e.g. for a Codebase, cite both the article and the release you used).

	DOI	Type
	10.21468/SciPostPhysCodeb.33	Article
	10.21468/SciPostPhysCodeb.33-r1.1	Codebase release

Abstract

The Single Instruction, Multiple Thread (SIMT) paradigm of GPU programming does not support the branching nature of a parton shower algorithm by definition. However, modern GPUs are designed to schedule threads with diverging processes independently, allowing them to handle such branches. With regular thread synchronisation and careful treatment of the individual steps, one can simulate a parton shower on a GPU. We present a Sudakov veto algorithm designed to simulate parton branching on multiple events in parallel. We also release a CUDA C++ program that generates matrix elements, showers partons and computes jet rates and event shapes for LEP at 91.2 GeV on a GPU. To benchmark its performance, we also provide a near-identical C++ program designed to simulate events serially on a CPU. While the consequences of branching are not absent, we demonstrate that a GPU can provide the throughput of a many-core CPU. As an example, we show that the time taken to shower $10^6$ events on one NVIDIA TESLA V100 GPU is equivalent to that of 295 Intel Xeon E5-2620 CPU cores.

TY  - JOUR
PB  - SciPost Foundation
DO  - 10.21468/SciPostPhysCodeb.33
TI  - An algorithm to parallelise parton showers on a GPU
PY  - 2024/08/12
UR  - https://scipost.org/SciPostPhysCodeb.33
JF  - SciPost Physics Codebases
JA  - SciPost Phys. Codebases
SP  - 33
A1  - Seymour, Michael H.
AU  - Sule, Siddharth
AB  - The Single Instruction, Multiple Thread (SIMT) paradigm of GPU programming does not support the branching nature of a parton shower algorithm by definition. However, modern GPUs are designed to schedule threads with diverging processes independently, allowing them to handle such branches. With regular thread synchronisation and careful treatment of the individual steps, one can simulate a parton shower on a GPU. We present a Sudakov veto algorithm designed to simulate parton branching on multiple events in parallel. We also release a CUDA C++ program that generates matrix elements, showers partons and computes jet rates and event shapes for LEP at 91.2 GeV on a GPU. To benchmark its performance, we also provide a near-identical C++ program designed to simulate events serially on a CPU. While the consequences of branching are not absent, we demonstrate that a GPU can provide the throughput of a many-core CPU. As an example, we show that the time taken to shower $10^6$ events on one NVIDIA TESLA V100 GPU is equivalent to that of 295 Intel Xeon E5-2620 CPU cores.
ER  -

@Article{10.21468/SciPostPhysCodeb.33,
	title={{An algorithm to parallelise parton showers on a GPU}},
	author={Michael H. Seymour and Siddharth Sule},
	journal={SciPost Phys. Codebases},
	pages={33},
	year={2024},
	publisher={SciPost},
	doi={10.21468/SciPostPhysCodeb.33},
	url={https://scipost.org/10.21468/SciPostPhysCodeb.33},
}

@Article{10.21468/SciPostPhysCodeb.33-r1.1,
	title={{Codebase release 1.1 for GAPS}},
	author={Michael H. Seymour and Siddharth Sule},
	journal={SciPost Phys. Codebases},
	pages={33-r1.1},
	year={2024},
	publisher={SciPost},
	doi={10.21468/SciPostPhysCodeb.33-r1.1},
	url={https://scipost.org/10.21468/SciPostPhysCodeb.33-r1.1},
}

Cited by 1

Authors / Affiliation: mappings to Contributors and Organizations

See all Organizations.

¹ Michael H. Seymour,
¹ Siddharth Sule

¹ University of Manchester

Funder for the research work leading to this publication

Science and Technology Facilities Council [STFC]