SciPost logo

SciPost Submission Page

Modeling Hadronization using Machine Learning

by Phil Ilten, Tony Menzo , Ahmed Youssef , and Jure Zupan

This Submission thread is now published as

Submission summary

Authors (as registered SciPost users): Tony Menzo · Ahmed Youssef
Submission information
Preprint Link: scipost_202204_00035v2  (pdf)
Code repository: https://gitlab.com/uchep/mlhad
Date accepted: 2022-09-09
Date submitted: 2022-08-26 19:02
Submitted by: Youssef, Ahmed
Submitted to: SciPost Physics
Ontological classification
Academic field: Physics
Specialties:
  • High-Energy Physics - Phenomenology

Abstract

We present the first steps in the development of a new class of hadronization models utilizing machine learning techniques. We successfully implement, validate, and train a conditional sliced-Wasserstein autoencoder to replicate the Pythia generated kinematic distributions of first-hadron emissions, when the Lund string model of hadronization implemented in Pythia is restricted to the emissions of pions only. The trained models are then used to generate the full hadronization chains, with an IR cutoff energy imposed externally. The hadron multiplicities and cumulative kinematic distributions are shown to match the Pythia generated ones. We also discuss possible future generalizations of our results.

Author comments upon resubmission

We thank the referees for their careful reading of the manuscript and constructive comments. Below please find our answers to the referees, as well as the list of changes made.

List of changes

General changes:
We have updated the description of the figures.
Section 2.2 has been split into two subsections (2.2 and 2.3). Section 2.2 details the architecture while section 2.3 focuses on training.
We have changed “Conclusion” to “Conclusion and Outlook”

Changes in the introduction on page 2:
We have added references to recent automated NLO work and updated parton showers in the first paragraph of the introduction.
We have added references for ML-based simulations for the parton shower and detector simulation.
We reformulated “The present manuscript represents a proof of principle that building a full-fledged ML based hadronization framework is possible.” to “The present manuscript represents [the first step toward building a full-fledged ML based hadronization framework. ”
We added normalizing flows as a commonly used generative model.
We added the sentence “In addition, conditional generative models give more flexibility and control of the output” with corresponding references to introduce the concept of conditional generative models.
In the last paragraph of page 2, we removed the word “unique” for the challenges.

Changes page 3:
We added the last sentence in the first paragraph to point the reader to further steps.
We added the first paragraph in sec. 2.1 to explain that both parton shower and hadronization can be well described using Markov chain frameworks. We also added a sentence to the end of the second paragraph of section 2.1, to emphasize what the goal of the ML framework for hadronization is.
We changed “ QCD string” to “QCD color flux tubes, or strings”

Changes page 4:
We added a sentence at the beginning of page 4 highlighting the goal of our ML framework.

Changes page 5
We have added a clarification sentence above eq. 3 on how eq. 3 relates to the variables we train on.

Changes on page 6
We have added two explanatory sentences at the end of Section 2.1 about what we want to achieve with ML model of hadronization.
We have added extra explanatory sentences for the SWD at the end of the first paragraph of Section 2.2.

Changes on page 8:
We have added a comment about the norm used in the lost function below eq. (9) and expanded the discussion of the sliced Wasserstein distance

Changes on page 11:
We have added a description of the variables in table 1.

Changes on page 18
We extended the discussion about possible observables and further steps in the last paragraph in “Conclusion and Outlook”.

Published as SciPost Phys. 14, 027 (2023)


Reports on this Submission

Report #1 by Anonymous (Referee 3) on 2022-9-6 (Invited Report)

Report

Thank you for taking into account my feedback. I still not really understand why you need to generate 100 configurations and only pick one (it does seem to be an inefficiency and not merely a "quirk" of the approach), but I won't insist further.

  • validity: -
  • significance: -
  • originality: -
  • clarity: -
  • formatting: -
  • grammar: -

Login to report or comment