SciPost Submission Page
PCJeDi: Diffusion for Particle Cloud Generation in High Energy Physics
by Matthew Leigh, Debajyoti Sengupta, Guillaume Quétant, John Andrew Raine, Knut Zoch, Tobias Golling
This is not the latest submitted version.
This Submission thread is now published as
Submission summary
Authors (as registered SciPost users):  John Raine · Debajyoti Sengupta 
Submission information  

Preprint Link:  scipost_202309_00013v1 (pdf) 
Code repository:  https://github.com/rodemhep/PCJeDi 
Date submitted:  20230912 15:01 
Submitted by:  Raine, John 
Submitted to:  SciPost Physics 
Ontological classification  

Academic field:  Physics 
Specialties: 

Approach:  Phenomenological 
Abstract
In this paper, we present a new method to efficiently generate jets in High Energy Physics called PCJeDi. This method utilises scorebased diffusion models in conjunction with transformers which are well suited to the task of generating jets as particle clouds due to their permutation equivariance. PCJeDi achieves competitive performance with current stateoftheart methods across several metrics that evaluate the quality of the generated jets. Although slower than other models, due to the large number of forward passes required by diffusion models, it is still substantially faster than traditional detailed simulation. Furthermore, PCJeDi uses conditional generation to produce jets with a desired mass and transverse momentum for two different particles, top quarks and gluons.
Author comments upon resubmission
Please find detailed replies to each point raised below, and a note where the text has been changed.
Best regards,
The authors
List of changes
Reviewer 1
1 Fix the choice of MLE weighting \lambda(t) = \beta(t) and not \beta(t)^2 in the text and Eq.7
>> Thank you for pointing out this error as it should indeed be without the square. We can confirmed that this was a typo in the paper only, and the corect formula for the MLE weighting was used in the code.
2 Explicitly mention the choice of parameters used as they are introduced in the text: beta parameterization and upper boundary used in Eq. 2, choice of alpha in Eq.7 , resulting choice of \sigma(t) and \gamma(t).
>> We have moved the details on the choice of the diffusion schedule from appendix to main body. As explained in the text, we treat sigma and gamma as first class citizens and use them to derive beta. Therefore the section now contains explicit forms for all these terms but they are introduced only after Eqn 5 (the intro of gamma and beta).
3 Directly after Eq.7 to mention the use of Huber loss instead of MSE or add that directly to the loss function.
>> Added the following sentance after the equation: "In practice we find that using Huber loss instead of the Frobenius norm results in faster training and better generation quality."
4 Add an explanation on how \alpha is chosen and the impact of this choice on the results presented. Good but not necessary: comparison of the results when \alpha=0 or at least a comment on the impact of this choice in the results presented.
>> Optimising alpha was part of our model gridsearch. We added a sentance explaining that values from 0.0001 and 0.01 were scaned and we and selected the best performing.
Reviewer 2
 There are a few more references that used point clouds for jet physics, for example 2102.05073. Maybe the authors could check if they missed more related literature and expand section 2. The HEPML living review could be helpful for this.
>> We have cross checked in the living review and added a few more citations here and into section 2
 For the comparison to the MPGAN, did you train your own MPGAN? Was the performance of that as in the respective paper?
>> We used the public provided weights of the MPGAN network in the publication and generated equal number of events as we do with pcjedi. We use a higher stats for calculation of metrics and uncertainties and wanted the fairest comparison, therefore the numbers do not necessarily align with those in the original MPGAN publication (where fewer bootstraps were run with a different sample size).
 How does equation 5 link to the selection made on page 25? Inserting the definition of β, I cannot reproduce σ and γ.
>> We thank the reviewer for brining this inconsistancy to our attention. Firstly, there were missing minus signs in the exponents of equation 5 which have been added. Secondly the form that we used for beta is an approximate solution which that is exact only if sigma_max = 1. We chose sigma_max to be 0.999 and found in our tests that the approximation was sufficient for good results as beta is only used in the framework for slight loss reweighting.
 In several places (for example in the timing section), the authors discuss the 'speed of inference'. I think 'speed of generation' would be a better description for this. (In other setups, like normalizing flows, 'inference' and 'generation' refer to different things with different timings and that might confuse the reader here, since it is really about the time needed for generation.)
>> We are happy to make this change, and have done so throughout the publication
 The architecture is said to be permutation equivariant in section 4.3. However, figs. 1 and 2 as well as the text on p. 8 say that the point cloud is passed through a dense layer before entering the TEblocks. Is that a small dense layer for all jet constituents (like in a deep set)? Or is that a large dense layer taking in all constituents, therefore breaking the permutation equivariance?
>> Yes, the MLPs are shared weights and applied per constituent as with a deep set. This is also standard for transformers.
 What is e in the Mdimensional timeencoding vector νt?
>> Euler's number
 How Gaussian is the distribution in the penultimate layer of ParticleNet? Are the assumptions of FPND fulfilled?
>> We use the ParticleNet network and FPND calculation provided in the JetNet package and introduced in the MPGAN paper. As this is a common metric across several papers we have not reoptimised or trained it, and we refer to the MPGAN paper for more information and studies regarding its definition.
 How stable are the results in tables 1 and 2? How much do the values change when a new sample is generated from the same generator? How much do they change when the generator is retrained? Where do the errorbars for W1 come from? Please estimate the errors for the other metrics.
>> Uncertainties come from bootstrapping as provided in the jetnet package, we increase the number of samples used in order to gain higher precision but stay with their prescription in order to facilitate comparisons to other works using this dataset. FPND, Coverage, and MMD are provided without uncertainties from the package. We will explain origin of uncertainties which cover the generation of different samples. Batch size of 10k with 40 batches. We added the following for the text.
>> Following the procedure defined by Ref. [31] uncertainties for the Wasserstein based metrics are derived using bootstrap sampling, however we increase the number of bootstrapped batches from 5 to 40 to reduce the run to run variance. Metrics such as FPND, Cov, MMD do not use bootsrapping and we do not quote an uncertainty.
 I don't understand the sentence "Although the time required for a single pass through the network is similar between MPGAN and PCJeDi for a single jet, the benefits of the transformer architecture become apparent as the number of jets in a batch increases." Isn't the MPGAN as a GAN also able to generate batches?
>> We have clarified this in the manuscript. For a batch size of 1 both the MP GNN in MPGAN and the transformer in PCJeDi have similar computation times. However for a batch size of 1000, for a single network evaluation, the transformer is 2x as fast as the GNN architecture. This speed benefit is what we were referring to in terms of architecture with the next sentence then highlighting the drawback of diffusion requiring several network evaluations.
 What is the timing for the standard MC generation for the jets (as a reference for table 3).
>> 46.2ms according to MPGAN ref, added to table.
 For the RK solver in Appendix C2, are the number of NN passes and the number of integration steps the same, or do they differ by a factor of 4 because lines 36 of algorithm 6 each call the NN once? Does that mean that the number of integration steps in fig. 18 do not match between methods?
>> Exactly, for higher order methods additional network passes are required  RK4 has 4, EM has 2, and euler has 1. This is why number of network passes (rather than timesteps) are compared in fig 18 in order to select the solver  here we choose to compare performance relative to computation time, not integration steps, as the relevant comparison
 How many runs are the error bars of fig. 18 based on?
>> Here the errorbars come with the metric evaluation provided by the JetNet package. For the wasserstein distances bootstrapping is used to evaluate the uncertainty, though this is not provided with FPND, MMD or Coverage. For consistency with other publications we choose not to reimplement these measures and present them without uncertainty, as provided.
 As an optional suggestion, since it came out after this manuscript: The authors mention it could be worth looking at a metric that is more sensitive to tails on p. 13, would a classifierbased weight study as suggested in 2305.16774 be an option?
>> We are happy to refer to this publication as a potential set of metrics, and have mentioned classiferbased weights to the manuscript.
Current status:
Reports on this Submission
Strengths
Unchanged compared to first submission.
Weaknesses
The uncertainty on evaluation metrics is missing.
Report
I'm very happy to see that almost all my points were addressed to my satisfaction. My only remaining concern is about the presentation of the results in tables 1, 2, 4, and 5. I understand now that the errors quoted for the Wasserstein distance come from the bootstrapping of a noisy estimator, whereas coverage etc. are deterministic. This, however, does not address my initial question about the stability and significance of the results. To be precise: Table 1 suggests that the coverage of PCJeDi is 0.01 better than the coverage of MPGAN. Is that a significant difference, or within the expected noise? Please generate additional datasets with PCJeDi, look at the metrics for these datasets, and report the mean/std of a few (3, 5, or 10) runs. Also, I couldn't find how many jets the evaluations in section 5 were based on.
Requested changes
See above:
Please generate additional datasets with PCJeDi, look at the metrics for these datasets, and report the mean/std of a few (3, 5, or 10) runs.