BROOD: Bilevel and Robust Optimization and Outlier Detection for Efficient Tuning of High-Energy Physics Event Generators

Wenjing Wang; Mohan Krishnamoorthy; Juliane Muller; Stephen Mrenna; Holger Schulz; Xiangyang Ju; Sven Leyffer; Zachary Marshall

SciPost Submission Page

BROOD: Bilevel and Robust Optimization and Outlier Detection for Efficient Tuning of High-Energy Physics Event Generators

by Wenjing Wang, Mohan Krishnamoorthy, Juliane Muller, Stephen Mrenna, Holger Schulz, Xiangyang Ju, Sven Leyffer, Zachary Marshall

This is not the latest submitted version.

This Submission thread is now published as

SciPost Phys. Core 5, 001 (2022)

Submission summary

Authors (as registered SciPost users):

Stephen Mrenna · Wenjing Wang

Submission information
Preprint Link:	scipost_202103_00005v2 (pdf)
Date submitted:	July 19, 2021, 6:46 a.m.
Submitted by:	Wang, Wenjing
Submitted to:	SciPost Physics

Ontological classification
Academic field:	Physics
Specialties:	High-Energy Physics - Experiment
Approaches:	Experimental, Computational

Abstract

The parameters in Monte Carlo (MC) event generators are tuned on experimental measurements by evaluating the goodness of fit between the data and the MC predictions. The relative importance of each measurement is adjusted manually in an often time-consuming, iterative process to meet different experimental needs. In this work, we introduce several optimization formulations and algorithms with new decision criteria for streamlining and automating this process. These algorithms are designed for two formulations: bilevel optimization and robust optimization. Both formulations are applied to the datasets used in the ATLAS A14 tune and to the dedicated hadronization datasets generated by the Sherpa generator, respectively. The corresponding tuned generator parameters are compared using three metrics. We compare the quality of our automatic tunes to the published ATLAS A14 tune. Moreover, we analyze the impact of a pre-processing step that excludes data that cannot be described by the physics models used in the MC event generators.

Current status:

Has been resubmitted

Reports on this Submission

Report #2 by Anonymous (Referee 5) on 2021-8-20 (Invited Report)

Cite as: Anonymous, Report on arXiv:scipost_202103_00005v2, delivered 2021-08-20, doi: 10.21468/SciPost.Report.3419

Strengths

Same as in previous review

Weaknesses

In this round of review, the authors made il clear that the improvement they obtain is practical (automatised workflow) but not conceptual. In this respect, my previous assessment on the interest of this paper should be reduced (still, the paper is interesting enough to meet publication requirements)

Report

I went through the new version of the manuscript and the provided answers to my comments. I am in general very satisfied with the effort made by the authors to improve the paper.

I still have a few follow up comments:

To comment iii) The authors insist that removing points is conservative. I don’t see it, sorry. Would be nice if they could show what they say. From my experience, removing outliers in a fit (particularly a chisq fit) has an effect on the fit (bias and uncertainty underestimate). I have the impression that what the authors say is true under the assumption that the model you use to fit the data is correct. But they comment at length that this is not the case (maybe because of the underlying assumptions, e.g., fixed-order perturbative calculations, soft-physics modeling, etc). So I am not very convinced of all this and I would like to see a more clear demonstration.

To comment e) I disagree. NP requires an alternative hypothesis. Not clear what that is, in your method. A quick google search for a paper that spells this out: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4347431/
Not a super relevant point. I would just avoid saying hypothesis test and would say null-hypothesis test or some such

To comment l) I acknowledge the fact that speed is not the main aspect. But I think the question stands. I find this choice quite odd, also because even if the time is not the main concern of the paper, it is the main concern of this section (see title of 4.9). At least some explanation of the reasoning behind this choice should be given

validity: -
significance: -
originality: -
clarity: -
formatting: -
grammar: -

Author: Wenjing Wang on 2021-09-01 [id 1722]

(in reply to Report 2 on 2021-08-20)

please find attached our response to your comments.

Attachment:

Response_to_Reviews_2.pdf

Report #1 by Anonymous (Referee 4) on 2021-7-24 (Invited Report)

Report

Thank you to the authors for their detailed responses to my comments and for integrating my suggestions into the manuscript. I am satisfied with the responses and I think the paper is now ready for publication.

validity: -
significance: -
originality: -
clarity: -
formatting: -
grammar: -

Author: Wenjing Wang on 2021-09-01 [id 1723]

(in reply to Report 1 on 2021-07-24)

Thank you for your positive review.

Attachment:

Response_to_Reviews_2_qKCzTu4.pdf

SciPost Submission Page

BROOD: Bilevel and Robust Optimization and Outlier Detection for Efficient Tuning of High-Energy Physics Event Generators

by Wenjing Wang, Mohan Krishnamoorthy, Juliane Muller, Stephen Mrenna, Holger Schulz, Xiangyang Ju, Sven Leyffer, Zachary Marshall

This is not the latest submitted version.

This Submission thread is now published as

Submission summary

Abstract

Current status:

Reports on this Submission

Report #2 by Anonymous (Referee 5) on 2021-8-20 (Invited Report)

Strengths

Weaknesses

Report

Author: Wenjing Wang on 2021-09-01 [id 1722]

Attachment:

Report #1 by Anonymous (Referee 4) on 2021-7-24 (Invited Report)

Report

Author: Wenjing Wang on 2021-09-01 [id 1723]

Attachment:

Login to report or comment