SciPost Submission Page
OASIS: Optimal Analysis-Specific Importance Sampling for event generation
by Konstantin T. Matchev, Prasanth Shyamsundar
This is not the current version.
|As Contributors:||Konstantin Matchev · Prasanth Shyamsundar|
|Arxiv Link:||https://arxiv.org/abs/2006.16972v1 (pdf)|
|Date submitted:||2020-07-01 02:00|
|Submitted by:||Shyamsundar, Prasanth|
|Submitted to:||SciPost Physics|
We propose a technique called Optimal Analysis-Specific Importance Sampling (OASIS) to reduce the number of simulated events required for a high-energy experimental analysis to reach a target sensitivity. We provide recipes to obtain the optimal sampling distributions which preferentially focus the event generation on the regions of phase space with high utility to the experimental analyses. OASIS leads to a conservation of resources at all stages of the Monte Carlo pipeline, including full-detector simulation, and is complementary to approaches which seek to speed-up the simulation pipeline.
Submission & Refereeing History
You are currently on this page
Reports on this Submission
Anonymous Report 2 on 2020-8-21 (Invited Report)
- Cite as: Anonymous, Report on arXiv:2006.16972v1, delivered 2020-08-21, doi: 10.21468/SciPost.Report.1930
1. The paper is timely as analyzing the expected large data sets at HL-LHC will require very large Monte Carlo (MC) samples. Optimizing the use of computational resources to generate such samples is important to maximize physics potential of HL-LHC.
2. The paper proposes an elegant idea to achieve this task, and discusses it in a clear, transparent, and mathematically rigorous way.
3. The paper describes in detail how this idea can be implemented in realistic experimental conditions, and as an example points to a concrete CMS analysis that can clearly benefit from it.
4. The idea is broadly applicable, not just in collider physics but in any area where comparison of experimental data and MC is employed.
1. A few questions regarding trade-offs inherent in the proposed method, and the range of analyses that can be benefit from it, have not yet been addressed. (See report for details.)
1. In the example considered in Sec. 2, it is shown that the proposed method improves the measurement uncertainty of the parameter $\theta$ possible for fixed $N_s$. The uncertainty is effectively computed with the prior that the data is correctly described by the function $f$. In general, theory/data comparison needs to address two questions - how well the data fits the underlying theoretical model, and what are the theory parameter(s) that fit the data best. It seems to me that the proposed technique may optimize the power to address the second question at the expense of reducing the power to address the first, and that the optimization would look different (and probably closer to the usual goal of $g(x)=f(x)$) if both questions are optimized together.
2. In applications to searches for BSM physics, the "optimal" region of phase space usually strongly depends on the parameters of the underlying BSM model. (For example, in a bump hunt, one would want to focus on the invariant masses around the mass of the hypothetical new particle.) The proposed method would require different optimization for each assumed set of BSM parameters, and since they are typically not known a priori, it is not clear that an overall reduction in MC sample sizes could be achieved. (An exception may be a model with very specific predictions, or if a signal already seen in one channel is being searched for in other channels.)
3. On p. 29, the authors state that "the event variable $v$ for a given value of parton-level event $x$ typically fall[s] within a small window of possibilities". However in some cases we may need to model tails of distributions which may contain significant contributions from rare events with "highly atypical" $x \mapsto v$ maps, e.g. large mis-measurements of MET etc. Would the proposed method still apply and be useful?
The authors should consider the questions listed above and add discussion of these points to the paper as they see fit.
Anonymous Report 1 on 2020-8-13 (Invited Report)
- Cite as: Anonymous, Report on arXiv:2006.16972v1, delivered 2020-08-13, doi: 10.21468/SciPost.Report.1914
1. covering both how the method can be applied at parton level and detector level analysis
2. paper well written and pedagogical
3. paper particularly relevant for the HL-HLC where such approach can be keys for the event-generation
1. weak conclusion
2. missing systematical uncertainty impact in the parton-level analysis
The paper focus on how someone can optimized the event generation for a given analysis in order to reduce as much as possible the statistical uncertainty impact on that given analysis. The main idea behind such method
The paper is well written and very pedagogical in the way they present the various concept used in the paper.
The paper also present two different type of analysis, the first one at parton level, which is particularly interesting for theorist and phenomenologist and the second one at reconstructed level which is particularly interesting for experimentalist. I would have been keen to recommend publication for each of the them independently.
The main drawback of the paper is that the search of optimal event generation for a given analysis might not lead to an optimal event set for another analysis. While this might not be a real concern for theorist/phenomenologist, this can be a huge issue for the experimental community where sample of events are primary produced centrally and the same events sample used by many group and an even higher number of analysis. This being said this is nothing that the authors can do about it, it just reduce the significance of their work to my point of view (and the issue is briefly mentioned in the paper already).
A final minor criticism is about the conclusion of the paper. My personal feeling is that the conclusion feels a bit too much like an introduction with the comparison of more basic method (for example biasing, slicing,...). Those comparison are valid but should be part of the introduction rather than the conclusion of the paper. The conclusion of the paper would also gain to recap in bit more in details how the optimality is defined/reached and some of the practical implication (like the fact that even for an infinite budget the optimal distribution is not the unweighted one).
As a final suggestion to the author, I believe that the paper can be significantly be improve by adding at the partonic level the effect of theoretical uncertainty. The authors include such type of effects in the detector level analysis while such effect are also important for the partonic case and seeing how figure 6 is impacted would be a nice plot to look at (even if I can guess what the plot will look likes in advance). But I think this can be left to the author to decide to include this or not.
Finally, I want to point to the author, a minor typo in the paper on page 23 -last line- the word "the" is repeated.
My final recommendation for this paper is a minor revision, such that the authors can at minima improve the conclusion and consider the other points that i have raised. After that small change, my recommendation will change to
"publish --top 10%--"
1. rewrite the conclusion
2. consider to add description of the impact of theoretical uncertainty at parton-level (optional)