## SciPost Submission Page

### Submission summary

 As Contributors: Nicolo De Groot Arxiv Link: https://arxiv.org/abs/1910.05334v2 Date submitted: 2019-10-22 Submitted by: De Groot, Nicolo Submitted to: SciPost Physics Discipline: Physics Subject area: High-Energy Physics - Experiment Approach: Experimental

### Abstract

Identification of charmonium states at hadron colliders has mostly been limited to leptonic decays of the J/{\Psi}. In this paper we present and algorithm to identify hadronic decays of charmonium states (J/{\Psi}, {\Psi}(2S), {\chi_{c0,1,2}}) which make up the large majority of all decays.

###### Current status:
Editor-in-charge assigned

### Submission & Refereeing History

Submission 1910.05334v2 on 22 October 2019

## Reports on this Submission

### Strengths

1- The paper demonstrates that J/Psi and other hadronic charmonia decays can be distinguished from quark and gluon jets under a number of assumptions, and quantifies the performance of an algorithm that does this. This can be useful input for phenomenological studies of potential analyses that consider these decays or serve as a template for such an algorithm to be implemented in an experiment.

2- While not without flaws as mentioned below, the writing is succinct and to the point and hence quite clear.

3- The algorithm itself seems reasonably solid.

### Weaknesses

1- Motivation too vague. While the general motivation is clearly intriguing, it is not clear which performance of the identification algorithm is required for it to be useful in an actual analysis of the suggested H-> J/Psi gamma search. Without at least a rough study and given the large photon + jet production cross section compared to the Higgs production cross section and, in addition, a branching fraction of 3 x 10e-6, it remains unclear if the results of the presented study will be of any practical use.

2- Doubts on the performance: While the performance seems somewhat solid, a few doubts remain under realistic circumstances (in particular concurrent pp interactions) and of the assumptions made e.g. on the pT spectra.

3- Complexity of hadronic J/Psi decays not discussed and potentially not addressed: It is mentioned that J/Psi decays are similar to tau decays, but this is only partly true. A deeper discussion is missing.

4- Presentation: There are a number of obvious typos and grammatical mistakes.

### Report

The presented study tries to distinguish hadronic charmonium decays from quark and gluon jets. An algorithm based on a number of higher-level input variables fed into a Deep Neural Network is discussed, similar to how tau lepton decays are identified in the ATLAS and CMS experiments. While a recent trend has been to include more low-level input variables to increase the performance of such taggers, the presented approach seems adequate for the purpose at hand given that the gain from using lower-level inputs is typically significant but far from an order of magnitude, and that similar approaches are in use by the ATLAS and CMS experiments and hence well understood. In the following, a list of general comments on the content is given. Comments on the language are included in the "Requested changes" section. It should be noted that, while the draft often reads smoothly, there is a potentially distractingly large number of grammatical mistakes and typographical errors.

Motivation:
While the motivation to increase the acceptance for H-> J/Psi gamma makes sense on paper, it is a priori not clear which reconstruction and identification efficiencies are needed to make the channels with hadronic J/Psi decays competitive with the clean dimuon and dielectron final states. It is doubtful that a factor 100 rejection of jets is sufficient to reduce the background to acceptable levels. A more thorough investigation of the H->J/Psi gamma search or at least a back-of-the-envelope calculation would strengthen the motivation for this paper. Alternatively, the motivation could be expanded to other processes, with a similar demonstration of the potential usefulness as requested for the H->J/Psi gamma decay.

Pileup:
With the search for H->J/Psi gamma in mind, which I assume is to be performed at the High-Luminosity LHC, the impact of pileup (concurrent pp interactions) is very relevant for the identification of hadronic charmonium decays, as it is for the reconstruction and identification of tau decays. Can an estimate of the possible impact of pileup be given?

J/Psi is not like a tau, since its decay most often does not involve hadronic resonances, whereas the tau either decays to a single charged particle and neutrino or via hadronic resonances in the vast majority of the overall decays (See PDG 2018 particle listing on J/Psi http://pdg.lbl.gov/2018/listings/rpp2018-list-J-psi-1S.pdf, or https://www.sciencedirect.com/science/article/pii/0370157389900744):
- Less than 2% of decays involve hadronic resonances
- Around 13% into stable hadrons
- To be contrasted to around 88% total decays to hadrons
- On the other hand, J/Psi does typically not have invisible decay products
- This should be elaborated on in more detail; one could even make use of the information from the preferred decay modes, as done for tau decays.

pT and eta spectra:
- It should be demonstrated explicitly that the pT and eta spectra for the signal charmonium decays and for quark and gluon jets (separately) are identical or sufficiently similar. Ideally, they should be made identical (Section 2 mentions "similar", but this is a too vague statement: How similar?)
- In addition to the eta distributions having to be identical, the upper cutoff should also be introduced.

Size of simulated event samples:
- Given that the tau taggers in ATLAS https://cds.cern.ch/record/2688062/files/ATL-PHYS-PUB-2019-033.pdf and CMS https://arxiv.org/abs/1809.02816 are trained with millions of simulated events, the 17 k events used for the training seem to be at the low end. An investigation of the effect of increasing the sample size would be useful to understand whether a potentially large gain in performance is still possible with the method at hand. The other parameters of the deep neural network and the optimisation procedure seem adequate.

### Requested changes

1- Motivate which performance of the tagger is required to make its inclusion in the H-> J/Psi gamma search useful, or alternatively find additional motivation.

2- Discuss and investigate the impact of concurrent pp interactions.

3- Elaborate on and possibly investigate in more detail the various hadronic J/Psi decays.

4- Make pT/eta spectra of signal and background identical or show that these are indeed sufficiently similar.

5- Address doubts on the size of the event samples being sufficient.

6- Improve figure 1 and discuss questions below in the text:
- Figure 1 caption, 2nd sentence: The J/Psi signal distributions are shown in blue, whereas the background from quark and gluon jets is shown in orange.
- Figure 1: It seems like the distributions are normalised to the same area, but it would be better to clarify in the caption.
- Figure 1: There seem to be two distributions of J/Psis in DeltaEta and DeltaPhi. It would be interesting to understand how they arise.
- Figure 1: Are there any structures in the jet mass that are exploited by the network that are invisible because of the coarse binning?
- Figure 1: What leads to the spikes in the n_ch distribution (in particular around 8 for quark and gluon jets)

7- Improve language, run spellchecker, and avoid grammatical mistakes, in particular the following list:
General
- Run spellchecker
- Various full stops omitted after citations (e.g. end of section 2)

Abstract:
- we present and algorithm -> we present an algorithm

Introduction

- uur ->our
- overal -> overall

Observables
- in the distinguishing -> in distinguishing
- a background samples -> a background sample

Machine Learning
- with and equal amount -> with an equal amount

Results
- curve.s -> curves
- as against -> than against

8- Citations: Cite Keras according to their preference https://keras.io/getting-started/faq/#how-should-i-cite-keras

• validity: ok
• significance: ok
• originality: good
• clarity: good
• formatting: reasonable
• grammar: acceptable

### Strengths

The idea of tagging charmonium hadronic decays are quite interesting experimentally and unique attempt in this area. If this works as author mentions, there might be a nice gain in some of the physics analysis. (e.g. $H\to c\bar{c}$)

### Weaknesses

I guess the performance of your tagger heavily depends on the $p_\text{T}$, and this argument is completely missing. Your study is based on the Z resonance, and thus performance is rather good, but there are many cases where people want to tag $J/\psi$ inside the B or D meson decays, where the performance heavily degrades (well, you are using the output of b-jet identification algorithm and thus will not be sensitive to inclusive B or D meson production, that have finite lifetime, but you can certainly remove this requirement).

### Report

First of all, please run the spell-checker ... there are really a lot of typos, incomplete sentences; referee is _not_ supposed to correct for your typo, but rather review the physics content. This is one of the minimum you should do before submitting the draft (as a manner). For typos, please see "Requested changes". Here, I mainly focus on the physics questions/comments.

Section2:
- please clarify what 3S1, 3PJ stands for
- Why gg2ccbar(3PJ)gm were not considered, like you did for $J/\psi$ and $\psi(2S)$?
- Why qg2ccbar(3S1)q, qq2ccbar(3S1)g were not considered, like you did for $\chi_c$?
- L7: "same sample" -> which sample? At this point, you haven't yet explained BG samples (you mention it in L8). Please rephrase. Also, How you get "gluon" jets from $Z^0\to q\bar{q}$ sample ?

Section-3:
- L1: "tau-like" is not really true, as it also includes leptonically decaying taus. Please use more better phrasing.
- L8: concerning to b-tag output. According to Table1, you seemed to use binary information, but which working point, how much the signal efficiency and fake rate?
- 2nd paragraph: Please add explanation about How $\kappa$ and $\beta$ is determined.
- Are these variables collinear safe?

section-4:
2nd paragraph: what do you mean by "gluon events"?

section-5:
- First paragraph: the notation "100x" is better avoided.
- L2: define "Auc" (Area under curve) as you are going to use it

### Requested changes

Please find below what I spotted, but there might be more. Please double-check again your whole sentences.

Abstract:
- L2: Here, you use non-italic for $J/\psi$, unlike you did in the text. I don't know the convention in abstract, but I feel more comfortable if you use the same font across whole paper. Please check.
- L2: and algorithm -> an algorithm

Section 1:
- page1, 3rd line from the bottom: uur -> our
- page1, last line: overal -> overall

Section 2:
- L4: extra space between qg2ccbar(3PJ)" and "q
- L4: extra space between qq2ccbar(3PJ)" and "g
- L5: "the process" does not have "minimum transverse momentum" -> rephrase it with more clear sentence
- L5: "the process" does not have "invariant mass" -> rephrase it with more clear sentence
- L2 from the bottom: $\Delta R$ is not defined.

Section 3:
- L2: [5]. -> missing period
- L4: you are missing ")" after $R_{track}$
- L6: you need to define "pt"
- L12: ... demonstrated to be efficient in "the" distinguishing -> remove "the"
- L15: use italic for "i"
- L16: I would not use short-hand "w.r.t"
- Figure 1 caption: parenthesis is not closed. Also, add unit. Please also add basic information such as the distribution is normalized to unity, and you are using 50% quark jets and 50% gluon jets ... The caption should be more self-explanatory.
- Table 1, $R_{em}$: $\Delta_{R}$ -> $\Delta R$. Same for $R_{track}$

Section 4:
- L6: as a measure "of" the separation power ...
- 2nd paragraph: with and equal -> with "an" equal ...
- Table 2: The caption requires more explanation. As outside readers (like me), none of the single item is understandable. You can also consider to put it to appendix.

Section 5:
- L2: curve.s -> curves.
- L5: extra space after "background rejection"
- 2nd paragraph: "is" given in Table3
- Table 3, caption: Overview "of the" training results. Please also add more explanation to the caption what mixed means.
- These variations cover "the a" range of ... -> please make it grammatically correct
- Table4: Please add more explanation to the caption. For example, what is UE, what is ISR, FSR (you don't define them in the text), what is Var 3a, 3b.,3c etc.

Acknowledgement:
I'm not sure if the abbreviation such as "NdG" and "SC" can be used in the paper: maybe the authors would like to ... ?

• validity: good
• significance: good
• originality: good
• clarity: good
• formatting: acceptable
• grammar: acceptable