Exploring Jet Substructure in Semi-visible jets

Semi-visible jets arise in strongly interacting dark sectors, where parton evolution includes dark sector emissions, resulting in jets overlapping with missing transverse momentum. The implementation of semi-visible jets is done using the Pythia Hidden valley module to duplicate the QCD sector showering. In this work, several jet substructure observables have been examined to compare semi-visible jets and light quark/gluon jets. These comparisons were performed using different dark hadron fraction in the semi-visible jets (signal). The extreme scenarios where signal consists either of entirely dark hadrons or visible hadrons offers a chance to understand the effect of the specific dark shower model employed in these comparisons. We attempt to decouple the behaviour of jet-substructure observables due to inherent semi-visible jet properties, from model dependence owing to the existence of only one dark shower model as mentioned above.


Introduction
Searches for dark matter (DM) particles in colliders have remained unsuccessful so far [1]. Consequently in recent years, some focus has shifted to unusual final states, which are not covered by typical searches at the Large Hadron Collider (LHC). One such final state is termed as semi-visible jets (svj), where parton evolution includes dark sector emissions, resulting in jets interspersed with DM particles [2,3]. While searches for semi-visible jets are underway in the LHC experiments, the focus of this paper is to probe the viability of such searches in boosted topologies. Rather a e-mail: deepak.kar@cern.ch b e-mail: sukanya.sinha@cern.ch than making any specific assumptions about the crosssection of the processes potentially resulting in semivisible jets, the idea here is to examine if the jet substructure of semi-visible jets can be used to discriminate them from ordinary jets produced by light quarks or gluons. Therefore it is important to note that here we are not focusing on the event selection criteria to obtain a reasonable signal over background ratio in order to discriminate large-radius semi-visible jets. We focus on the more challenging scenario of t-channel production mode of semi-visible jets, where the absence of a resonance mass peak makes identifying the substructure difference more critical.
We start by briefly summarising the idea of semivisible jets in Sec. 2. Then comparisons between semivisible jets and ordinary jets are presented in Sec. 3, based on their substructure. The robustness of these differences and the underlying reasons are investigated in Sec. 4, before concluding in Sec. 5. For these studies, the Rivet [4] analysis toolkit was used, with the Fastjet package [5] for jet clustering.

Semi-visible jets
Semi-visible jets [6] are hypothetical reconstructed collider objects where the visible states in the shower are Standard Model (SM) hadrons. It is assumed in these scenarios that the strongly coupled hidden sector contains some families of dark quarks which bind into dark hadrons at energies lower than a dark-confinement scale Λ d . In scenarios where the mediator mass is less than the collision energy, simplistic descriptions of collider phenomenology can be constructed, keeping at bay the extra physics details of the theory, observed at energies greater than collider levels, and irrelevant at the LHC. In t-channel setups (Fig. 1), the mediator interacts with DM and one of the SM quarks, usually simplified models consider the scenario of fermionic DM particle which interacts with SM particles via a scalar mediator coupling only to right-handed quarks. A generic t-channel DM simplified model contains an extension of the SM by two additional fields: a DM candidate (χ) and a mediator (φ) which has itâĂŹs fundamental representation in SU (3) c and the dark gauge group considered in the theory. The dark quarks are the DM degrees of freedom at LHC energies, but at scales probed by direct detection (DD) experiments, the DM degrees of freedom are dark mesons and hence DD rates are highly suppressed for such models since they fall below the neutrino background [3]. Considering the approach of parameterising the BSM effects in an EFT expansion, an effective Lagrangian captures all possible interactions: where L i are constructed from Standard Model operators that obey the SU (3) C x SU (2) L x U (1) gauge symmetries, and the higher-dimensional Lagrangian terms representing effective (i.e. non-fundamental) couplings, are suppressed by powers of Λ. As the dark matter particles can appear in the final state of this model, a Dark Matter Effective Field Theory (DMEFT) approach is used, in which the dark matter (DM) is the only additional degree of freedom beyond the SM accessible by current experiments, and hence the interactions of the DM particle with SM particles are described by effective operators (of dimension-6 or higher) of the form: Here, the dark sector is a SU (2) D gauge theory with coupling α d = g 2 d /4π, containing two fermionic states χ a = χ 1,2 , and c ijab are O(1) couplings that encode the possible flavour structures, and (assuming minimal flavour-violation) heavy-flavour production channels dominate. The fermions act as dark quarks which interact strongly with coupling strength α d , similar to QCD. If the mediator is assumed to be unstable, they will decay to a particle which is charged under a new strong group, but is a singlet under all the SM groups. At the confinement scale Λ d when α d becomes nonperturbative, these dark quarks form bound states. [3,6].
The spectrum of these dark hadronic states have non-perturbative dependencies, and most of the details concerning the spectrum are inconsequential for collider observables. If dark mesons exist, their evolution and hadronization procedure are currently little constrained. They could decay promptly and result in a very SM QCD like jet structure, even though the original decaying particles are dark sector ones; they could behave as semi-visible jets; or they could behave as completely detector-stable had-rons, in which case the final state is just the missing transverse momentum. Apart from the last case, which is more like a conventional BSM MET signature, the modelling of these scenarios is an unexplored area.
In this paper, we consider the case where the final state consists of a jet interspersed with missing transverse momentum (usually referred to as MET) due to a mixture of stable, invisible dark hadrons (with decay time cτ > 10 mm) and visible hadrons from the unstable subset of dark hadrons that promptly decay back to SM particles. The model discussed in [3] uses a simplified parameterisation, where a direct mapping of the Lagrangian parameters to physical observables is not possible since some of the dark sector observables depend on non-perturbative physics. The three parameters of this model are: -Mass of the scalar bi-fundamental mediator, φ -Dark hadron mass, m D -Ratio of the rate of stable dark hadrons over the total rate of hadrons, r inv The third parameter in its intermediate regime leads to the appearance of semi-visible jets (Fig. 2).
The modelling of such unique final state signatures is done using the Hidden Valley (HV) [8] module of Pythia8 [9], which was designed in order to study a sector which is decoupled from the Standard Model (SM). The basic motivation was the simple idea that one could start with a large number of gauge groups in the high energy limit, but can break them down to fewer groups as the energy decreases, while maintaining the observed cosmological bound. The module tends to achieve a reasonably generic framework for studying BSM models, hence the normal time-like QCD and QED showering has been extended by the addition of the HV sector. HV being a light hidden sector, the associated particles may have masses as low as 10 GeV and the spectrum of the valley particles and  their dynamics depends on the valley gauge group G v , their spin and the number of particles contained in the theory, along with their group representations. There are 12 particles which are charged under both the SM and HV symmetry groups, with each particle coupling flavour-diagonally with the corresponding state in SM, but has a fundamental representation in the HV colour symmetry as well. The HV particles with no SM couplings are invisible and their presence can only be detected by observing the amount of missing transverse momentum present in a particular event. In case of the SU(N) symmetries, the gauge group remains unbroken leading to massless gauge bosons g v and there is confinement of partons. In this scenario, the HV quarks q v s andq v s can be obtained which can either decay back to SM or remain stable, depending upon the mixing of the states. If it is off-diagonal, flavour-charged, then the q v s can exist as stable and invisible states, whereas diagonal ones can decay back to the SM and contribute to formation of visible hadrons, leading to the formation of semi-visible jets.
Studies discussed in [7,10,11] have shown that the decay of dark hadrons also depends on the mediator to the visible sector. Two different dark quark flavours combine to form dark π + , π − , π 0 , and dark ρ + , ρ − , ρ 0 , where the dark ρ's are assumed to be produced thrice as much as pions. The dark ρ mesons tend to decay promptly via the decay channel ρ d → π d π d , except for the ρ 0 meson, which decays into SM particles due to portal interactions of the mediator coupling the SM sector to the dark sector. Hence, for the jet-substructure studies, the ρ + and ρ − mesons can be treated as intermediate dark states, which subsequently decay to the π + , π − , π 0 mesons, constituting the final dark states, and the ρ 0 meson contributes to the visible fraction of the semi-visible jet.

Analysis strategy
The signal samples, at √ s = 13 TeV are generated by using a t-channel simplified dark-matter model in Madgraph5 [12] matrix element (ME) generator, with xqcut = 100 1 and NNPDF2.3 LO PDF set [13], a mediator mass of 1500 GeV, and a dark-matter candidate mass of 10 GeV. Different r inv fractions result in somewhat different kinematics, so r inv values of 0.25, 0.50 and 0.75 are studied, as well as the values of 0 (no dark component) and 1 (fully dark jet) corresponding to the boundary conditions. Up to two extra jets were simulated and MLM matched [14] in order to have a reasonable cross-section and obtain a proper signal which does not get swamped under QCD background. The background sample, generic HardQCD processes, was generated in Pythia8 as well.
In this study, we are using large-radius jets, more specifically anki-k t [15] jets with R=1.0, trimmed (with R sub = 0.2 and f cut = 0.05) [16] in order to stay close to potential experimental analysis. The large-radius jets are required to have p T > 250 GeV. As stated in Sec. 2, the identifying signature of semi-visible jets is the alignment of the event missing transverse momentum along the direction of such a jet. Therefore we require the presence of at least one large-radius jet within ∆R < 1.0 of the missing transverse momentum direction, and that jet is tagged as a svj. Additionally, we require at least 200 GeV of missing transverse momentum, owing to the fact that an actual search using a missing transverse energy trigger will require that threshold.
It is however interesting to note that in a majority of events, the subleading jet in transverse momentum is tagged as the svj, as can be see from the distribution of ∆φ between leading and subleading jets with the missing transverse momentum direction in Fig. 3.
In Fig. 4, we show that the events with svj have high missing transverse momentum compared to the background jets, as expected, and also the p T distribution of the svj with the background jets. We pick the leading large-radius jet, without any requirement on missing transverse momentum as the background 1 Defines the minimum distance in the phase space, that is allowed between the extra partons  jet. Here we note that even though the svj is more often than not the sub-leading jet, we are mostly interested in differentiating svj from standard quark/gluoninitiated jets, so we can use the leading jet from the background without any loss of genarality. It was observed that using only quark or only gluon initiated background jets made no difference.

JSS observables
Many jet substructure observables have been designed over last decade or so [17,18], with different sensitivity to different signal jets. In recent works, the focus was on energy correlation observables [19], and discussed the non trivial theoretical uncertainties associated with jet substructure. In this study, we looked at a broad array of observables, Les Houches angularity (LHA) [20], splitting variables r g and z g [21,22], N-subjettiness ratios , τ 21 and τ 32 [23], and the ratios of energy correlation functions, C 2 , D 2 , ECF 2 , and ECF 3 [24]. However, we have seen C 2 , LHA and τ 21 , and τ 32 were enough to explain most of the features, so we focus on these observables particularly. In general D 2 and ECF 2 were fairly similar, but were less sen-  sitive as compared to C 2 , and ECF 3 , r g and z g were mostly insensitive to the effect we are probing.
In order to compare signal and background largeradius jets with similar kinematics, we look at two different jet p T ranges, 400-600 GeV and 800-1000 GeV, motivated by Fig. 4.

Results
Distributions of several jet substructure observables are compared between semi-visible and ordinary jets in Fig. 5. The results in p T range of 400-600 GeV are shown, but the results in the 800-1000 GeV range exhibit the same feature, albeit with a lack of statistics. The distributions are normalised to area, not to cross-section, as we are interested in probing the shape differences.
The overall interpretation is, semi-visible jets result in more multi-pronged substructure, as evidenced in higher values of C 2 and LHA. For τ 21 , and τ 32 , the lower values of signal indicate that those are more 2 and 3 pronged respectively, whereas the background is more single pronged. LHA, surprisingly does not show any difference when changing r inv . For τ , lower values seem closer to background, indicative of the the fact that lower dark hadron fraction is indeed more background like. The results here are shown without any theoretical systematic uncertainty. Based on the recent study [19], we can very conservatively assume a 30-40% flat uncertainty on these substructure variables. That would not make the general conclusions arrived at this article invalid, but for certain observables, like τ 21 for lower r inv values, the discrimination power would be degraded. Also, detector effects can degrade the performance as well, but a quick check using parametrised smearing [25] showed the results we obtain are robust.

Model dependence
Currently the only dark shower model that can be used to simulate semi-visible jets is the Pythia8 Hidden Valley module, as discussed in Sec. 2. So an obvious concern is, to what extent the differences seen between signal and background in the previous section is model-dependent. Due to the absence of another model, an unambiguous answer to this question is difficult to arrive at, but considering an extreme scenario of r inv = 0 might offer us some clues. Imposing this condition implies that our signal large-radius jets consist entirely of visible hadrons, and subsequently the behaviour is expected to be like background jets, with low missing transverse momenta, as seen in Fig. 4.
However, in this case, requirements on missing transverse momentum magnitude and direction does not really make sense for signal, so for these comparisons, a background-like event selection is employed, assuming leading large-radius jet is the svj.
If the substructure of the signal jets in this case resemble that of the background jets, then that would give us some confidence that the difference seen for non-zero r inv values, as seen before, are caused not only by the model specifications. Among the handles we have on the HV shower, the HiddenValley:alphaFSR and the HiddenValley:pTMin can be expected to be most consequential. We have found minimal dependence on the latter, but in Fig. 6, we see how the substructure variables change significant with the variation of HiddenValley:alphaFSR, where other intermediate values were also probed, but are not shown.
The takeaway message is that in signal jets, C 2 can be made to look similar to background jets for Hidden-Valley:alphaFSR= 0.1. The trend for LHA is not so clear, and it is clear that tau variables are potentially most sensitive to the HV model implementations, so they will be looked into more carefully as we go along.
While this HiddenValley:alphaFSR value is the closest to the QCD α F SR value used in generators, one must note that they cannot be treated at the same footing, as QCD coupling is run at 2-loops. However, based on these results, we will use this HiddenValley:alphaFSR value in the rest of the comparisons.   It is interesting to note that that a signal with r inv = 0 is not necessarily equivalent to the background.

Origin of the differences
An understanding of the observed behaviour of jet substructure observables in semi-visible jets is last piece of the puzzle. In order to investigate this, we asked three questions: 1. What is effect of initial state radiation (ISR) and extra radiation on jet substructure? 2. Does decay from intermediate to final dark hadrons affect the substructure? 3. How does grooming affect jet substructure in svj?
In order to answer these, we turn to the other extreme scenario of r inv = 1, which corresponds to the case where the signal jet consists entirely of dark hadrons. Evidently in this case the signal jet itself is ill-defined, but by considering the unphysical scenarios of using dark hadrons in jet clustering, we can try to disentangle several effects.
First, the dark hadrons can be used to form signal jets, along with visible hadrons or without visible hadrons. The extra ME jets and the ISR can be turned off in either case. In each case, the leading large-radius jet is taken, and unless otherwise mentioned, comparisons are performed in the p T range of 400-600 GeV. We look at the same observables as before in Fig. 7.
Clustering only dark hadron in jets is indicative of the shape an ideal semi-visible jet may result in. The more realistic scenario is of course clustering the visible hadrons. In r inv = 1 scenario considered here, the visible hadrons come almost exclusively from ME level extra jets and ISR. Looking at C 2 and τ observables, its clear that adding visible hadrons make the signal jets more multiprong. It is interesting to see how the visible hadrons coming from ISR and ME extra jets affect the substructure differently. Turning off the ISR affects C 2 more than τ observables, perhaps indicating the C 2 is more sensitive to the softer radiation. Additionally turning ME extra jets off has the opposite behaviour, it does not affect C 2 , but makes taus indicative of slightly more two/three pronged substructure. It also implies that ISR adds more activity to semi-visible jets compared to ME extra jets, making them slightly more multi-pronged. Turning off ME extra jets makes the svj produced with less p T , so that implies we are not comparing the same jets in these cases. Surprisingly LHA seem rather insensitive.
An interesting feature can be seen the bottom left τ 21 distribution of Fig. 7, where two peaks appear. This feature in enhanced for the higher p T range, and also appears for lower values of HiddenValley:alphaFSR as discussed in Sec. 4.1, which can be seen in Fig. 8. This is independent of adding SM hadrons, except when ME extra jets are turned off. This observation is consistent with the occurrence of this feature with higher p T , where jets can be more collimated and twopronged. The lower values of HiddenValley:alphaFSR similarly indicate less radiation.
Another sanity check is to examine if the decay from intermediate dark hadrons to the final dark hadrons considered above is responsible for creating or enhancing the substructure. We make the intermediate dark hadrons stable, and cluster them in jets, with and without visible hadrons. In Fig. 9, the comparison of those with the previous results show essentially no difference, except a slightly more flattish shape in lower values of tau for the current case. So it is safe to say the observed substructure is not due to HV decay structure.
The next test was how grooming affects the substructure of semi-visible jets, as grooming preferentially cuts out soft or wide angle radiation. We test the effect of trimming here.
In Fig. 10, we compare different configurations with and without trimming. Trimming in general moves τ 21 to the left, indicating a cleaner two pronged substructure. This is least pronounced for 'only dark hadron' case, slightly more when visible hadrons are also clustered, and most pronounced for no extra ME jets or ISR case. A comparison between the scenarios of no extra ME jet and no ISR indicates ISR gets more affected by trimming. The same conclusion could also have been reached at looking at C 2 , but the effect was less pronounced. Trimming did not affect the p T spectra of the signal jets.
Last but not the least, after exploring what effects are not responsible for the specific substructure of semivisible jets, we are ready to answer what is actually responsible. For finite r inv values, only the visible hadrons are clustered in jets in Sec. 3.3, and slightly different substructure were seen for different r inv values. Now, as in Sec. 4.2, if the final dark hadrons are also clustered in the jets, we should expect this difference to go away, as the different amount of missing hadrons in each case presumably was responsible for the difference. Indeed, in Fig. 11, we see the expected behaviour. For C 2 , the lines corresponding to the cases where dark hadrons are clustered are almost identical, and while they are not identical for τ 21 , they lie in between the two original lines. This indicate that the substructure becomes less two-pronged with visible and dark hadrons in them, and the absence of the dark hadrons create the two-pronged structure. These distributions were made with HiddenValley:alphaFSR = 1 to have the maximum possible dark radiation.

Conclusions
A comprehensive study of the substructure of semivisible jets has been performed. We demonstrated that a specific hidden valley parameter choice can reduce the model dependence on the Pythia HV module while comparing signal and background jets. The origin of the substructure in semi-visible jet is neither caused by the decay of intermediate dark hadrons, nor by extra ME jets, or ISR, although the latter two affect the substructure. The substructure is created by the interspersing of visible hadrons with dark hadrons. The substructure observables which are least affected by model dependence can be used in searches, and also as inputs to machine learning algorithms trying to identify semi-visible jet via anomaly detection [26]. It must be noted though, that the current model leads to a scenario where kinematic selections like that on missing transverse momentum or leading jet p T is far more effective in suppressing the background compared to the substructure observables considered here. However, it is not out of the question to have a scenario where kinematic selections still leaves roughly similar signal and background contribution, and then the potential discriminating power of these observables will be more important.