# Using Machine Learning to disentangle LHC signatures of Dark Matter candidates

### Submission summary

 As Contributors: Charanjit Kaur Khosa · Veronica Sanz Arxiv Link: https://arxiv.org/abs/1910.06058v3 (pdf) Date submitted: 2021-04-14 10:29 Submitted by: Khosa, Charanjit Kaur Submitted to: SciPost Physics Academic field: Physics Specialties: High-Energy Physics - Theory High-Energy Physics - Phenomenology Approaches: Computational, Phenomenological

### Abstract

We study the prospects of characterising Dark Matter at colliders using Machine Learning (ML) techniques. We focus on the monojet and missing transverse energy (MET) channel and propose a set of benchmark models for the study: a typical WIMP Dark Matter candidate in the form of a SUSY neutralino, a pseudo-Goldstone impostor in the shape of an Axion-Like Particle, and a light Dark Matter impostor whose interactions are mediated by a heavy particle. All these benchmarks are tensioned against each other, and against the main SM background ($Z$+jets). Our analysis uses both the leading-order kinematic features as well as the information of an additional hard jet. We explore different representations of the data, from a simple event data sample with values of kinematic variables fed into a Logistic Regression algorithm or a Fully Connected Neural Network, to a transformation of the data into images related to probability distributions, fed to Deep and Convolutional Neural Networks. We also study the robustness of our method against including detector effects, dropping kinematic variables, or changing the number of events per image. In the case of signals with more combinatorial possibilities (events with more than one hard jet), the most crucial data features are selected by performing a Principal Component Analysis. We compare the performance of all these methods, and find that using the 2D images of the combined information of multiple events significantly improves the discrimination performance.

###### Current status:
Editor-in-charge assigned

### Submission & Refereeing History

Resubmission 1910.06058v3 on 14 April 2021
Resubmission 1910.06058v2 on 29 January 2021
Submission scipost_202009_00017v1 on 21 September 2020

## Reports on this Submission

### Report

Thank you for clarifying. I am now quite confused. The r = 2 classifier has different information than the r = 1 classifier and yet the r = 2 classifier is a suboptimal use of the information. If you use the r = 1 classifier, but apply it to 2 events at a time, my guess is that its accuracy will be much better than the r = 2 classifier performance. In fact, as you consider n -> \infty events at a time, the accuracy should converge to 100% as long as the r = 1 classifier is not random. This makes me wonder how you made Fig. 15. You reference [33] which says "We have then derived the signal significance for each signal region from the number of signal events (S), after having imposed the cuts above, and the number of background events (B)". However, in your case, you have a classifier that only operates at the level of 20 events at a time. Can you please clarify? Also, do you have a baseline for Fig. 15? Are these results better than a baseline?

I appreciate that you would like to make minimal changes to the paper at this point and I would support this. Perhaps you could add a couple of sentences to further clarify the above points?

• validity: -
• significance: -
• originality: -
• clarity: -
• formatting: -
• grammar: -