SciPost Submission Page
Using Machine Learning to disentangle LHC signatures of Dark Matter candidates
by C. K. Khosa, V. Sanz, M. Soughton
|As Contributors:||Charanjit Kaur Khosa · Veronica Sanz|
|Arxiv Link:||https://arxiv.org/abs/1910.06058v3 (pdf)|
|Date submitted:||2021-04-14 10:29|
|Submitted by:||Khosa, Charanjit Kaur|
|Submitted to:||SciPost Physics|
We study the prospects of characterising Dark Matter at colliders using Machine Learning (ML) techniques. We focus on the monojet and missing transverse energy (MET) channel and propose a set of benchmark models for the study: a typical WIMP Dark Matter candidate in the form of a SUSY neutralino, a pseudo-Goldstone impostor in the shape of an Axion-Like Particle, and a light Dark Matter impostor whose interactions are mediated by a heavy particle. All these benchmarks are tensioned against each other, and against the main SM background ($Z$+jets). Our analysis uses both the leading-order kinematic features as well as the information of an additional hard jet. We explore different representations of the data, from a simple event data sample with values of kinematic variables fed into a Logistic Regression algorithm or a Fully Connected Neural Network, to a transformation of the data into images related to probability distributions, fed to Deep and Convolutional Neural Networks. We also study the robustness of our method against including detector effects, dropping kinematic variables, or changing the number of events per image. In the case of signals with more combinatorial possibilities (events with more than one hard jet), the most crucial data features are selected by performing a Principal Component Analysis. We compare the performance of all these methods, and find that using the 2D images of the combined information of multiple events significantly improves the discrimination performance.
Submission & Refereeing History
You are currently on this page
Reports on this Submission
Anonymous Report 1 on 2021-4-28 Invited Report
Thank you for clarifying. I am now quite confused. The r = 2 classifier has different information than the r = 1 classifier and yet the r = 2 classifier is a suboptimal use of the information. If you use the r = 1 classifier, but apply it to 2 events at a time, my guess is that its accuracy will be much better than the r = 2 classifier performance. In fact, as you consider n -> \infty events at a time, the accuracy should converge to 100% as long as the r = 1 classifier is not random. This makes me wonder how you made Fig. 15. You reference  which says "We have then derived the signal significance for each signal region from the number of signal events (S), after having imposed the cuts above, and the number of background events (B)". However, in your case, you have a classifier that only operates at the level of 20 events at a time. Can you please clarify? Also, do you have a baseline for Fig. 15? Are these results better than a baseline?
I appreciate that you would like to make minimal changes to the paper at this point and I would support this. Perhaps you could add a couple of sentences to further clarify the above points?