SciPost logo

SciPost Submission Page

The Dark Machines Anomaly Score Challenge: Benchmark Data and Model Independent Event Classification for the Large Hadron Collider

by T. Aarrestad, M. van Beekveld, M. Bona, A. Boveia, S. Caron, J. Davies, A. De Simone, C. Doglioni, J. M. Duarte, A. Farbin, H. Gupta, L. Hendriks, L. Heinrich, J. Howarth, P. Jawahar, A. Jueid, J. Lastow, A. Leinweber, J. Mamuzic, E. Merényi, A. Morandini, P. Moskvitina, C. Nellist, J. Ngadiuba, B. Ostdiek, M. Pierini, B. Ravina, R. Ruiz de Austri, S. Sekmen, M. Touranakou, M. Vaškevičiūte, R. Vilalta, J. R. Vlimant, R. Verheyen, M. White, E. Wulff, E. Wallin, K. A. Wozniak, Z. Zhang

This is not the latest submitted version.

This Submission thread is now published as

Submission summary

Authors (as registered SciPost users): Polina Moskvitina · Bryan Ostdiek · Rob Verheyen · Melissa van Beekveld
Submission information
Preprint Link:  (pdf)
Date submitted: 2021-10-08 15:18
Submitted by: Ostdiek, Bryan
Submitted to: SciPost Physics
Ontological classification
Academic field: Physics
  • High-Energy Physics - Experiment
  • High-Energy Physics - Phenomenology
Approaches: Experimental, Phenomenological


We describe the outcome of a data challenge conducted as part of the Dark Machines Initiative and the Les Houches 2019 workshop on Physics at TeV colliders. The challenged aims at detecting signals of new physics at the LHC using unsupervised machine learning algorithms. First, we propose how an anomaly score could be implemented to define model-independent signal regions in LHC searches. We define and describe a large benchmark dataset, consisting of >1 Billion simulated LHC events corresponding to $10~\rm{fb}^{-1}$ of proton-proton collisions at a center-of-mass energy of 13 TeV. We then review a wide range of anomaly detection and density estimation algorithms, developed in the context of the data challenge, and we measure their performance in a set of realistic analysis environments. We draw a number of useful conclusions that will aid the development of unsupervised new physics searches during the third run of the LHC, and provide our benchmark dataset for future studies at Code to reproduce the analysis is provided at

Current status:
Has been resubmitted

Reports on this Submission

Anonymous Report 2 on 2021-11-19 (Invited Report)


Thanks for the revised document. I see that most of my comments have been addressed sufficiently. I have no further comments.

  • validity: -
  • significance: -
  • originality: -
  • clarity: -
  • formatting: -
  • grammar: -

Anonymous Report 1 on 2021-11-18 (Invited Report)

  • Cite as: Anonymous, Report on arXiv:2105.14027v2, delivered 2021-11-18, doi: 10.21468/SciPost.Report.3866


1- This paper describes different methods to search for new physics, in context of the data collected at the LHC, using a model independent approach.
2- The different methods and algorithms described and studied are based on novel machine learning techniques and a comparison of the effectiveness of the different approaches is presented.
3- The paper uses multiple methods to compare the different algorithms studied, providing an overview of how the different approaches perform under the separate criteria used for evaluation. This provides greater insight into the effectiveness of the different approaches studied.
4- The paper also provides scope for future development and participation for readers by making the code and datasets used available for further studies.


1- In this paper, the reasons why certain methods perform better than others are not explored in detail, neither are the reasons for different performances in different datasets explained - these are left for further study.


This paper is suitable for publication as it provides a detailed overview of novel methods based on machine learning algorithms to search for new physics in experimental data at the LHC using a model independent approach. A comparison of the effectiveness of the different algorithms is also provided along with information that would provide access to the code and the datasets used for further studies that might be useful for interested readers.

The authors have addressed the requested changes submitted in the previous round of review, some minor follow up is suggested in the requested changes to improve the understanding of readers that would not have access to the authors' reply.

Requested changes

1- The authors have mentioned in their response that - "As the Dark Machines groups, we are interested in exploring matters of dark matter with machine learning. With that in mind, all of the models explored had some component of dark matter escaping the detector. While these are certainly not all of the interesting BSM models that could be discovered, it is what led to the model selection." -> suggest to indicate this motivation of choice of BSM models in the paper explicitly.

2. While addressing the suggestion on including the size of the dataset, the numbers mentioned for Channel 1 in their reply ("size of the datasets are 214,000 SM events for Channel 1 ") and that stated in the text of the paper is different. The authors can resolve this minor ambiguity.

3. In the meanwhile between the submission of the first draft of the paper for review, an updated version of Reference 16 (CMS Collaboration, MUSiC ... ) has been published : Eur. Phys. J. C 81, 629 (2021); this reference can be updated.

  • validity: high
  • significance: high
  • originality: high
  • clarity: good
  • formatting: excellent
  • grammar: good

Author:  Bryan Ostdiek  on 2021-12-10  [id 2021]

(in reply to Report 1 on 2021-11-18)

Thank you for the final suggestions. We have fixed these in the new version.

Login to report or comment