Searching for new physics in large data sets needs a balance between two competing effects---signal identification vs background distortion. In this work, we perform a systematic study of both single variable and multivariate jet tagging methods that aim for this balance. The methods preserve the shape of the background distribution by either augmenting the training procedure or the data itself. Multiple quantitative metrics to compare the methods are considered, for tagging 2-, 3-, or 4-prong jets from the QCD background. This is the first study to show that the data augmentation techniques of Planing and PCA based scaling deliver similar performance as the augmented training techniques of Adversarial NN and uBoost, but are both easier to implement and computationally cheaper.
Cited by 3
Benjamin Nachman, A guide for deploying Deep Learning in LHC searches: How to achieve optimality and account for uncertainty
SciPost Phys. 8, 090 (2020) [Crossref]
Anders Andreassen et al., Simulation assisted likelihood-free anomaly detection
Phys. Rev. D 101, 095004 (2020) [Crossref]
J. A. Aguilar-Saavedra et al., Jet tagging made easy
Eur. Phys. J. C 80, 530 (2020) [Crossref]