SciPost Submission Page
Semi-visible jets, energy-based models, and self-supervision
by Luigi Favaro, Michael Krämer, Tanmoy Modak, Tilman Plehn, Jan Rüschkamp
This is not the latest submitted version.
Submission summary
Authors (as registered SciPost users): | Luigi Favaro · Tilman Plehn |
Submission information | |
---|---|
Preprint Link: | scipost_202312_00024v2 (pdf) |
Code repository: | https://github.com/luigifvr/dark-clr |
Data repository: | https://zenodo.org/records/12801842 |
Date submitted: | 2024-10-23 11:12 |
Submitted by: | Favaro, Luigi |
Submitted to: | SciPost Physics |
Ontological classification | |
---|---|
Academic field: | Physics |
Specialties: |
|
Abstract
We present DarkCLR, a novel framework for detecting semi-visible jets at the LHC. DarkCLR uses a self-supervised contrastive-learning approach to create observables that are approximately invariant under relevant transformations. We use background-enhanced data to create a sensitive representation and evaluate the representations using a normalized autoencoder as a density estimator. Our results show a remarkable sensitivity for a wide range of semi-visible jets and are more robust than a supervised classifier trained on a specific signal.
Author indications on fulfilling journal expectations
- Provide a novel and synergetic link between different research areas.
- Open a new pathway in an existing or a new research direction, with clear potential for multi-pronged follow-up work
- Detail a groundbreaking theoretical/experimental/computational discovery
- Present a breakthrough on a previously-identified and long-standing research stumbling block
Author comments upon resubmission
In response to the main points raised by the referees, we have significantly improved the clarity of the paper. The main changes are a new schematic diagram of the network and a discussion of the transformations in each block. We have also added a summary of the strategy for the NAE in Section 4 specifying that the anomaly score is now approximately invariant under the positive transformations.
We found it difficult to study the bias of non-uniform decay to the Standard Model without a specific benchmark signal. We tested the trained models on mock datasets in which we dropped a large fraction of the constituents from a single reclustered sub-prong of the QCD dataset, and found that the network was sensitive to this. However, this test uses data already seen during training and should be complemented with a case signal model. We note that this is not a limitation of our framework. DarkCLR allows the implementation of additional augmentations, positive or anomalous, covering other modes in the missing energy distribution within the jet. For example, Ref.[36] used four anomalous augmentations and retained robustness over a wide range of signals. We have added a comment in the conclusions.
To improve the reproducibility and future developments, we published the dataset on Zenodo, DOI: 10.5281/zenodo.12801842.
List of changes
Report 1
1. See main reply
2. We observed with the linear classifier test that the representations before and after the head network have similar performance. However, we think that the behaviour of the final representation is easier to understand because it is directly related to the CLR loss, i.e. the learning invariants. We also observe that a cut-based analysis on the norm of the vector, a very simple scalar function, can detect outliers. We expect the vector information to improve the sensitivity. For example, this information is included in the autoencoder, which works on the full vector. We have modified the sentence to better reflect this idea and added a reference to the linear classifier test appendix.
3.
We have clarified this issue in the summary of the NAE in section 4.
4.
We have changed the labels to "DarkCLR" and "JetCLR". This is now consistent with the text, and we use the same signal efficiency threshold throughout the paper.
5.
We have corrected this typo.
6.
The numbers in parentheses are the one sigma deviations from the ensemble on trained networks. The other entries in the table are copied from previous papers where an error was not reported. This is now explicit in the text, and we have included error bars
on the classifier in Section 5.2
7.
We have removed the label for section 2.1
8.
This is indeed a reconstruction cut. We have moved the discussion of the cut to after the discussion of the clustering details.
9.
The rescaling is done to ensure that the selection cuts in the augmented dataset are the same as in the original data. If this were not the case, the network would have an easy variable to discriminate between the two samples, which is not consistent with the construction of powerful observables. We have added a comment to the description of the augmentation.
10.
This is a general notation for: "distributed according to".
11.
We have defined this acronym and a few others the first time they are used in the text.
Report 2
1.
We have included a figure showing the positive augmentations in Figure 1.
2.
See main reply
3&4.
See main reply
5.
We have defined this acronym and a few others the first time they are used in the text.
Current status:
Reports on this Submission
Strengths
1- New state-of-the-art technique to search for new signals producing semi-visible jets
2- Method robust against model parameters of the signal
3- Test the sensitivity of the method to hyperparameter choices
4- Method potentially insensitive to simulation choice
Weaknesses
1- Not clear enough yet
Report
The paper introduces an innovative method for tagging semi-visible jets.
The results are shown to be insensitive to variations in BSM parameters compared to supervised or unsupervised learning techniques.
However, some textual enhancements are necessary to improve the paper's accessibility and robustness.
Requested changes
1- Clarify what is a jet constituent in your context (likely a calorimeter tower?)
2- figure 1: top and bottom is swapped
3- About "The latter step ensures that the augmented jets fulfill the selection cuts applied in the generation process." You could also add that is avoid the pT (or the variables related to it) to become a discriminating variable
4- Equation 4: Why has Softmax the indice "j" ? In the following text, should not d_r and a_j be d_z and a_ij ?
5- About "Therefore, we expect the norm of the representation vector to be a discriminative scalar quantity and propose it as a CLR-based anomaly score that can show the effect of the DarkCLR pretraining", you did not answer my previous question. Thinking more about this, I think the reason why anomaly should have large norm is due to the additional penalty term that you explain in the following paragraph. In any case, the reason has to be made crystal clear, as it is an important ingredient of your result.
6- Equation 8: explaining the meaning "z~p_Z" and "p_Z" would help the reader to follow
7- "fat jet" expression is more and more deprecated in favor of large-R jet. In your case, the jet have not a large mass, and this point is not very important, so I suggest you simply use "jet".
8- table 1: it is confusing to have 2 variables with the same name ("d_z") . Please provide 2 different names, and propagate this to the text, that will help the reader
Recommendation
Ask for minor revision
Strengths
The paper describes a new tagging algorithm for identifying semi-visible jets
based on a contrastive learning representation.
1 - The algorithm presented is interesting and relevant, relying on minimal physical features of the signal.
2 - The background rejection is superior to supervised classifiers and more stable with respect to changes in the model parameters.
3 - The text is well written and the results presented strongly support the paper claims.
4 - The paper is clear and accessible to non-experts on current machine learning developments.
Report
The paper clarity has been improved, making it accessible to non-experts on the latest machine learning developments.
Requested changes
1- The top/bottom plots in figure 1 are switched with respect to their descriptions in the caption. Also, the last sentence of Sec.3.2 should read "Fig.1 (bottom) ....".
Recommendation
Publish (surpasses expectations and criteria for this Journal; among top 10%)