SciPost Submission Page
MACK: Mismodeling Addressed with Contrastive Knowledge
by Liam Rankin Sheldon, Dylan Sheldon Rankin, Philip Harris
Submission summary
Authors (as registered SciPost users): | Dylan Rankin |
Submission information | |
---|---|
Preprint Link: | https://arxiv.org/abs/2410.13947v1 (pdf) |
Date submitted: | 2024-10-23 14:25 |
Submitted by: | Rankin, Dylan |
Submitted to: | SciPost Physics |
Ontological classification | |
---|---|
Academic field: | Physics |
Specialties: |
|
Approaches: | Computational, Phenomenological |
Abstract
The use of machine learning methods in high energy physics typically relies on large volumes of precise simulation for training. As machine learning models become more complex they can become increasingly sensitive to differences between this simulation and the real data collected by experiments. We present a generic methodology based on contrastive learning which is able to greatly mitigate this negative effect. Crucially, the method does not require prior knowledge of the specifics of the mismodeling. While we demonstrate the efficacy of this technique using the task of jet-tagging at the Large Hadron Collider, it is applicable to a wide array of different tasks both in and out of the field of high energy physics.
Author indications on fulfilling journal expectations
- Provide a novel and synergetic link between different research areas.
- Open a new pathway in an existing or a new research direction, with clear potential for multi-pronged follow-up work
- Detail a groundbreaking theoretical/experimental/computational discovery
- Present a breakthrough on a previously-identified and long-standing research stumbling block
Current status:
Reports on this Submission
Strengths
1- New methodology to reduce simulation bias
2- Realistic test to evaluate the performance
Weaknesses
1- Lack of clarity in the methodology: besides the figure, there's no mention of how the contrastive loss is used, the figure indicated that only background is used in the alternative samples for the featurizer, but the text seems to indicate that signal + background is used.
2- Lack of code for reproducibility or making the realistic sample available
Report
The authors introduce a new methodologie to tackle the issue of data and simulation differences when training classifiers. The core idea is to create pairs based on the EMD distance and add a contrastive loss term to align pairs in a joint embedding space. The idea is novel and merits publication but there are a few clarifications needed to make the paper more clear. See below the suggestions in the rested changes.
Requested changes
1- As mentioned in the weaknesses, the authors should clarify the motivation for the specific contrastive loss used. Since it's a major portion of thee work, the authors should at least write down the loss in the text and explain each term instead of simply giving a citation.
2- How is the alternate samples constructed? Is it only background as Fig. 1 suggests or is it a mix of both? I would expect that you want the alternate sample to follow as close as possible the training samples, i.e same expected composition and fraction of signal/background events to have a sensible EMD pairing, is this correct? If so, is this an issue with the current implementation?
3-Fig.2 The ROC curves are very hard to read and impossible if you are colorblind. The figure label helps but the legend needs improvement. Have 2 boxes, one that just says that testing on nominal is a full line and testing on alternative samples is a dashed line. In the other box, make it clear and possibly with other colors what do they mean: one line for MACK without augmentations, one for MACK with augmentations, one for trained on nominal aug., one for trained on alternative aug..
4- In the same figure, I don't understand why training on nominal and evaluating on nominal has such a low performance compared to MACK. Worse, the performance on both nominal and alternative samples seem to be equally bad. Where does the additional performance comes from? Maybe I'm reading the plot wrong, which reinforces my point that the labels aren't clear.
5- Table 1: would be great to add errors to that table to see the stability of MACK across different runs.
6- The extreme example is interesting, but also highly unrealistic. However, I would be curious to see what happens if you keep the training strategy on JetNet as you have, with the alternate samples coming from a mixture of other physics processes, but evaluated the classifier on the same physics processes used for training, but with some modification, either a different tune as you already did in the previous exercise, or different generator. The main reason I ask is because I would like to know what is the impact of the choice of "data" used when calculating the pairs and aligning the embedding. If my data have other processes, or if the fraction of signal and background is different, does MACK hurt or still helps?
Recommendation
Publish (meets expectations and criteria for this Journal)