SciPost Submission Page
jBOT: Semantic Jet Representation Clustering Emerges from Self-Distillation
by Ho Fung Tsoi, Dylan Rankin
Submission summary
| Authors (as registered SciPost users): | Ho Fung Tsoi |
| Submission information | |
|---|---|
| Preprint Link: | scipost_202601_00045v1 (pdf) |
| Code repository: | https://github.com/hftsoi/jbot |
| Date submitted: | Jan. 19, 2026, 10:57 p.m. |
| Submitted by: | Ho Fung Tsoi |
| Submitted to: | SciPost Physics |
| Ontological classification | |
|---|---|
| Academic field: | Physics |
| Specialties: |
|
| Approaches: | Experimental, Computational |
Abstract
Self-supervised learning is a powerful pre-training method for learning feature representations without labels, which often capture generic underlying semantics from the data and can later be fine-tuned for downstream tasks. In this work, we introduce jBOT, a pre-training method based on self-distillation for jet data from the CERN Large Hadron Collider, which combines local particle-level distillation with global jet-level distillation to learn jet representations that support downstream tasks such as anomaly detection and classification. We observe that pre-training on unlabeled jets leads to emergent semantic class clustering in the representation space. The clustering in the frozen embedding, when pre-trained on background jets only, enables anomaly detection via simple distance-based metrics, and the learned embedding can be fine-tuned for classification with improved performance compared to supervised models trained from scratch.
Author indications on fulfilling journal expectations
- Provide a novel and synergetic link between different research areas.
- Open a new pathway in an existing or a new research direction, with clear potential for multi-pronged follow-up work
- Detail a groundbreaking theoretical/experimental/computational discovery
- Present a breakthrough on a previously-identified and long-standing research stumbling block
