SciPost logo

SciPost Submission Page

jBOT: Semantic Jet Representation Clustering Emerges from Self-Distillation

by Ho Fung Tsoi, Dylan Rankin

Submission summary

Authors (as registered SciPost users): Ho Fung Tsoi
Submission information
Preprint Link: scipost_202601_00045v1  (pdf)
Code repository: https://github.com/hftsoi/jbot
Date submitted: Jan. 19, 2026, 10:57 p.m.
Submitted by: Ho Fung Tsoi
Submitted to: SciPost Physics
Ontological classification
Academic field: Physics
Specialties:
  • High-Energy Physics - Experiment
Approaches: Experimental, Computational

Abstract

Self-supervised learning is a powerful pre-training method for learning feature representations without labels, which often capture generic underlying semantics from the data and can later be fine-tuned for downstream tasks. In this work, we introduce jBOT, a pre-training method based on self-distillation for jet data from the CERN Large Hadron Collider, which combines local particle-level distillation with global jet-level distillation to learn jet representations that support downstream tasks such as anomaly detection and classification. We observe that pre-training on unlabeled jets leads to emergent semantic class clustering in the representation space. The clustering in the frozen embedding, when pre-trained on background jets only, enables anomaly detection via simple distance-based metrics, and the learned embedding can be fine-tuned for classification with improved performance compared to supervised models trained from scratch.

Author indications on fulfilling journal expectations

  • Provide a novel and synergetic link between different research areas.
  • Open a new pathway in an existing or a new research direction, with clear potential for multi-pronged follow-up work
  • Detail a groundbreaking theoretical/experimental/computational discovery
  • Present a breakthrough on a previously-identified and long-standing research stumbling block
Current status:
In refereeing

Login to report or comment