SciPost Submission Page
Scaling Laws in Jet Classification
by Joshua Batson, Yonatan Kahn
This is not the latest submitted version.
This Submission thread is now published as
Submission summary
Authors (as registered SciPost users): | Yonatan Frederick Kahn |
Submission information | |
---|---|
Preprint Link: | scipost_202412_00008v1 (pdf) |
Date submitted: | 2024-12-04 02:12 |
Submitted by: | Kahn, Yonatan Frederick |
Submitted to: | SciPost Physics |
Ontological classification | |
---|---|
Academic field: | Physics |
Specialties: |
|
Approaches: | Theoretical, Computational, Phenomenological |
Abstract
We demonstrate the emergence of scaling laws in the benchmark top versus QCD jet classification problem in collider physics. Six distinct physically-motivated classifiers exhibit power-law scaling of the binary cross-entropy test loss as a function of training set size, with distinct power law indices. This result highlights the importance of comparing classifiers as a function of dataset size rather than for a fixed training set, as the optimal classifier may change considerably as the dataset is scaled up. We speculate on the interpretation of our results in terms of previous models of scaling laws observed in natural language and image datasets.
Author indications on fulfilling journal expectations
- Provide a novel and synergetic link between different research areas.
- Open a new pathway in an existing or a new research direction, with clear potential for multi-pronged follow-up work
- Detail a groundbreaking theoretical/experimental/computational discovery
- Present a breakthrough on a previously-identified and long-standing research stumbling block
Current status:
Reports on this Submission
Strengths
The article discusses the occurrence of scaling laws in classification problems in particle physics. The numerical experiments show that different classifiers exhibit an approximate power-law scaling of test loss as a function of training set size with different power-law indices. This result emphasises the importance of comparing classifiers as a function of the size of the data set, rather than for a fixed training set, since the optimal classifier may change significantly as the data set increases.
The work is well presented and comprehensible also for non-particle physicists.
Weaknesses
1) The work is largely exploratory and is based on established methods and data sets. An explanation of the numerical results is missing in many cases - this is of course also due to the complexity of the research question, but reduces the relevance of the results.
2) The discussion of the data covariance matrix and the relevance of the corresponding eigenvalues (see Eq. (8) and Fig. 4) is difficult for non-experts to understand. It is based on A. Maloney, D. A. Roberts and J. Sully, A Solvable Model of Neural Scaling Laws (2022) (Ref [4]); I would appreciate it if the authors could expand this discussion a bit to make the paper self-contained.
Report
Although the paper is mostly exploratory and lacks an explanation of the numerical results, I find the results interesting and the main message, i.e. that classifiers should be compared as a function of the size of the data set, relevant. The paper does not meet the acceptance criteria for SciPost Physics, but I recommend the paper for publication in SciPost Physics Core.
Requested changes
1) Include a brief discussion of the relevance of the data covariance and associated eigenvalues.
Recommendation
Accept in alternative Journal (see Report)
Report #1 by Anonymous (Referee 1) on 2025-1-21 (Invited Report)
- Cite as: Anonymous, Report on arXiv:scipost_202412_00008v1, delivered 2025-01-21, doi: 10.21468/SciPost.Report.10518
Report
The article explores scaling laws in jet classification, focusing on the discrimination between top and QCD jets. The authors present a series of classifiers utilizing various architectures and input features, highlighting that all exhibit power-law scaling — a phenomenon widely observed in numerous machine learning applications.
The discussion on power scaling behaviour is compelling. The authors’ conclusion that meaningful comparisons between different classifiers require testing across various training set sizes is particularly relevant for collider physics. However, the methods for data preprocessing and classification draw heavily on established literature, which somewhat limits the article’s originality. Additionally, the interpretation of certain results, especially in Fig. 4, lacks clarity — an issue the authors themselves acknowledge — raising doubts about the generality of the findings.
In summary, I recommend the article for publication in SciPost Physics Core, provided the minor issues listed below are addressed.
Requested changes
- The significance of the spectrum of the data-data covariance matrix needs to be explained better. How is the spectrum related to the performance of the classifiers? In this context, also the meaning of i as x-label in the left panel of Fig. 4 should be clarified.
- on page 12: the statement "We note that including C!=0 ... in a much poorer fit." needs to be better explained. Is the "much poorer" referring to the fit for C=0 or in comparison to the fits in Fig. 5. Is there any explanation for the observed worse fits?
- Regarding Fig. 5: Would it be possible to run one point with an even larger training set size (e.g. 10^7) to test the prediction of the fit curves?
Recommendation
Ask for minor revision