SciPost logo

SciPost Submission Page

Generating particle physics Lagrangians with transformers

by Yong Sheng Koay, Rikard Enberg, Stefano Moretti, Eliel Camargo-Molina

Submission summary

Authors (as registered SciPost users): Rikard Enberg · Yong Sheng Koay
Submission information
Preprint Link: scipost_202507_00009v1  (pdf)
Code repository: https://huggingface.co/JoseEliel/BART-Lagrangian
Data repository: https://huggingface.co/datasets/JoseEliel/lagrangian_generation
Date submitted: July 2, 2025, 3:27 p.m.
Submitted by: Yong Sheng Koay
Submitted to: SciPost Physics
Ontological classification
Academic field: Physics
Specialties:
  • High-Energy Physics - Theory
Approach: Computational

Abstract

In physics, Lagrangians provide a systematic way to describe laws governing physical systems. In the context of particle physics, they encode the interactions and behavior of the fundamental building blocks of our universe. By treating Lagrangians as complex, rule-based constructs similar to linguistic expressions, we trained a transformer model --- proven to be effective in natural language tasks --- to predict the Lagrangian corresponding to a given list of particles. We report on the transformer's performance in constructing Lagrangians respecting the Standard Model $\mbox{SU}(3)\times \mbox{SU}(2)\times \mbox{U}(1)$ gauge symmetries. The resulting model is shown to achieve high accuracies (over 90\%) with Lagrangians up to six matter fields, with the capacity to generalize beyond the training distribution, albeit within architectural constraints. We show through an analysis of input embeddings that the model has internalized concepts such as group representations and conjugation operations as it learned to generate Lagrangians. We make the model and training datasets available to the community. An interactive demonstration can be found at: \url{https://huggingface.co/spaces/JoseEliel/generate-lagrangians}.

Author indications on fulfilling journal expectations

  • Provide a novel and synergetic link between different research areas.
  • Open a new pathway in an existing or a new research direction, with clear potential for multi-pronged follow-up work
  • Detail a groundbreaking theoretical/experimental/computational discovery
  • Present a breakthrough on a previously-identified and long-standing research stumbling block
Current status:
Awaiting resubmission

Reports on this Submission

Report #2 by Anonymous (Referee 2) on 2025-11-28 (Invited Report)

Strengths

This is a well-written paper with substantial content, including several side-studies based on the results from the main study. The authors come up with a sophisticated tokenization scheme for Lagrangians, and train a transformer to generate them. The presented research is of very high quality, the authors have covered more than enough of what could be expected to be in the scope of this topic for a single paper.

Weaknesses

  1. The authors say that this is not meant to replace current mathematical tools that can generate flawless Lagrangians, but that it should rather be seen as a first step on the way to a foundation model that can include not only theory but also experimental data. I find the discussion about this quite vague, and would have liked to see more concrete suggestions for how this would fit into such a framework. As it is, the reader is left with the impression that while this is quite an impressive study, there is no immediate plan for where to “plug it in” in a larger framework.

  2. This comment concerns what is in the main text and what is relegated to appendices. I think the textbook style chapter on Langrangians in particle physics is nice to have, but I would definitely suggest to put this in an appendix and have a much shorter section only summarizing the main points that are needed to understand the paper. In contrast, there are several things that are now in the appendices that could be moved to the main text. For example, I found Section 3.3 on the encoding of the contents of the Lagrangians quite confusing. I found it absolutely necessary to refer to Appendix C to understand how all of this worked. Some of Appendix C could thus be moved here, in particular I would suggest bringing in a representative example. I also think some of the discussion in Appendix E was interesting enough that it would deserve a place in the main text, in particular on how the fact that the invariance of Lagrangians based on the order of the input was not enforced in the model has effects on the performance in certain cases. The discussion on Yukawa interactions and the idea to fine-tune the trained model also fits in the main text.

Report

Yes, the article meets the Journal's criteria.

Requested changes

1) Fig 2: The plots have the same axes labels and the same legend. Suggest to give the subplots titles, so you can see directly at a glance what they show without having to read the caption. 2) Table 2: you have added a “conjugate of a singlet” for the fermion 3) I think you should explain what the ID tokens are a bit better around line 494. It is quite confusing. Perhaps it would be clearer if you spelled out what indices the ID tokens correspond to. Table 8 in appendix makes it much clearer, but it’s annoying to have to go to the appendix. 4) Lines 512 and 536: Something weird has happened to the formatting here 5) Are the object and contraction scores explicitly invariant wrt the order of the terms and the individual fields within the terms, as was discussed starting from line 554? 6) Is S_object not included in S_Lagrangian? Then what is its purpose? 7) Table 3: Is the uniform model trained on the uniform dataset, and similarly for sampled? Perhaps clarify this in the paragraph starting on line 579. 8) Are the histograms in fig 3 not normalized? In the right plot, it looks like the orange line contains fewer Lagrangians than the blue line, why is that? 9) Fig 6, right: please make the legend larger 10) Fig 7:please make the axis labels larger. It could also be good to add a line showing what “perfect” would be, so it’s easier for the eye to follow. 11) What do the blue arrows correspond to in Fig 9? In the gender and royalty example, the model has learned to associate royalty to the gender duality, but I don’t see such a correspondence in the conjugation example. Rather, there it is just single dualities. 12) Fig 10: the text in the legends needs to be much larger 13) Appendix C: why does table 6 come before tables 7-11, when it’s discussed after them? 14) Appendix C: I don’t understand the discussion of the double indices of table 10. Also, it’s quite confusing for the reader to have to scroll back and forth between the text and table 10, it’s better if the text is where the table is. I don’t see tables 8, 9 and 11 discussed in the text. 15) Appendix D: Table 13 is mentioned before table 12, which is a bit backwards.

Recommendation

Ask for minor revision

  • validity: top
  • significance: top
  • originality: top
  • clarity: high
  • formatting: perfect
  • grammar: perfect

Report #1 by Anonymous (Referee 1) on 2025-8-24 (Invited Report)

Strengths

I like the clear explanation of how LLM-concepts are applied to a particle physics theory problem.

In particular the out of domain-section is clearly written and explains the method well. Quantification of abstract similarity is particularly nicely carried out.

Weaknesses

1 - While interesting in its own right, I am wondering how, on the longer term, the Lagrangian-generating AI might be applied: To my understanding it generates EFT-Lagrangians, and would it be imaginable that subsequent processing determines for instance S-matrix elements and scattering amplitudes?

2 - Many intellectual achievements in the construction of the standard model of particle physics seem to be implicitly assumed in the proposed generative AI, or are beyond the scope of what the algorithm would be able to provide: Could you comment on this, in particular with these two points in mind?

2a - Many choices in the construction of Lagrangians are implicit: restriction to squares of first derivatives in the kinetic terms disallow derivative couplings or Ostrogradsky-unstable theories, explicitly constructed differences between left- and right-handed fermion doubles already codify P-violation. Can you please be more explicit in what implicit or hidden assumptions are at the basis of the Lagrange-construction?

2b - Similarly, adding more particles for instance with SU(3) symmetries would have them co-exist with possible interactions - but there would not be a larger symmetry group accommodating them, right?

3 - There are studies for automatic generation of Lagrange-functions from data through symbolic regression. Can you please add a comment on how the two are related? To my understanding, your methods looks for mathematical consistency, and tying in to my first remark, does not take direct input from experimental data?

4 - Similarly, it'd be good to emphasise that the gauge particle content and their group structure in this work is fixed, and the matter particle content is variable. On a related note - there is no concept of quark mixing, right?

5 - Would the generative AI find Lagrange-densities with V-A coupling, i.e. explicit parity violation?

Report

The paper demonstrates parallels between the construction of viable Lagrange densities for application in particle physics by generative AI and language models. I like how concepts like tokenization are applied to the construction of mathematical objects. I find the investigations presented interesting, as they demonstrate well what the authors had in mind.

The decision to relegate the technical parts to the appendices is understandable, but did not help me to access the paper. It was only after working through the appendix that things became apparent. I'd argue that the appendices are vital to the understanding of the paper.

Requested changes

Please take account of the points listed under "weakness" in the evaluation. Further points would be:

  1. I find it difficult to assess whether the number of Lagrange densities for instance in the training data set is large or not. Can you make a combinatorial argument what fraction of Lagrange densities is contained?

  2. I've been playing with the online tool for Lagrangian generation, which I find a great addition to the paper and a prime example of accessibility. Would it be imaginable that the coupling terms are non-polynomial?

  3. Could you please explain how you quantify "hallucination" and what measures to take against it?

  4. Would the boundary between well-constructed Lagrange-densities and faulty ones as a function of the number of fields (discussed in the OOD-section) shift towards more fields if more fields are contained in the training data? In other words, is there anything peculiar about 6 fields?

Recommendation

Ask for minor revision

  • validity: high
  • significance: high
  • originality: high
  • clarity: good
  • formatting: excellent
  • grammar: excellent

Login to report or comment