SciPost Submission Page
Instanton Density Operator in Lattice QCD from Higher Category Theory
by Jing-Yuan Chen
This is not the latest submitted version.
Submission summary
Authors (as registered SciPost users): | Jing-Yuan Chen |
Submission information | |
---|---|
Preprint Link: | scipost_202407_00023v1 (pdf) |
Date submitted: | 2024-07-14 16:02 |
Submitted by: | Chen, Jing-Yuan |
Submitted to: | SciPost Physics |
Ontological classification | |
---|---|
Academic field: | Physics |
Specialties: |
|
Approach: | Theoretical |
Abstract
A natural definition for instanton density operator in lattice QCD has been long desired. We show this problem is, and has to be, resolved by higher category theory. The problem is resolved by refining at a conceptual level the Yang-Mills theory on lattice, in order to recover the homotopy information in the continuum, which would have been lost if we put the theory on lattice in the traditional way. The refinement needed is a generalization---through the lens of higher category theory---of the familiar process of Villainization that captures winding in lattice XY model and Dirac quantization in lattice Maxwell theory. The apparent difference is that Villainization is in the end described by principal bundles, hence familiar, but more general topological operators can only be captured on the lattice by more flexible structures beyond the usual group theory and fibre bundles, hence the language of categories becomes natural and necessary. The key structure we need for our particular problem is called multiplicative bundle gerbe, based upon which we can construct suitable structures to naturally define the 2d Wess-Zumino-Witten term, 3d skyrmion density operator and 4d hedgehog defect for lattice $S^3$ (pion vacua) non-linear sigma model, and the 3d Chern-Simons term, 4d instanton density operator and 5d Yang monopole defect for lattice $SU(N)$ Yang-Mills theory. In a broader perspective, higher category theory enables us to rethink more systematically the relation between continuum quantum field theory and lattice quantum field theory. We sketch a proposal towards a general machinery that constructs the suitably refined lattice degrees of freedom for a given non-linear sigma model or gauge theory in the continuum, realizing the desired topological operators on the lattice.
Author indications on fulfilling journal expectations
- Provide a novel and synergetic link between different research areas.
- Open a new pathway in an existing or a new research direction, with clear potential for multi-pronged follow-up work
- Detail a groundbreaking theoretical/experimental/computational discovery
- Present a breakthrough on a previously-identified and long-standing research stumbling block
Current status:
Reports on this Submission
Strengths
1. Well-written motivation.
2. Detailed review of the status of the problem.
3. Novel math tools.
Weaknesses
1. Lack of clear statement of results.
2. The validity of the proposal is not tested.
Report
A well-known problem is that the most popular lattice discretization of gauge theories and sigma-models do not respect the topology of these models. In the continuum the space of classical configurations has several topologically distinct components, on the lattice the space of configurations is connected. This disagreement is supposed to be resolved only after the continuum limit is taken. It would be desirable to have a lattice discretization which correctly reflects the expected topological classification before taking such a limit. For some models such discretizations exist, but the most interesting case (non-abelian Yang-Mills in 4d) remains open. The paper under review proposes a lattice discretization of 4d Yang-Mills with a microscopically defined instanton number, as well as other related models (sigma-model with target S^3). The proposal is quite complicated and is not clear to the reviewer whether it "works". I am not even sure what this would mean. The only way to test the proposal, as far as I can see, is to perform numerical simulations of the model and verify that it behaves qualitatively the same as the usual discretization of 4d Yang-Mills. Section 5 provides a sort of mathematical motivation for the proposal, but it is really hard for me to see the connection between these abstract considerations and the concrete proposal in Section 4. I would also add that the proposal itself is spread over many pages. It would be a good idea to formulate it more concisely.
Requested changes
1. Formulate the proposed discretization of 4d Yang-Mills clearly and concisely.
2. If possible, provide some tests of the proposal.
3. The mathematical part of the paper does not really help to understand the proposal. While it informs the author's thinking, it does not make the paper more convincing and just adds to its length. I suggest dropping it altogether.
Recommendation
Ask for major revision
Strengths
1- intriguing motivation
2- detailed motivation and review of background
3- potential high impact, potential high novelty
4- potentially relevant sophisticated mathematical tools identified correctly
Weaknesses
1- key definitions remain hard to parse
2-consistency checks of key definitions seem lacking
3-relation between key sections 4 and 5 remains vague
Report
+++ General statement +++
The author's motivation and broad proposal is in intriguing, and success of this ambitious approach would have high impact. From the introductory sections, I was highly motivated to see the proposed solution.
However, the main section 4 still is largely motivational in style and the key fine-print of the proposed definitions remains unclear to me after spending some time mulling over it.
I would urge the author to restate the key definitions (64) and (71) in something closer to math style, where every ingredient is explicitly declared. This concerns particularly the arguments of the key term $W_2(..)$ and $W_3(...)$, respectively. Define them, concretely.
Morever, consistency checks are missing: After introducing all the new fields, there should be an argument that the resulting lattice models are indeed locally still suitably equivalent to the ordinary ones which they are meant to enhance. This seems far from obvious and needs an argument. (Note that a broad appeal to Elitzur's theorem advertized in the introduction has little bearing on this, as it does not concern the higher gauge fields nor the various constraints introduced by the authors)
Section 5 is a review of sophisticated mathematics that the authors plausibly argues to be necessary for coming up with these definitions, and it may serve a purpose as exposing some of this math to a math-remote pure physics community -- but its actual relation to the proposal in section 4 is left quite unclear.
It may be worth going for a much shorter article which just defines the proposed lattice models concretely. If the category/homotopy theory helps with establishing their intended behaviour then add that proof explicitly, but if the category/homotopy theory is just a vague motivation for the author, then leave it away and instead focus on tangible results.
+++ List of comments +++
Here is a list of random comments going linearly through the document.
general comment:
The text speaks throughout about "higher category theory", but what it really uses is just "higher groupoid theory" also known as (simplicial-)homotopy theory. This conflation is common in the literature, but it may be worthwhile to beware of it.In simplicial homotopy theory, the notion of "anafunctors" is much further developed, known as maps out of cofibration resolutions.
p. 4 "theoretical appeal[s]"
p. 4: appeal to Elitzur's theorem:
However, Elitzur's theorem applies to ordinary gauge transformation, while the lattice models that the author is about to introduce have (also) higher gauge symmetries. It is maybe not a priori clear that the analog of Elitzur's theorem will still apply to these.
p. 9 "only [a] sketched"
p. 9 "that we care [about]"
p. 12 "in the below"
either: "below" or "in the following"
p. 12 "we do not say ... is constant"
the mathematical term is: it is *locally constant*
p. 13:
Figure below (7): It is hard to see from the figure on the right what is being illustrated.
p. 21 equation (18):
probably "f_c" and "s_c" should be "f_p" and "s_p"
p. 39: "at the 2d boundary between two patches"
What must be meant instead is "the is 2d overlap of two patches"
p. 39, 40: "Stoke's " must be "Stokes' "
p. 41 Section 4.1
Just to note that only now, over 40 pages into the text, does the first promised definition begin to slowly emerge. This lack of conciseness not only puts a burden on the reader at this point but may also not be helpful for the author's own thought development.
At some point the motivational commentarey must be set aside and an actual, concrete detailed definition must be written down and checked to make sense.
p. 41 "To understand why..."
To understand this one may simply and immediately observe that the fiber over +1 or -1 is a singleton but the fiber over any other point is a set of two elements.
p. 42 "What this extra S^2 does... will be explained later."
Why not just say where it will be explained.
p. 42: "Note that while m_l is a two-valued label it by no means for a Z_2 group"
The technical term for this is that m_l is an element of a *torsor* over Z_2.
And the words "by no means" are misleading: A simple means makes any torsor a group: namely the choice of any one element.
p. 42: "each patch of Y now has an open boundary"
Strictly speaking, since the components of Y are open balls (as opposed to closed balls), they do not have any boundary in the technical sense, much less an open boundary.
p. 43 "the contributions from all other continuum paths"
This does not seem to be true: There are paths that stay neither entirely in the patch SU(2)\{+1} not entirely in the patch SU(2)\{-1}.
This is the beginning of me feeling increasingly unsure about the definition that is incrementally being sketched here.
p. 43 "pick a representative path"
It is unclear why to pick any representative paths at all.
p. 46 "it is easy to picture the following desired properties for mu"
I find the logic now hard to follow. Easier than incrementally motivating a definition would be to just state it and then discuss that it satisfies desired properties.
It looks like a definition of the argument of W_2 in (59) is being indicated. And this is indeed needed to make of the key claim (64) to follow. But what exacty the definition of (59) is I am not sure from reading the text. This ought to be clarified.
p. 50 equation (64)
This seems to be the statement of the first main new proposal of the article -- it might want to be highlighted as such.
I am left wondering to know that definition (64) is consistent. Apart from it reproducing the intended topological charges -- which the author has motivated but maybe not proven -- one also needs to check backwards-compatibility, namely that all the new degrees of freedom added (notably the hat-n_l) do not locally change the intended dynamics of the sigma model.
This may be clear to the author, but it is far from clear to this referee at this point.
p. 54 equation (71)
This seems to be the statement of the second main new proposal of the article -- it might want to be highlighted as such.
The same comments apply as to the analogous statement (64) on p. 50 above, only more urgently so, since the actual definition of the term W_3(...) now is even less clear to me than the previous W_2(term).
Also, reading ahead I get the impression that discussion around (126) on p. 106 is meant to be relevant here in giving this definition of (64). If so, this ought to be said. If not, it needs to be said what else (126) is about.
p. 52: "Recall in the case... patches were chosen to be invariant under conjugation..."
Better to give an explicit equation number from which the reader can specifically recall this, since it is not easy to find.
p. 53:
At this point I'd like to see a concise definition of the construction. The discussion of the ingredients has been spread out over many pages -- which may be good for motivation --- but it makes it hard to know at a glance what all the symbols mean.
Maybe one could point ahead for such a definition, to around (126)
p. 53, equation (71):
The first integral sign seems to be lacking its "dg"
p. 71, item 2:This is maybe the first point that the notation "BEG" appears, which is used again many pages later (pp. 106) without (re-definition)
Clarification is needed for what is meant, and also for this choice of notation.
Note that usually, "EG" denotes the universal G-bundle, or else the simplicial complex
(WG)_n = G_n x G_{n-1} x ... x G_0
This happens to be a simplicial group (arXiv:1204.4886) hence has a further delooping via a bi-simplicial construction, which would deserve to be called B E G.But this does not appear to be what the author is after here.
For one, EG has contractible homotopy type, and hence so does its delooping, which would make it unsuitable for the author's purpose.
p. 106, equation (126):
Best to (re-)state the definition of "Y"
an open cover of G?
a more general surjective submersion of G?
subject to which conditions?
This is crucially important now to make sense of the discussion, and leaving it unclear casts doubt on the whole edifice.
Requested changes
1- state precisely the actual definitions of the lattice models
2-add consistency checks that these definitions are indeed backwards compatible with the ordinary ones.
3-clarify the claimed relation of the lattice models to homotopy theory and gerbes
Recommendation
Ask for major revision
Author: Jing-Yuan Chen on 2025-02-18 [id 5228]
(in reply to Report 1 on 2024-08-28)
I would like to thank Referee 2 for their recognition of the originality and the potential impact of this work. Regarding their major concerns of weakness:
1 & 2: In the present paper, the construction of the new models, especially that for the refined Yang-Mills theory in Section 4.2, was indeed only a sketch. In the original version, I said more details would be contained in a follow-up paper. The follow-up paper is now completed as arXiv:2411.07195, which will be cited in the updated version of the current paper. The follow-up paper is self-contained for an audience who only care about the model and its physical intuition but not the overall big picture and mathematics, in complimentary to the present paper.
The reason why I decided to present the details in a separate paper is that the detailed construction itself is highly technical and takes tens of pages to explain. On the other hand, this is just one particular construction---there is no single "canonical construction", only the principles are crucial. My prospect is that in the future, numerics will help us find better optimized constructions, different from the one being described in 2411.07195, but still following the same principles. So I decided that the present paper should focus on the principles, while the highly technical details of one particular construction is left to a separate paper 2411.07195.
3 & 2: I will update the version so that at the beginning of Section 5, the relation to Section 4 is more clearly stated: The category theory description is not only motivational; more crucially, we kept saying our motivation is to ``capture the $\pi_3$ physics into the refined model", and what that really means can be given a precise mathematical meaning. Having such a precise mathematical principle for ``refining a lattice model to capture the topology", I believe, is very important for the development of the field. In the updated version, I will split the original Section 5.4 into 5.4 and 5.5, and in the new Section 5.5 state a more exact mathematical principle for what this ``refinement that captures topology onto lattice" means.
Before further replying to the detailed comments, I would also like to respond to the general comment that the writing of this paper is in a largely motivational style. This is indeed true and intentional. A relavant reality in the community of theoretical physics is, while some groups of people are excited about category theory, some other groups are skeptical about its usefulness in the so-called "real physics" (in the sense of describing the real physical world). Since this work is to solve a physics problem using category theory in a necessary manner, I believe it is important to convince a broad audience that the appearance of category theory here is natural, not being anything fancy or artificial. More particularly, there are ideas that used to be familiar to different groups: lattice QCD, condensed matter, TQFT and homotopy theory, and the motivational style serves the purpose of illustrating that if one digs deeper into these existing ideas, they really do come to a confluence in terms of category theory, leading to a natural and useful picture. I hope this motivational style can help with bringing in more consensuses to our community. And, finally, again, a detailed technical description is separately presented in the follow-up 2411.07195, serving the complimentary purpose.
Now I reply to the more detailed comments:
-- Typos will be corrected and various unclear or inaccurate explanations will be improved per the referee's suggestions. I would like to thank the referee for pointing them out.
-- Indeed, the higher categories involved are higher (simplicial) groupoids that appear in homotopy theory, since homotopy is indeed what we want to capture in physics. Should we use the general term "anafunctor", or more field-specific names such as bibundle, H-S morphism, (and as the referee bring up) cofibration resolution etc? My inclination is to use the more general name, because I suppose category theory might be new to a certain proportion of the audience, and I want to show the natural progression of function-->functor-->anafunctor as generally useful concepts without bringing up too much specific knowledge which might bring barriers to readers. In the end, unifying seemingly different concepts in different branches of mathematics is what category theory is good for. That said, in a footnote, I did mention that anafunctor in specific fields has specific names; in the update I will add "cofibration resolution" brought up by the referee, since it is particularly relevant.
-- Those (generalized) higher gauge redundancies in this paper are actually dealt with in a gauge fixed manner. In the Villain model, we discussed why we can fix e.g. $\theta_v\in (-\pi, \pi]$ instead of $\mathbb{R}$, in the spinon decompostion we lifted $S^2$ to $SU(2)$ in certain gauge convention based on the north pole, and likewise for our new constructions. In principle one could also use arbitrary gauge, and the redundancy will be manifestly local. But in this paper we just fixed the (generalized) higher gauge, and each time the independence of the choice of gauge has been explained in the text.
-- The m_l label in the refined non-linear sigma model is not only not a $Z_2$ group, nor is it naturally a $Z_2$ torsor, because the $W_1$ weight for the two choices of $m_l$ are not symmetric under exchange, i.e. there is no natural $Z_2$ symmetry acting on these two labels. Also, topologically, one could have in principle used more patches, which obviously then does not admit a $Z_2$ action. I am not aware of mathematical literature referring to the choice of patches that we described as naturally forming a torsor.
-- When considering "the contributions from all other continuum paths", it is unnecessary for the entirety of the path to stay within one patch. This is just a schematic way of understanding what the $m_l$ label represents, just like in Kadanoff's block spin procedure there is some choice for what a renormalized spin represents.
-- I thank the referee for stressing on this important problem: Do those new d.o.f. added along with the new weights back-react on the dynamics and ruins the ``original theory''?
The same question could have been asked to the Villain model (after the vortex fugacity weight is introduced), so in the update I will add a paragraph elaborating on this in Section 2.1, and re-emphasize this point in the later sections. In the Villain model, which has been well-studied for decades, it is well-known that adding a suitable vortex fugacity term weighing the new d.o.f. actually facilities the renormalization convergence of the theory, rather than ruins the theory (see e.g. the cited Kogut's review paper). The physical idea is that, even if one does not include those higher d.o.f. and higher weights to begin with, along the coarse-graining procedure, those effects will be effectively generated anyways (since those higher d.o.f. are to capture how the fields interpolate, which is like the reminiscence of the d.o.f. on a finer lattice), so having those higher d.o.f. and weights in the action actually helps keeping track such effects along the renormalization flow. This has been analyzed in the Villain model. Since the new constructions are based on the same physical idea, it is physically intuitive to expect the same advantage (although the reasonable range of choices for the new weights should be determined numerically). Discussions along this line appeared in the final section in the original version, but in the updated version I will have the discussion throughout the text.
-- I thank the referee for bringing up the important issue that, given that EX and (when X=|G|) its delooping BEG are naturally equivalent to the trivial category, why the theories can still have non-trivial topology.
In the original version (mainly Section 5.2), I have explained the physical reason for this: The path integral weight in a dynamical QFT does not respect natural transformation, and this is a crucial difference with TQFT. So one cannot conclude the triviality of the theory just based on the triviality of EX or BEG.
While this explanation is correct, I myself also found this not satisfactory enough, both in terms of the way it was written and in terms of the sharpness of the statement. So in the updated version I will greatly improve the explanation on this point:
First, I will move certain discussions in the original Section 5.2 to Section 5.1 where EX and BEG were first introduced, emphasizing the physical reason mentioned above, and emphasizing the crucial difference between dynamical QFT vs TQFT (where the category theory foundation of the latter is much more familiar to certain groups of readers).
Second, in the new Section 5.5 (which used to be the second half of the original Section 5.4), I will make it clear that while EX is trivial, what we are looking at is the inclusion of X into EX, because the path integral weight is able to distinguish a locally constant configuration from a non-constant one; likewise, while BEG is trivial, what we are looking at is the inclusion of BG into BEG, because the path integral weight is able to discern a flat connection from a non-flat one. This will lead to relative cohomology classification. This is a substantial improvement in the scientific content of the updated version.
Author: Jing-Yuan Chen on 2025-02-18 [id 5230]
(in reply to Report 2 on 2024-10-04)The main purpose of this work, as the referee points out, is indeed that the lattice discretization should correctly reflect the expected topological classification before taking a continuum limit---this serves both the fundamental purpose (since we want a lattice QFT with good enough properties) and the practical purpose (since in practice there is always a finite lattice length) of lattice QFT.
The first main concern of the referee is that the model construction is not explicit enough in this paper. Indeed, the purpose of this paper is to introduce the big picture, the principles and the mathematics required to solve this problem. A follow-up work, which was mentioned in the original version, has now appeared as arXiv:2411.07195, in which a more technical explicit construction is given. The follow-up paper is self-contained for an audience who only care about the model and its physical intuition but not the overall big picture and mathematics, in complimentary to the present paper. Perhaps the follow-up paper 2411.07195 is closer to what the referee has in mind, according to the report.
I will then take this opportunity to explain why I decided to have the present paper and the follow-up paper 2411.07195 serving such complimentary roles. First of all, the detailed model construction itself is highly technical and takes tens of pages to explain. On the other hand, this is just one particular construction---there is no single "canonical construction", only the principles (see below) are crucial. My prospect is that in the future, numerics will help us find better optimized constructions, different from the one being described in 2411.07195, but still following the same principles. So I decided that it is important to have a first paper (i.e. the present paper) stating the principles (see below) which are correct in a model-independent manner, and sketch the model construction, and then have a follow-up paper (i.e. the 2411.07195) presenting the highly technical details of one particular model construction.
A second key question brought up by the referee is, what does it mean for our proposal "to work"? There are two layers of requirements for our proposal "to work". Along the way we will respond to the referee's third question, on why the mathematical Sections 5&6 must not be dropped.
The first layer is to serve our motivation of (in the referee's terms) "having a lattice discretization which correctly reflects the expected topological classification before taking the continuum limit". Thus, we need to make it precise what it means for a lattice theory to "correctly reflect" the topology.
In the past, this used to be only a matter of intuition. There has been no precise principle for this (except for TQFT, but here we are working with dynamical QFT). One major contribution of the present paper is to uncover a precise mathematical principle for what this really means---this is indeed why we wrote the mathematical Sections 5&6, and why these sections must not be dropped. (On the other hand, in the complimentary paper 2411.07195 whose goal is to introduce a detailed construction, there is no discussion of these mathematics but only the physical intuition behind. So this is along the lines of what the referee suggests.)
In the updated version of the present paper, I will state it clearly at the beginning of Section 5 that the mathematics to be introduced is not only motivational; they serve the purpose of uncovering the precise meaning of ``having the lattice theory to correct reflect the topology", hence indeed making the paper convincing. In particular, in the new Section 5.5 of the updated version (which used to be the second half of the original Section 5.4), I will give a sharpened mathematical statement for what it means to ``correct capture the topology", and explain why the construction in Section 4 (and in 2411.07195 in more details) indeed fulfills this principle by design. This explains the first layer of "the theory works"---that the construction indeed correctly captures the topology.
The second layer is that the dynamics of the lattice theory, in particular including the dynamics of the topological configurations, should be right as we approach the continuum limit. Surely this ultimately can only be tested numerically, which we have not yet done, but we have some good reasons to argue that this is highly likely the case:
We introduced some new degrees of freedom and some new weights. If the new weights are chosen in certain limits, then integrating out the new d.o.f. will just reduce the theory back to the traditional Wilson's definition. So first of all, our refinement at least would not make things worse.
On the contrary, it is likely that suitably chosen new weights will make the renormalization convergence of the theory better. In the Villain model, which has been well-studied for decades, it is well-known that adding a suitable vortex fugacity term weighing the new d.o.f. actually facilities the renormalization convergence of the theory, rather than ruins the theory (see e.g. the cited Kogut's review paper). The physical idea is that, even if one does not include those higher d.o.f. and higher weights to begin with, along the coarse-graining procedure, those effects will be effectively generated anyways (since those higher d.o.f. are to capture how the fields interpolate, which is like the reminiscence of the d.o.f. on a finer lattice), so having those higher d.o.f. and weights in the action actually helps keeping track such effects along the renormalization flow. This has been analyzed in the Villain model. Since the new constructions are based on the same physical idea, it is physically intuitive to expect the same advantage (although the reasonable range of choices for the new weights should be determined numerically). Discussions along this line appeared in the final section in the original version, but in the updated version I will make the discussions more clearly throughout the text.
Therefore, this second layer of the meaning of "the theory works" is an educated expectation, that awaits for numerical checks as a next step (following our follow-up explicit construction paper 2411.07195). I believe, nonetheless, it is good to write theoretical papers introducing the important new concepts before the numerical implementation is carried out. As far as I am aware of, Wilson's original 1974 lattice gauge theory paper was motivated by the formal purpose of making good sense of what a QFT really means; only a few years later did Creutz implement it numerically and found it to be numerically useful. Also, Luscher's 1982 geometrical construction of instanton was a theoretical proposal which is only implemented a few years later. I hope this can justify that the present theoretical paper, which introduced the mathematical principles of "capturing topology onto lattice" (which are novel and physically intuitive), is of important values in its own right.