SciPost Submission Page
Scalable hybrid quantum Monte Carlo simulation of U(1) gauge field coupled to fermions on GPU
by Kexin Feng, Chuang Chen, Zi Yang Meng
This is not the latest submitted version.
Submission summary
| Authors (as registered SciPost users): | Kexin Feng |
| Submission information | |
|---|---|
| Preprint Link: | https://arxiv.org/abs/2508.16298v3 (pdf) |
| Code repository: | https://github.com/KexinFeng/qed_fermion |
| Date submitted: | Nov. 7, 2025, 4:26 a.m. |
| Submitted by: | Kexin Feng |
| Submitted to: | SciPost Physics |
| Ontological classification | |
|---|---|
| Academic field: | Physics |
| Specialties: |
|
| Approaches: | Theoretical, Computational |
Abstract
We develop a GPU-accelerated hybrid quantum Monte Carlo (QMC) algorithm to solve the fundamental yet difficult problem of $U(1)$ gauge field coupled to fermions, which gives rise to a $U(1)$ Dirac spin liquid state under the description of (2+1)d quantum electrodynamics QED$_3$. The algorithm renders a good acceptance rate and, more importantly, nearly linear space-time volume scaling in computational complexity $O(N_{\tau} V_s)$, where $N_\tau$ is the imaginary time dimension and $V_s$ is spatial volume, which is much more efficient than determinant QMC with scaling behavior of $O(N_\tau V_s^3)$. Such acceleration is achieved via a collection of technical improvements, including (i) the design of the efficient problem-specific preconditioner, (ii) customized CUDA kernel for matrix-vector multiplication, and (iii) CUDA Graph implementation on the GPU. These advances allow us to simulate the $U(1)$ Dirac spin liquid state with unprecedentedly large system sizes, which is up to $N_\tau\times L\times L = 660\times66\times66$, and reveal its novel properties. With these technical improvements, we see the asymptotic convergence in the scaling dimensions of various fermion bilinear operators and the conserved current operator when approaching the thermodynamic limit. The scaling dimensions find good agreement with field-theoretical expectation, which provides supporting evidence for the conformal nature of the $U(1)$ Dirac spin liquid state in the \qed. Our technical advancements open an avenue to study the Dirac spin liquid state and its transition towards symmetry-breaking phases at larger system sizes and with less computational burden.
Author indications on fulfilling journal expectations
- Provide a novel and synergetic link between different research areas.
- Open a new pathway in an existing or a new research direction, with clear potential for multi-pronged follow-up work
- Detail a groundbreaking theoretical/experimental/computational discovery
- Present a breakthrough on a previously-identified and long-standing research stumbling block
Current status:
Reports on this Submission
Report
The paper uses hybrid quantum Monte Carlo (HQMC) simulations to study lattice quantum electrodynamics in 2+1 space-time dimensions (QED3). Several optimization techniques adapted to GPU clusters are employed, including problem-specific preconditioning, customized CUDA kernels, and CUDA graph acceleration. These techniques allow the authors to reach unprecedented system sizes up to $N_τ \times L^2 = 660 \times 66^2$. From the decay of the spin-spin and bond-bond correlation functions, the scaling dimension of the adjoint fermion bilinear in the field theory is extracted and is shown to be consistent with previous predictions based on the 4-ε expansion. Further, the decay of the flux-flux correlation function is shown to be consistent with the scaling dimension of a conserved current, as expected. These results demonstrate the existence of a stable conformal ground state of QED3.
Both the methodological progress and the results for the correlation functions represent important and significant advances that are of interest to a broad audience and warrant publication in SciPost Physics.
To further improve the paper, the authors may consider the following optional suggestions:
(1) It may be useful to explicitly define the fermion flavor number Nf mentioned on p.4. I assume Nf counts the number of four-component Dirac fermion flavors, with Nf = 2 corresponding to two spin-degenerate Dirac cones in the first Brillouin zone. Is it correct?
(2) Wording: - "In this paper, we focus on a dubbed non-compact non-compact U(1) gauge field ..." (p.2) - word doubling - "systematic results of such interesting phase and rich diagram" (p.6) - confusing, maybe adapt word ordering?
Recommendation
Publish (surpasses expectations and criteria for this Journal; among top 10%)
Report
Recommendation
Publish (surpasses expectations and criteria for this Journal; among top 10%)
Report #2 by Anonymous (Referee 2) on 2026-1-4 (Invited Report)
The referee discloses that the following generative AI tools have been used in the preparation of this report:
Improved the wording and grammar of the report.
Strengths
1- Demonstrates a major computational breakthrough by reducing QMC complexity to near-linear scaling through GPU-accelerated hybrid Monte Carlo 2- Provides clear physical validation of U(1) Dirac spin liquid behavior via consistent scaling dimensions across spin, bond, and flux correlators.
Weaknesses
1- Algorithmic performance still relies on careful parameter tuning (e.g., Δτ constraints and preconditioner validity), which may limit portability to other models. 2- Ergodicity risks in determinant-sign sectors are noted but not fully resolved in all coupling regimes, meaning broader applicability requires caution.
Report
Requested changes
-1 It could be useful to provide quantitative comparison tables summarizing performance vs. DQMC across sizes and hardware to clarify cost-benefit scaling.
-2 Given the technical and methodological advance, it would be helpful for the community if the authors share access to to the repository
Recommendation
Publish (easily meets expectations and criteria for this Journal; among top 50%)
We thank the referee for the their assessment of the work and valuable questions. We will respond to the comments below and make corresponding changes in the revised manuscript.
** Comment 1**: It could be useful to provide quantitative comparison tables summarizing performance vs. DQMC across sizes and hardware to clarify cost-benefit scaling.
** Reply 1**: Actually, the quantitative comparison between the HQMC algorithm and the DQMC algorithm has already been shown in Fig.2(c). The time complexity of the HQMC algorithm indeed displays near linear scaling behavior with respect to the space-time volume, i.e. $T \sim O(L^3)$, while the time complexity of the DQMC scales like $T\sim O(L^7)$. This result shows that the HQMC simulation can be scaled to the lattice sizes that DQMC cannot reach, which is already a clear win. This difference in time complexity is mainly due to the algorithmic difference. Thus, the result is agnostic to hardware.
**Comment 2**: Given the technical and methodological advance, it would be helpful for the community if the authors share access to to the repository.
**Reply 2**: Agreed. The repository was actually public when the manuscript was submitted.
Strengths
(ii) Original and innovative methodological developments enabling record large simulations
(iii) Results substantially improve our understanding of the fate of U(1) gauge field couplied to fermions
Weaknesses
(ii) Complementary and additional quantities need to be analyzed to conclusively establish the obtained results
Report
The manuscript is well-written and I would recommend its publication in SciPost Physics after the authors have answered the following concerns/questions and addressed these issues in an appropriately revised manuscript:
(i) The manuscript would benefit from quantitative measurements of autocorrelation times for gauge fields, fermionic bilinears, and energy. Could the authors provide τ_L or acceptance-rate scaling with L, and comment on whether critical slowing down is suppressed or only reduced?
(ii) The authors fix N_τ = 10 L (thus β = L).
Questions: Why is this particular scaling chosen?
Has the dependence on Δτ been checked?
Are the power-law exponents stable as Δτ → 0?
Could the authors provide a small-Δτ extrapolation or a consistency check?
(iii) While Fig. 2(b) shows small-L comparisons, no quantitative error analysis is given.
Could the authors provide explicit differences between HQMC and exact/ED or DQMC observables (e.g., energies, correlation functions) to demonstrate correctness and absence of systematic biases?
(iv) The scaling fits are done on a small number of large-distance points due to noise limitations.
Can the authors quantify the fitting range and systematic drift of exponents as the fitting window is varied? Are the statistical uncertainties (including covariance between data points) properly propagated?
(v) Since DSL/QED3 physics emerges only in the infrared, the accessible L may still be in a crossover regime. Are there indicators (e.g., scaling collapse, monotonic drift of exponents with 1/L) that the simulations have indeed reached the asymptotic scaling regime?
(vi) The paper interprets matching exponents for spin and bond operators as evidence for emergent SU(4). Can the authors provide a quantitative comparison (e.g., exponent difference Δ_spin − Δ_bond with uncertainty)? Is the agreement within error bars expected theoretically at these lattice sizes? Are there other operators (e.g., adjoint vs singlet channels) that could strengthen the SU(4) conclusion?
(vii) The τ⁻⁴ behaviour is associated with conserved current scaling Δ_J = 2. Can the authors distinguish τ⁻⁴ from possible τ⁻³ (free photon) or τ⁻⁵ behaviors within error bars? Are the results sensitive to gauge coupling J or to anisotropy in temporal vs spatial discretization?
Requested changes
Please see points raised in the Report
Recommendation
Publish (surpasses expectations and criteria for this Journal; among top 10%)
We would like to thank the referee for his/her high assessments of our work. We will respond to the comments below and make corresponding changes in the revised manuscript.
Response to Referee 1
We thank the referee for the careful reading and valuable suggestions, which have greatly helped us in the revision of our manuscript.
**Comment 1**: The manuscript would benefit from quantitative measurements of autocorrelation times for gauge fields, fermionic bilinears, and energy. Could the authors provide $\tau_L$ or acceptance-rate scaling with $L$, and comment on whether critical slowing down is suppressed or only reduced?
**Reply 1**: Yes, we made the new plots that show the autocorrelation data of $S_\tau$, the time-derivative energy, and $S_{\text{plaq}}$, the plaquette energy. By exponential fitting, we extracted the autocorrelation time $\tau_L$ for each lattice size, which gives critical exponents of $z \approx 1.34$ and $z \approx 1.96$ respectively, within lattice sizes up to $L = 60$ ($N_\tau = 10 L$). So the slowdown is reduced compared to DQMC simulation applied on a Holstein model at critical point, which is around $z \approx 5.1$ [Physical Review B 98.4 (2018): 041102.]. This is due to the advantage of the global update scheme in HQMC over the local update scheme in DQMC.
This plot and analysis will be appended as a new section to the supplementary material. Here we did not plot the autocorrelation of fermion bilinear correlation, since we found that the time series data of this observable gets mixed from the very beginning, and as a result, they do not show significant autocorrelation.
**Comment 2**: The authors fix $N_\tau = 10 L$ (thus $\beta = L$). Questions: Why is this particular scaling chosen? Has the dependence on $\Delta\tau$ been checked? Are the power-law exponents stable as $\Delta\tau \to 0$? Could the authors provide a small-$\Delta\tau$ extrapolation or a consistency check?
**Reply 2**: It is true that throughout the paper, the aspect ratio $N_\tau = 10 L$ is chosen, as mentioned in the main text "Model" section. The reason is mainly due to the requirement of the smallness in $\Delta\tau$, the Trotter decomposition step. We chose $\Delta\tau = 0.1$. It cannot be larger since otherwise it would lead to systematic error in Trotter decomposition. Besides, from numerical perspective, if $\Delta\tau \gt 0.12$, the CG preconditioner will also fail, as mentioned in "CG method and preconditioner" section. On the other hand, too small $\Delta\tau$ will increase the space-time lattice size and, most importantly, make the cubic space-time lattice more skewed, which significantly increases the simulation hardness. So as a tradeoff we chose $\Delta\tau = 0.1$. Then, $N_\tau = \beta/\Delta\tau = 10 \beta$, where $\beta$ is the reverse temperature. $\beta$ goes to infinity at thermodynamic limit in the same way as $L$. So we set $\beta = L$. Hence, $N_\tau = 10 L$.
On the other hand, different $\Delta\tau$ only changes the aspect ratio of $N_\tau$ vs $L$. We will eventually perform thermodynamic limit extrapolation, which assumes that the difference in the aspect ratio of the space-time lattice will be eliminated at the thermodynamic limit. Note that the difference in the aspect ratio here is not exclusive to the difference in the choice of $\Delta\tau$, but also the choice of $\beta$, $L_x$, and $L_y$. So we rely on the extrapolation quality instead to estimate the choice of the series of finite-size cubic lattices.
Based on the thermodynamic extrapolation analysis, the spin bilinear correlations $C^{\uparrow\downarrow}_S(r,0)$ and $C_B(r,0)$ converge well with fair extrapolation quality. But the extrapolation quality of the flux dynamic correlation $C_{\text{flux}}(\tau)$ is not as good, and displays finite-size oscillation behavior. This indicates that the simulation will need to be pushed to larger lattice sizes to get more accurate extrapolation.
Yes, we will add the extrapolation analysis to the main text mentioned above. Here, the Trotter decomposition step $\Delta\tau$ is fixed in this process, though. We extrapolate $1/L \to 0$, which means $N_\tau = \beta/\Delta\tau \to \infty$.
**Comment 3**: While Fig. 2(b) shows small-$L$ comparisons, no quantitative error analysis is given. Could the authors provide explicit differences between HQMC and exact/ED or DQMC observables (e.g., energies, correlation functions) to demonstrate correctness and absence of systematic biases?
**Reply 3**: Actually, the quantitative error analysis was given, shown as the error bar in both HQMC and DQMC results. But probably they were plotted too small to be discernible in the previous figures. We will add a zoomed-in view of the comparison between the HQMC and DQMC results in Fig. 3(a), which shows that the HQMC and DQMC results agree with each other within the error bar.
**Comment 4**: The scaling fits are done on a small number of large-distance points due to noise limitations. Can the authors quantify the fitting range and systematic drift of exponents as the fitting window is varied? Are the statistical uncertainties (including covariance between data points) properly propagated?
**Reply 4**: Indeed the amount of effective spin bilinear correlation data points $C^{\uparrow\downarrow}_S(r,0)$ and $C_B(r,0)$ are few. So the fitting range starts from the first data points ($r = 1$) all the way to the noise level. Because of the scarcity of the effective data points, we didn’t play with the fitting window. Besides, the spin bilinear correlation data do not show clear downward curvature like the $C_{\text{flux}}(\tau)$ data shown in Fig. 3(c). So we will take the whole fitting window of the effective data points, while the difference resulting from different choices of the fitting window is considered as the error of the fitting slopes. Such error bars in the slope will be shown in the extrapolation plot, to be added in the updated version.
Yes, in the analysis of the error bars of the slopes as well as the analysis of the thermodynamic limit extrapolation, the statistical uncertainties (including covariance between data points) are properly propagated. This will be added in the updated version.
**Comment 5**: Since DSL/QED3 physics emerges only in the infrared, the accessible $L$ may still be in a crossover regime. Are there indicators (e.g., scaling collapse, monotonic drift of exponents with $1/L$) that the simulations have indeed reached the asymptotic scaling regime?
**Reply 5**: Whether or not the asymptotic scaling regime is reached is mainly checked by the extrapolation process from finite-size lattices to thermodynamic limit. In the current version of Fig. 3, we have seen that the series of curves of the fermion bilinear correlations $C^{\uparrow\downarrow}_S(r,0)$ and $C_B(r,0)$ fairly converge near $L = 60$, which indicates that the lattice size is sufficiently large. However, the series of the $C_{\text{flux}}(\tau)$ curves still display clear oscillation, which indicates that the simulation might need to be pushed to larger lattice sizes.
In the updated version, we will add the extrapolation plots, which is another indicator to estimate whether the asymptotic scaling regime is reached. The extrapolation analysis will show the asymptotic behavior more clearly and give similar conclusions to the ones above.
**Comment 6**: The paper interprets matching exponents for spin and bond operators as evidence for emergent SU(4). Can the authors provide a quantitative comparison (e.g., exponent difference $\Delta_{\text{spin}} - \Delta_{\text{bond}}$ with uncertainty)? Is the agreement within error bars expected theoretically at these lattice sizes? Are there other operators (e.g., adjoint vs singlet channels) that could strengthen the SU(4) conclusion?
**Reply 6**: Yes, as mentioned above, in the updated version we have added the quantitative analysis of the $\Delta_{\text{spin}}$ and $\Delta_{\text{bond}}$, including the thermodynamic extrapolation and systematic error estimation. And they do agree with each other within the estimated error bars. Here we didn’t analyze the difference of $\Delta_{\text{spin}} - \Delta_{\text{bond}}$ since this quantity is one order higher differentiation, which would make the result noisier and harder to extrapolate. So, in the extrapolation process we stick with $\Delta_{\text{spin}}$ or $\Delta_{\text{bond}}$ and obtain their respective extrapolated values along with the error bars.
Another operator that in theory should have the same scaling dimension $\Delta_{\text{adj}}$ is the dimer operator, which is numerically studied in Physical Review X 9.2 (2019): 021022. This could strengthen the SU(4) conclusion. However, we didn’t show the dimer-dimer correlation, because the stochastic estimation of the dimer-dimer correlation, which is an eight-fermion correlation function, is too costly. The cost of the stochastic estimation increases significantly with respect to the number of fermions in the correlator. So this computation is left for future studies with larger simulation scale, for example with multiple GPUs.
**Comment 7**: The $\tau^{-4}$ behaviour is associated with conserved current scaling $\Delta_J = 2$. Can the authors distinguish $\tau^{-4}$ from possible $\tau^{-3}$ (free photon) or $\tau^{-5}$ behaviors within error bars? Are the results sensitive to gauge coupling $J$ or to anisotropy in temporal vs spatial discretization?
**Reply 7**: We have updated Fig. 3 and added the extrapolation analysis of the $\Delta_J$ to $1/L \to 0$ limit, which gives $2\Delta_{J,\infty} = 3.8 \pm 0.7$. So with the error considered here, the result is able to distinguish $2\Delta_J = 4.0$ from $2\Delta_J = 3.0$ and $2\Delta_J = 5.0$, which however is marginal. As mentioned above and in the updated main text, the convergence of $\Delta_J$, as $L$ increases, is not as clear as that of $\Delta_{\text{spin}}$ or $\Delta_{\text{bond}}$; it shows clear oscillatory behavior at finite sizes, and as a result the error bar is large. This indicates that the current simulation size may have not reached the asymptotic regime and needs to be pushed further. About the calculation of how the choices of space-time aspect ratio change the resultant scaling dimension, as mentioned in the reply above, we instead turned to the analysis of the extrapolation quality to estimate the effect of the finite-size lattice, given that both increasing or decreasing the Trotter decomposition step $\Delta\tau$ (effectively decreasing or increasing $N_\tau$) will lead to significant increase in numerical hardness.
About the effect of the different choices of $J$, we refer to the previous numerical study in Physical Review X 9.2 (2019): 021022, which shows that $J = 1.25$ is already deep in the U1D phase. Based on this study, we didn’t worry about the effect of the gauge coupling $J$ too much.

Author: Kexin Feng on 2026-01-10 [id 6223]
(in reply to Report 4 on 2026-01-08)We thank the referee for the their careful assessment of the work. Here are the replies to each comment.
Comment 1: It may be useful to explicitly define the fermion flavor number Nf mentioned
on p.4. I assume Nf counts the number of four-component Dirac fermion flavors, with Nf = 2
corresponding to two spin-degenerate Dirac cones in the first Brillouin zone. Is it correct?
Reply 1: Yes it is correct. We have clarified the definition in the updated version.
Comment 2: Wording: - "In this paper, we focus on a dubbed non-compact U(1) gauge field
..." (p.2) - word doubling - "systematic results of such interesting phase and rich diagram"
(p.6) - confusing, maybe adapt word ordering?
Reply 2: We thank the referee for the careful reading. The adaptions have been made.