Processing math: 100%
SciPost logo

SciPost Submission Page

Neural Quantum State Study of Fracton Models

by Marc Machaczek, Lode Pollet, Ke Liu

This is not the latest submitted version.

This Submission thread is now published as

Submission summary

Authors (as registered SciPost users): Ke Liu · Marc Machaczek · Lode Pollet
Submission information
Preprint Link: https://arxiv.org/abs/2406.11677v1  (pdf)
Code repository: https://github.com/MarcMachaczek/FractonNQS/tree/v1.0
Date submitted: 2024-06-18 18:57
Submitted by: Machaczek, Marc
Submitted to: SciPost Physics
Ontological classification
Academic field: Physics
Specialties:
  • Condensed Matter Physics - Computational
  • Quantum Physics
Approaches: Theoretical, Computational

Abstract

Fracton models host unconventional topological orders in three and higher dimensions and provide promising candidates for quantum memory platforms. Understanding their robustness against quantum fluctuations is an important task but also poses great challenges due to the lack of efficient numerical tools. In this work, we establish neural quantum states (NQS) as new tools to study phase transitions in these models. Exact and efficient parametrizations are derived for three prototypical fracton codes - the checkerboard and X-cube model, as well as Haah's code - both in terms of a restricted Boltzmann machine (RBM) and a correlation-enhanced RBM. We then adapt the correlation-enhanced RBM architecture to a perturbed checkerboard model and reveal its strong first-order phase transition between the fracton phase and a trivial field-polarizing phase. To this end, we simulate this highly entangled system on lattices of up to 512 qubits with high accuracy, representing a cutting-edge application of variational neural-network methods. Our work demonstrates the remarkable potential of NQS in studying complicated three-dimensional problems and highlights physics-oriented constructions of NQS architectures.

Author indications on fulfilling journal expectations

  • Provide a novel and synergetic link between different research areas.
  • Open a new pathway in an existing or a new research direction, with clear potential for multi-pronged follow-up work
  • Detail a groundbreaking theoretical/experimental/computational discovery
  • Present a breakthrough on a previously-identified and long-standing research stumbling block
Current status:
Has been resubmitted

Reports on this Submission

Report #2 by Anonymous (Referee 2) on 2024-7-30 (Invited Report)

  • Cite as: Anonymous, Report on arXiv:2406.11677v1, delivered 2024-07-30, doi: 10.21468/SciPost.Report.9502

Strengths

1. The objectives and methods introduced in this study are well-justified

2. The here introduced correlation-enhanced RBM seems to be fast, reliable, quite general and easy to implement method.

3. The paper is well written and easy to follow with enough of additional information in appendices and online to reproduce the work.

Weaknesses

1. The role of the hysteresis is overstated in the paper.

2. Other learning protocols and NQS can and should have been used the in search of lower variational energy or to test some of the statements.

3. The estimations of the critical field and the discussion of the type of the phase transition in unconvincing.

Report

The work under consideration “Neural Quantum State Study of Fracton Models” employs Neural Quantum States (NQS) as variational wave functions within variational Monte Carlo (VMC) methods to explore the properties of strongly entangled fracton order and associated phase transitions. Motivated by unsatisfactory results from the standard restricted Boltzmann machine (and other typical NQS applications), the authors have developed a physically motivated extension known as the correlation-enhanced RBM (cRBM). Through the use of cRBM, the authors demonstrate that all three fracton models examined exhibit a first-order phase transition from a fracton phase to a trivial field-polarizing phase.

The objectives and methods introduced in this study are well-justified. The fractons under investigation necessitate long-range entanglement and a lattice dimensionality of three or higher, conditions that are a big challenge to standard techniques such as Tensor Networks. The authors effectively show that NQS can provide a solution in these cases. Overall, this is a very good study which, after addressing certain issues, could be confidently recommended for publication in SciPost Physics Core. However, to meet the standards of SciPost Physics as a flagship journal, the paper would need substantial enhancement.

Further comments and questions:

1- "NQS can be considered a subclass of variational Monte Carlo (VMC) [34] techniques.” VMC is a broad framework where the variational principle is used to approximate the ground state of quantum systems. S NQS, on the other hand, represent a clever variational wavefunction characterized by parameters that are optimized using machine learning techniques. So, in my understanding, NQSs aren't a subclass of VMC, but rather a type of wavefunction used within the VMC framework.

2- I am somewhat confused by the correlators. Admittedly, this confusion stems from my gaps in understanding fracton systems. Nevertheless, are the conditions imposed on the correlators satisfied for ANY ground states of the models, or only for some? Additionally, what is the difference between this technique and Group Convolutional Neural Networks?

3- Before Equation (14), the visible biases are set to zero. Why is this the case? As mentioned in the paper, visible biases are often critical for setting the correct sign structure of the wave function. Why can this aspect be ignored here?

4- Figure 9 is used to demonstrate the significant advantage of incorporating custom correlator features into RBM. However, this figure raises several questions. First, the alpha value of ¼ used here is very small. This is probably to keep the number of parameters approximately the same as in other networks compared to cRBM, where already 52 parameters suffice. Yet, this approach seems a bit self-serving, and the number of parameters for sRBM is still significantly smaller than for cRBM. So, how stable is this effect where other networks remain stuck in the initial local minimum for a significant number of training iterations? How does it depend on alpha, learning rate, and the distribution of initial parameters? I would guess that there is a learning protocol that could allow the RBM to learn the ground state as well. Or am I mistaken, and is the RBM used is too small to capture the exact ground state?

5- Figure 11 demonstrates that the transfer learning approach, where one starts at a high field and then gradually decreases it, has clear advantages over the other two protocols. However, since it is a variational method, why not retrace the steps in the opposite direction? That is, start with any "blue" point in Figure 11 that has a small relative error and perform a left-right sweep from there. This approach might be irrelevant for the 4x2x2 lattice where the agreement with the exact solution reaches the level of numerical (or Monte Carlo) precision for the right-left sweep. However, it could lower the energies for larger systems. And in the end, that is what the variational methods are all about. Such there-and-back techniques have previously been used for magnetic systems, for example, to retrace step changes in magnetization [SciPost Phys. Core 6, 088 (2023)]. The results could also enhance the understanding of the role of transfer learning here.

6- I think that the role of the hysteresis is overstated in the paper. The hysteresis is related to the details of the learning, there is no physical time involved. I therefore don’t know what is the significance of the statements like “... our work disclosed that the checkerboard model experiences a strong first-order phase transition with a large hysteresis when subjected to uniform magnetic fields.” , “ … the method detects hysteresis effects and provides insights into the nature of the underlying phase transitions…”. The hysteresis might completely disappear with different learning protocols or even different Monte Carlo sampling methods.

7- What is meant by “We estimate a critical field hcrit ≈ 0.445 from the intersection points of energy curves and the middle of the magnetization hysteresis.” The “hysteresis” is huge for L=8,6, actually the second branch is not even in the figure and it is not clear at all that the energies for systems with different lattice sizes cross each other. The magnetisation in Figure 13 seems to get more smooth with increasing lattice size and shifts to smaller values of h_x. On top of that the discontinuities in the relative energy contributions in Figure 14 also point to a much smaller critical value of h_x. Isn’t there a more reliable way to estimate the critical point from your data?

8- What is actually meant by “strong first order phase transition”? Is there some other way to test the order of the phase transition? Why not use some version of Binders cumulant?

9- The technique of splitting the network into one that encodes the phase and the other for amplitude is quite standard in the context of NQS and should be tested at least with the other methods in Figure 9

Requested changes

1- Figure 9: Discuss and demonstrate the stability of the result with respect to changing learning rate, initial parameters distribution, standard deviation and alpha. Add results for RBM with phase and amplitude encoded by separated networks and preferably also something even simpler than RBM, e.g., Jastrow. Also show results for symmetric RBM with at least as many parameters as for cRBM.

2- Reevaluate or explain the validity of the estimation of the critical field as well as the significance of the hysteresis.

3- Show at least some tests for transfer learning where the direction of the learning was reversed at finite fields.

4- For further suggestions see the report.

Recommendation

Accept in alternative Journal (see Report)

  • validity: high
  • significance: good
  • originality: good
  • clarity: high
  • formatting: excellent
  • grammar: excellent

Author:  Marc Machaczek  on 2025-02-02  [id 5179]

(in reply to Report 2 on 2024-07-30)
Category:
remark
answer to question
correction

We thank the referee for the extensive feedback. We believe the comments and questions raised led to an improved manuscript by helping us resolve possible confusion about our results and clarifying key aspects of our work. We'd like to address the questions and feedback point-by-point in the following:

1) NQS can be considered a subclass of variational Monte Carlo (VMC) [34] techniques. VMC is a broad framework where >the variational principle is used to approximate the ground state of quantum systems. NQS, on the other hand, represent a clever variational wavefunction characterized by parameters that are optimized using machine learning techniques. So, in my understanding, NQSs aren't a subclass of VMC, but rather a type of wavefunction used within the >VMC framework.

We thank the referee for bringing attention to this formulation. Accordingly, we have changed the phrasing in the second paragraph of the Introduction and Section 3 to clarify the relationship between NQS and VMC, avoiding the statement about NQS being a subclass of VMC.

2) I am somewhat confused by the correlators. Admittedly, this confusion stems from my gaps in understanding fracton > systems. Nevertheless, are the conditions imposed on the correlators satisfied for ANY ground states of the models, or only for some? Additionally, what is the difference between this technique and Group Convolutional Neural Networks?

These are indeed two very interesting aspects which we would like to clarify in the following: - 1. The purpose of the correlators is to inform the network about certain properties of the state. In particular, we do not enforce any specific value for any of the correlators. In the case of the checkerboard model, for instance, it holds true that any ground state (in the absence of magnetic fields) satisfies BC=+1 as described in the end of Section 3, and similar for the Xcube model with the star stabilizer generators. We calculate these values explicitly and feed these correlator features into the network. Then, the network learns itself that BC=1 do not correspond to the ground state as they lead to increased energy, and suppresses them by tuning the weights accordingly. The loop correlators can take any value; the different values of loop correlators label different sectors in the ground state manifold. The pairwise correlators are a generic extension that can in principle take any value in the ground state manifold.
- 2. While the inclusion of correlators and implementation of symmetries are independent concepts a priori, a symmetrized cRBM can also be viewed as a GCNN (in log space). The difference to a symmetrized RBM, for which it is well known that it can be regarded as a GCNN with one hidden layer and a logcosh activation function, is that the GCNN is now defined on the extended feature space. The latter can be modeled as the disjoint union of the actual lattice and suitable domains to which the correlator features are assigned (for instance, a copy of the lattice for cube correlators in the checkerboard model). The (translation) group then acts accordingly on each feature domain; this is possible because lattice symmetries are graph automorphisms that conserve the correlator type, i.e. a translation induces a permutation of the correlators.

3) Before Equation (14), the visible biases are set to zero. Why is this the case? As mentioned in the paper, visible biases are often critical for setting the correct sign structure of the wave function. Why can this aspect be ignored here?

As the referee suggested, the visible biases can be used to model phase factors of the wave function, which might be helpful when dealing with the X-type stabilizer generators. Even when only considering RBM-type wave functions there are many possible parametrizations, some of which might include non-zero visible biases. However, for the fracton models considered here, it is just not necessary to include a visible bias in order to parameterize their ground states exactly, as demonstrated in the main text. And, because their Hamiltonians are stoquastic, it should not be necessary to model a sign structure. We also want to point out that we do not enforce any value for the visible biases in the numerical simulations.

4) Figure 9 is used to demonstrate the significant advantage of incorporating custom correlator features into RBM. However, this figure raises several questions. First, the alpha value of ¼ used here is very small. This is probably to keep the number of parameters approximately the same as in other networks compared to cRBM, where already 52 parameters suffice. Yet, this approach seems a bit self-serving, and the number of parameters for sRBM is still significantly smaller than for cRBM. So, how stable is this effect where other networks remain stuck in the initial local minimum for a significant number of training iterations? How does it depend on alpha, learning rate, and the distribution of initial parameters? I would guess that there is a learning protocol that could allow the RBM to learn the ground state as well. Or am I mistaken, and is the RBM used is too small to capture the exact ground state?

The referees valuable comment on this issue allowed us to refine our manuscript in order to present a more nuanced picture on this matter. We have expanded the information contained in Figure 9 taking into account the referees feedback, which led to a better presentation of the impact of different hyperparameter choices. It now contains a Jastrow wave function for comparison, and some different combinations of standard deviations, learning rates, and feature densities for RBM architectures. Moreover, we've included an additional Figure 20 which contains training curves for uniform parameter initialization. We have also made clear which architectures are able to represent the ground state exactly by means of the parametrization derived in Section 4. In short, some combinations of hyperparameters allow the other RBM architectures to approximate the ground state too. α=1/4 is sufficient for symmetric RBMs to parameterize the ground state in theory, but even for larger features densities they converge slower (or not at all) compared to the cRBM. The latter shows strong robustness with respect to the different hyperparameters and the choice of initialization, converges the fastest and reaches lower variances than other network candidates with similar (but still higher) parameter counts. We hope that this demonstrates the advantage of the correlators more clearly. Regarding the splitting of the network into a phase and amplitude network, please refer to point 9 of our reply.

5) Figure 11 demonstrates that the transfer learning approach, where one starts at a high field and then gradually decreases it, has clear advantages over the other two protocols. However, since it is a variational method, why not retrace the steps in the opposite direction? That is, start with any blue'' point in Figure 11 that has a small relative error and perform a left-right sweep from there. This approach might be irrelevant for the 4x2x2 lattice where the agreement with the exact solution reaches the level of numerical (or Monte Carlo) precision for the right-left sweep. However, it could lower the energies for larger systems. And in the end, that is what the variational methods are all about. Such there-and back techniques have previously been used for magnetic systems, for example, to retrace step changes in magnetization [SciPost Phys. Core 6, 088 (2023)]. The results could also enhance the understanding of the role of transfer learning here.

We thank the referee for proposing an interesting adaption of the transfer learning protocol used in this work and we appreciate their comment for bringing attention to the work [SciPost Phys. Core 6, 088 (2023)]. Simulating the there-and-back'' approach on a small (4x2x2) checkerboard model, starting from the left and turning around at some intermediate value (0.2), resulted only in minor energy improvements. This is a result of the additional training iterations performed in a narrow range of the phase diagram, however the ground state error remained significantly larger compared to a full right-left sweep. We made a similar observation for the L=6 lattice. Additionally, this time turning around in the critical region (e.g. at 0.42) did result in larger energy fluctuations due to Markov chains spreading over the meta-stable configurations. We obtained analogous results when initializing the "there-and-back" approach in the strong-field limit.

All these observations are consistent with tracing the evolution of the ground state under varying field in the presence of a first-order phase transition (also see variational neural annealing Ref.[60] and Appendix B) and, at least in our testing, no notable energy improvements where achieved. The goal of this work is to identify the strong hysteresis and first-order transition in the checkerboard model, which requires a full scan from both directions over the whole critical region. While the "there-and-back" approach did not add any additional insights, we thank the referee for suggesting this approach as a solid consistency check for our simulations. Nonetheless, we added a reference to [SciPost Phys. Core 6, 088 (2023)] in the manuscript to inform the reader about other work which employs similar ideas.

6) I think that the role of the hysteresis is overstated in the paper. The hysteresis is related to the details of the learning, there is no physical time involved. I therefore don’t know what is the significance of the statements like "... our work disclosed that the checkerboard model experiences a strong first-order phase transition with a large hysteresis when subjected to uniform magnetic fields", "the method detects hysteresis effects and provides insights into the nature of the underlying phase transitions". The hysteresis might completely disappear with different learning protocols or even different Monte Carlo sampling methods.

We thank the referee for their question. The hysteresis phenomenon reflects the tunneling between stable and metastable branches of the states, indicating discontinuity in a phase transition. It is a commonly used method to defect first-order phase transitions for both classical and quantum systems, independent of the algorithm used. To exclude numerical artifacts, as emphasized in the manuscript, one has to ensure the hysteresis is robust and becomes more profound with increasing system sizes. This is indeed the case in our Figure~13 for the checkerboard model. There is no need to simulate the time evolution of the system, as an equilibrium phase transition can be characterized by static quantities, like energy and order parameters. Moreover, strong first-order phase transitions and large hysteresis are typically easier to detect; see the reply to Question 8.

In addition, as a cross-check, we have performed extensive new simulations for the field-perturbed X-cube model, whose phase transition is known from quantum Monte Carlo (QMC) simulations (Ref.~44) and high-order series expansions (Ref.~46). These two works employed different methods, but both relied on energy hysteresis to determine the field-driven first-order phase transition; see Figure~1 in SM of Ref.~44 and Figure~12 of Ref.~46. We evaluate both energy hysteresis and magnetization hysteresis and find a strong first-order phase transition at hx,c0.91, as shown in new Figure~21 in the revised manuscript. This agrees well with the QMC result hx,c0.9 (Ref.~44) and the high-order series-expansion result hx,c0.916 (Ref.~46). This further strengthens our NQS results and the use of hysteresis.

7) What is meant by We estimate a critical field hcrit = 0.445 from the intersection points of energy curves and the middle of the magnetization hysteresis.'' The hysteresis is huge for L=8,6, actually the second branch is not even in the figure and it is not clear at all that the energies for systems with different lattice sizes cross each other. The magnetisation in Figure 13 seems to get more smooth with increasing lattice size and shifts to smaller values of hx. On top of that the discontinuities in the relative energy contributions in Figure 14 also point to a much smaller critical value of hx. Isn't there a more reliable way to estimate the critical point from your data?

We thank the referee for this pertinent question. The transition point of a first-order phase transition is often inferred from the energy intersection of two sweeping procedures, such as in Ref.~44 and Ref.~46, or the middle of the hysteresis loop, such as the midpoint between supercooling and superheating of a classical magnet. The underlying reason is that, at such points, the two branches are expected to have approximately the same weight density of states. We followed this protocol.

To better estimate the transition point, ideally, one shall perform finite-size scaling with several sufficiently large system sizes to extrapolate the thermodynamical result. However, this is typically not feasible for 3D quantum systems due to simulation complexity. In such a situation, one has to comprise and instead use the largest available size to approximate the critical field. The results remain suffering from finite-size effects but are sufficient for our purpose of determining the order of the phase transition.

The size of the hysteresis loop, which is identified by the collapses of the metastable branch to the stable branch, is actually more prominent for larger lattices both in Figure~13 for the checkerboard model and Figure~21 for the X-cube model. Since we are dealing with first-order phase transitions, we do not expect energy curves for different lattice sizes to cross at one point. Moreover, Figure~14 is merely used to visualize how different interactions in the Hamiltonian contribute to the total energy. It treats those interactions as independent terms, and one shall not use them to estimate the transition point.

Nevertheless, we admit that the three-digit value is beyond the precision of our data and have changed the critical field to hc0.44(1). In addition, since the magnetization hysteresis is too large for L=6,8, we mainly used energy intersection to estimate hc in the current problem, which we have clarified in the revised manuscript.

8) What is actually meant by strong first order phase transition''? Is there some other way to test the order of the phase transition? Why not use some version of Binders cumulant?

We thank the referee for this question. A strong first-order phase transition means discontinuity in thermodynamical quantities is particularly strong, which typically shows up with large hysteresis. It is associated with a small correlation length and can manifest itself at small or medium system sizes. This makes a strong first-order phase transition easier to distinguish from a continuous phase transition, compared to a weak first-order transition whose correlation length can be very large.

In principle, there are several quantities that can be used to characterize a first-order phase transition, including histogram and Binder cumulants of energy and order parameters. When ergodicity can be guaranteed, such quantities can locate the transition point more precisely than direct uses of energy and order-parameter curves. Specifically, the histogram will exhibit a double-peak behavior, while the Binder will show a diverging dip at a first-order phase transition. Hence, they are commonly used in classical Monte Carlo simulations. However, quantum algorithms, including VMC, tensor network, and also NQS, typically aim to represent individual wavefunctions in an accurate manner, instead of exploring the entire configuration space or Hilbert space. This is not ergotic; hence we cannot reliably use techniques based on histograms and Binder cumulants. In addition, while one can still use derivatives of energy and order parameters, they are more expensive and require many more samples to converge. Overall, the hysteresis diagnosis is more efficient and gives reliable results, in particularly in cases of strong first-order phase transitions.

9) The technique of splitting the network into one that encodes the phase and the other for amplitude is quite standard in the context of NQS and should be tested at least with the other methods in Figure 9

We acknowledge the valuable suggestion made by the referee regarding the network architecture. Since the Hamiltonians are stoquastic (except for finite hy magnetic fields), there is no need to explicitly model a sign structure. Furthermore, modeling the phase and amplitude by two separate neural networks is usually done when working with autoregressive architectures. In other contexts, this approach was found to be inferior, see for instance [Viteritti, Ferrari and Becca 2022, https://scipost.org/10.21468/SciPostPhys.12.5.166] In our testing, we also found that two symmetric RBMs modeling phase and amplitude separately performed worse than a single complex RBM already for the two-dimensional Toric Code. Hence, we do not expect any additional insights by including simulations using this architecture.

Request change 1. Figure 9: Discuss and demonstrate the stability of the result with respect to changing learning rate, initial parameters distribution, standard deviation and alpha. Add results for RBM with phase and amplitude encoded by separated networks and preferably also something even simpler than RBM, e.g., Jastrow. Also show results for symmetric RBM with at least as many parameters as for cRBM.

See the replies to Questions 4, 5 and 9, and also the List of changes.

Request change 2. Reevaluate or explain the validity of the estimation of the critical field as well as the significance of the hysteresis. See the replies to Questions 6 and 7.

Request change 3. Show at least some tests for transfer learning where the direction of the learning was reversed at finite fields. See the reply to Question 5.

Request change 4. For further suggestions see the report.

All suggestions from the referee have been addressed. We thank the referee again for their extensive feedback and valuable comments and believe that the revised manuscript is ready for publication in Scipost Physics.

Report #1 by Anonymous (Referee 1) on 2024-7-23 (Invited Report)

  • Cite as: Anonymous, Report on arXiv:2406.11677v1, delivered 2024-07-23, doi: 10.21468/SciPost.Report.9452

Strengths

1- clear presentation, 2- proposes a specific NQS architecture change which performs better in studied task, 3- studies a relevant physical system

Weaknesses

1- Does not demonstrate that earlier findings about phase transitions of the X-cube model and Haah’s code under uniform magnetic fields can be replicated with NQS, which would show that the analysis done with NQS is trustworthy

Report

This work studies the potential of neural quantum states (NQS) in representing 3D lattice systems with long-range entanglement, specifically the checkerboard, X-cube, and Haah's code models. The authors show that unspecialized NQS architectures, such as multilayer perceptrons and restricted Boltzmann machines, do not perform well in this task. They propose a more specialized architecture, the correlation-enhanced RBM, to address this issue. The proposed approach is tested by studying a phase transition in the checkerboard model.

I believe this work is relevant and substantial enough for publication in SciPost Physics. However, it could be improved by including an analysis of the X-cube and Haah’s code models and demonstrating that previous results obtained with other methods can be replicated using NQS. This comparison would be very interesting and would more clearly indicate the advantages or disadvantages of NQS, strengthening the claim that NQS methods can serve as a new "workhorse" in studying such systems.

Recommendation

Publish (meets expectations and criteria for this Journal)

  • validity: high
  • significance: good
  • originality: high
  • clarity: high
  • formatting: excellent
  • grammar: excellent

Author:  Marc Machaczek  on 2025-02-02  [id 5178]

(in reply to Report 1 on 2024-07-23)
Category:
remark
correction

We thank the referee for their recommendation and valuable suggestions. We
agree that we could further improve our analysis by including a comparison with the
existing results of the X-cube model and Haah’s code obtained by other methods.
Therefore, in the revised Manuscript we went for intensive, additional simulations of the X-cube model, with
a focus on hx fields and system sizes up to N = 375 qubits, as shown in Fig. 1. We
obtain a critical field hx,c ≈ 0.91, in good agreement with the previous results from
quantum Monte Carlo simulations (hx,c ≈ 0.9, Ref. 44) and from high-order series
expansions (hx,c ≈ 0.92, Ref. 46). This strengthens our claim of NQS being powerful
tools to study fracton models, which was also recognized by the referee.
We could perform the same analysis for the Haah’s code, but that would again
require resource-intensive simulations and will take a long time. Nevertheless, we
believe our original results of the checkerboard model and our reproduction of the X-
cube phase transition are sufficient to demonstrate NQS methods as a new “workhorse”
in studying such systems.

Login to report or comment