SciPost logo

SciPost Submission Page

Learning the ground state of a non-stoquastic quantum Hamiltonian in a rugged neural network landscape

by Marin Bukov, Markus Schmitt, Maxime Dupont

This Submission thread is now published as

Submission summary

Authors (as registered SciPost users): Marin Bukov · Markus Schmitt
Submission information
Preprint Link: https://arxiv.org/abs/2011.11214v2  (pdf)
Date accepted: 2021-06-15
Date submitted: 2021-04-13 08:51
Submitted by: Bukov, Marin
Submitted to: SciPost Physics
Ontological classification
Academic field: Physics
Specialties:
  • Condensed Matter Physics - Computational
Approaches: Theoretical, Computational

Abstract

Strongly interacting quantum systems described by non-stoquastic Hamiltonians exhibit rich low-temperature physics. Yet, their study poses a formidable challenge, even for state-of-the-art numerical techniques. Here, we investigate systematically the performance of a class of universal variational wave-functions based on artificial neural networks, by considering the frustrated spin-$1/2$ $J_1-J_2$ Heisenberg model on the square lattice. Focusing on neural network architectures without physics-informed input, we argue in favor of using an ansatz consisting of two decoupled real-valued networks, one for the amplitude and the other for the phase of the variational wavefunction. By introducing concrete mitigation strategies against inherent numerical instabilities in the stochastic reconfiguration algorithm we obtain a variational energy comparable to that reported recently with neural networks that incorporate knowledge about the physical system. Through a detailed analysis of the individual components of the algorithm, we conclude that the rugged nature of the energy landscape constitutes the major obstacle in finding a satisfactory approximation to the ground state wavefunction, and prevents learning the correct sign structure. In particular, we show that in the present setup the neural network expressivity and Monte Carlo sampling are not primary limiting factors.

Author comments upon resubmission

Dear Editor,

thank you for considering our manuscript for review. We have revised the manuscript taking into consideration all critique points raised by the referees.

We hope that with the revisions made, our work meets the publication criteria of SciPost.

Sincerely,
the authors

List of changes

- clarified sentences throughout the text
- added new Fig 4 to explain the occurrence of a potential instability with holomorphic neural networks, and a corresponding paragraph as requested by Ref B
- re-wrote Sec 4 to (i) clearly state its purpose in the introductory paragraph, (ii) visually simplify the text layout thru bullet points, (iii) relegated extra info to a footnote, (iv) added a new Fig as Fig 7 d upon request by Ref B and a corresponding discussion in the text, (v) formulated a clear take-home message from the section
- moved Fig 9 to Sec 5.2 as requested by Ref B
- re-organized Sec 6
- added Fig 14 to Sec 7.2 to provide a visualization to clarify the text
- added new Sec C3 in appendix
- added references as suggested by the referees, and due to private communication with colleagues working in the field
- answered to all critique points raised by the referees

Published as SciPost Phys. 10, 147 (2021)


Reports on this Submission

Report 2 by Giuseppe Carleo on 2021-5-29 (Invited Report)

Report

I thank the authors for having addressed my comments, I feel that the work is now publishable as it is.

  • validity: -
  • significance: -
  • originality: -
  • clarity: -
  • formatting: -
  • grammar: -

Anonymous Report 1 on 2021-5-9 (Invited Report)

  • Cite as: Anonymous, Report on arXiv:2011.11214v2, delivered 2021-05-09, doi: 10.21468/SciPost.Report.2890

Report

I have read the new version of the manuscript and the responses to my previous questions. I think that the presentation has been improved, even though it is not optimal. I still have a few points that are not completely clear:

1) According to the text (page 10), the Fig.2 has been obtained with a single layer NN. However, in App.H, it seems that the NN has 2 layers. Am I wrong?

2) I am not sure to understand the whole construction of Fig.4: why are we looking to z_i^(1)? By the way, how many layers are there? In addition, I do not understand the location of the poles in Fig.4: is there a way to obtain them?

3) I do not fully understand the discussion in Sec.5: according to the results of Sec.4, the 50% of the configurations not having the Marshall sign rule have a tiny contribution to the ground-state wave function. It should be possible that this is the best that can be done with the chosen number of parameters and, in order to improve the correct signs, a larger number of parameters is necessary. Could the author comment on that?

In summary, I think that the paper is interesting and could be published, after these additional points will be addressed.

  • validity: good
  • significance: good
  • originality: good
  • clarity: ok
  • formatting: good
  • grammar: perfect

Author:  Marin Bukov  on 2021-05-31  [id 1480]

(in reply to Report 1 on 2021-05-09)

1) According to the text (page 10), the Fig.2 has been obtained with a single layer NN. However, in App.H, it seems that the NN has 2 layers. Am I wrong?

  • We thank the Referee for spotting this typo in the text which we corrected. We double checked that App. H shows the correct network architectures: Fig. 2 has two layers and Fig. 3 has a single layer (as was originally stated in the caption).

2) I am not sure to understand the whole construction of Fig.4: why are we looking to z_i^(1)?

  • The pre-activations z_i^(l) are the values that go into the non-linear function (or its derivative during back-propagation). Therefore, if any z_i^(l) lies close enough to a pole of the non-linearity, the algorithm blows up. Here, any z_i^(l) means that it is sufficient to have a single input configuration in the sample for which a single pre-activation in the network lies close to a pole. In Fig. 4 we show the distribution of pre-activation values of just one of the network layers (therefore, z_i^(1)) in the complex plane for a sample of 1000 input configurations. We find that the distribution spreads out as training progresses, but it remains dense, meaning that poles are inevitably encountered eventually. We reformulated the discussion of Fig. 4 at the top of page 12 to better clarify this point.

By the way, how many layers are there?

  • As we state in the caption of Fig. 4, this is the same network as in Fig 2. In other words, there are two dense layers.

In addition, I do not understand the location of the poles in Fig4: is there a way to obtain them?

  • The locations indicated in Fig. 4 are the first poles of the non-linearity log(cosh(z)) at z=\pm\pi/2. This is stated in the discussion at the top of page 12. Note that there are more poles on the y-axis which fall outside its range. Our point is that any non-constant holomorphic layer will necessarily exhibit a pole which, if hit, will cause instability.

3) I do not fully understand the discussion in Sec.5: according to the results of Sec.4, the 50% of the configurations not having the Marshall sign rule have a tiny contribution to the ground-state wave function.

  • We certify that this is correct.

It should be possible that this is the best that can be done with the chosen number of parameters and, in order to improve the correct signs, a larger number of parameters is necessary. Could the author comment on that?

  • Such a case is indeed plausible, but we rule it out in Fig 11, which demonstrates numerically that the optimization dynamics gets stuck in a saddle, not a minimum: there is more room for improvement within the current architecture. As we discuss in the main text, the existence of the saddle is not a proof that the network architecture is expressive enough to capture the global minimum, but it is good enough to prove that there exists a solution within the current architecture which will have lower energy than the present saddle.

On a separate note, notice that the number of network parameters cannot be enlarged indefinitely; from a given critical value onwards (about several thousand, depending on the computing architecture) computing the S-matrix becomes infeasible.

Login to report or comment