SciPost Submission Page
Learning the ground state of a non-stoquastic quantum Hamiltonian in a rugged neural network landscape
by Marin Bukov, Markus Schmitt, Maxime Dupont
This is not the latest submitted version.
This Submission thread is now published as
|Authors (as Contributors):||Marin Bukov · Markus Schmitt|
|Arxiv Link:||https://arxiv.org/abs/2011.11214v1 (pdf)|
|Date submitted:||2020-11-27 10:59|
|Submitted by:||Bukov, Marin|
|Submitted to:||SciPost Physics|
Strongly interacting quantum systems described by non-stoquastic Hamiltonians exhibit rich low-temperature physics. Yet, their study poses a formidable challenge, even for state-of-the-art numerical techniques. Here, we investigate systematically the performance of a class of universal variational wave-functions based on artificial neural networks, by considering the frustrated spin-$1/2$ $J_1-J_2$ Heisenberg model on the square lattice. Focusing on neural network architectures without physics-informed input, we argue in favor of using an ansatz consisting of two decoupled real-valued networks, one for the amplitude and the other for the phase of the variational wavefunction. By introducing concrete mitigation strategies against inherent numerical instabilities in the stochastic reconfiguration algorithm we obtain a variational energy comparable to that reported recently with neural networks that incorporate knowledge about the physical system. Through a detailed analysis of the individual components of the algorithm, we conclude that the rugged nature of the energy landscape constitutes the major obstacle in finding a satisfactory approximation to the ground state wavefunction, and prevents learning the correct sign structure. In particular, we show that in the present setup the neural network expressivity and Monte Carlo sampling are not primary limiting factors.
Submission & Refereeing History
You are currently on this page
Reports on this Submission
Report 2 by Giuseppe Carleo on 2021-2-9 (Invited Report)
- Cite as: Giuseppe Carleo, Report on arXiv:2011.11214v1, delivered 2021-02-09, doi: 10.21468/SciPost.Report.2523
Clear and detailed account of numerical experiments
Innovative techniques to improve optimization of neural quantum states
Insight in numerical instability
None to report
This work is a very interesting study of frustrated models with neural quantum states, and highlights some very important technical aspects and challenges of these variational studies.
The paper is very well written and it is a trove of precious technical details and novel parameterization strategies that will help the community mover forward in this fast-developing field.
I do not have any "blocking" remarks and I recommend to publish the paper, essentially, as it is. I just have a few optional remarks the authors might want to address:
1. Recent works (starting from https://journals.aps.org/prresearch/abstract/10.1103/PhysRevResearch.2.033075 ) have shown that separately optimizing the sign and the amplitudes is beneficial in getting robust results (see also this recent work with an alternated optimization strategy https://arxiv.org/abs/2101.08787 ). In this regard, I find the "partial learning" experiments done in this work quite interesting but also hard to understand (at least , for me!). What is the neural network architecture/ size used in the partial learning? Maybe it is written in the text but I could not find it. If the exact sign structure is taken, is there a significative dependence on the network size?
2. Something I believe it would be nice to discuss more is the role of symmetries in improving the optimization landscape. For example, it is quite remarkable that a very simple, shallow RBM (purely holomorphic!) has obtained significantly improved results over the CNN (and DMRG) at the infamous J_2/J_1=0.5 https://iopscience.iop.org/article/10.1088/1361-648X/abe268/meta . Do the authors have insight on why this might be the case, in light also of their more general understanding?
3. A very small remark I also already privately mailed to the authors is about the remark in the conclusion "Directly utilizing the Hessian matrix appears straightforward, but it is, unfortunately, prohibitively expensive.". I am not sure about these conclusions, because one can use the same kind of tricks that are used to apply the quantum geometric tensor (S matrix) on the gradients, without forming the actual matrix. So I believe it should be possible to have a relatively decent scaling also for the Hessian. I would maybe leave this a bit more open as it is currently written.
Anonymous Report 1 on 2021-1-16 (Invited Report)
- Cite as: Anonymous, Report on arXiv:2011.11214v1, delivered 2021-01-16, doi: 10.21468/SciPost.Report.2417
1- discuss important issues on neural-network wave functions, which are often used to study strongly-correlated systems.
1- the presentation is confused and it is not easy to follow the logic of the presentation.
In this work Bukov and collaborators assess the accuracy of neural-network wave functions for the frustrated J1-J2 model on the square lattice.
I think that the paper is quite hard to read and thread of the conversation is not very linear. The general remark is that several different problems are shown, but the reader may be lost in all details.
I will try to make some suggestion to improve the presentation, possibly cutting/rearranging the discussion.
1) Section 3: I do not really understand why the discussion of the complex (holomorphic) state is so long, since, at the end of the day, it does not give sensible results. Still, I do not really understand the reason why there are numerical instabilities. The fact that a complex function must diverge somewhere in the complex plan does not convince me too much. Anyway, since the parametrization with two real functions is eventually more stable, I suggest to cut the part on the complex wave function, just showing the most important results.
2) Section 4: This section looks useful, even though I would have preferred something a bit more quantitative. For the case they show, the exact sign is almost given by the Marshall rule, but what is the relevance of configurations violating the Marshall sign? Namely, what is the amplitude of these configurations (exact vs variational ones)? I am not sure about the real lesson that we learn from this section.
3) Sections 5 and 6: I think that this is most confusign part. For example, I would have discussed the "full basis simulation" in tight connection with "Varying the size of the Monte Carlo fluctuations", since I think they are directly related.
My main concern is about the fact that the machine precision may really affect the results. In my experience, this suggests that there is some problem in the numerical code. Also the fact that different random number sequences may affect the results is very suspicious. There is no a very convincing discussion about that.
4) Section 7: I would just shorten the discussion.
Now, let me give a general comment: all the paper is about small lattices (e.g., 4x4 and 6x6), for which exact diagonalizations are possible. What is the usefulness of these neural-network states if 1) they provide more or less the same accuracy of much simpler states (e.g., Gutzwiller-projected fermionic states) and 2) calculations are limited to small systems?
A minor point: it would be better to specify the value of J_2/J_ 1 along the text and in the captions of the figures.
In summary, I would strongly suggest a deep revision of the paper, which should make the presentation linear.