SciPost Submission Page
Quantum-Inspired Tempering for Ground State Approximation using Artificial Neural Networks
by Tameem Albash, Conor Smith, Quinn Campbell, Andrew D. Baczewski
This is not the latest submitted version.
Submission summary
| Authors (as registered SciPost users): | Tameem Albash |
| Submission information | |
|---|---|
| Preprint Link: | https://arxiv.org/abs/2210.11405v2 (pdf) |
| Date submitted: | Nov. 17, 2022, 2:05 p.m. |
| Submitted by: | Tameem Albash |
| Submitted to: | SciPost Physics |
| Ontological classification | |
|---|---|
| Academic field: | Physics |
| Specialties: |
|
| Approaches: | Theoretical, Computational |
Abstract
A large body of work has demonstrated that parameterized artificial neural networks (ANNs) can efficiently describe ground states of numerous interesting quantum many-body Hamiltonians. However, the standard variational algorithms used to update or train the ANN parameters can get trapped in local minima, especially for frustrated systems and even if the representation is sufficiently expressive. We propose a parallel tempering method that facilitates escape from such local minima. This methods involves training multiple ANNs independently, with each simulation governed by a Hamiltonian with a different "driver" strength, in analogy to quantum parallel tempering, and it incorporates an update step into the training that allows for the exchange of neighboring ANN configurations. We study instances from two classes of Hamiltonians to demonstrate the utility of our approach. The first instance is based on a permutation-invariant Hamiltonian whose landscape stymies the standard training algorithm by drawing it increasingly to a false local minimum. The second instance is four hydrogen atoms arranged in a rectangle, which is an instance of the second quantized electronic structure Hamiltonian discretized using Gaussian basis functions. We study this problem in a minimal basis set, which exhibits false minima that can trap the standard variational algorithm despite the problem's small size. We show that augmenting the training with quantum parallel tempering becomes useful to finding good approximations to the ground states of these problem instances.
Current status:
Reports on this Submission
Report #2 by Anonymous (Referee 2) on 2022-12-21 (Invited Report)
- Cite as: Anonymous, Report on arXiv:2210.11405v2, delivered 2022-12-21, doi: 10.21468/SciPost.Report.6362
Strengths
1- Introduction of a new and potentially widely applicable approach to escape local minima in variational Monte Carlo 2- Systematic benchmarking of the new approach that demonstrates its merit 3- Very clear and structured presentation
Weaknesses
1- Study restricted to system sizes where exact methods are applicable 2- Role of the choice of the replica swap probability remains unclear 3- Source code not available
Report
As neural network quantum states become more and more widely used, this work addresses a very timely topic and it introduces an original and promising new component to the toolbox. This tool can be used by follow-up works as presented in the manuscript, but, as the authors point out, also its better understanding and further optimization opens different directions for future work. Thereby, the manuscript meets the expectations for acceptance in SciPost Physics.
The paper also meets the general acceptance criteria 1-4 and 6. Regarding criterion 5 (reproducibility), the manuscript seems to contain all necessary information. But, ideally, the used source code should be made available as it is common practice in the machine learning community.
The presented study is very well organized, the data is convincing, and the presentation is clear. There are only two points of criticism: First, the work is restricted to small system sizes and tailored model systems, where exact solutions are available. It would be interesting to see whether the parallel tempering approach yields an advantage in a larger system of physical interest, e.g. the J1-J2 model that has become a common testing ground in the community. Second, the replica swap probability used by the authors is motivated by physical intuition, but the choice is still arbitrary. A comparison with some trivial choice for the swap probability, e.g. uniform, could give an idea how large the influence of the choice is on the result.
Finally, one suggestion for first additional insights into the efficiency of the parallel tempering would be to look at trajectories of individual replicas during the optimization. This would reveal how often replicas are swapped and whether they are passed all the way between the target and the mixer Hamiltonian.
In my opinion, the manuscript is in good shape to be published in SciPost Physics after a minor revision.
Requested changes
1- Formatting needs to be changed to SciPost template. 2- Create a public repository containing the source code. 3- Consider the two main points of criticism mentioned above for a revised version. 4- Although it appeared only two days earlier on the preprint server, I would suggest to add also https://arxiv.org/abs/2211.07749 in the introduction besides Refs [16,17], because it constitutes a significant step forward beyond Ref. [16].
Report #1 by Pranay Patil (Referee 1) on 2022-12-18 (Invited Report)
- Cite as: Pranay Patil, Report on arXiv:2210.11405v2, delivered 2022-12-18, doi: 10.21468/SciPost.Report.6328
Strengths
2-The method developed is general and can be applied to a wide variety of optimization problems, and thus has a significant scope of application.
3-The paper is presented in a pedagogical way, with a clear flow of content, and appropriate relegation of details and derivations to appendices. The overarching theme is easy to understand.
Weaknesses
2-The text below Eq.10 is lacking in clarity (I believe you mean "over 10 updates of SR by $\bar{O}$" and not $\langle\bar{O}\rangle$). The presentation here would benefit from a discussion about the performance when using just $O$, instead of the average. This also helps with a better understanding of Eq.12 .
3-The discussion below Eq.14 would benefit from an example illustrating the procedure of calculating the probabilities associated with $u$. This is not completely necessary as it may be out of the scope of this work, but a quick illustration in terms of the classical Ising chain as a target Hamiltonian would be quite illuminating for readers who are amateurs in this area.
4-Four figures mention resampling using bootstrap. A brief explanation of this procedure before Sec.3 would add valuable clarity to the numerical results.
Report
The presentation is clear and concise, and all the included content is relevant for the point which the authors wish to make. Given that there is no constraint on page length, the manuscript would benefit from a more detailed description of the parallel tempering method, and from a short example which illustrates the changes in energy for each replica as a function of simulation time.
The manuscript meets all of the general acceptance criteria, and meets the expectation of "Open a new pathway in an existing or a new research direction, with clear potential for multipronged follow-up work". The method is of interest to the adiabatic quantum computing community as a whole, and one can expect that many researchers in the field will attempt to build on it.
Requested changes
1-Given that the entire manuscript deals with an RBM architecture, it is essential that this be mentioned in the abstract.
2-Typo: Above Eq.8, "linearly" twice
3-An intuition about the performance of the method can be provided by studying the behavior of the energy and variance during simulation, this will help the general reader understand the technique better.
4-The discussion below Eq.14 is missing the definition of $n$, and the description of selecting the appropriate $u$ is too terse. If possible, a short example (maybe the classical Ising chain) can be included.
5-The resampling procedure using bootstrap is not described in the main text, this should be included as a short note.
6-A brief discussion about the effect of the number of replicas and how much the ground states of various replicas differ, and how this affects the performance should be included. Perhaps this is best done in Sec. 3A using the language of the precipice problem.
7-After Eq.15, one should use $H_T(s)$ instead of $H(s)$.
8-In the conclusion section, the authors equate stoquasticity with ease of representation. This should be discussed in a bit more detail, as it is not immediately apparent why this is so from the RBM architecture, which is built in terms of complex parameters.
9-In Eq.A6, $S_{k,k^{\prime}}$ instead of $s_{k,k^{\prime}}$.
