SciPost Submission Page
Quantum-Inspired Tempering for Ground State Approximation using Artificial Neural Networks
by Tameem Albash, Conor Smith, Quinn Campbell, Andrew D. Baczewski
This is not the latest submitted version.
This Submission thread is now published as
Submission summary
Authors (as registered SciPost users): | Tameem Albash |
Submission information | |
---|---|
Preprint Link: | https://arxiv.org/abs/2210.11405v2 (pdf) |
Date submitted: | 2022-11-17 14:05 |
Submitted by: | Albash, Tameem |
Submitted to: | SciPost Physics |
Ontological classification | |
---|---|
Academic field: | Physics |
Specialties: |
|
Approaches: | Theoretical, Computational |
Abstract
A large body of work has demonstrated that parameterized artificial neural networks (ANNs) can efficiently describe ground states of numerous interesting quantum many-body Hamiltonians. However, the standard variational algorithms used to update or train the ANN parameters can get trapped in local minima, especially for frustrated systems and even if the representation is sufficiently expressive. We propose a parallel tempering method that facilitates escape from such local minima. This methods involves training multiple ANNs independently, with each simulation governed by a Hamiltonian with a different "driver" strength, in analogy to quantum parallel tempering, and it incorporates an update step into the training that allows for the exchange of neighboring ANN configurations. We study instances from two classes of Hamiltonians to demonstrate the utility of our approach. The first instance is based on a permutation-invariant Hamiltonian whose landscape stymies the standard training algorithm by drawing it increasingly to a false local minimum. The second instance is four hydrogen atoms arranged in a rectangle, which is an instance of the second quantized electronic structure Hamiltonian discretized using Gaussian basis functions. We study this problem in a minimal basis set, which exhibits false minima that can trap the standard variational algorithm despite the problem's small size. We show that augmenting the training with quantum parallel tempering becomes useful to finding good approximations to the ground states of these problem instances.
Current status:
Reports on this Submission
Report #2 by Anonymous (Referee 3) on 2022-12-21 (Invited Report)
- Cite as: Anonymous, Report on arXiv:2210.11405v2, delivered 2022-12-21, doi: 10.21468/SciPost.Report.6362
Strengths
1- Introduction of a new and potentially widely applicable approach to escape local minima in variational Monte Carlo
2- Systematic benchmarking of the new approach that demonstrates its merit
3- Very clear and structured presentation
Weaknesses
1- Study restricted to system sizes where exact methods are applicable
2- Role of the choice of the replica swap probability remains unclear
3- Source code not available
Report
The authors address the issue of getting stuck in local minima that arises in variational ground state searches with neural network ansatz functions. In order to remedy this problem, they introduce a new optimization method inspired by quantum parallel tempering: Multiple replicas of the ansatz are optimized with varying admixtures of a "mixer Hamiltonian" and the replicas are regularly swapped during the procedure. Using two model Hamiltonians, where it is known that finding ground states is difficult due to false minima, they demonstrate that the parallel tempering approach can have a clear advantage over conventional Stochastic Reconfiguration.
As neural network quantum states become more and more widely used, this work addresses a very timely topic and it introduces an original and promising new component to the toolbox. This tool can be used by follow-up works as presented in the manuscript, but, as the authors point out, also its better understanding and further optimization opens different directions for future work. Thereby, the manuscript meets the expectations for acceptance in SciPost Physics.
The paper also meets the general acceptance criteria 1-4 and 6. Regarding criterion 5 (reproducibility), the manuscript seems to contain all necessary information. But, ideally, the used source code should be made available as it is common practice in the machine learning community.
The presented study is very well organized, the data is convincing, and the presentation is clear. There are only two points of criticism: First, the work is restricted to small system sizes and tailored model systems, where exact solutions are available. It would be interesting to see whether the parallel tempering approach yields an advantage in a larger system of physical interest, e.g. the J1-J2 model that has become a common testing ground in the community. Second, the replica swap probability used by the authors is motivated by physical intuition, but the choice is still arbitrary. A comparison with some trivial choice for the swap probability, e.g. uniform, could give an idea how large the influence of the choice is on the result.
Finally, one suggestion for first additional insights into the efficiency of the parallel tempering would be to look at trajectories of individual replicas during the optimization. This would reveal how often replicas are swapped and whether they are passed all the way between the target and the mixer Hamiltonian.
In my opinion, the manuscript is in good shape to be published in SciPost Physics after a minor revision.
Requested changes
1- Formatting needs to be changed to SciPost template.
2- Create a public repository containing the source code.
3- Consider the two main points of criticism mentioned above for a revised version.
4- Although it appeared only two days earlier on the preprint server, I would suggest to add also https://arxiv.org/abs/2211.07749 in the introduction besides Refs [16,17], because it constitutes a significant step forward beyond Ref. [16].
Report #1 by Pranay Patil (Referee 1) on 2022-12-18 (Invited Report)
- Cite as: Pranay Patil, Report on arXiv:2210.11405v2, delivered 2022-12-18, doi: 10.21468/SciPost.Report.6328
Strengths
1-Addresses an important aspect of the variational approach to quantum many-body systems. The topic discussed is of broad interest for adiabatic quantum computing and statistical mechanics communities.
2-The method developed is general and can be applied to a wide variety of optimization problems, and thus has a significant scope of application.
3-The paper is presented in a pedagogical way, with a clear flow of content, and appropriate relegation of details and derivations to appendices. The overarching theme is easy to understand.
Weaknesses
1-Given that the parallel tempering apparatus is the heart of the method, detailed information about its performance is required. This has not been included by the authors. A suggestion for this would be studies of $E_r(t)$ and $\sigma_r(t)$ throughout the training period. Additional discussions about the performance of the training for all replicas is also beneficial, as it helps provide an intuition of the optimization process.
2-The text below Eq.10 is lacking in clarity (I believe you mean "over 10 updates of SR by $\bar{O}$" and not $\langle\bar{O}\rangle$). The presentation here would benefit from a discussion about the performance when using just $O$, instead of the average. This also helps with a better understanding of Eq.12 .
3-The discussion below Eq.14 would benefit from an example illustrating the procedure of calculating the probabilities associated with $u$. This is not completely necessary as it may be out of the scope of this work, but a quick illustration in terms of the classical Ising chain as a target Hamiltonian would be quite illuminating for readers who are amateurs in this area.
4-Four figures mention resampling using bootstrap. A brief explanation of this procedure before Sec.3 would add valuable clarity to the numerical results.
Report
The manuscript discusses a method of solving generic systems of constraints using variational states. This is done by building a target Hamiltonian whose ground state encodes for the solution with least number of broken constraints, and learning the desired state by optimizing the parameters of a variational wavefunction. Although the general technique is well established, the machinery often does not work well in practice for problems of interest due to a lack of powerful optimization protocols. The authors attempt to resolve this problem by adopting the technique of parallel tempering, which is borrowed from the field of Monte Carlo simulations of spin glasses. They expand the the system of interest to a set of Hamiltonians, which interpolate between the target Hamiltonian, and one which has an easy to train solution. This allows a high degree of exploration in the wavefunction space, thus leading to a higher chance of hitting the true ground state. They illustrate the performance of the algorithm using two representative problems, the first of which is chosen specifically to stymie the traditional version, and a second which is a toy problem from quantum chemistry.
The presentation is clear and concise, and all the included content is relevant for the point which the authors wish to make. Given that there is no constraint on page length, the manuscript would benefit from a more detailed description of the parallel tempering method, and from a short example which illustrates the changes in energy for each replica as a function of simulation time.
The manuscript meets all of the general acceptance criteria, and meets the expectation of "Open a new pathway in an existing or a new research direction, with clear potential for multipronged follow-up work". The method is of interest to the adiabatic quantum computing community as a whole, and one can expect that many researchers in the field will attempt to build on it.
Requested changes
1-Given that the entire manuscript deals with an RBM architecture, it is essential that this be mentioned in the abstract.
2-Typo: Above Eq.8, "linearly" twice
3-An intuition about the performance of the method can be provided by studying the behavior of the energy and variance during simulation, this will help the general reader understand the technique better.
4-The discussion below Eq.14 is missing the definition of $n$, and the description of selecting the appropriate $u$ is too terse. If possible, a short example (maybe the classical Ising chain) can be included.
5-The resampling procedure using bootstrap is not described in the main text, this should be included as a short note.
6-A brief discussion about the effect of the number of replicas and how much the ground states of various replicas differ, and how this affects the performance should be included. Perhaps this is best done in Sec. 3A using the language of the precipice problem.
7-After Eq.15, one should use $H_T(s)$ instead of $H(s)$.
8-In the conclusion section, the authors equate stoquasticity with ease of representation. This should be discussed in a bit more detail, as it is not immediately apparent why this is so from the RBM architecture, which is built in terms of complex parameters.
9-In Eq.A6, $S_{k,k^{\prime}}$ instead of $s_{k,k^{\prime}}$.