SciPost logo

SciPost Submission Page

On the descriptive power of Neural-Networks as constrained Tensor Networks with exponentially large bond dimension

by Mario Collura, Luca Dell'Anna, Timo Felser, Simone Montangero

This Submission thread is now published as

Submission summary

Authors (as registered SciPost users): Mario Collura · Luca Dell'Anna · Timo Felser
Submission information
Preprint Link: https://arxiv.org/abs/1905.11351v4  (pdf)
Date accepted: 2021-01-29
Date submitted: 2020-09-29 10:59
Submitted by: Collura, Mario
Submitted to: SciPost Physics Core
Ontological classification
Academic field: Physics
Specialties:
  • Atomic, Molecular and Optical Physics - Theory
  • Quantum Physics
Approaches: Theoretical, Computational

Abstract

In many cases, Neural networks can be mapped into tensor networks with an exponentially large bond dimension. Here, we compare different sub-classes of neural network states, with their mapped tensor network counterpart for studying the ground state of short-range Hamiltonians. We show that when mapping a neural network, the resulting tensor network is highly constrained and thus the neural network states do in general not deliver the naive expected drastic improvement against the state-of-the-art tensor network methods. We explicitly show this result in two paradigmatic examples, the 1D ferromagnetic Ising model and the 2D antiferromagnetic Heisenberg model, addressing the lack of a detailed comparison of the expressiveness of these increasingly popular, variational ans\"atze.

Author comments upon resubmission

REPLY TO REFEREE 1
WEAKNESS:
We thank the referee for pointing out these weaknesses when comparing different variational methods. However, one central message of the manuscript is indeed making the scientific community aware of the difficulties arising when naively comparing these ansätze. In particular, many references in this scientific community compare the neural network states to exponential large Tensor networks without performing fundamental analysis of the actual descriptive power of those methods. Thus, with our manuscript, we address the lack of a detailed comparison of the expressiveness of these increasingly popular, variational ansätze. Therefore, we are very happy and see it as a strong point of our manuscript that the referee successfully pointed out these difficulties in comparing the representations which can start a proper discussion of a more detailed comparison.

In particular, for point 1) we stress that often the number of variational parameters is far from being a proper indicator for the accuracy in a proper comparison. For instance, in the AKLT model, the MPS is exact with bond dimension 2. However, the same model would require $O(N^2)$ hidden units in the RBM. On the other hand, an RBM is well suited for representing the Laughlin wave function with $N(N-1)/2$ hidden units in contrast to the here actually required exponentially large bond dimension for tensor networks. Thus, concluding, we are not convinced that fixing the number of variational parameters in general leads to a fair comparison.
For point 2), we point out that the different optimisation techniques used in the comparison are state-of-the-art for each of the variational ansätze. Thus, for each ansatz, the chosen optimisation technique delivers state-of-the-art results. We are convinced that a fair comparison should be exactly this: Comparing the best possible outcomes of both approaches.

Given that we explicitly state the procedure on how we perform the analysis, we are convinced that the results (i) can be used for a proper comparison and (ii) most certainly can help to encourage a scientific discussion about the differences of the methods and thereby inspire novel insights on the yet unknown connections between the approaches.

For point 3), We thank the referee for pointing out the slightly misleading parts in the manuscript. For the sake of clarity, we changed critical parts in the abstract, introduction and minor parts in the conclusion.

REPORT:
Point 1) and 2): We thank the referee for pointing out these issues and refer to the answer above.

Moreover, We thank the referee for pointing out the slightly misleading parts in the manuscript. For the sake of clarity, we changed critical parts in the abstract, introduction, main text and minor parts in the conclusion, see the revised version of the manuscript.

Finally, We thank the referee for pointing the scientific importance of our manuscript and would like to inform that we performed a proper revision based on the valuable comments and suggestions of both referees. As both referees suggested, we will transfer our manuscript for publication in SciPost Physics Core.



REPLY TO REFEREE 2
We thank the referee for pointing this strength of our manuscript and would like to point out that the mentioned comparison is not only performed in 1D with an MPS against NNs, but as a further strong point of our manuscript in 2D with a TTN against NNs as well.

REPORT:
We thank the referee for pointing out the issue when comparing different variational methods. However, we point out that often the number of variational parameters is far from being a good indicator for the accuracy in a proper comparison. For instance, in the AKLT model, the MPS is exact with bond dimension 2. However, the same model would require O(N^2) hidden units in the RBM. On the other hand, an RBM is well suited for representing the Laughlin wave function with N(N-1)/2 hidden units in contrast to the here actually required exponentially large bond dimension for tensor networks. Thus, concluding, we are not convinced that fixing the number of variational parameters, in general, is a completely appropriate metric either for a fair comparison. Thus, finding a completely fair way of comparing the methods remains an open question as we point out in our manuscript.

Given that we explicitly state the procedure on how we perform the analysis, we are convinced that the results (i) can be used for a proper comparison and (ii) most certainly can help to encourage a scientific discussion about the differences of the methods and thereby inspire novel insights on the yet unknown connections between the approaches.

While we agree with the statement that a different choice of solver might in principle alter the outcome, we point out that the different optimisation techniques used in the comparison are state-of-the-art for each of the variational ansätze. Thus, for each ansatz, the chosen optimisation technique delivers state-of-the-art results. We are convinced that a fair comparison should be exactly this: Comparing the best possible outcomes of both approaches. Moreover, the referee’s critique is possible for any numerical minimization result, however, this does not prevent the scientific community to build new physics on those possibly flawed results.

We thank the referee for pointing out the parts to be addressed for more clarity and transparency. Thus, for the sake of clarity, we changed critical parts in the abstract, introduction, main text and minor parts in the conclusion. We are confident that our manuscript has been improved and is now more transparent in our comparison and its limitations. Thus, we thank the referees for their support in improving our manuscript.

The formulation used to describe the CPU time was misleading for the reader. The main message was that the coMPS is more efficient in calculating the expectation values (which is a part of the optimisation). The fact that they are based on matrix-matrix-multiplications was independent of the above statement and used to clarify that the required resources scale with O(\chi^3).
However, we realized that both massages are already mentioned and we chose to remove them to reduce redundancy.

We do not compare the expectations values of the AFM Heisenberg model in the thermodynamic limit. However, we compare our results with Monte Carlo simulations of the exact same finite size systems (for L=8 and L=10) with the same boundary conditions. The paper, in which these results were published aims to calculate the expectations values of the AFM Heisenberg model in the thermodynamic limit by extrapolating the results for finite system sizes L={4,6,8,10,12,14,16}. In both cases, for L= 8 and L= 10, the errors of the Quantum Monte Carlo results are below 1E−5 and therefore negligible compared to both, the TTN as well as the RBM, which we now state in our manuscript.

We changed the figure 6 accordingly and thank the referee for his/her comment.

We improved the wording for the comparison between RBM and TTN, i.e., we now explicitly state that our findings apply to the models we analysed, but that the generalisation of our statements remains an open question, as the outcome for the different method might be highly model specific.

The referee is correct, that additional symmetry constrains can also be implemented in the method of Ref. 13. However, while encoding symmetries in RBMs will indeed improve the connected correlations since they enforce $\langle\sigma^\gamma_j\rangle= 0$, as well as $\ langle\sigma^\gamma_i \sigma^\gamma_j\ rangle $ to be equivalent independent on $\gamma$, the actual computational gain is higher in the case of a TTN. Thus, for RBMs, we would expect to gain a higher precision in the final results when exploiting the $SU(2)$ symmetry as a result of the improved correlations. However, we would not expect a dramatic increase in the computational time in contrast to the case of the TTN. Here we explicitly reduce the effective bond dimension by changing the basis to only work within the symmetry multiplet space. This drastically decreases the computational resources required which in return enables the TTN to achieve higher bond dimensions and thereby to further increase the accuracy additionally to the gains resulting from the enforced expectation values.
We added a paragraph to clarify this benefit as indeed this message might not have been clear in our prior version of the manuscript.

We added a statement on page 7 to clarify that the coMPS approach drastically improves the evaluation of expectation values only and not the optimisation itself.

Finally, We thank the referee for pointing the scientific importance of our manuscript: we performed a proper revision based on the valuable comments and suggestions of both referees. In conclusion, following the referees suggestion, we are going to submit our manuscript for publication in SciPost Physics Core.

Published as SciPost Phys. Core 4, 001 (2021)


Reports on this Submission

Report #1 by Anonymous (Referee 3) on 2021-1-13 (Invited Report)

  • Cite as: Anonymous, Report on arXiv:1905.11351v4, delivered 2021-01-13, doi: 10.21468/SciPost.Report.2401

Strengths

1- honest evaluation of the expressiveness of different variational states together with their appropriate optimization approach

2- Interesting mapping from

Weaknesses

2- the use of different optimization approaches entangles expressiveness with optimizability

Report

The authors have sufficiently addressed the concerns I raised.

There are not many works seriously comparing the descriptive power of neural networks with other variational ansatz. Even though the use of different optimization approaches makes the comparison slightly more obscure,
I still do believe that the work deserves a publication in SciPost Physics Core.

  • validity: good
  • significance: good
  • originality: good
  • clarity: high
  • formatting: good
  • grammar: excellent

Login to report or comment