SciPost Submission Page
Residual Matrix Product State for Machine Learning
by Ye-Ming Meng, Jing Zhang, Peng Zhang, Chao Gao, Shi-Ju Ran
This is not the latest submitted version.
This Submission thread is now published as
Submission summary
Authors (as registered SciPost users): | Chao Gao · Yeming Meng |
Submission information | |
---|---|
Preprint Link: | scipost_202209_00017v1 (pdf) |
Code repository: | https://github.com/YemingMeng/ResMPS |
Date submitted: | 2022-09-09 04:00 |
Submitted by: | Meng, Yeming |
Submitted to: | SciPost Physics |
Ontological classification | |
---|---|
Academic field: | Physics |
Specialties: |
|
Approach: | Computational |
Abstract
Tensor network, which originates from quantum physics, is emerging as an efficient tool for classical and quantum machine learning. Nevertheless, there still exists a considerable accuracy gap between tensor network and the sophisticated neural network models for classical machine learning. In this work, we combine the ideas of matrix product state (MPS), the simplest tensor network structure, and residual neural network and propose the residual matrix product state (ResMPS). The ResMPS can be treated as a network where its layers map the "hidden" features to the outputs (e.g., classifications), and the variational parameters of the layers are the functions of the features of the samples (e.g., pixels of images). This is different from neural network, where the layers map feed-forwardly the features to the output. The ResMPS can equip with the non-linear activations and dropout layers, and outperforms the state-of-the-art tensor network models in terms of efficiency, stability, and representation power. Besides, ResMPS is interpretable from the perspective of polynomial expansion, where the factorization and exponential machines naturally emerge. Our work contributes to connecting and hybridizing neural and tensor networks, which is crucial to further enhance our understanding of the working mechanisms and improve the performance of both models.
Author comments upon resubmission
Detailed answers to all issues raised by the referees have been submitted separately as replies.
List of changes
1. Lines 76-81. We added a description of the structure of the whole paper.
2. Lines 97-101. We rewrote these sentences to make them clearer.
3. Lines 107-111. We added descriptions of an output linear layer that maps the final hidden $\hat{h}^{[N]}$ to classifications.
4. Lines 127-131 & Fig.3. We added visualization and descriptions of t-SNE intermediate data.
5. Lines 189-191. We added more formulas to state the equivalence of sResMPS and regular MPS.
6. Lines 222-242. We added Sec. 2.5 to compare ResMPS, RNN, and transformer network.
7. Appendix A. Detailed training information which helps to reproduce our result.
8. We uploaded codes to GitHub repository (see [YemingMeng/ResMPS](https://github.com/YemingMeng/ResMPS)).
9. Appendix B. Comparison of different paths which map 2D figures to 1D sequence.
10. Appendix C & Tab. 2. Additional benchmarks 1 that compare exhaustive combinations of different components (channels, activation and dropout).
11. Appendix C. Additional benchmark 2 that compare performance under small virtual feature dimension $\chi<10$.
12. Several typos were fixed.
Current status:
Reports on this Submission
Report #1 by Anonymous (Referee 3) on 2022-10-7 (Invited Report)
- Cite as: Anonymous, Report on arXiv:scipost_202209_00017v1, delivered 2022-10-07, doi: 10.21468/SciPost.Report.5846
Strengths
See previous report.
Weaknesses
1- Conclusions drawn from the experiments on expressivity (Section 2.4.3 and Appendix C) are not supported by the data.
Report
The revised version of the manuscript is substantially improved over the previous one. In particular, the three weaknesses pointed out in my previous report were addressed and resolved.
However, the new data generated in response to my previous requested change 9) reveals another issue that touches also the claim made in the reply that "some physical concepts fails in machine learning MPS", in particular "the failure of bound dimension $\chi$ to measure the representation ability". The data in Appendix C show that the expressivity of the model is systematically improved as $\chi$ is increased. This is exactly what one expects from a parameter that controls the representation ability. Moreover, Appendix C reveals, that the performance does not increase further beyond $\chi=6$. The data shown in Fig. 4 were produced with much larger dimensions up to $\chi=40$. This corresponds to an increase of the number of parameters by two orders of magnitude. Therefore, it does not seem particularly contradictory to me that Fig. 4b) shows that one can again reduce the number of parameters by two orders of magnitude using random elimination without any effect. Clearly both ways of varying the model size are inequivalent, but I do not think that the data supports the claim that the bond dimension fails to be a measure for the representative power. In that regard, I find the discussion in Section 2.4.3 quite misleading.
Furthermore, I still think that a more machine learning oriented journal would be a better fit for the manuscript, because it has hardly any relation to physics, but this should be an editorial decision.
Overall, the manuscript will meet the acceptance criteria of SciPost Physics, if the requested changes below are addressed.
Requested changes
1- Reconsider the discussion in 2.4.3 according to my comment above.
2- Line 182: The last sentence of 2.4.1 is misleading. The data shows no evidence based on which one could say that "It seems that the ResMPS models eventually surpass ResNet...". Since it is a speculation and the same thought is repeated in the Discussion, I would suggest to remove the sentence from 2.4.1.
3- Line 48: "TN itself represents a linear map between quantum states." This formulation seems wrong. Tensor networks are not a map between quantum states. They do not map one state to another as this formulation implies. Please clarify.
4- Line 107: Should the sentence read "... the hidden variable is *not* equal to the dimension of the label index..."? If the two were equal, one could directly use the hidden state as output.
5- Line 122: "sResMPS" appears before it was introduced. Maybe swap the order of sections 2.2 and 2.3?
6- Line 134: Fig. 1(f) does not exist. Should be 1(e).
7- Line 364: The first sentence of this paragraph is wrong: There is no "exponential decay behavior of entanglement entropy of MPS". MPS always have a finite correlation length, i.e., correlations decay exponentially. But there is no exponential decay of entanglement entropy.