SciPost Submission Page
Introduction to dynamical mean-field theory of generic random neural networks
by Wenxuan Zou and Haiping Huang
This is not the latest submitted version.
This Submission thread is now published as
Submission summary
Authors (as registered SciPost users): | Haiping Huang |
Submission information | |
---|---|
Preprint Link: | scipost_202305_00028v1 (pdf) |
Code repository: | https://github.com/ZouWenXuan/Dynamical-Mean-Field-Theory |
Date submitted: | 2023-05-18 09:19 |
Submitted by: | Huang, Haiping |
Submitted to: | SciPost Physics Lecture Notes |
Ontological classification | |
---|---|
Academic field: | Physics |
Specialties: |
|
Approaches: | Theoretical, Computational |
Abstract
Dynamical mean-field theory is a powerful physics tool used to analyze the typical behavior of neural networks, where neurons can be recurrently connected, or multiple layers of neurons can be stacked. However, it is not easy for beginners to access the essence of this tool and the underlying physics. Here, we give a pedagogical introduction of this method in a particular example of generic random neural networks, where neurons are randomly and fully connected by correlated synapses and therefore the network exhibits rich emergent collective dynamics. We also review related past and recent important works applying this tool. In addition, a physically transparent and alternative method, namely the dynamical cavity method, is also introduced to derive exactly the same results. The numerical implementation of solving the integro-differential mean-field equations is also detailed, with an illustration of exploring the fluctuation dissipation theorem.
Current status:
Reports on this Submission
Report #2 by Anonymous (Referee 2) on 2023-9-11 (Invited Report)
- Cite as: Anonymous, Report on arXiv:scipost_202305_00028v1, delivered 2023-09-11, doi: 10.21468/SciPost.Report.7806
Report
The Authors consider a random fully connected network with Gaussian interactions of zero mean and tunable asymmetric correlations following a stochastic dynamics with Gaussian white noise. They review the derivation of dynamical mean-field equations tracking the evolution of the system in the thermodynamic limit both via the MRSJD path integral formulation and the dynamical cavity method. They discuss the numerical integration of the effective “single-spin” self-consistent stochastic process and an analytical approach to close the equations on the fixed point solution. Finally, they present how to relate correlation and response functions from DMFT to the fluctuation-dissipation relation with a focus on the concept of effective temperature.
The topics reviewed by the Authors have already been largely covered in previous literature, notably in references [11,21,22,23,24]. I find that the manuscript fails to contextualize appropriately its specific contribution and goals with respect to previous contributions.
The paper contains significant flaws in the presentation. My major concerns are:
(i) Previous literature is cited inappropriately and sometimes overlooked, both regarding original work and review papers.
(ii) Some technical derivations and comment of the results lack clarity.
I believe that the current version of the manuscript does not meet SciPost’s acceptance criteria. I list below detailed comments and questions.
- The Authors should deemphasize the generality of their derivations. "Generic random neural network" (Sec. 2) is too broad with respect to the actual model studied here. E.g., interactions J_ij (i<j) are iid, while a more generic covariance structure could be considered, as well as a more complicated noise structure or interactions (see [27] and missing references [d,e])
- The introduction fails to correctly refer to previous review literature: [21] and [23] are only introduced as references in the technical section, while these works already cover most of the topics of the manuscript and therefore should be discussed in the introduction.
- Missing reference [a] (see Chapt. VI) should be cited when introducing the dynamical cavity method . Similarly, I think that [a] is a more appropriate citation for "the replica analysis in equilibrium spin glass theory" (page 8, below Eq. 25)
- The Authors should clarify a crucial difference between their derivation and other learning theory references such as [14,27], i.e., in the latter the interaction weights are learned.
- I find the discussion confusing and sloppy regarding: the difference between relaxation to equilibrium and non-equilibrium steady state; the closure of the dynamical equations on fixed point solutions and when this connection is missing or only partially understood (e.g., glassy systems), the physical meaning of (effective) FDT. In particular, the following sentences are not precise:
Page 8: "This relation *bears the similarity* with the linear response relation in equilibrium statistical physics"
Page 9: "Note that the two point correlation *could* relax to the EA order parameter"
- Missing reference: [b] for the numerical integration of the DMFT self-consistent process.
-The gap between the measured temperature and the noise correlation when \eta=1 suggests that there is a problem in the extrapolation of Teff. These two quantities should match. Is this value of Teff stable when increasing t' in Fig.3 ? The authors should clarify this extrapolation procedure: they say it is due to "numerical errors by simulations", but aren't these numbers extrapolated from DMFT?
I don't understand the need to specify that "time is measured in units of milliseconds" in this case. Missing reference: [c] where a very similar analysis of effective temperature was carried out for the stochastic gradient descent algorithm.
- The Authors choose to adopt Ito's convention, which leads to convenient simplifications in the computation. This choice should be remarked in a clearer way.
- There is a formatting problem on Page 8.
Missing references:
[a] Mézard, Marc, Giorgio Parisi, and Miguel Angel Virasoro. Spin glass theory and beyond: An Introduction to the Replica Method and Its Applications. Vol. 9. World Scientific Publishing Company, 1987.
[b] H. Eissfeller and M. Opper. New method for studying the dynamics of disordered spin systems without finite-size effects. Physical review letters, 68(13):2094, 1992.
[c] Mignacco, Francesca, and Pierfrancesco Urbani. "The effective noise of stochastic gradient descent." Journal of Statistical Mechanics: Theory and Experiment 2022.8 (2022): 083405.
[d] F. Schuessler, A. Dubreuil, F. Mastrogiuseppe, S. Ostojic, and O. Barak, Dynamics of Random Recurrent Networks with Correlated Low-Rank Structure, Phys. Rev. Research 2, 013111 (2020).
[e] Krishnamurthy, Kamesh, Tankut Can, and David J. Schwab. "Theory of gating in recurrent neural networks." Physical Review X 12.1 (2022): 011011.
Report #1 by Anonymous (Referee 3) on 2023-7-19 (Invited Report)
- Cite as: Anonymous, Report on arXiv:scipost_202305_00028v1, delivered 2023-07-19, doi: 10.21468/SciPost.Report.7538
Strengths
The presentation of the DMFT method in simple models of recurrent neural networks is very pedagogical and sufficiently coincise to condense the main points in a few pages
Weaknesses
I believe that the bibliography could be improved and the introduction could be expanded.
Report
I recommend the publication of the manuscript since I believe that it fulfills the corresponding criteria.
Requested changes
- In equation [6] the noise term is evaluated at time $t$. However since the authors are using the Ito convention, shouldn't it be evaluated at time $t-1$ under the convention that this noise is independent of $x[t-1]$?
- In sec. 3.2 one could also make reference to chapter 6 of the book by Mézard, Parisi and Virasoro (spin glass theory and beyond), where the very same line of reasoning on a very similar model was presented.
-Maybe one can also comment that a damping term is sometimes useful to help the convergence of the kernels in the numerical algorithm to solve the DMFT equations (see section 4.1). One can also add that running several stochastic trajectories is an operation that is easily done in parallel.
-It may be useful to discuss how the response function is computed in the numerical simulations used in Fig.1 to compare with the numerical solution of the DMFT equations.