Loading [MathJax]/jax/output/CommonHTML/jax.js
SciPost logo

SciPost Submission Page

Bayesian RG Flow in Neural Network Field Theories

by Jessica N. Howard, Marc S. Klinger, Anindita Maiti, Alexander G. Stapleton

This is not the latest submitted version.

This Submission thread is now published as

Submission summary

Authors (as registered SciPost users): Jessica N. Howard
Submission information
Preprint Link: https://arxiv.org/abs/2405.17538v2  (pdf)
Code repository: https://github.com/xand-stapleton/bayes-nn-ft/tree/main
Date submitted: 2024-11-12 07:05
Submitted by: Howard, Jessica N.
Submitted to: SciPost Physics Core
Ontological classification
Academic field: Physics
Specialties:
  • High-Energy Physics - Theory
Approaches: Theoretical, Computational

Abstract

The Neural Network Field Theory correspondence (NNFT) is a mapping from neural network (NN) architectures into the space of statistical field theories (SFTs). The Bayesian renormalization group (BRG) is an information-theoretic coarse graining scheme that generalizes the principles of the exact renormalization group (ERG) to arbitrarily parameterized probability distributions, including those of NNs. In BRG, coarse graining is performed in parameter space with respect to an information-theoretic distinguishability scale set by the Fisher information metric. In this paper, we unify NNFT and BRG to form a powerful new framework for exploring the space of NNs and SFTs, which we coin BRG-NNFT. With BRG-NNFT, NN training dynamics can be interpreted as inducing a flow in the space of SFTs from the information-theoretic 'IR' 'UV'. Conversely, applying an information-shell coarse graining to the trained network's parameters induces a flow in the space of SFTs from the information-theoretic 'UV' 'IR'. When the information-theoretic cutoff scale coincides with a standard momentum scale, BRG is equivalent to ERG. We demonstrate the BRG-NNFT correspondence on two analytically tractable examples. First, we construct BRG flows for trained, infinite-width NNs, of arbitrary depth, with generic activation functions. As a special case, we then restrict to architectures with a single infinitely-wide layer, scalar outputs, and generalized cos-net activations. In this case, we show that BRG coarse-graining corresponds exactly to the momentum-shell ERG flow of a free scalar SFT. Our analytic results are corroborated by a numerical experiment in which an ensemble of asymptotically wide NNs are trained and subsequently renormalized using an information-shell BRG scheme.

Current status:
Has been resubmitted

Reports on this Submission

Report #2 by Anonymous (Referee 2) on 2025-1-27 (Invited Report)

  • Cite as: Anonymous, Report on arXiv:2405.17538v2, delivered 2025-01-27, doi: 10.21468/SciPost.Report.10550

Strengths

1. The paper is very well-written, clear and the logic is easy to follow.

2. The paper pedagogically introduces the necessary preliminaries like Bayesian inference, Bayesian RG, the NNFT correspondence.

3. The paper demonstrates the proposed BRG-NNFT formalism through clearly, explained analytical and numerical examples.

Weaknesses

1. The mathematical notation in sections where Bayesian Inference and Bayesian RG is discussed is a bit difficult to follow for a non-expert.

Report

The paper applies the formalism of Bayesian Renormalization Group (BRG) to perform an information theoretic coarse graining in the parameter space of neural networks. This leads to a flow in the parameter space which via the NN-FT formalism, is interpreted as a flow in the space of SFTs from the information-theoretic UV (fully trained network) to IR (untrained network). The training of a neural network in this language is an inverse BRG flow. The proposed BRG-NNFT correspondence is then substantiated through concrete analytical and numerical examples. The paper is an important contribution to the topic of NN-FT correspondence and can shed light on an understanding of how neural networks learn. I strongly recommend it to be published in SciPost Physics Core.

Requested changes

1. No important change requested. Some more pedagogical explanation of the concepts of Bayesian inference and Bayesian RG will be useful for a non-reader.

Recommendation

Publish (easily meets expectations and criteria for this Journal; among top 50%)

  • validity: good
  • significance: good
  • originality: good
  • clarity: high
  • formatting: excellent
  • grammar: excellent

Author:  Jessica N. Howard  on 2025-02-11  [id 5213]

(in reply to Report 2 on 2025-01-27)
Category:
answer to question

Thank you very much for your positive review and helpful suggestion. Indeed, we hope that this work is amenable to a wide audience as we believe there are many potential applications. We therefore appreciate your suggestion to make some of the abstract concepts more concrete. Along that line, we have added a concrete example of Bayesian inference at the end of Section 2.1 which we hope better illustrates some of the abstract concepts discussed.

Report #1 by Anonymous (Referee 1) on 2024-12-30 (Invited Report)

  • Cite as: Anonymous, Report on arXiv:2405.17538v2, delivered 2024-12-30, doi: 10.21468/SciPost.Report.10390

Strengths

1. The concept is well presented. The paper is well organised and nicely written.

2. A pedagogical exposition to Bayesian renormalisation group and neural network field theory are given in the paper which is helpful for the readers, including those, who are not closely familiar with this field of research.

3. The results of the paper are summarized well. The authors have also elucidated the proposed NNFT and BRG correspondence using illustrative examples.

Weaknesses

Some Mathematical expressions and notations can be better explained to improve the clarity of the paper.

Report

The authors have explored the connection between Neural Network Field Theory correspondence and Bayesian Renormalization Group flow in the space of statistical field theories. The authors have shown that a class of SFT actions, constructed from a NN architecture with its parameter distribution coarse grained to a cut-off scale Λ, can be interpreted as an information-theoretic BRG flow. The results of the paper are significant and the paper also contains a good amount of pedagogical details about NNFT and BRG. I recommend this paper to be published in SciPost Physics Core.

Requested changes

For the convenience of the general readers it will be better if the authors can clarify some notations, which are mentioned below.

1. The target function ϕ mentioned in Table 1 in section (2.2) is not explained in the text. It will be helpful if some details of this function are discussed in the main text.

2. Could the authors comment on how ˜G(n)(x1,,xn) in Eq.(31) is related to G(n)(x1,,xn)?

3. In Eq.(32) product index should be i instead of j.

4. Can some more details be given about the N-scaling in the couplings, mentioned in the paragraph below table 2? Are these couplings related to g(n) appearing in Eq.(35)?

5. Is the N appearing in Eq.(54) same as the normal density given in Eq.(21) of the footnote 11? It will be better if it is mentioned explicitly in the paragraph containing Eq.(54).

Recommendation

Publish (easily meets expectations and criteria for this Journal; among top 50%)

  • validity: good
  • significance: good
  • originality: good
  • clarity: good
  • formatting: excellent
  • grammar: perfect

Author:  Jessica N. Howard  on 2025-02-11  [id 5212]

(in reply to Report 1 on 2024-12-30)
Category:
answer to question

Thank you very much for your positive review and helpful feedback. Based on your suggestions, we have implemented the following changes:

  1. Thank you for catching this, we have added a sentence explaining this in Section 2.2: "Here, ϕi is the value of the target function ϕ which we would like our NN to approximate evaluated at xi, i.e. ϕi=ϕ(xi)". We have also standardized our notation to use ϕ when referring to the target function throughout the text.
  2. Indeed, the tilde indicates that this is a sampled (empirical) estimate of G(n) in Eq. 31 (which is now Eq. 37). In the limit of an infinite number of samples, ˜G(n) should approach G(n). We added text clarifying this.
  3. Thank you for catching this typo, it has been fixed.
  4. Yes, exactly. We have added a short description (below what is now Eq. (43)) expanding on this connection further and discussing where in the literature more details on this topic can be found.
  5. Thank you for pointing this out. Indeed, in Eq. (54) we were using a short-hand version of the same normal distribution notation seen in Eq. (21). Namely, if y is a normally distributed random variable with mean μ and variance σ the distribution can be written as N(yμ,σ). In Eq. (54) we suppressed the "y " in this expression since the random variable being sampled was indicated e.g. yN(μ,σ). There is another minor difference between the two instances that we should also mention, in Eq. (21) this is assumed to be a multivariate normal distribution whereas in Eq. (54) this is a univariate normal distribution. However, since the univariate version is a special case of the multivariate version, the same N notation is used. Which version is meant is often gleaned from context. For example, whether one mentions a covariance Σ (multivariate case) or a variance σ (univariate case).

    In any case, we agree that defining the notation used in Eq. (54) (now Eq. (61) ) in the text (as opposed to implicitly in a previous footnote) is an excellent suggestion. We have added text around Eq. (61) to clarify what is meant there.

We hope that the above changes have addressed your concerns. Again, we greatly appreciate your suggestions and believe they have improved the clarity of our work.

Login to report or comment