Cavity and replica methods for the spectral density of sparse symmetric random matrices

We review the problem of how to compute the spectral density of sparse symmetric random matrices, i.e. weighted adjacency matrices of undirected graphs. Starting from the Edwards-Jones formula, we illustrate the milestones of this line of research, including the pioneering work of Bray and Rodgers using replicas. We focus first on the cavity method, showing that it quickly provides the correct recursion equations both for single instances and at the ensemble level. We also describe an alternative replica solution that proves to be equivalent to the cavity method. Both the cavity and the replica derivations allow us to obtain the spectral density via the solution of an integral equation for an auxiliary probability density function. We show that this equation can be solved using a stochastic population dynamics algorithm, and we provide its implementation. In this formalism, the spectral density is naturally written in terms of a superposition of local contributions from nodes of given degree, whose role is thoroughly elucidated. This paper does not contain original material, but rather gives a pedagogical overview of the topic. It is indeed addressed to students and researchers who consider entering the field. Both the theoretical tools and the numerical algorithms are discussed in detail, highlighting conceptual subtleties and practical aspects.

3.3 Thermodynamic limit within the cavity framework 12 3.4 The c → ∞ limit in the cavity formalism 13 3.5 The spectral density and the resolvent 14 4 Replica method: the Bray-Rodgers equation

Introduction
The calculation of the average spectral density of eigenvalues of random matrices belonging to a certain ensemble has traditionally been one the fundamental problems in Random Matrix Theory (RMT), ever since the application of RMT to the statistics of energy levels of heavy nuclei [1]. The spectral problem has retained its centrality in RMT with diverse applications in physics [2], computer science [3], finance [4][5][6] and statistics [7,8]. The most celebrated results about the density of states such as the Wigner semicircle law [9] for Wigner matrices (including Gaussian ensembles) and the Marčenko-Pastur law [10] for covariance matrices refer to "dense" matrix ensembles, i.e. those for which most of the matrix entries are non-zero.
On the other hand, the spectral problem is very relevant also for "sparse" matrix models, i.e. when most of the entries are zero. Indeed, the spectral properties of (weighted) adjacency matrices of sparse graphs encode the structural and topological features of many complex systems [11,12]. For random walks and dynamical processes on graphs, the eigenvalue spectrum is directly connected to the relaxation time spectrum [13,14]. Moreover, sparsely connected matrix models provide a test ground for physical systems described by Hamiltonians with finite-range interactions. In particular, tight-binding Hamiltonian operators with a kinetic term and an on-site random potential translate into matrix models that involve discrete graph Laplacians with additional random contributions to diagonals [15]. The spectra of such matrices have been used for the characterisation of many physical systems in condensed matter such as the studying of gelation transition in polymers [16]. Moreover, the behaviour of supercooled liquids can be described in terms of the spectrum a random sparse matrix representing the Hessian of those systems, within the framework of instantaneous normal modes [17].
Spectra of sparse random matrices and trees have also been employed as the simplest model to study Anderson localisation [18], i.e. the phenomenon by which a metal becomes an insulator due to disorder, such as impurities. The metallic phase corresponds to a spatially extended electronic wave functions, allowing transport. On the other hand, a high level of disorder leads to localised wave functions, which prevent conduction. A model for this phase separation is represented by the localisation transition characterising the spectra of sparse random matrices, where eigenvalues related to delocalised eigenvectors are separated at the mobility edge from those related to localised eigenvectors (see Section 5). Localisation phenomena have been analysed on Bethe lattices 1 (see [15,19] and the seminal paper of Abou-Chacra and collaborators [20], in which the cavity method that will be discussed below was also used) and on sparse random graphs [21][22][23][24][25].
In this paper, we describe the various strategies to compute the average spectral density (also known as density of states) for ensembles of sparse symmetric random matrices, i.e. weighted adjacency matrices of undirected graphs. Given a N × N random matrix J with eigenvalues {λ i } i=1,...,N , the average spectral density is defined as where the limit N → ∞ is understood and ... J denotes the average over the matrix ensemble to which J belongs. The latter is also referred to as "disorder" average. For a given (large) N , ρ(λ) can be numerically obtained by first diagonalising a large number M of N × N matrices drawn from the ensemble, collecting all their N · M eigenvalues and organising them into a normalised histogram. Our analysis is rooted in the statistical mechanics of disordered systems, with the main technical tools being the cavity (Section 3) and replica methods (Section 4 and 5).

A historical perspective on the spectral problem for sparse matrices
Our analysis will follow the historical developments that led to the solution of the problem. We start from the celebrated Edward-Jones formula [26], which is a key result linking the spectral problem to statistical mechanics. Indeed, the formula recasts the determination of the average spectral density (1) in terms of the average free energy logZ(λ) J of a disordered system with partition function Z(λ). Edward and Jones were the first to use the replica method, extensively employed in spin-glass physics [27], to perform averages of this type in the context of random matrices. Historically, the application of the Edwards-Jones recipe to sparse symmetric random matrices (in particular Erdős-Rényi adjacency matrices of graphs with finite mean degree c and the non-zero entries drawn from a Bernoulli distribution) was pioneered by Bray and Rodgers in [28] (and in a similar context in [29] and later on in [30]). However, in their formulation the evaluation of the average spectral density ρ(λ) relies on the solution of a very complicated integral equation. The same integral equation has been derived independently with a supersymmetric approach in [31] and later obtained in a rigorous manner in [32], thus confirming the exactness of the symmetry assumptions in [28]. A full analytical solution of this equation is still unavailable. A numerical solution for large average connectivity c was found in [17], whereas a solution in the form of an expansion for small c was proposed in [16]. The difficulties in dealing with this equation stimulated the search for a variety of approximation schemes, such as large average connectivity expansions [28], the single defect approximation (SDA) [33] and the effective medium approximation (EMA) [34,35]. Alongside approximation schemes, results from numerical diagonalisation such as in [36] have been employed to investigate the spectral properties of sparse random matrices.
A different approach to the spectral problem of sparse symmetric random matrices was proposed in [37]. There, the order parameters of the replica calculation are represented as uncountably infinite superpositions of Gaussians with random variances, as suggested by earlier solutions of models for finitely coordinated harmonically coupled systems [38]. A replica-symmetric Gaussian ansatz for the order parameter had appeared earlier in the random matrix context in [39], but was evaluated only within the SDA approximation. In [37], the intractable Bray-Rodgers integral equation is replaced by non-linear fixed-point equations for probability density functions, which are solved by a stochastic population dynamics algorithm. We will review both approaches in Sections 4 and 5 below.
Almost in parallel to [37], the cavity method [40] started to be employed for the determination of the spectral density of sparse symmetric random matrices by Rogers and collaborators in [41]. The cavity method, also known as belief propagation, represents a much simpler alternative to replicas and was originally introduced for the study of spectra of dense Lévy matrices in [42] and for diluted systems in [17]. The exactness of the cavity method for locally tree-like graphs with finite mean degree c was proved in [43]. In [41], building on the Edwards-Jones setup, the authors used the cavity method to compute the spectrum of large single instances of sparse symmetric random matrices. The ensemble average spectral density (1) is then obtained building on the single-instance results, circumventing the calculation of the average "free energy" logZ(λ) J altogether. The cavity treatment produces non-linear fixed-point integral equations that are completely equivalent to those obtained in [37] within the replica framework.
It has been shown in [44] that both the cavity and the replica method yield the same results concerning the spectral density of graphs. Both approaches in [37] and [41] recover known results such as the Kesten-McKay law for the spectra of random regular graphs [45,46], the Marčenko-Pastur law and Wigner's semicircle law respectively for sparse covariance matrices and for Erdős-Rényi adjacency matrices in the large mean degree limit. Moreover, both methods allow one to characterise the spectral density of sparse Markov matrices [47,48] and graphs with modular [49] and small-world [50] structure and with topological constraints [51].
In a similar manner, both methods have also been employed to study the statistics of the top and second largest eigenpair of sparse symmetric random matrices [52][53][54]. The two methods have also been extended to the case of sparse non-Hermitian matrices [55][56][57][58]. A particular attention has been devoted to the spectral properties of the Hashimoto non-backtracking operator on random graphs [59,60]. Both cavity and replica methods have been recently used to characterise the dense (c → ∞) limit of the spectral density of adjacency matrices of undirected graphs within the configuration model, which reveals that the behaviour of the limiting spectral density is not universal but actually depends on degree fluctuations. Indeed the expected Wigner semicircle is recovered when the degree distribution tightly concentrates around the mean degree c for c → ∞, whereas non trivial deviations from the semicircle are found when degree fluctuations are stronger [61].
Moreover, thanks to the extension of the replica method to the analysis of sparse loopy random graphs, the influence of loops on the spectra of sparse matrices has been lately investigated in [62,63]. There is also a recent cavity analysis of the problem of loopy graphs by Newman and collaborators in [64].

Paper organisation
In this paper, we will retrace the main milestones in the determination of the spectral density of sparse symmetric random matrices. We start with the analysis of the Edwards-Jones formula in Section 2, providing its proof in Section 2.1 and discussing how to deal with the average logZ(λ) J in Section 2.2. For clarity and simplicity, we will first illustrate the cavity approach in Section 3. We outline the cavity setup in Section 3.1, then we deal with the spectrum of large instances of sparse symmetric random matrices in Section 3.2. In Section 3.3 we show how the single-instance approach can be extended to the N → ∞ limit to recover the ensemble average spectral density. Besides, in 3.4 we evaluate the large c limit of the average spectral density obtained within the cavity formalism, showing that it converges to the Wigner semicircle. We will then follow the historical development of the subject by documenting the Bray-Rodgers replica approach in Section 4. We will derive the Bray-Rodgers integral equation in Section 4.3, while in Section 4.4 we will obtain its large c asymptotic expansion, showing that its leading order gives rise to the Wigner semicircle, as expected. In Section 5 we will deal with the alternative replica solution proposed in [37], showing in Section 5.1 that the solution obtained with this approach coincides with that found by the cavity treatment in Section 3.3. In Section 6, we outline the stochastic population dynamics algorithm employed to solve the non-linear fixed-point integral equations that are found within both the cavity and replica frameworks respectively in Section 3.3 and Section 5.1.

Edwards-Jones formula
Edwards and Jones in [26] provide a formula to express the average spectral density of N × N random matrices (1) as with where again the ... J denotes the average over the matrix ensemble to which J belongs. In (2), which is valid for any N , Im indicates the imaginary part and log is the branch of the complex logarithm for which log e z = z. In (3), the symbol 1 represents the N × N identity matrix, the symbol v describes a vector in R N and the integral extends over R N . Moreover, λ = λ − iε, where ε is a positive parameter ensuring that the integral (3) is convergent, since the absolute value of the integrand has the leading behaviour e − ε (3) can be interpreted as the canonical partition function of the Gibbs-Boltzmann distribution of N harmonically coupled particles with an imaginary (inverse) temperature, viz.
with a complex "Hamiltonian" In this framework, the computation of (1) requires to evaluate logZ(λ) J , which is the canonical free energy of the associated N particles system, averaged over the random couplings.

Proof of the Edwards-Jones formula
The starting point is the definition (1). Looking for a representation of the Dirac delta, one considers the Sokhotski-Plemelj identity (see A for a proof), viz.
where x ∈ R and Pr denotes the Cauchy principal value. The imaginary part of the identity, namely provides the desired representation. Therefore, inserting (6) into (1) results in where the minus sign has been made explicit.
One would now express the ratio in the angle brackets as the derivative of the principal branch of the complex logarithm, denoted by Log. Unlike other properties, its derivative behaves exactly like that of the real logarithm, therefore entailing for the average spectral density the formula The sum of logarithms in (10) can be related to the partition function Z(λ) in (3) by exploiting the following identity [65,66], (11) Caution is needed when taking the logarithm on both sides of (11), as in general Log(e z ) = z (see B). Indeed, using the property (172) and taking the principal logarithm on both sides of (11), one would obtain where is the imaginary part of the exponent in (11) and the symbol ... denotes the floor operation, i.e. x is the integer such that Note that this branch choice would make the r.h.s. not everywhere differentiable for λ ∈ R. Therefore, it is convenient to pick the branch of the complex logarithm such that log e z = z instead, i.e. for which the extra (non-differentiable) phase term in (12) is killed. This choice yields where the constant terms on the r.h.s. depend on N , but not on λ. Taking the derivative, one eventually finds ∂ ∂λ therefore the Edwards-Jones formula (2) is recovered.

Tackling the average in the Edwards-Jones formula
In order to obtain the spectral density, the average logZ(λ) J must be computed. It explicitly reads where P ({J ij }) is the joint distribution of the matrix entries. The presence of the logarithm in (16) prevents a factorisation of averages over edges (i, j) even for a factorised pdf of the J ij . The only available strategy seems to perform the inner N -fold integral over v first, compute the logarithm, and then average over the random matrix disorder. However, this sequence of operations would simply run the Edwards-Jones formula (2) backwards, leading to the useless identity ρ(λ) = ρ(λ). The only chance to make some progress therefore relies on performing the disorder average first. However, the two integrations in (16) cannot be directly exchanged due to the presence of the logarithm in between.
Disorder averages such as (16) are called quenched averages. The technique to handle such averages is the replica trick. It is a well established method employed in the statistical mechanics of disordered systems that allows one to bypass the logarithm in (16) in favour of the computation of integers moments of Z(λ) (see Section 4) 1 .
The replica method for the calculation of the spectral density of dense random matrices was employed by Edwards and Jones in [26]. The same replica calculation for sparse ensembles was pioneered by Bray and Rodgers in [28]. However, we prefer to start with the cavity approach because it is technically much less involved and allows one to circumvent the direct computation of logZ(λ) J . We will then follow the historical path traced in [28] in Section 4.

Cavity method for the spectral density
The cavity method as implemented in [41] makes it possible to derive the spectral density for a single instance of large sparse symmetric matrices. According to the physical interpretation of the Edwards-Jones formula, the calculation of the spectral density can be recast as a problem of interacting particles on a sparse graph. The basic idea behind the cavity method [40] is that observables related to a certain node of a network in which cycles are scarce (thereby called tree-like) can be determined from the same network where the node in question is removed. Due to the sparse structure, the removal of a node makes its neighbouring sites (as well as the signals coming from them) uncorrelated.

Definition of the sparse matrix ensemble
We consider a large N × N sparse symmetric random matrix J. It represents the weighted adjacency matrix of an undirected graph G, i.e. each entry can be expressed as J ij = c ij K ij , where the c ij = c ji ∈ {0, 1} represent the pure adjacency matrix and the K ij encode the bond weights. When two nodes i and j are connected by a link, then c ij = 1, otherwise c ij = 0. We consider simple graphs, in which self-loops are not present, entailing that c ii = 0 for any node i. In an undirected graph, the degree k i of the node i is defined as the number of nodes in its neighbourhood ∂i = {j : c ij = 1}, viz.
We define c = 1 N N i=1 k i as the mean degree. We consider locally tree-like sparse matrices, in which the probability of finding a cycle vanishes as ln N/N when N → ∞. Alternatively, this property is implied by the requirement that the mean degree c does not increase with the matrix size N , hence c/N → 0 as N → ∞. In this very sparse regime, the cavity method predictions are approximate for sparse graphs of finite size N , whereas they are exact for finite trees. However, the cavity results become asymptotically exact on finitely connected networks in the limit N → ∞ (i.e. in the thermodynamic limit). This has been rigorously proved in [43].
Following the statistical mechanics analogy, in the sparse case the N particles described by the variables v i interact on the graph G where an edge is defined for any pair (i, j) of interacting particles. While the replica formalism analyses the partition function (3) in the limit N → ∞, the cavity method focusses on the associated Gibbs-Boltzmann distribution (4) with imaginary inverse temperature i and complex Hamiltonian (5), as shown in the section below.

Cavity derivation for single instances
The spectral density of J is obtained from the Edwards-Jones formula (2) for finite N as where Z(λ) is defined in (3). The subscript indicates that ρ J (λ) refers to a single, specific instance J. For the same reason, no averaging is needed. Performing explicitly the λ-derivative in (18) with Z(λ) defined in (3), one obtains where is the Gibbs-Boltzmann distribution defined in (4). For any given i, the average w.r.t. the joint pdf (20) in (19) reduces to the average w.r.t. the single-site marginal P i (v i ), viz.
where the v 2 i represent the single-site variances of each of the N marginal pdfs P i (v i ). Using (19) and (21), the spectral density in (18) can thus be written as Therefore, it is sufficient to determine the N single-site variances to calculate the spectral density using (22). In order to find the v 2 i , one looks at each marginal pdf P i (v i ). Due to the sparse nature of J, the variable v i is coupled (through J ij ) only to those v j associated to nodes that are neighbours of i. Hence, the single-site marginal of the node i can be expressed as In (23), the integration is over the "particles" interacting with particle i, i.e. those sitting on the neighbouring sites ∂i. The distribution P (i) (v ∂i ) collects the contributions coming  Figure 1: Graphs sketches. Graph 1: a tree-like graph where the indices refer to the notation used in Section 3.1 to derive cavity single-instance equations. Graph 2: example of the decorrelation occurring to the nodes j 1 , j 2 and j 3 , neighbours of the node i, after the removal of i.
from the interaction of each of the v j (j ∈ ∂i) with particles sitting on nodes that are not neighbours of i themselves (see Graph 1 on the l.h.s. of Fig. 1). The contributions to the integral defining P i (v i ) coming from nodes further away generate a constant term that is absorbed in the normalisation constant Z i . The distribution P (i) (v ∂i ) is called the cavity distribution, since it refers to a graph in which the node i has been removed. In a tree-like structure, the neighbouring sites of each node i are correlated mainly through the node i. Hence, when the node i is removed, its neighbours become uncorrelated (see Graph 2 on the r.h.s. of Fig. 1). Therefore, the joint cavity pdf P (i) (v ∂i ) factorises into the product of independent cavity marginals P We remark that the condition (24) is exact only as N → ∞, while being only approximate for finite N . From Eq. (24), it follows that the single-site marginal (23) can be expressed as Eq. (25) shows that the marginal P i (v i ) is defined in terms of the cavity marginals P  j (v j ) can be obtained by iterating the same reasoning as above. Indeed, one can now choose one of the nodes j ∈ ∂i and define the marginal pdf associated to that node in the same way as in eq. (25). However, the network one is considering at this stage is that where the node i has already been removed, therefore eventually obtaining the cavity marginal P where the symbol ∂j\i denotes the set of neighbours of node j excluding i (see again Graph 1 on the l.h.s. of Fig. 1 ). In turn, the cavity marginals P (j) (v ) are defined on the graph where also the node j ∈ ∂i has been removed. Eq. (26) defines a set of recursion equations for any pair of interacting nodes (i, j). The set of recursion equations (26) is solved exactly by a zero-mean Gaussian ansatz for the cavity marginals P (i) j (v j ). Indeed, assuming that and performing the Gaussian integrals on the r.h.s. of (26), one gets The comparison between the exponents of (27) and (28) entails Therefore, the set of equations (26) translates into a set of self-consistency equations for the cavity inverse variances ω (i) j . Similarly, the Gaussian ansatz (27) can be inserted in the single-site marginal expression (25), yielding a Gaussian structure for P i (v i ), viz.
with single-site inverse variances given by Once the cavity inverse variances are determined as the solution of (29), the single-site inverse variances v 2 i = 1 ω i are found from (31), and the spectral density is readily obtained from (22) as The formula (32) is exact in the limit N → ∞: however for a finite but sufficiently large N , it provides an approximation for the average spectral density of the ensemble. As a concluding remark, it can be noticed that the set of self-consistency equations for the cavity inverse variances (29) only depends on the square of matrix entries, thus entailing that the spectrum of the matrix J is equal to that of the matrix −J and therefore is perfectly symmetric around λ = 0. This property indeed holds exactly for trees, since every tree is a bipartite graph (see [67] or G for a simple proof of this property). This check further corroborates that cavity equations are exact on trees, but only approximate on tree-like structures as long as cycles are negligible.
The set of cavity recursions (29) can be solved by a forward iteration algorithm. A working example of a code that allows one to determine the average spectral density on a single instance is available upon request.

Thermodynamic limit within the cavity framework
In this section we depart from [41] and show that the ensemble average of the spectral density (1) can be recovered from the single-instance spectral density (32) as obtained through the cavity method. Indeed, by invoking the law of large number in (32), in the large N limit one gets whereπ(ω) is the pdf of the inverse variances ω i taking values aroundω. In the r.h.s. of Eq. (33) the subscript J has been dropped, as the quantity ρ(λ) characterises the ensemble of J, rather than a single matrix. Eq. (33) implicitly assumes that the spectral density enjoys the self-averaging property, meaning that a large single instance of the ensemble faithfully represents the average behaviour over many instances.
The task now is to find the pdf of the inverse variancesπ(ω). Recalling the single-instance relation (31) between the single-site inverse variances ω i and the cavity inverse variances ω (i) j , the pdfπ(ω) will be determined in terms of the probability density π(ω) of ω In order to find the pdf π(ω), one observes that the set of self-consistency equations for the cavity inverse variances (29) refers to the links of the underlying graph. In an infinitely large network, links can be distinguished from one another by the degree of the node they are pointing to. Therefore, considering a link (i, j) pointing to a node j of degree k, the value ω of the cavity inverse variance ω (i) j living on this link is determined by the set {ω } k−1 of the k − 1 values of the cavity inverse variances ω (j) living on each of the edges connecting j with its neighbours ∈ ∂j\i. In an infinite system, these values can be regarded as k − 1 independent realisations of the random variables of type ω (j) , each drawn from the same pdf π(ω). The entries of J appearing in (29) are replaced by a set {K } k−1 of k − 1 independent realisations of the random variables K j , each distributed according to the bond weights pdf p K (K). The distribution π(ω) is then obtained by averaging the contributions coming from every link w.r.t. the probability k c p(k) of having a link pointing to a node of degree k 2 . This reasoning leads to the self-consistency equation where {dπ} k−1 = k−1 =1 dω π(ω ) and the angle brackets · {K} k−1 denote the average over k − 1 independent realisations of the random variable K. Eq. (34) is generally solved via a population dynamics algorithm (see Section 6).
The same reasoning can be applied to find the pdfπ(ω) of inverse variances. Recalling (31), it can be noticed that the ω i are variables related to nodes, rather than links. Since in the infinite size limit the nodes can be distinguished from one another by their degree, the pdfπ(ω) can be written in terms of (34) as where p(k) is the degree distribution. Inserting (35) into (33) gives (at the ensemble level) Eq. (36) is the ensemble generalisation of the single-instance formula (32) and provides the ensemble average of the spectral density (1). The average spectral density as expressed in (36) can be interpreted as a weighted sum of local densities, each pertaining to sites of degree k. As shown by (36), the solution of the spectral problem is completely determined by the distribution π satisfying the self-consistency equation (34). Once π has been obtained, the average spectral density (36) is evaluated by sampling from a large population representing the distribution π(ω). Section 6 illustrates the algorithm that produces the solution of selfconsistency equations of this type as well as the details of the sampling procedure.

The c → ∞ limit in the cavity formalism
One can easily show that taking the c → ∞ limit in Eq. (34), (35) and then eventually (33), the Wigner semicircle law is recovered. This has been first shown in [41]. According to [61], we consider graphs in the configuration model having a degree distribution such that Here, the symbol σ k denotes the standard deviation of the degree distribution p(k). For example, σ k = √ c for Erdős-Rényi graphs. A meaningful large-c limit is obtained for Eq. (34) or equivalently (35) by rescaling each instance of the bond random weights as K ij = K ij / √ c. Therefore, considering (34) one obtains For large c, the sum over the degrees in Eq. (37) receives contributions only from k = c±O(σ k ). As c → ∞, the degree distribution p(k) becomes highly concentrated around k = c, thus the argument of the δ-function on the r.h.s of Eq. (37) can be evaluated using the Law of Large Numbers (LLN). Indeed, one finds that the r.h.s. of the condition does not fluctuate, hence ω itself is fixed and determined by the algebraic equation For large c, the quantity represents the second moment of the pdf of the rescaled bond weights.
The very same reasoning can be applied to the argument of the δ function in (35), entailing that in the limit c → ∞ theω are non-fluctuating as well, and take the same constant values given by the solutions of Eq. (39), viz.
Therefore, inserting Eq. (39) and (40) in Eq. (33), one finds that in the limit c → ∞ which in the ε → 0 + limit eventually reduces to where the plus sign has been chosen to get a physical solution. The latter expression corresponds to the Wigner's semicircle.

The spectral density and the resolvent
Before dealing with the replica derivation of the average spectral density, it is worth remarking that the average spectral density can be obtained in an alternative way considering the resolvent. Given a N × N matrix J, its resolvent is defined as where z ∈ C and the matrix 1 is the N × N identity matrix. Setting z = λ ε = λ − iε, the average spectral density is obtained from the imaginary part of the trace of the resolvent matrix, i.e.
where the thermodynamic limit N → ∞ is understood. Eq. (44) can be explained by observing that the resolvent provides a regularised version of the Dirac delta appearing in Eq. (1). Indeed, the matrix G shares the same eigenvector basis with J, {u α } with α = 1, . . . , N . Then, using the spectral theorem, the resolvent (43) can be written as entailing that where the normalisation property of the eigenvectors 1 = N i=1 u 2 iα for any α = 1, ..., N and the Sokhotski-Plemelj identity (7) have been used. Given the connection with the eigenvectors, the resolvent allows for the study of localisation properties (see for instance [15,23] and Section 7 for a further application).
The cavity method can be directly applied to the resolvent, as originally suggested in [42]. A set of self-consistency equations for the cavity diagonal entries of the resolvent G(z) (i) jj (i.e. the diagonal entries of the resolvent matrix from which the i-th row and the i-th column have been removed) can be obtained thanks to a Schur decomposition procedure or alternatively by representing the G(z) ii and in turn the G(z) (i) jj as Gaussian integrals and then applying the cavity method in the same fashion as Section 3.2 (see e.g. [23] for the details).
A correspondence between the cavity formalism as developed in Section 3.2 and the cavity method applied to the resolvent can be easily established. Indeed, comparing respectively Eq. (27) and (21) with Eq. (16) and (13) in [23], it follows that for any λ ∈ R and for any where the ω

Replica method: the Bray-Rodgers equation
In this section, we illustrate the replica calculation for the average spectral density, as originally proposed by Bray and Rodgers. Following [28], the goal is to evaluate the average spectral density (1) of an ensemble of N × N real symmetric sparse matrices. Leveraging on the notation of section 3, given a matrix J, each matrix entry can be written as J ij = c ij K ij , where the c ij = {0, 1} represent the pure adjacency matrix of the underlying graph and the K ij encode the bond weights. In particular, the matrix model considered in [28] is the Erdős-Rényi (ER) model: the probability of having a non-zero entry is given by p = c/N , where c represents the mean degree of the nodes of the underlying graph. For more details on the ER model, see C. The joint distribution of the matrix entries is given by where p C (c ij ) represents the ER connectivity distribution, viz.
while p K (K ij ) represents the bond weight pdf, which will be kept unspecified until the very end of the calculation.

Replica derivation
The Edwards-Jones formula (2) is used. As anticipated in Section 2, in order to deal with the quenched average (16) the replica identity will be employed, viz.
where n is initially taken as an integer 3 . The replica identity is easily obtained considering that in the limit n → 0 The average replicated version of the partition function (3) reads The ensemble average ... J splits into the connectivity average w.r.t. the c ij and the disorder average w.r.t. the K ij . The connectivity average can be performed explicitly exploiting the large N scaling, yielding where ... K denotes averaging over p K (K). The details of the calculation leading to (54) can be found in D.
In order to decouple sites, the following functional order parameter is introduced, via the path integral identity where v ∈ R n represents a n-dimensional vector in the replica space. Eq. (56) is the functional analogue of In terms of the order parameter (55), the average replicated partition function becomes The multiple integral I in the last line of (58) factorises into N identical copies of the same n-dimensional integral over R n . Indeed, one finds where Log denotes again the principal branch of the complex logarithm. The replicated partition function (58) can then be written in the form where with Eq. (60) is amenable to a saddle-point evaluation for large N , yielding where the star denotes the saddle-point value of the order parameter and its conjugate. The stationarity conditions of the action (61) w.r.t. the functional order parameter ϕ and its conjugateφ give The two stationarity conditions (66) and (67) can be combined. Indeed, by calling iφ ( v) = cg( v) and inserting (67) in (66), one obtains where f (x) = e iKx K − 1. A numerical solution for coupled saddle-point equations that are analogous to Eq. (66) and (67) has been recently proposed in [68].

Average spectral density: replica symmetry assumption
The function g( v) defined by (68) fully determines the average spectral density. Indeed, recalling (2) and using (65), one gets The λ-derivative acts only on the terms of the action S n explicitly depending on λ, due to the stationarity of S n w.r.t. ϕ andφ . Indeed, one obtains where g( v) solves (68).
In order to perform the n → 0 limit in (69), an assumption on the symmetries of the function g( v), or equivalently of both ϕ ( v) andφ ( v), under permutations of replica indices needs to be made. It is known that a replica-symmetric "high-temperature" solution, preserving both permutational and rotational symmetry in the replica space, is exact in the random matrix context 4 . Hence, following [28], one can assume where v = | v| = a v 2 a . Therefore, taking into account the ratio (71), the replica symmetric ansatz (72) and that Im(ix) = Re(x) for any x ∈ C, the average spectral density reads To further simplify the ratio of integrals in (73), n-dimensional spherical coordinates are introduced. The symbol v represents the radial coordinate and In these coordinates, the factor arising from the integration over the angular degrees of freedom, appears both in the numerator and denominator of (73). Hence it cancels, yielding eventually The integral in the denominator can be further simplified integrating by parts, i.e.
since the boundary contribution vanishes. Therefore, taking the n → 0 limit, the average spectral density reads .
The expression (77) shows that the function g(v) is the only ingredient needed to compute the average spectral density. The search for a (replica-symmetric) solution of (68) will be addressed in Section 4.3.

Replica symmetry assumption for the Bray-Rodgers integral equation
At this stage, in order to get a replica-symmetric version of (68), the replica-symmetric ansatz (72) is applied and spherical coordinates are again introduced. Assuming that φ 1 = φ ∈ [0, π] is the angle between the vectors v and v and | v | = r one finds since all the remaining angular integrations cancel between numerator and denominator. We recall that f (z) = e iKz K − 1. In order to proceed, a specific bond weight distribution must be introduced. The choice made by Rodgers and Bray in [28], entails that f (z) = cos z − 1. This gives The angular integral in the numerator in (80) where J α (x) indicates the Bessel function of the first kind, defined by the series The angular integral in the denominator of (80) is independent of the radial one and gives thus canceling the divergent factor for n < 1 appearing in (81). The radial integral in the denominator in (80) can be simplified integrating by parts. By calling G(r) = exp − i 2 λ ε r 2 + cg(r) , one obtains as the boundary contribution vanishes. Collecting the results, one finds Lastly, the n → 0 limit in (85) is taken. Recalling the definition of G(r) and noticing that 3.
Given the structure of Eq. (89), it follows that g(0) = 0, therefore eventually which is equivalent to Eq. (18) in [28]. This equation, also known as the Bray-Rodgers integral equation, fully defines the quantity g(v). Despite numerous attempts, an exact analytical solution for (90) is currently not available. The numerical evaluation of Eq. (90) in the case of c = 20 has been carried out in [17], to obtain the average spectral density of Laplacians of ER graphs with bimodal weights. In this high connectivity regime, the quality of the numerical solution was comparable with the SDA approximation (see [33]). On the other hand, the authors in [16] describe a procedure for the solution of a similar integral equation employing a series expansion which is valid only for c < 1/2. Nonetheless, for values of c that are in between these two extremal cases, Eq. (90) has proved to be hard to tackle even numerically, due to the exponential non-linearity and the oscillatory Bessel term. Taking into account Eq. (90), the average spectral density (77) can be further simplified as follows. Indeed, recalling that G(v) = exp −i λε 2 v 2 + cg(v) , the denominator in Eq. (77) can be expressed as therefore entailing that the average spectral density reduces to

The average spectral density in the c → ∞ limit
It can be shown that Eq. (90) can be solved perturbatively in powers of 1/c in the limit c → ∞.
In this framework, the average spectral density (92) is in turn expressed as a perturbative expansion, whose leading term is the Wigner semicircular law. We will provide a derivation inspired by [28,30].
First, the following changes of variables are introduced, where x δ = x − iδ, with δ > 0. Moreover, it is assumed that g(v) = 1 c γ(s). Eq. (95) implies rescaling the spectral density such that a meaningful c → ∞ limit can be taken. This rescaling is equivalent to normalising the matrix entries by √ c. After the change of variables, the spectral density will be expressed in terms of x δ = x − iδ, which does not scale with c. In this setting, the limit ε → 0 + is replaced by the limit δ → 0 + . Taking into account (95), for the l.h.s. of Eq. (92) before taking the ε → 0 + limit one finds entailing that One then rewrites Eq. (90) in terms of the new variables. The differential in (90) transforms as while the integration boundaries are unchanged. Indeed, from (94), one finds In terms of the new variables, after some algebra Eq. (90) converts to corresponding to Eq. (23) in [28]. With these choices, it is natural to expand γ(s) as a power series in s/c. High powers of s are related to high powers of 1/c. Therefore, one expects to find a solution of the form where in turn the coefficients b r are defined via the expansion Eq. (101) and (102) allow one to obtain all possible combinations of powers of s r c r+ . The target is to determine the coefficients b ( ) r , by solving Eq. (100) order by order. This would permit a complete representation of γ(s) as a power series. The following steps will be followed for the solution.
• Express γ via the expansions (101) and (102) in both the l.h.s. and the exponent of the r.h.s. of (100).
• Integrate w.r.t. u term by term in the r.h.s. of (100).
• Equate the coefficients of the powers s r c r+ .
The expansion of γ(s) will be stopped at O 1 c . Hence, the l.h.s. of (100) reads Looking at the r.h.s., one notices that only the terms m = 0 and m = 1 of the sum in (100) are needed to match the powers s r c r+ in (103). Indeed, one finds The first and second integral on the r.h.s. of (104) can be denoted respectively as I A and I B . Considering first the integral I A , it has a pre-factor of order O(s), therefore it may yield contributions of any order in powers of 1/c depending on the order at which we stop the expansion of γ(u) in the exponent. Expanding γ(u) as in (103) is sufficient to obtain all the O(1) and O 1 c contributions. Indeed, we have where In order to express (105) as a power series, an asymptotic expansion of erfc(z) is employed. For large real z it is given by The series is divergent for any finite z. However, few terms of it are sufficient to approximate erfc(z) well for any finite z. Using (108) in (105), one obtains Eq. (109) can be further simplified recalling that (1 + βx) α ≈ 1 + αβx for x 1, entailing that the first term of (109) becomes Since the expansion is stopped at O 1 c , only the n = 1 term in the sum in (109) needs to be considered, as the contributions for n ≥ 2 are at least O 1 c 2 . Moreover, since the n = 1 term exhibits the O 1 c scaling explicitly, only the O(1) contribution arising from the round brackets in the denominator of the general term of the sum in (109) must be taken into account. Indeed, using (110) the n = 1 term in (109) becomes Collecting all the leading contributions to the integral I A one obtains Considering now the second integral on the r.h.s. of (104), denoted by I B , its pre-factor has already the O 1 c scaling. Therefore, only the O(1) term arising from the 1 c expansion of I B is needed. To this purpose, it is sufficient to consider the expansion of γ(u) no further than O u c . Indeed, one finds with In conclusion, using the expansions of I A and I B , respectively given by Eq. (112) and (113), Eq. (104) representing the r.h.s. of (100) becomes γ(s) = 1 (114) Equating term by term the expansion in Eq. (103) and (114), one gets a closed set of equations to determine the three coefficients b corresponding to Eq. (25), (26) and (27) in [28]. The latter system of equations can be easily solved, yielding where A = 1 − 4 Finally, the average spectral density (92) can be evaluated perturbatively. Indeed, taking into account Eq. (97) and applying the change of variables (93), (95) and g(v) = 1 c γ(s) to the r.h.s. of Eq. (92), one finds where the dependence on c is only through the power series (101) expressing γ(s). Therefore, using (112) in (121) one gets where and the coefficients b 1 and b 2 have been defined respectively in (118), (119) and (120). The O(1) leading term ρ 0 (x) in Eq. (123) is simply obtained by observing that Using (125), considering the imaginary part and taking the δ → 0 + limit, the O(1) leading term in Eq. (122) becomes where we have used that and the plus sign in front of the square root has been chosen in order to get a physical (nonnegative) solution. This sign choice amounts to selecting the top alternative for the signs appearing in the expansion coefficients (118), (119) and (120). Eq. (126) corresponds to the Wigner's semicircle as expected. Similarly, using the definitions (118), (119) and (120), taking the δ → 0 + limit and employing the property (127), after some algebra the O (1/c) correction (124) is obtained as where the sign is determined by the same sign convention for the coefficients of the expansion adopted for the evaluation of ρ 0 (x). The correction (128) is non-zero only in the interval −2 < x < 2 and diverges at the edges. Moreover, one can notice that the total average spectral density (122) is correctly normalised up to order O(1/c), given that the integral of the correction (128) over its domain −2 < x < 2 is zero. In Fig. 2, we compare the analytical expression for ρ 1 (x) against the data from direct diagonalisation of sparse ER matrices, with bond weights distribution p K (K) = 1 in the case of c = 50. We obtained the eigenvalues of 10000 matrices of size N = 1000 and discarded the isolated contribution due to the top eigenvalue of such matrices, which lies outside the spectrum. We then organised the remaining 9.99 × 10 6 eigenvalues in a normalised histogram and removed from the data the semicircular contribution ρ 0 (x) evaluated at the mid-point of each histogram bin. We found a very good agreement between the analytical curve representing ρ 1 (x) (red line) and the numerical diagonalisation data (blue dotted line).
As a final remark, we observe that another possible way to extract the large c limit of the average spectral density in the replica formalism would be considering K = K/ √ c in Eq. (66), with the distribution of the rescaled weights being p K (K) = 1 2 δ K,1 + 1 2 δ K,−1 , and expanding the exponential in a Taylor series. This choice would result in the conjugate order parameter being expressed as an expansion in powers of 1 c , given that the odd powers of 1 √ c are cancelled by the fact that the odd moments of p K (K) are zero. Assuming that the order parameter (67) at the saddle point is expressed at the saddle-point as a multivariate factorised zero-mean Gaussian, i.e.
the leading order of its conjugate results in a quadratic polynomial in the v a , namely where σ 2 is determined by the condition 1 σ 2 = iλ ε + K 2 K σ 2 . Using (130) in Eq. (69), one easily obtains the Wigner semicircle. The expansion could also be continued to obtain the corrections in powers of O(1/c).

Alternative Replica solution: uncountably infinite superposition of Gaussians
Kühn in [37] suggested a different approach for the average spectral density problem, which completely bypasses (90). At the outset, the treatment in [37] is the same as in [28], but departs from the Bray-Rodgers original derivation at the level of the stationarity conditions (66) and (67). The order parameter ϕ( v) and its conjugate iφ( v) are represented as uncountably infinite superpositions of complex Gaussians, i.e.
Eq. (131) and (132) are expansions of ϕ andφ in an over-complete function system. The structure of this ansatz derives from the study of models for amorphous systems. In that context, it was noticed that harmonically coupled systems -such as the model defined by our "Hamiltonian" (5) -admit a solution in terms of superpositions of Gaussians [38,70]. This ansatz exhibits permutation symmetry among replicas as well as rotational symmetry in replica space, therefore sharing the same symmetries assumed in [28] (see Eq. (72)). The advantage of the ansätze (131) and (132) is that they allow us to extract the leading contribution to the saddle-point of (60) in the limits N → ∞ and n → 0.
The stationarity conditions (66) and (67) are replaced by the stationarity conditions w.r.t. π andπ. They are and where γ andγ are two Lagrange multipliers to enforce the normalisation condition of π and π. The equality (138) is realised by making sure that γ satisfies the equality while the remaining part (which is a function of ω) should satisfŷ where Z 2 (ω, ω , K) = Z(ω )Z(ω + K 2 ω ) has been used. Since eq. (141) must hold for any value of ω in order for (138) to be satisfied, one notices that the following definition once inserted in the l.h.s. of (142), indeed produces the r.h.s. Likewise, also in (139) a constant part can be isolated, and a part that is a function ofω, viz.
As before, since (144) must hold for anyω, it follows that In order for bothπ and π to be normalised to 1, the conditionĉ = c must be imposed.

Average spectral density unfolded
The solutions of the two coupled functional equationŝ represent the saddle-point evaluation of (133). The symbol p c (k) represents the Poisson degree distribution, which is expected for ER sparse graphs. However, it has been shown in [38,50,53] that the above equations hold unmodified also for any non-Poissonian degree distributions p(k) within the configuration model framework, as long as the mean degree k = c is a finite constant, i.e. does not scale with N . Unlike (90), the equations (146) and (147) can be very efficiently solved numerically by a population dynamics algorithm (see Section 6). Some remarks are in order.
• Inserting (146) into (147) yields a unique self-consistency equation for π that is exactly identical to (34), obtained using the cavity method in the thermodynamic limit. This fact demonstrates once more the equivalence between the replica and cavity methods.
• Alternatively, one could insert (147) into (146), obtaining a single self-consistency equation forπ that readŝ The solution of the latter equation via a population dynamics algorithm will be described below in Section 6. While the two approaches are equivalent, here we choose to work with the {ω} since the final equation for the spectral density is more naturally expressed in terms of those, as shown in the following.
The pdfπ(ω) defined in (148) fully determines the average spectral density. Indeed, recalling (2) one gets where the latter expression corresponds to Eq. (33) in [37]. We notice that (149) is completely equivalent to (36) ifπ(ω) is expressed in terms of π(ω) according to (146). All the observations made about (36) hold here as well. The average spectral density as expressed in (149) is evaluated by sampling from a large population distributed according toπ(ω): this procedure will also be illustrated in Section 6.

The presence of localised states and the role of ε
The average spectral density (149) can be rewritten in order to isolate singular pure-point contributions from the continuous spectrum. Indeed, defining one finds the identity The integrand in (151) becomes singular as ε → 0 for a = 0. These singular contributions can be isolated representing P (a, b) as yielding for the spectral density Here, L ε (λ + b) is a Cauchy distribution with scale (half-width at half-maximum) parameter ε, viz.
that reduces to a delta-peak in b = −λ for any value of λ as ε → 0 + . The spectral density ρ(λ) can then be easily evaluated by sampling (see Section 6) from the population of the a and b (i.e. theω). Relying on the law of large numbers and calling M the number of samples {(a i , b i )}, the two integrals in (153) can indeed be rewritten as where ρ S (λ) and ρ C (λ) indicate respectively the singular and the continuous part of the average spectral density. Eq. (155) is equivalent to (162) and corresponds to Eq. (40) in [37]. Figure 3 shows the spectral density obtained for ER matrices with mean degree c = 2 and Gaussian weights with zero mean and variance 1/c. The tails of the distribution and the central peak in λ = 0 are dominated by localised states, i.e. the eigenvectors corresponding to those values of λ have most of their components equal to zero. Given that there is a one-to-one matching between the eigenvectors of graphs and their nodes, a localised state can be also described as an eigenvector that is concentrated on few sites of the graph. Quantitatively, the presence and location of localised states in the spectrum is confirmed by the numerical analysis of the Inverse Participation Ratio (IPR) of the eigenvectors in [37]. Given an eigenvector v of a N × N matrix, its IPR is defined as The above definition is independent of the eigenvector's normalisation. The IPR of localised states is O(1), as opposed to the O(N −1 ) scaling for delocalised states. Indeed, the numerical study in [36,37] shows that the O(1) scaling for IPR is found in correspondence of the tails and around the peak in λ = 0. Moreover, the IPR analysis makes it possible to relate localised states with the singular contributions to the overall spectrum ρ(λ). Indeed, the regions of the spectrum where the IPR is O(1) are also those where the singular contribution ρ S (λ) dominates over ρ C (λ). Moreover, when comparing numerical direct diagonalisation and population dynamics results two fundamental aspects must be taken into account: • On the one hand, the eigenvalues obtained by direct diagonalisation must be suitably binned in order to produce the numerical spectral density profile. The binning procedure smoothens the localised peaks and makes them harder to detect.
• On the other hand, the parameter ε plays an essential role in highlighting the singular contributions to the spectrum. In the evaluation of (155) only the samples such that for any given value of λ contribute to ρ S (λ). Therefore, in order to have enough data for a reliable evaluation of the singular contribution ρ S (λ), one must refrain from using a very small ε > 0 (such as ε = O(10 −300 )), but rather use a relatively large value ε > 0 (such as ε = O(10 −3 )), to ensure that Mερ S (λ) 1. Here M indicates the number of samples used to evaluate the sums in (155).
The effects of the choice of the regulariser ε are evident in the case c = 2 (see Fig. 3), since for such low c localised states prevail in the spectrum. Indeed, this is due to the structure of the graph itself, which is made of a giant cluster component and a collection of isolated finite connected clusters of nodes of any size (see C). An excellent agreement between direct diagonalisation and population dynamics results is achieved for ε = 10 −3 (top left panel). Indeed, in the ε = 10 −3 case, the Cauchy peaks related to the singular contributions to the spectral density are broadened into "wider" Cauchy pdfs. On the other hand, when using a smaller regulariser such as ε = 10 −300 (top right panel), the spectral density exhibits large fluctuations mainly due to errors that occur when sampling isolated states, as the condition Mερ S (λ) 1 cannot be satisfied. The peaks are superimposed on the continuous curve representing ρ C (λ).
However, the curve ρ C (λ) is unable to capture the tails of the spectral density: this effect is highlighted on a logarithmic scale (bottom panel). This is the typical signature of a localisation transition: the tails of the spectral density are dominated by localised states, hence they cannot be represented by the continuous part of the spectrum. At the same time, using too small values for ε makes it impossible to observe the localised state contributions in the tails. If a larger regulariser (hence better local statistics) were employed (top left panel), then the tails could be revealed. For an extensive discussion of these phenomena, see [37].
These effects are less evident in the c = 4 case, shown in Fig. 4, since localised states are much less relevant as the mean degree c increases. Indeed it has been shown in [71,72] that the weight of the delta-peaks related to localised states is an exponentially decreasing function of c, hence the peaks tend to disappear and merge into the continuous part of the spectrum as c grows. Moreover, the proportion of isolated nodes and isolated tree-like clusters of nodes in the graph is strongly reduced (see again C). Therefore, in the c = 4 case the choice of the regulariser is of lesser importance, and population dynamics simulations run with different values of ε yield very similar results in the continuous part of the spectrum. This is shown  in the left panel of Fig. 4, where population dynamics results obtained with ε = 10 −3 (solid blue line) and ε = 10 −300 (solid green line) are compared with the numerical diagonalisation of 10000 matrices of size N = 1000 (red circles). The peak at λ = 0 due to isolated nodes can be noticed. The log scale plot reveals that the solid green curve obtained for ε = 10 −300 still departs from the blue one obtained for ε = 10 −3 at the mobility edge, i.e. the value of λ at which ρ C (λ) vanishes, hence separating the delocalised from the localised phase. However, the mobility edge for c = 4 is located at a larger λ than in the c = 2 case, in agreement with previous observations [24,37]. The case of the spectral density of ER matrices with Gaussian couplings with c = 4 is also used to show that finite size effects are barely present in the spectral problem, away from the localisation transition. This is confirmed by results in the right panel of Fig. 4 where we compare the numerical diagonalisation of matrices of different size (in particular N = 1000 and N = 4000) with the same population dynamics simulation run with a large regulariser ε = 10 −3 , in order to generate sufficient statistics in the tails.
Corroborating the observations of [71,72], the left panel of Fig. 5 shows the average spectral density for adjacency matrices of ER graphs with Gaussian bond weights with zero mean and variance 1/c, for growing c. The plots are obtained using the population dynamics algorithm with ε = 10 −300 , in order to highlight the continuous part of the spectrum ρ C (λ). As c grows from 2 to 4, the number of peaks superimposed on the continuous curve is strongly reduced and the location of the mobility edge moves to larger values of λ, entailing that the relevance of localised states is reduced. As c is further increased, the edge of the continuous spectrum approaches λ = ±2, which are the edges of the Wigner semicircle that would be obtained as c → ∞ with that choice for bond weights statistics.
Finally, as an illustration of the fact that the formalism presented here can be used to obtain the spectral density for other finite mean degree ensembles in the configuration model class, we show in right panel of Fig. 5 the spectral density of the ensemble of adjacency matrices of random regular graphs (RRG), having degree distribution p(k) = δ k,c . We consider the c = 4 case. For RRGs adjacency matrices, there are no localised states for any c > 2. Conversely, there are mainly localised states for c = 2 (see again [37] for a detailed Figure 5: Left: spectral density of adjacency matrices of ER graphs with Gaussian bond weights with zero mean and variance 1/c, for growing values of c, obtained with population dynamics algorithm using a regulariser ε = 10 −300 to highlight the continuous part of the spectrum. Right: spectral density of adjacency matrices of random regular graphs with coordination c = 4. The population dynamics result (solid blue line) is compared to the analytical expression of the spectral density, found in [45,46], known as Kesten-McKay pdf (red circles). discussion). The population dynamics algorithm perfectly reproduces the Kesten-McKay distribution [45,46], given by the analytical formula We remark that (157) can be derived analytically within the formalism of Section 5.1 employing a "peaked" ansatz for the distribution of inverse variances asπ(ω) = δ(ω −ω). This is shown in F.

Population dynamics algorithm
In this section, we sketch the stochastic population dynamics algorithm that allows us to solve the self-consistency equation (148) and the sampling procedure to evaluate (149). This kind of algorithm is widely used in the study of spin glasses [73,74]. This procedure is general and allows to solve every equation having the same structure as (148) (including for instance (34)). In order to solve (148), one represents the pdfπ(ω) in terms of a population of N P complex values {(ω i )} 1≤i≤N P , which are assumed to be sampled from that pdf. Given that the true pdf is initially unknown, a starting population is randomly initialised with Re[ω i ] > 0. Then, a stochastic algorithm for which the solution of Eq. (148) is the unique stationary solution is constructed.
To start, we fix ε = 10 −300 . Indeed, when solving (148), we may choose ε to be as small as possible in order not to bias the values of theω 6 . Moreover, we define a set I of equally spaced real positive numbers, starting at zero. The parameter λ will take values in I. The distance between two consecutive values in I, denoted by ∆λ, represents the λ-scan. For the plots shown in this paper, we have employed ∆λ = O(10 −3 ). However, the mesh precision can be tuned depending on the desired resolution of the spectrum. Since the average spectral density as expressed by Eq. (151) (or equivalently Eq. (155)) is an even function of λ, it can be evaluated in the interval I and then simply mirrored w.r.t. 0 to obtain the full spectral density shape. As a last remark, it is convenient to normalise the bond weights drawn from p K (K) by √ c, where c is the mean degree. Given these initial remarks, the stochastic algorithm consists in iterating the following steps until a statistically stationary population is obtained, for any given λ ∈ I.
1. Generate a random k according to the distribution k c p(k), where p(k) is the degree distribution of interest and c = k .
2. Generate K from the bond weights pdf p K (K).
3. Select k − 1 elementsω from the population at random, then computê which is the equality enforced by the delta function in (148) . Replace a randomly selected population memberω j (where j = 1, ..., N P ) withω (new) .

Return to (i).
A sweep is completed when every member of the populationω j with j = 1, ..., N P has been updated once according to the previous steps. We denote the i-th sweep for a given λ as S i (λ). A sufficient number N eq of sweeps is needed to equilibrate the population. Stationarity can be assessed by looking at the sample estimate of the first moments of theω variables. The population dynamics algorithm can also be employed for the sampling procedure that allows one to numerically evaluate (149) (and in a similar fashion (36)). Once (for a given value of λ) the population {(ω i )} 1≤i≤N P has been brought to convergence after N eq equilibration sweeps, a number N meas of so-called measurement sweeps M (λ) is performed. Here, N meas is the number of the measurement sweeps.
Each measurement sweep M j (λ) (j = 1, ..., N meas ) can be divided into two parts. In the first part, the population in equilibrium is updated via a sweep S j (λ), as described before. The second part is the actual measurement part m j (λ), involving the following steps. Two (real) empty arrays {a i } {1≤i≤Nsam} and {b i } {1≤i≤Nsam} of size N sam are initialised. Here, N sam is the number of samples per measurement sweep. Each of the a i and the b i will eventually play the role of the real and the imaginary parts of sums of theω , respectively. Then, for i = 1, ..., N sam : 1. Generate a random k according to the degree distribution p c (k) (or in general p(k)) 2. Select k elementsω from the population {ω i } 1≤i≤N P at random and compute 3. Compute which represents the contribution to the average spectral density for a given value of λ. It should be noticed that the sample size M that one employs to compute the sum (162) is completely unrelated to the population size N P . The evaluation of the sample average (162) requires a careful choice of ε, since the value of ε in (162) will determine the width of the Cauchy distributions approximating the delta-peaks in the spectrum (see the discussion in Section 5.2). In general, the value of ε employed in (162) can be larger (up to ε = O(10 −3 )) than the value ε = 10 −300 chosen for the equilibration sweeps in (158), in order to have sufficient statistics to faithfully represent the singular part of the spectrum. Eq. (162), corresponding to eq. (40) in [37], is a discretised version of (151). This step concludes the sampling algorithm for a given value of λ. The full sampling algorithm can be summarised as follows. Compute (162) for given λ 10: end for .
An example of population dynamics algorithm for the spectra of ER matrices is available upon request.

Conclusions
In summary, we have provided a pedagogical and comprehensive overview of the computation of the average spectral density of sparse symmetric random matrices. We started with the celebrated Edwards-Jones formula (2) and outlined its proof. The formula allows to recast the determination of the density of states of a N × N matrix into the calculation of the average free energy of a system of N interacting particles at equilibrium, described by a Gibbs-Boltzmann distribution at imaginary inverse temperature. Therefore, techniques from the statistical physics of disordered systems, such as the replica method, can be employed to correctly deal with the calculation of the average free energy (see Eq. (16)) that features in the Edwards-Jones formula. The replica method was indeed the strategy used in the seminal work of Bray and Rodgers, which represents the first attempt to obtain the spectral density for matrices with ER connectivity. We have reproduced their calculations in detail, showing how to derive the integral equation (90) whose solution still represents a challenging open problem. We also described how to obtain the Wigner semicircle as the leading order of the large mean degree expansion of Eq. (90), as well as the first 1/c correction.
Considering sparse tree-like graphs within the Edwards-Jones framework, we have described how to apply the cavity method to the spectral problem for single instances. The cavity method circumvents the averaging of the free energy by making the associated Gibbs-Boltzmann distribution the target of its analysis. We have demonstrated that in this context the only ingredients needed to compute the spectral density are the inverse variances of each of the N marginal pdfs of the Gibbs-Boltzmann distribution (see Eq. (22)). These inverse variances are easily obtained in terms of a set of self-consistency equations (29) for the cavity inverse variances. Moreover, we have explained how in the thermodynamic limit the cavity single-instance recursions give rise to a self-consistency integral equation (34) for the pdf of the inverse cavity variances, in terms of which the average spectral density (36) is fully determined at the ensemble level.
We have also illustrated an alternative replica derivation, where the high-temperature replica symmetry ansatz employed by Bray and Rodgers is realised by assuming that the order parameter and its conjugate are expressed through an infinite superposition of zeromean complex Gaussians, with random inverse variances (see Eq. (131) and (132)). We showed that the two coupled integral equations (146) and (147) that define the pdfs of the aforementioned inverse variances reduce to a unique self-consistency equation (148), which is completely equivalent to Eq. (34) found within the cavity treatment in the thermodynamic limit. In the replica framework, too, the average spectral density (149) depends only on this single pdf defined in (148). Therefore, once again the equivalence between the cavity and replica approaches is confirmed. Indeed, both methods permit to express the average spectral density as a weighted sum of local contributions coming from nodes of different degree k. On the practical side, the average spectral density is obtained by sampling from a large population of complex numbers distributed according to the pdf of the inverse variances (148) (or equivalently (34)). We remark that both approaches are not restricted to ER graphs, but can handle any degree distribution with finite mean degree. The essential tool for solving self-consistency equations of the kind of Eq. (148) and performing the sampling procedure to evaluate the spectral density (149) is a stochastic population dynamics algorithm. We give a detailed description of the algorithm along with pratical tips to implement it. Results obtained with the population dynamics algorithm are in excellent agreement with the numerical diagonalisation of large weighted adjacency matrices of tree-like graphs, provided that the correct choice of the value of the regulariser ε used in the algorithm is made. Indeed, we thoroughly describe the important role of ε in unveiling the contribution of the localised states to the spectral density, also in connection with the mean degree and structure of the graph. We are also able to show that the spectral density does not significantly suffer from finite size effects, away from any localisation transition.
In line with the pedagogical gist of this work, we include a number of appendices where we provide background information on graphs, discussion of technical aspects, and quick proofs of the identities that have been used in the main text.
We believe that there are still open pathways for further research in this field. One might be the investigation of the properties of the eigenvectors of sparse random matrices through the analysis of the statistics of the local resolvent using the cavity method. As mentioned in Section 3.5, there is a connection between the resolvent of a matrix J (43), its eigenvectors, and the marginal inverse variances ω i defined in Eq. (31). Indeed, given a N × N matrix J with eigenpairs {(λ α , u α )} α=1,...,N , whose resolvent is G(z) = (z1 − J) −1 , from Eq. (46), the following identity holds for λ ∈ R and for any i = 1, . . . , N . The function δ ε (x − x 0 ) = 1 π ε (x−x 0 ) 2 +ε 2 approximates the Dirac delta as ε → 0. When λ = λ α , then ε ImG(λ − iε) ii u 2 iα + O(ε 2 ). On the other hand, considering Eq. (46) for just a single instance (i.e. omitting the ensemble average) and comparing it with Eq. (32), one eventually observes that where the ω i are defined in Eq. (31). Eq. (164) suggests that the distribution of the squares of the eigenvectors can be obtained in terms of that of the ω i . While this suggestion appears to be warranted for eigenvectors corresponding to isolated eigenvalues, preliminary results indicate that it may be less reliable for the continuous component of the spectrum, which exists above the percolation threshold. It is likely that a combination of two facts is responsible for this observation. First, cycles will appear in random graph ensembles above the percolation threshold, as a consequence of which the decorrelation assumptions used in the cavity method will be only approximate and no longer exact. For a wide variety of problems in fields as diverse as the physics of spin-glasses, optimisation or percolation, this aspect is well known not to cause any problems in applications of the cavity method. This is generally attributed to the fact that the tree-like approximation becomes asymptotically exact in the thermodynamic limit for networks in the configuration model class that are on average finitely connected. However, the problem of eigenvector components may be different in this respect, and this is where the second fact comes into play. Due to the occurrence of quasi-degeneracies in the continuous spectrum, it is conceivable that hybridisations of near degenerate eigenstates rather than the true eigenstates may appear in any approximate evaluation of Eq. (164). Indeed, if the u iα appearing in Eq. (164) were in an approximate evaluation in fact replaced by linear combinations of components of eigenvectors corresponding to several nearly degenerate eigenvalues, then individual components would be modified, whereas sums of squares of components would remain unaffected due to orthonormality of the underlying true eigenvectors. This could naturally reconcile the seemingly contradictory observations that eigenvalue densities can be evaluated with great precision using the cavity method, whereas this may not be true for individual eigenvector components. Another topic of great importance would be the understanding of finite-size effects in the immediate vicinity of localisation transitions, which are expected to affect results as in other continuous phase transitions. The study of these phenomena for spectra of sparse random matrices also provides a framework for the investigation of Anderson localisation, which has recently regained new interest in view of its connection to systems exhibiting many-body localisation (see [75] for a recent review). Despite a variety of works concerning the localisation transition in spectra of sparse symmetric random matrices [15,23,24] and studies on the localisation of their top eigenvector [76,77], a systematic approach for the analysis of the finite size effects at the transition is still needed. Steps in this direction have been done in [78,79], where the authors focus on finite size effects concerning Anderson localisation on RRGs.
In connection with the localisation transition, another unexplored field that deserves further investigation is the behaviour of the population dynamics algorithm at the transition. Indeed, as pointed out in [37] it exhibits critical slowing down and long autocorrelation time, which in turn affect the quality of the averages. Moreover, as highlighted in [80] for glassy systems on networks, the population dynamics results at the transition are also affected by finite population size effects. Therefore a systematic analysis of how the algorithm is influenced by the crossover between the delocalised and the localised phase would be extremely insightful not only for the spectral problem per se but in general whenever this kind of algorithm is employed.

B The principal branch of the complex logarithm
The logarithm in the complex plane is in general a multi-valued function. Whenever a well defined, single-valued function is needed, the principal branch of the complex logarithm can be considered. It is denoted by "Log" and defined such that for any z ∈ C with r = |z|, The function Arg(z) denotes the principal value of the argument of the complex number z.
In particular, given z = re iθ ∈ C, the argument of z is given by arg(z) = θ and is in general a multi-valued function. The single-valued principal argument Arg(z) is related to arg(z) via the following relation, where the symbol ... denotes the floor operation, i.e. x is the integer number such that In general Log e z = z for z ∈ C. Indeed, for any z = x + iy ∈ C the following property holds: Log(e z ) = Log|e z | + iArg(e z ) = Log(e x ) + iArg(e iy ) = x + i arg(e iy ) + 2π where Log(x) = ln(x) for x ∈ R and the definition (171) has been used for the principal value of the argument Arg(z).

C Erdős-Rényi graphs
The Erdős-Rényi (ER) graph is the prototypical example of a random graph, introduced by Erdős and Rényi in [81,82]. It is the simplest and most studied uncorrelated undirected random network. It can be denoted by G (N, p), where N is the number of nodes and p ∈ [0, 1] is the probability that any two nodes (there are N (N − 1)/2 possible pairs, hence possible links) are connected. In other words, p is the probability that a link exists independently from the others. In formulae, the probability that a link exists between nodes i and j is All properties of the ER model depend on the two parameters N and p. Its degree distribution is binomial, viz.
Pr[a random node has degree k] = p(k) Indeed, a node has degree k if it is connected to k nodes (the probability of this event being p k ) and at the same it is not connected to all the remaining N − 1 − k nodes (the probability of this event being (1 − p) N −1−k ). The binomial coefficient accounts for the fact that the specific subset of k nodes we choose out of the remaining N − 1 does not matter. The mean degree is then c = p(N − 1). In the limit N → ∞ where N − 1 N and keeping c = N p constant, the binomial distribution in (174) converges to the Poisson distribution, The condition for this limit to hold is exactly verified in the sparse ER ensemble that we consider in our analysis. Indeed, we explicitly ask that the mean degree c be a finite constant, hence ensuring that p = c N → 0 as N → ∞. The Poisson distribution in (175) is decaying exponentially for large degree k.
The structure of an ER graph and in particular the existence of a giant component depend on the value of p [82]. The giant component of a graph is the largest connected component (i.e. cluster of nodes) in the graph, containing a finite fraction of the total N nodes. In a connected component, every two nodes are connected by a path, whereas there are no connections between two nodes belonging to two different components. We have the following properties [83,84]: • For p < 1 N (i.e. c < 1), the probability of having a giant component is zero. Indeed, almost surely there are no connected components with size larger than O(ln(N )). The graph can be described as a disjoint union of trees and unicycle components, i.e. trees with an extra link forming a cycle.
• For p > 1 N (i.e. c > 1), the probability of having a giant component is 1. Almost surely, the graph will have a unique giant component whose size is O(N ) and contains cycles of any length, while the remaining smaller components (typically trees and unicycles) have at most size O(ln(N )).
• p = 1 N = p c represents the percolation threshold as it separates the two regimes: indeed at p = p c (i.e. c = 1) most of the isolated components for c < 1 merge together, giving rise to a giant component of size O(N 2/3 ). As the (constant) mean degree c > 1 increases, the smaller components join the giant component, which then becomes O(N ) in size. The smaller the size of the isolated components, the longer they will survive the merging process.
• For p ≤ ln(N ) N , the graph contains isolated nodes almost surely, hence it is disconnected. As soon as p > ln(N ) N , the graph becomes connected, as the isolated nodes attach to the giant component entailing that every pair of nodes in the graph is connected by a path. The value p = ln(N ) N is then a threshold for the connectivity of the graph.
The structural properties of the graph are reflected in the spectrum. Indeed, the variety of peaks in the spectrum related to singular contributions are due to isolated nodes and isolated finite clusters of nodes that are still present for finite constant c > 1, alongside with the giant component.
The ER graph can also be seen as a model of link percolation [85]. Indeed, ER graphs can be generated also starting from a fully connected graph and removing links at random with constant probability 1 − p.
An algorithm for the generation of the adjacency matrix of any generic random graphs within the configuration model is described in Section 8.1 and detailed in Appendix J.5 (Algorithm 27) of [86]. A simple code for the generation of single instances of adjacency matrices of ER graphs is available upon request.

D How to perform the average (54)
The goal is to perform the average w.r.t. the joint distribution of the matrix entries where represents the ER connectivity distribution, and p K (K ij ) is the bond weight pdf. The average is computed for large N as follows, where the subscripts {c} and {K} respectively denote averaging w.r.t. the joint pdfs of the {c ij } and the bond weights {K ij }, whereas the non-bracketed subscripts c and K refers to the average over a single random variable drawn from p C (c) and p K (K), respectively. Moreover, in the second line we have used independence of the random variables and in the last line we have re-exponentiated the product and the factor 1/2 prevents from over-counting symmetric terms in the double sum.

F The Kesten-McKay distribution from a peakedπ
We analytically derive the spectral density of the ensemble of adjacency matrices of random regular graphs (RRGs), using the formalism of Section 5. We employ eq. (148) and (149), specialised to the RRG case where p(k) = δ k,c and p K (K) = δ(K − 1). Therefore, we obtain for the self-consistency equation forπ whereas for the spectral density we get Eq. (184) can be solved by a degenerate pdf of the form provided thatω ε solves Therefore, the spectral density reads Taking the real part and thereafter the ε → 0 + limit in (188), one obtains where the minus sign has been chosen in order to have a physical solution. The latter expression is the Kesten-McKay pdf in Eq. (157).

G Trees have a symmetric spectrum
A tree is a connected acyclic undirected graph. Acyclic means that it contains no cycles. In a tree, any two nodes are connected via a unique path [87]. In particular, trees are examples of bipartite graphs, in which nodes can be divided into two disjoint subgraphs S 1 and S 2 , such that every node in subgraph S 1 only has neighbours in the complementary subgraph S 2 and vice versa.
Here, we show that the N × N adjacency matrix A (whether it is weighted or not) of a tree with N nodes has a spectrum that is symmetric around λ = 0. In other words, for any eigenvalue λ of A, then −λ is also an eigenvalue of A. This result is encoded in the fact that in the set of recursion equations for the cavity inverse variances (29) and single-site inverse variances (31), the matrix entries appear only through their square.
Let x be the eigenvector of A corresponding to the eigenvalue λ. Given x and λ, we will be able to construct a vector y that is an eigenvector of A corresponding to the eigenvalue −λ. Indeed, considering the eigenvalue equation for the component x i , the node i contributing to the l.h.s. of (190) and the nodes {j : j ∈ ∂i} contributing to the r.h.s. of (190) always belong to different subgraphs. Therefore, the signs of all the components x j with j ∈ ∂i all belonging to one of the two subgraphs (S 1 or S 2 ) can be changed, giving rise to This reasoning can be iterated for any i = 1, . . . , N . Therefore, given the eigenvector x corresponding to λ, one can construct a vector y such that Of course, the choice of inverting the sign of the components of x on the set S 2 is arbitrary. The same result is achieved by changing the sign of the components living on nodes in S 1 while leaving the components defined on nodes in S 2 unchanged. The vector y is thus an eigenvector of the matrix A corresponding to −λ.