Spiking neuromorphic chip learns entangled quantum states

The approximation of quantum states with artiﬁcial neural networks has gained a lot of attention during the last years. Meanwhile, analog neuromorphic chips, inspired by structural and dynamical properties of the biological brain, show a high energy efﬁciency in running artiﬁcial neural-network architectures for the proﬁt of generative applications. This encourages employing such hardware systems as platforms for simulations of quantum systems. Here we report on the realization of a prototype using the latest spike-based BrainScaleS hardware allowing us to represent few-qubit maximally entan-gled quantum states with high ﬁdelities. Bell correlations of pure and mixed two-qubit states are well captured by the analog hardware, demonstrating an important building block for simulating quantum systems with spiking neuromorphic


Introduction
As von-Neumann computers are rapidly approaching fundamental physical limitations of conventional semiconductor technology, a number of alternative computing architectures are currently being explored. Among them, neuromorphic devices [1,2], which take inspiration from the way the human brain works, hold promise of having a wide range of applications, in particular in machine learning and artificial intelligence [3][4][5][6][7][8][9][10][11]. Here we focus on using them as a sampling device to emulate measurement outcomes in quantum physics [12], which are inherently probabilistic in nature. The BrainScaleS neuromorphic system [11] is ideally suited for this task. The accelerated analog circuit dynamics and the inherently parallel nature of the neuromorphic substrate enable a rapid generation of samples which carries the potential of scaling benefits as compared to von-Neumann devices (App. A).
We use neuronal spikes (action potentials) to mark transitions between discrete states and thereby effectively carry out the sampling process. The all-or-nothing nature of spikes represents a blessing in disguise. On the one hand, it does have an apparent drawback by making the computation of gradients -and thus, training -more demanding than in classical deep neural networks [2]. On the other hand, it also allows us to use a spiking neuromorphic substrate in the first place, the speed-up of which we harness for efficient Hebbian learning [13].
Since any quantum state can be mapped to a probability distribution [14,15], it can, in turn, be represented using networks of leaky integrate-and-fire (LIF) neurons [16][17][18]. Here, we use the BrainScaleS-2 chip [11] as a physical substrate to emulate such networks. This mixed-signal neuromorphic platform is centered around an analog core: neuro-synaptic states are represented as voltages and currents in integrated electronic circuits and evolve in continuous time. Its configurable connectivity of neurons allows us to explore various different network topologies, including shallow, as well as deep and densely connected ones. With this substrate, we demonstrate an approximate representation of quantum states with classical spiking neural networks that is sufficiently precise for encoding states with genuine quantum correlations.

Neuromorphic encoding of quantum states
In classical machine learning, generative models based on artificial neural networks are used to encode and sample from probability distributions [13]. Similarly, spiking neural networks can be viewed as approximating Markov-chain Monte-Carlo sampling, albeit with dynamics that differ fundamentally from standard statistical methods [19]. Here, we encode quantum states      [11]. Similar to biological neurons in the human brain, LIF neurons communicate via spikes. Each neuron can be viewed as a capacitor integrating the currents it receives from its synaptic inputs to generate a membrane potential. Whenever this membrane potential crosses a threshold from below, the neuron sends a spike to the synaptic inputs of its efferent partners (Fig. 1c, top panel). After sending a spike, the neuron is set to an inactive state, in which no additional spike can be triggered for a certain time, referred to as the refractory period τ ref . In the spike-based sampling framework, neurons in this refractory state encode the state z = 1, and z = 0 at all other times (Fig. 1c, lower panel). The stochasticity required for sampling is induced by adding a random component to the generation of spikes; for LIF networks, this can be ensured by sufficiently noisy membrane potentials [16,18]. To this end, we used on-chip sources to inject pseudo-Poisson spike trains into the network (see App. A).
As an experimental result, the BrainScaleS-2 chip returns a list of all spike times and associated neuron IDs. This information is sufficient to reconstruct the network state at any point in time. We estimated the distribution sampled by the network by observing its state at regular intervals, as visualized in Fig. 1c. To ensure an optimal estimate, the observation frequency needs to be at least (τ ref /2) −1 (see App. A). For our analysis, we used (τ ref /5) −1 , thereby guaranteeing a large safety margin. The resulting binary configurations are collected in a histogram as shown in Fig. 1d.
A pure quantum state is described by a vector in Hilbert space and can be represented by a hermitian density matrix with complex entries. Density matrices can also encode mixed states and thus account for a possible coupling to an environment, which is relevant for a realistic description of experiments. Fig. 1e shows an example of a density matrix for a system of two spin-1/2 degrees of freedom (qubits) corresponding to a Hilbert-space dimension d = 4. The corresponding probability distribution which we encode in our network is obtained from a so-called tomographically complete measurement [14]. Such a measurement has d 2 possible outcomes. Mathematically, these outcomes are represented by a set of operators {M a } a , forming a so-called positive-operator-valued measure (POVM). The density matrix can be reconstructed uniquely as ρ = {a} P(a)Q a from the probabilities P(a) = Tr [ρM a ] for obtaining outcome a according to Born's rule. The operators Q a are given by Q a = {a } T −1 a,a M a , with T a,a = Tr [M a M a ] [15]. Hence, any density matrix ρ can be mapped to a probability distribution P(a), and the information contained in the quantum state can be retrieved from that distribution. In our two-qubit example (Fig. 1d) we chose M a = M a 1 ⊗ M a 2 , where M a i (a i = 0, . . . , 3) are projection operators onto the single-qubit states represented as the four corners of a tetrahedron on the Bloch sphere. As each a i can take four different values, the encoding of the probabilities P(a) by a spiking network is realized by representing each qubit by a pair of binary neurons in the visible layer (cf. gray shadings in Fig. 1a). This results in the distribution p * (v ) over the visible neurons (see App. C).
To approximate p * (v ) through spike-based sampling, the parameters of the spiking network were adjusted in an iterative training procedure. We used the Kullback-Leibler divergence to measure the quality of the sampled marginal p(v ; W). In each training epoch, the synaptic weights were updated along the gradient of the D KL (see App. D): which is derived assuming that the distribution p(v ; W) is given by Eq. (1). While the dynamical behavior of the spiking hardware approximates this probability distribution, the exact relation between the network parameters and the encoded distribution cannot be given in a closed form [18]. Instead, pairwise correlations v i h j model in the network were measured   from the sampled distribution p(v , h; W). Target correlations 〈v i h j 〉 target were also obtained from the sampled distribution by renormalization to the target marginal distribution: A similar scheme was used for the neuronal biases b j and d i . The performance characteristics of the neuromorphic hardware make computing additional samples cheap compared to reconfiguration and reinitialization. Hence we can take into account the complete sampled distribution for the update calculation, rather than relying on few-sample approximations as in contrastive divergence [13]. This enables a much better estimation of the D KL gradient and does not rely on layer-wise conditional independence, allowing the exploration of network topologies other than bipartite graphs. See App. D for an extended discussion of the learning scheme.

Encoding an entangled Bell state
To demonstrate that a spiking neural network can learn to represent entangled quantum states we focus on a maximally entangled two-qubit state, the Bell state |Ψ + 〉 = (|↑↑〉 + |↓↓〉) / 2. This state is a prototypical example exhibiting quantum mechanical correlations [20,21]. We trained a network of four visible and 20 hidden neurons to encode the POVM probability distribution corresponding to ρ B = |Ψ + 〉〈Ψ + |. For calculating the weight updates in each epoch of the training procedure, as well as for evaluating expectation values, we drew 125000 samples of neuron states. This number is sufficient for the saturation of the D KL as can be seen in Fig. 3b and was used for all experiments, if not specified otherwise.
To characterize the learned quantum state, we used the observable B(Θ), which can signal genuine quantum correlations and is experimentally accessible via measurements as illustrated and defined in Fig. 2a: The two qubits are distributed to two parties who independently perform one of two possible measurements on their respective qubit. We choose the standard parametrization of the different measurements by a single angle Θ. For a Bell state this procedure yields correlations violating the inequality |B(Θ)| ≤ 2, which is obeyed by classical systems [21]. At Θ = π/4 this inequality is maximally violated for the Bell state ρ B and thus yields an experimentally accessible witness for Bell correlations [20,22].
The correlations in the quantum states encoded as probability distributions by the trained spiking network clearly exceed the classicality bound |B(Θ)| = 2 (red points in Fig. 2b) and are in agreement with their exact Θ-dependence (black line). The inset shows how the Bell correlation witness B(Θ = π/4) develops during the training, converging after less than 1000 iterations.
To illustrate the generality of our neuromorphic encoding scheme we consider mixed quantum states by adding white noise to the pure Bell state resulting in the Werner state [23]. Increasing the noise reduces |B(Θ)| and eventually confines it within the classical regime (cf. green data in Fig. 2b). For 1 − r > 1/ 2 the Bell correlation witness fails to detect entanglement, and for 1 − r > 2/3 the state becomes separable (unentangled). The resulting mixed states are faithfully represented by our system for any value of r as shown in Fig. 2c. The fluctuations in the experimental data decrease with increasing noise contribution, allowing a more accurate learning of mixed states. This counterintuitive effect is due to additional noise leading to an increase in entropy, which is synonymous with sampling from more uniform distributions. These, in turn, are realized by weaker weights, thus decreasing the influence of imperfect synaptic interactions in the neuromorphic substrate.

Learning performance
We analyzed in detail the convergence of the learning algorithm using the classical Kullback-Leibler divergence D KL as defined in Eq. (2). In addition, we use the quantum fidelity to quantify the distance between the target state ρ B and the network-encoded state ρ N , which, for pure states, reduces to the state overlap. As shown in Fig. 3a, the learning converges after 1000 training epochs. Increasing the number of hidden neurons we find that the fidelity reaches ≈ 98% (correspondingly D KL 10 −2 ) for M 20 hidden neurons. The limited reachable fidelity is a result of many different factors of the physical implementation of the spiking neural network on the BrainScaleS-2 platform. The synaptic connections are implemented with 6-bit resolution, limiting the achievable precision of approximating the probability distribution. Also, uncontrolled environmental changes such as temperature variations or hostto-system effects influence the performance of the hardware. This manifests in the jumps of fidelity occurring during learning, as well as in strong noise in the fidelity after the learning process has saturated, as can be seen in Fig. 3a. These instabilities exceed the anticipated noise level due to finite sample statistics used for evaluating observables and calculating gradients in each epoch. These factors degrade the correspondence between the model assumption underlying the employed learning rule and the actual dynamics of the hardware. Many of the issues mentioned above can be resolved in future hardware generations.
To ensure that the learning performance is not limited by finite sample statistics, we evaluated the Kullback-Leibler divergence as a function of the number of samples in a trained network with fixed network parameters. Figure 3b shows the expected convergence towards a minimum value determined by the quality with which the spiking network approximates the POVM distribution. Typically, for >10 5 samples the statistical error is negligible compared to the errors due to hardware noise and limited representational power of the network, causing the saturation of the DKL observed in Fig. 3b. This justifies our choice of training with 125000 samples per epoch.
Having demonstrated high-fidelity emulation of two-qubit entangled states, we investigated whether states of multiple qubits can also be encoded by our spiking sampling network. Figure 3c shows the fidelity achieved in learning Greenberger-Horne-Zeilinger (GHZ) states [24], i. e. N -qubit generalizations of a Bell state, as a function of the number of hidden neurons M . The underlying probability distribution covers a larger state space of the visible neurons, requiring us to increase the number of samples to 225000 to reach convergence in the D KL . In all cases the fidelity of the learned state to the perfect GHZ state increases with M , reaching values of close to 90% and about 70% for three and four qubits, respectively. As layered network architectures are known to require a large number of neurons for representing GHZ states [15], we assume that larger chip sizes will allow to increase these values further. Note that a GHZ-state fidelity above F = 1/ 2 ≈ 70% means that the state exhibits genuine N -partite entanglement (cf. dashed line in Fig. 3c) [25].

Deep and partially restricted networks
Our flexible learning scheme allows the training of network architectures beyond simple bipartite graphs. To explore network architectures with potentially larger representational power we added connections between the visible neurons, resulting in a more densely connected network. Figure 4a shows that a Bell state can be encoded successfully with this architecture, reaching similar fidelities as the two-layer fully restricted spiking network. We also explored deeper network architectures by adding an additional hidden layer, see Fig. 4b. Again, the Bell state was learned successfully reaching similar fidelities as in the bipartite case. We note that the learning performance is not monotonic at small M for M 2 = 10 neurons in the second hidden layer. This is expected, since the intermediate layer constitutes an information bottleneck towards the visible layer, which makes learning more difficult. Therefore, the greater representational power offered by additional depth [26] does not necessarily translate into a higher fidelity for M < M 2 . The overall non-monotonic dependence of the fidelity on the number of hidden neurons is caused by hardware noise leading to fluctuating training performance.
The fact that the learning performance does not improve when using different architectures indicates that the reachable fidelity is currently limited by technical imperfections rather than the representational power of the ansatz. Larger-scale systems may be able to exploit the greater representational power of these deeper and more complex architectures. Deep network structure · · · · · · · · · · · ·

Conclusion
We have shown that a spiking neural network implemented on a classical neuromorphic chip can approximate entangled quantum states of few particles with high quantum fidelity. In particular states with non-classical Bell correlations can be encoded faithfully, demonstrating that the representation of quantum states on a classical spiking network can capture their intrinsic quantum features. The fidelities and system sizes achieved in this first study on neuromorphic quantum state encoding should be regarded as a proof of principle. The experienced restrictions are mainly technical in nature and can be improved in future generations of spiking neuromorphic devices. Specifically for the BrainScaleS-2 system, both the hardware and its surrounding software framework are in an ongoing maturation process. The size and fidelity of the approximated quantum states can be significantly improved upon by optimizing the usage of hardware realestate, the signal-to-noise ratio of the analog circuitry and the calibration of the chip. Judging from the current pace of progress in neuromorphic engineering, significantly larger systems, both digital and analog, can be expected to become available in the near future [1].
Furthermore, runtime improvements are anticipated, as the current bottleneck is the calculation of the weight updates of the network parameters, which is done "offline" on a conventional computer and only the sampling itself is performed on the chip (see App. A). Using the on-chip plasticity processor to update synaptic weights has the potential of drastically reducing the training time by removing the cumbersome chip-host loop [27].
One key advantage of this neuromorphic system as compared with simulated generative models is that scaling to larger network sizes does not increase the time needed to collect a desired number of samples. We illustrate this property by comparing the sampling time on a neuromorphic chip with sampling times achieved in CPU implementations in App. B showing a gain through neuromorphic sampling already at moderate system sizes. Given the efficient learnability [28] and representability of important classes of quantum states [29][30][31], and the availability of sampling schemes for neuromorphic devices [32,33], we thus expect favorable scaling properties for our approach. Thus our work opens up a path towards applications of neuromorphic hardware in quantum many-body physics.

-SFB 1225 (ISOQUANT), by the European Union 7th and Horizon-2020 Framework Programmes, through the ERC Advanced Grant EntangleGen (Project-ID 694561) and under grant agreements 604102, 720270, 785907, 945539 (HBP), and by the Manfred Stärk Foundation.
Data availability The data that support the findings of this study, as well as the code to generate the presented results using the BrainScaleS-2 system, and the scripts to analyze the data are available at https://github.com/sCzischek128/SpikingNeuromorphicChipLearnsEntangled QuantumStates

A Implementation details of BrainScaleS-2
The BrainScaleS-2 system is a mixed-signal neuromorphic platform. Its analog core is composed of neuron and synapse circuits with inherent time constants of the order of microseconds. An application-specific integrated circuit (ASIC) for the BrainScaleS-2 system features 512 neuron circuits, which emulate the adaptive exponential integrate-and-fire model. These individual compartments can be wired to resemble more complex structured neurons. An onchip analog parameter memory as well as integrated static random-access memory (SRAM) cells allow us to individually configure and optimize the dynamics of each circuit. Each neuron integrates input from 256 dedicated synapses, which carry a 6-bit weight and can be either excitatory or inhibitory.
The analog core is accompanied by supporting logic, including circuitry for communication and configuration. Further functionality is provided by high-bandwidth spike sources, which can emit either regular or Poissonian spike trains of configurable frequency. A routing module allows mixing these spikes with external stimuli and recurrent events. It allows, in combination with in-synapse event filtering, the implementation of arbitrary network topologies.
Custom embedded processors allow the modification of the entire configuration space during the runtime of an experiment. Tightly coupled to the synaptic arrays, they allow the efficient and flexible implementation of learning rules based on observables such as neuronal potentials, firing rates, and synaptic correlations.
A network of leaky integrate-and-fire (LIF) neurons can implement a sampling spiking network (SSN) if the neurons are under stochastic noise influence, their membrane time constant is sufficiently small and the synaptic and refractory time constants roughly match [18]. A system-specific calibration is required to configure the analog core of BrainScaleS-2, shown in Fig. 5a according to these requirements. For ease of implementation we use a simple routing scheme in which the on-chip network looks like 128 unique sources which can be arbitrarily connected. This allows the association of each of the 128 synapse drivers with one spike source while using the double line to implement signed synapses (cf. Fig. 5b).
The stochastic input spikes are generated via two of the eight on-chip linear shift registers (LSFRs). We assign the spike source IDs 0-63 to the network neurons and split the spike trains from the LSFRs among the IDs 64-127. For networks smaller than 64 neurons, the upper part of (0-63) remains unused. Again simplifying the implementation we use the first half of the noise IDs (64-95) as excitatory and the second half (96-127) as inhibitory sources (cf. Fig. 5c lower  part). This scheme allows in principle all-to-all connectivity within the network. Choosing to use a layered network structure results in a block structure of the upper part of the synapse array (cf. Fig. 5c).
Each sampling neuron is connected to 5 randomly chosen excitatory and 5 randomly chosen inhibitory noise sources. This introduces correlations between neurons even without synaptic connections, but in general does not hinder training [16,34]. Synaptic connections   on BrainScaleS-2 are 6-bit-valued circuits. The dynamical impact of a single network spike (used to mediate the stochastic response of the receiving sampling unit) onto another neuron is given by its own strength relative to the total strength of the input provided by the background sources. The latter defines the transfer function and thereby the excitability of the neurons (cf. Fig. 5g). Choosing the noise parameters (weight and number of sources) is done such as to attain the competing goals of allowing the network neurons to drive each other significantly while allowing for small weight changes within the 6-bit resolution limit. The particular choice is, in general, problem dependent.
Having chosen the noise parameters, the sampling interface of BrainScaleS-2 becomes a black box that requires a weight (6-bit) matrix and a (10-bit) bias vector and returns a set of spike trains. Neurons are assigned a state of z = 1 at time t if they emitted a spike within their effective refractory period τ eff ref prior to t (cf. Fig. 1c in the main text). We determine τ eff ref by setting the leak potential of the neurons to its maximum value and measuring the resulting inter-spike intervals (cf. Fig. 5e). The effective refractory time consists of the clamped part which is digitally driven and therefore does not vary between different neurons and the drift part back to the spiking threshold in the end. Due to the circuit variability (e.g. different membrane time constants) of the analog circuits we see some modest variation in τ eff ref (cf. Fig. 5f). Using the measured τ eff ref we assign a state every 2 µs and use the set of these states for the evaluation and the update calculation. Figure Fig. 5h demonstrates the correctness of an approximated distribution for a simulated sampling spiking network (using [35]) as a function of the number of samples for different state assign times d t (cf. Fig. 1 in the main text). For more than two samples per refractory period τ ref the number of samples required to achieve a given performance level increases due to the correlated states as expected from the Nyquist-Shannon theorem. Both the noise parameters and the sample frequency were chosen such that they enable sufficiently accurate sampling, but without performing an exhaustive optimization.
As discussed above, a chip-specific calibration is required but can be reused for each training. For each experiment the chip needs to be initialized (blue period in Fig. 5d) once. This ensures that the correct calibration is loaded and the routing is configured correctly before the training iterations (orange period in Fig. 5d) can start. After the initialization only the synapse array (weights) and the leak potentials of the neurons (biases) are reconfigured once per epoch (green period in Fig. 5d). Each training epoch consists of 26 sampling runs (red section in Fig. 5d) and a single calculation of the parameter update (purple in Fig. 5d). In each hardware run we build a program for the FPGA to execute (dark red in Fig. 5d), transfer it to the FPGA with some initial buffering (yellow in Fig. 5d) in order to compensate for network latencies, perform the actual execution on chip (light blue in Fig. 5d) and transfer the spikes back to the host computer (grey in Fig. 5d).
In total, an epoch takes about 1.9 s of which roughly half is spent in the sampling and the other half is used to calculate the parameter updates. While some time was spent to improve performance, both parts can still be optimized. For example the gradient calculation is implemented in Python and most of the sampling time is spent buffering and reading back the results. The actual hardware runtime is only 30 % of the time marked as HW-run in Fig. 5d. Using a more complex routing setup an increase to at least 256-spike sources is possible and since BrainScaleS-2 is a physical system the runtime of the hardware part is not affected by the network size. Time for generating 10 6 samples HX size limit for 6 qbits size limit for 10 qbits CPU measured @1.2GHz CPU limit @1.2GHz

B Computation time benchmark for sampling from neural networks
In this section, we provide a speed comparison between the BrainScaleS-2 neuromorphic chip and a C++-implemented software solution to the sampling from binary Boltzmann machines. The software implements standard Gibbs sampling, i.e. it sequentially calculates the "membrane potential" u i = b i + k W ki z k for each neuron and assigns a new state z i = 1 with probability σ(u i ) = 1/(1 + exp[−u i ]) and z i = 0 otherwise. This implementation, while fairly optimized in single-thread performance, does not take into account the potential parallelism of a layered structure. Since the simulator is optimized for large-scale systems it drops all terms with W ki = 0, at the price of an additional indirection. The sum now runs over a list of indices which is harder to optimize than a simple sequential iteration. We executed this on the bwForCluster NEMO cluster [36] which uses Intel Xeon E5-2630v4 (Broadwell) CPUs. Generating a new state requires the update of all neurons, and each update of a single neuron requires the calculation of u i plus a comparison with a random number for the probabilistic update. For the architecture used in the main manuscript, i.e. layered networks with 2N visible and M hidden neurons and assuming a perfect implementation without additional cost for memory accesses, generating a new update takes 2(2N )M evaluations and additions of the term W ki z k , besides (2N ) + M additions of b i , and (2N ) + M comparisons to a random number. Assuming further that each of these steps takes one clock cycle, we can estimate the expected time required.
In order to reduce the impact of the initialization of the software sampler (loading of the network configuration and initialization) we measure the time to generate 10 6 samples. We note that the number of operations per update is dominated by the number of connections (synapses) 2(2N )M . As such, the time required scales linearly in the number of hidden units only for a fixed number of visible units, which is given by the size of the physical system (cf. Fig. 6).
On the other hand, the BrainScaleS-2 implementation, due to its inherently parallel architecture, requires a sample generation time that is independent of the size of the sampled network. With τ ref /2 = 5 µs per sample (cf. Fig. 5h), this leads to a constant time of 5 s. This constant scaling is only true if the network fits onto the system (up to 256 sampling neurons). Since the number of visible neurons is given by the size of the physical system that is represented (N spins), larger physical systems give a greater speedup. Already for the case of 8 spins (16 visible units and 180 hidden units) the fixed runtime of the BrainScaleS-2 system is exceeded by our estimation from the idealized software estimate (cf. Fig. 6). Larger system sizes will skew this comparison further to favor of BrainScaleS-2, which can even implement more densely connected network topologies without incurring a performance penalty. We also note that the BrainScaleS-2 chip requires less than 500 mW [37,38], while the Intel Xeon E5-2630v4 has a thermal design power (TDP) of 85 W for 10 cores. As such BrainScaleS-2 is using comparable energy even for the smallest systems we implemented in the prototype system used in the main manuscript. While the system size at which the BrainScaleS-2 chip outperforms CPU implementations may shift to larger values when comparing to the fastest currently available CPUs, the fundamental difference in scaling behavior, i.e. constant v.s. linear, persists.
To reconstruct the density matrix from this probability distribution, the inverse of the fullsystem overlap matrix T is needed, which can be constructed as the product of the singlequbit overlap matrices, T = T 1 ⊗ T 2 . Each single-qubit overlap matrix consists of the elements T a i ,a i = Tr M a i M a i . For the tetrahedral POVM the inverse T −1 i of the single-qubit overlap matrix takes the form The density matrix can then be reconstructed linearly as ρ = {a 1 ,a 2 } P (a 1 , a 2 ) Q a 1 ,a 2 , with operators Q a 1 ,a 2 = {a 1 ,a 2 } (T −1 . This enables an efficient evaluation of expectation values by sampling configurations from P (a 1 , a 2 ) in the POVM representation, where the density matrix does not need to be calculated explicitly. The POVM representations of important classes of quantum states can be approximated well and in a scalable way by generative modelling approaches [15]. The computational bottleneck of these methods is the generation of samples from the model distribution, and can potentially be alleviated using neuromorphic devices.
The Bell state is encoded in a sampling spiking network as follows. The visible neurons v are identified with the qubits a in the POVM representation. The network parameters are trained such that the distribution P B (a 1 , a 2 ) is represented by the network. To achieve this, we need to translate the variables a 1 , a 2 , which can take four possible values each, into binary neurons v , where each neuron can take the values 0 or 1. The mapping to four binary visible neurons v 1 , . . . , v 4 is accomplished by defining From this we can derive the distribution p * B (v ) over the states of the visible neurons and have all ingredients to encode the Bell state in our spiking network.
Analogously, the probability distribution for the two-qubit Werner state with noise contribution r can be derived from its density matrix [23,39], performance characteristics. An implementation of the preparation-dominated contrastive divergence scheme on the spiking neuromorphic hardware hence does not provide any of the benefits observed in software simulations. In contrast we take advantage of these hardware characteristics by using the full model distribution to calculate network parameter updates, which improves the quality of the stochastic gradient estimation. We further optimize the hardware training implementation by reconstructing the correlations between visible and hidden layers from the encoded distribution by reweighting all samples p(v) according to the target probability p * (v ), see Eq. (14) and Eq. (15). This is in contrast to contrastive divergence learning where the distribution of the visible layer is explicitly enforced to match the target distribution and only the correlations with the hidden layer are being sampled. Beyond the optimized implementation on the spiking neuromorphic system, our proposed training algorithm can be used to obtain network parameter updates for arbitrarily connected networks, while contrastive divergence is limited to strictly layered network structures.