Strictly local one-dimensional topological quantum error correction with symmetry-constrained cellular automata

Active quantum error correction on topological codes is one of the most promising routes to long-term qubit storage. In view of future applications, the scalability of the used decoding algorithms in physical implementations is crucial. In this work, we focus on the one-dimensional Majorana chain and construct a strictly local decoder based on a self-dual cellular automaton. We study numerically and analytically its performance and exploit these results to contrive a scalable decoder with exponentially growing decoherence times in the presence of noise. Our results pave the way for scalable and modular designs of actively corrected one-dimensional topological quantum memories.


Introduction
Storing quantum information in a noisy, classical environment is essential for scalable quantum computation and communication [1]. Kick-started by Shor's 9-qubit code [2], quantum error correction comes to the rescue: Logical qubits are stored in virtual subsystems [3] that decouple from typical environmental perturbations and allow for error detection and correction [4,5]. Quantum error correction codes come in two flavors: The conventional ones (e.g., Shor's code) have no physical interpretation and are treated as abstract entities, isolated from the underlying computational architecture (much like classical error correction codes).
Topological quantum codes, in contrast, are tied to the real world in that they are realized as ground state manifolds of local Hamiltonians and thereby inherit the geometry of their environment. Familiar examples are the Majorana chain (a p-wave superconductor) in one [6,7] and the toric code in two spatial dimensions [8,9], both of which have seen experimental progress in the last years, see e.g. [10,11,12,13] and references therein. In this manuscript, we are interested in such topological codes in one dimension and present a method to stabilize them using only strictly local resources. Topological codes allow, in principle, for two modes of operation: Taking their realization as ground states seriously entails the intriguing concept of self-correction where errors appear as excitations that are energetically suppressed by the parent Hamiltonian [14,15,16]. In contrast, active error correction adopts the algorithmic scheme of conventional codes, i.e., an external decoder is fed with measured syndromes and computes compatible corrections. The fragility of low-dimensional topological order to thermal excitations [17,18,19], and the so-far unsettled quest for realizable self-correcting codes [20], makes active error correction on topological codes one of the most promising routes to long-term qubit coherence [21,22,23]. As realizable quantum architectures loom on the horizon [24], convenient abstractions face the intricacies of reality: Can active error correction be implemented efficiently? How can it be scaled up when it is cast into hardware? Since space and time constraints can rule out implementations of otherwise promising algorithms, it is a crucial question whether and how topological quantum codes can be stabilized by manifestly local decoders. For the toric code, this has been tackled with a completely local but hierarchical decoder in [25] (inspired by [26]), with translationally invariant cellular automata [27,28], with a modular setup of simple units connected by noisy links in [29], and with optimized versions of minimum-weight perfect matching [30,31,32,33]. Prolonging the lifetime of certain stabilizer codes by local unitary operations (instead of full-fledged error correction) may be a viable alternative [34]. However, rigorous results on the performance of decoders with strict space and time constraints are scarce.
In this work, we focus on the simplest case of a one-dimensional topological quantum code, defined by the ground state space of the Majorana chain [6], and remodel a known (classical) cellular automaton [35,36] to contrive a convenient, strictly local quantum decoder. We prove that both the probability for successful decoding and the time required to do so scales favorably with the chain length, surpassing conventional global decoding schemes. For realistic error rates, this allows for the stabilization of logical qubits in the presence of continuous (uncorrelated) noise using shallow, translationally invariant circuits with local wiring only. This paves the way for scalable and modular on-chip realizations of actively corrected topological quantum memories based on one-dimensional p-wave superconductors. In the following, we provide a detailed outline of the methods and approaches used to derive these results: In Subsec. 2.1 we start with a description of the quantum code defined by the degenerate ground state space of the Majorana chain, where dephasing is topologically suppressed and depolarizing errors are forbidden by fermionic parity superselection (which can be violated in real setups due to quasiparticle poisoning [37,38]). This paradigmatic model exemplifies topological quantum error correction and relates to the familiar toric code via Jordan-Wigner transformation in the degenerate case of a L × 1 square lattice with open boundaries. The syndromes of the Majorana chain quantum code (MCQC) are fermionic quasiparticles flanking strings of parity-preserving errors. Maximum-likelihood decoding therefore requires pairing quasiparticles with minimum-length error strings; this scheme is known as minimum-weight perfect matching (MWPM) [39] for the toric code and reduces in one dimension to simple majority voting, the decoding scheme used for classical repetition codes. In Subsec. 2.2, we review the known result that applying majority voting at a fixed rate to the MCQC leads to an exponentially growing lifetime of the encoded logical qubit with the chain length L. This is true for continuous, uncorrelated (Bernoulli) noise on the physical qubits with arbitrary on-site error probability p 0 -except for the singular, completely mixing channel with p 0 = 1 2 ; there is no non-trivial error threshold, in contrast to "true" two-dimensional MWPM for the toric code [40,30]. However, global majority voting violates locality as it requires space for each logic gate and time for communication between them. This raises the question whether this extraordinary robustness of majority voting survives in realistic setups. In Subsec. 2.3 we argue that low-level decoders of quantum memories must be realized in hardware and close to the coherent subsystem (here the Majorana chain) to allow for modularity and scalability, both in the number of chains and their length. Then, collecting the syndromes of an extended chain in a central processing unit, and distributing corrections afterwards requires timewhich scales with the system size L. We demonstrate that this important feature of global majority voting precludes its application at a fixed rate for L → ∞, and thereby spoils the favorable scaling of decay times. This line of thought motivates our search for a manifestly local decoder of the MCQC, taking finite communication speed and spatial extent seriously. Then, locality implies that restrictions on the time granted for decoding translates into restrictions on the syndromes that can influence a local correction. We derive a generic upper bound on the success probability for decoding the MCQC with local decoders and discuss implications for the scaling of the decoding time with the chain length.
After setting the scene (and sketching what we can expect and what we cannot), we aim for a feasible local decoder of the MCQC. To this end, Subsec. 3.1 introduces the concept of cellular automata (CA) as well-developed prime example for physically realistic local computation. The natural invariance of local CA rules in space narrows down the choice of local decoders but allows for implementations that can be scaled up easily. While CAs naturally operate on classical bits, the physical qubits of the MCQC are not accessible-only the syndromes can be measured without perturbing the state (we call this the "quantum handicap"). We argue that only CAs featuring a particular symmetry (called self-duality) can be employed as MCQC-decoders.
To decode the MCQC by means of a CA, implementing a global majority vote by local rules seems a good approach. This task is known as density classification problem [41,42] and has been shown to be unsolvable for binary CAs in any dimension [43]. In Subsec. 3.2 we review some of the results on approximate density classifiers which could provide viable replacements for perfect majority voting if error rates are small (i.e., away from p 0 = 1 2 ). We present two binary CAs that are known to perform well on density classification, one of which (called TLV) is self-dual; it can be rewritten in a form that complies with the "quantum handicap": It naturally takes syndromes as input and produces correction operations as output. Before we can explore the performance of TLV as MCQC-decoder, the question of boundary conditions has to be addressed. It is common to place CAs on finite chains with periodic boundaries. In Subsec. 3.3 we point out that this is not compatible with locality of classical computations on the one hand and the necessity of a stretched quantum chain on the other. Hence a modification of TLV at the boundaries is required (denoted by TLV). We demonstrate that for MCQC-decoding, mirrored boundary conditions are the way to go: The CA operates in a cavity-like geometry to pair quasiparticles with partners in the edge modes of the MCQC.
In Subsec. 4.1 we start our analysis of TLV with a numerical evaluation of its decoding capabilities. Sampling uncorrelated Bernoulli random configurations with on-site error probability p 0 and subsequent evolution with TLV allows us to gauge the possible downsides of performing only approximate majority voting. Despite the existence of periodic cycles that cannot be decoded, numerics suggests that for p 0 < 1 2 only an exponentially (in L) small fraction of error patterns fails to be corrected successfully. Moreover, the typical time needed to rotate an error-afflicted instance of the MCQC back into the codespace grows sublinearly with the chain length L (in contrast to global majority voting). To substantiate these claims, we apply the concept of sparse errors to the particular case of TLV. In Subsec. 4.2 we derive a central statement of this work: The probability to decode a length-L MCQC successfully with TLV after t ∝ L κ time steps (with κ > 0 arbitrary) tends to 1 exponentially fast for L → ∞ and small but finite error probabilities p 0 1 2 . This provides us with a much simpler and faster decoder than global majority voting (the decoding time of which scales linearly with L), and especially implies that for these error probabilities the "expensive" global nature of majority voting is not required for efficient decoding.
In the remainder, we shift our focus from the decoding of static error patterns to the protection of the MCQC in the presence of continuous (Bernoulli) noise. In Subsec. 5.1 we realize this scenario by applying the local rules of TLV and on-site errors with probability p 0 alternately. We demonstrate numerically that TLV cannot cope with such perturbations of its evolution in that the lifetime of the logical qubit grows only subexponentially with the chain length. In the light of known results on the behavior of one-dimensional CAs, this is unfortunate but not surprising: Simple, one-dimensional CAs subject to noise are expected to be ergodic; this is known as the positive rates conjecture [44]-it is well-established that this conjecture is incorrect, but the only known counterexample is extraordinary complex [45,26,46], and we cannot expect that our setup is a simpler one.
If one abandons strictly one-dimensional decoders (with circuit complexity ∼ L), there is a rather generic solution to this problem: Any given decoder can be employed to counter continuous noise by repetitive applications with a fixed rate. If the time required to decode a fixed error pattern grows with the code size L, so does the number of required instances running in parallel to prevent errors from accumulating. Thus the additional hardware overhead due to continuous noise correlates with the decoding time for fixed error patterns. In Subsec. 5.2 we follow this idea and stack copies of TLV in the second dimension perpendicular to the quantum chain. The depth of this classical circuit quantifies the hardware overhead required for the retention of the logical qubit in the presence of noise; as it directly relates to the decoding time of TLV, it grows sublinearly with the chain length, so that shallow circuits suffice for reasonably low error rates. Indeed, the complexity of these circuits scales with L 1+κ for 0 < κ < 1, in contrast to the typical L 2 -scaling of global majority voting.

1D topological quantum codes
We start this section with a description of the Majorana chain and thereby review the realization of a topological quantum code as degenerate ground state space of a local Hamiltonian. In particular, we revisit the procedure of quantum error correction using syndrome measurements and demonstrate that it reduces to global majority voting in this particular case. This decoding scheme features an exponentially growing lifetime of the encoded logical qubit with the chain length L. However, global majority voting violates locality as it relies on the evaluation of a function of spatially distributed syndrome measurements. But the time required for collecting the syndromes of an extended chain in a central processing unit (and distributing corrections afterwards) scales with the system size L. We conclude by demonstrating that taking into account this processing time eliminates the exponential scaling of the qubit lifetime. This sets the stage for the construction and study of a strictly local, inherently scalable replacement for global majority voting.

Majorana chain
The simplest example of a topological quantum error correction code in one dimension is given by the degenerate ground state manifold of the paradigmatic Majorana chain [6], see  where c i , c † i denote fermionic annihilation and creation operators, w i is the tunneling amplitude, ∆ i the superconducting gap parameter, and µ i denotes the chemical potential. At the "sweet spot", µ i = 0 and w i = −∆ i = 1, the Hamiltonian takes the form The operators S j = −iγ 2j γ 2j+1 = (−1)c † jc j measure the parity of the localized quasiparticle modes c † j above the superconducting condensate and play the role of a stabilizer known from quantum information theory [7,47]: [S i , S j ] = 0, S † j = S j , and S 2 j = 1. Let S = {S 1 , . . . , S L−1 } be the (Abelian) stabilizer group. The codespace C ≡ { |Ψ ∈ H | S |Ψ = |Ψ } is the Sinvariant subspace and encodes a single logical qubit, dim C = 2; this space is equivalent to the degenerate ground state space of Eq. (1). This observation can be understood as follows: Since the edge Majorana modes γ L ≡ γ 1 and γ R ≡ γ 2L are missing in Hamiltonian (1), one finds that Σ z = S edges ≡ −iγ L γ R acts on the ground state space C as [Σ z , S j ] = 0. Furthermore, it allows for the definition of a convenient basis of the code space, namely Flipping the encoded qubit is possible via the edge modes Σ x ≡ γ L and Σ y ≡ γ R , e.g., without violating any stabilizer constraint, [Σ x,y , S j ] = 0. The operators Σ α characterize the logical qubit completely as they realize the Pauli algebra Σ α , Σ β = 2iε αβγ Σ γ on C.
Crucial for realizing a quantum memory is its resilience against depolarizing and dephasing noise. The Majorana chain fights these types differently: Depolarizing (bit-flip) noise cannot be suppressed by the Hamiltonian since the logical operators Σ x and Σ y are perfectly local in any embedding of the open chain and energetically not penalized by the Hamiltonian in Eq. (1). However, in terms of the fermions it is Σ x = c † 1 + c 1 and Σ y = i(c † L − c L )operators which break the fermionic parity symmetry of the superconducting Hamiltonian. In superconducting systems, fermionic parity is considered a natural symmetry that can be enforced to high precision because fermions are created by breaking cooper pairs (though it can be violated by quasiparticle poisoning [37,38]). In that sense, the fermionic nature of the physical realization is exploited to suppress depolarizing errors. Strictly speaking, this is just symmetry protection.
In contrast, dephasing noise operates as Σ z ∝ γ L γ R on the chain which is a non-local operator that cannot be induced directly by a noisy environment respecting locality. Indeed, the generic form of environmental noise that is both local and parity-symmetric has the form E j = −iγ 2j−1 γ 2j (note that pairs shifted by a single site act trivially on the code space). Since {E i , S j } = 0 if and only if j = i or j = i−1 (otherwise [E i , S j ] = 0), a single error E i is flanked by a pair of syndromes or charges with S i = −1 = S i−1 . From the condensed matter point of view, this accounts for breaking a cooper pair and lifting the localized quasiparticles above the superconducting gap. Subsequent errors can move and/or create additional quasiparticle pairs that can, as time goes on, traverse the macroscopic chain in a noise-driven, diffusive process. Once a pair of charges traverses the whole chain, it is described by where we used S i = 1 on the code space. Thus dephasing noise on the logical qubit is only possible if quasiparticles travel freely through the system. Unfortunately, save for the energy gap which penalizes the creation of charges, there is no cost for moving them. This deconfinement renders the Hamiltonian theory unstable at finite temperatures (this is related to the fact that there is no phase transition for the one-dimensional classical Ising model).
To protect the logical qubit from dephasing, active error correction must be employed. Assume the system is initialized in state |Ψ ∈ C and subsequently has been affected by the error E(x) = j E x j j , encoded by the binary vector x = (x 1 , . . . , x L ) ∈ Z L 2 . Because E 2 i = 1, applying the same error twice cancels the latter and removes all syndromes, rotating the system's state back into the code space C. Here, ⊕ denotes the (element-wise) modulo-2 addition. To infer x, the local stabilizers {S 1 , . . . , S L−1 } are measured periodically to yield a binary syndrome pattern s = (s 1+ 1 2 ) that indicates the boundaries of E(x). (Recall that only projective measurements that leave C invariant do not destroy the logical qubit-these are exactly the stabilizer generators.) In terms of binary vectors, this reads The index shift by 1 2 for syndromes is purely formal to distinguish them from error patterns x i . Inferring x from s is complicated by the fact that ∂x = s = ∂x c , where (x c ) i = x i ⊕ 1 is the element-wise binary complement. The decoding problem is therefore not unique as complementary error strings share the same syndrome. If x c is chosen to specify the correction, one has and thereby (unknowingly) applies a quantum gate on the stored qubit; this follows from Eq. (4). Thus it is of paramount importance to choose the correct error pattern. The optimal decoding strategy depends on the error channel that gave rise to E(x). Here we will always assume x to be a sequence of uncorrelated Bernoulli random variables x i with parameter 0 ≤ p 0 ≤ 1 2 so that Pr(x i = 1) = p 0 for all i = 1, . . . , L. Then, the provably best decoder ∆ is (global) majority voting, ∆(s) ≡ y with ∂y = s and |y| < |y c | which realizes maximum-likelihood decoding for repetition codes, i.e., y is preferred over y c because the former requires less errors (x i = 1) and this makes it more probable with respect to a Bernoulli distribution with p 0 < 1 2 . In the context of quantum codes (in particular the toric code), the prescription (8) is also called minimum-weight perfect matching, which is equivalent to majority voting in one dimension (see below). Here, the weight |x| is the number of non-zero components x i = 1 and we require L to be odd to avoid ties (|x| = |x c |). Note that ∆ indeed performs majority voting on x in the sense that The majority function on L binary inputs x i is defined as (ties evaluate to 0 with this definition) and the bold version maj [•] indicates a vectorized result with each entry given by maj [•]; x denotes the greatest integer less than or equal to x. We conclude that the "quantum handicap" of having only access to the syndrome s for decoding the topological code does not change the decoding strategy as compared to a classical repetition code. Indeed, for a correctable binary error pattern x (|x| < |x c | ⇔ maj [x] = 0), the classical repetition codeword Ψ ∈ {0, 1}, and the quantum codeword |Ψ ∈ C, we have for the classical code and the quantum analogue where E (C) denotes application of errors (corrections).
To make connection with another well-known topological quantum code, let us, just for a second, peek into the second dimension: There, the simplest model is given by the toric code [9,48,8] which features a four-fold degenerate ground state manifold (the code space) and is defined by the Hamiltonian with stabilizer operators living on an L x × L y square lattice with periodic boundary conditions and spin-1 2 representations σ e (physical qubits) on the edges. While the toroidal geometry of Hamiltonian (13) is crucial for its four-fold ground state degeneracy, it also renders the model experimentally challenging (even more than it already is due to its four-spin interactions). However, if the above Hamiltonian is adapted to a planar square lattice with appropriately chosen ("rough" and "smooth") open boundaries [49], the experimental implementation becomes more attractive while the ground state manifold is still two-fold degenerate and constitutes a topological quantum memory with the two Abelian anyonic excitations A s = −1 and B p = −1, see Fig. 1 Reducing this code to the degenerate, one-dimensional case L x = L and L y = 1, yields a 1D spin system which maps directly to the Majorana chain under the Jordan-Wigner transformation Note that in one dimension there are no faces and the B p stabilizers are absent. Then, Hamiltonian (13) describes the 1D Ising model and local errors E j = σ z j correspond to spinflips in the σ x -basis; syndromes S j = σ x j σ x j+1 = −1 can be associated with domain walls.
The physical distinction between Majorana chain and 1D toric code/Ising chain becomes evident if one realizes that error strings E(x) can be directly measured by σ x j = ±1 in the 1D toric code/Ising chain (and not only their endpoints by S j = ±1). The analogous operator for the Majorana chain error strings violates the fermionic parity and is thereby suppressed. Similarly, while Σ x = γ L is forbidden in the fermionic setting of the Majorana chain due to parity superselection, there is no natural symmetry in the spin chain preventing Σ x = σ x 1 from depolarizing the logical qubit. This is why it is legit to call the Majorana chain a 1D topological quantum memory whereas the mathematically equivalent 1D toric code/Ising chain only protects a classical bit by realizing a repetition code.
Nevertheless, from the algorithmic point of view, both theories carry the same syndromes and therefore can be corrected with the same algorithms. In particular, MWPM on the toric code degenerates into majority voting on the Majorana chain. For the degenerate toric code, this active error correction procedure has already been demonstrated experimentally with transmon qubits [11,12].

Global majority voting
We proceed with a brief analysis of global majority voting. As we argued above, we can ignore the "quantum handicap" that restricts our knowledge to the endpoints of error strings (the syndromes) and instead work with the actual error patterns.
Assume a classical bit x, initialized as x = 0, is flipped by a (unbiased) Bernoulli process with probability 0 ≤ p 0 ≤ 1 2 per time step δt. If we think of the state x = 0 as the "clean" one while x = 1 indicates as site that is error-afflicted, the probability to find x = 1 after t time steps is given by which renormalizes to the completely mixed state p (t) → 1 2 exponentially fast whenever 0 < p 0 < 1. If we copy the bit L times, (x 1 , . . . , x L ), and encode a logical bit X via a simple repetition code, then, for uncorrelated Bernoulli error processes on the physical bits x i , the best decoder is given by the global majority vote An erroneous logical bit X = 1 occurs whenever the majority is altered by the local errors. Formally, the probability to find X = 1 after t time steps of accumulating errors is given by where in the last line we used the regularized incomplete beta function to express the cumulative Binomial distribution in a closed form [50] (see the Appendix A for details). We illustrate P L,p 0 (t) = 1 − P L,p 0 (t) as a function of t in Fig. 1 (d) for different system sizes L and fixed continuous noise p 0 . Note that the decay time grows very slowly with the system size L. After a single time step, we have the logical failure probability P L,p 0 ≡ P L,p 0 (t = 1). Assume that after each time step δt the errors that occurred during δt are immediately countered by majority voting, i.e., following Eq. (11) for classical and Eq. (12) for quantum codes. The probability of the logical (qu)bit to be in its original state after t time steps is thenP L,p which yields the timescale T L,p 0 for the logical information loss if lim L→∞ P L,p 0 = 0. Moreover, it is straightforward to show that it diverges exponentially with the system size for any non-trivial (and non-critical) microscopic error probability 0 < p 0 < 1 2 . Indeed, we can use Eq. (20b) to derive the upper bound with q = p 0 (1 − p 0 ) < 1 2 for p 0 < 1 2 ; in the last step, we used the asymptotic approximation This is illustrated in Fig. 1 (e) by plottingP L,p 0 (t) over time for different system sizes and fixed continuous error rate. Eq. (24) is a quantitative manifestation of the perfect decoding properties of global majority voting on a repetition code. Note that the only error rate for which decoding fails in the thermodynamic limit is the singular point p 0 = 1 2 for which 2q = 1 ⇒ log 1 2q = 0.

Constraints by locality
Eq. (24) tells us that global majority voting is a very powerful decoding scheme for the 1D quantum code realized by the Majorana chain: its critical error rate p c = 1 2 is optimal. In physics, nothing is for free. This begs the question what it is that we are paying with by employing the global majority decoder ∆ (or, equivalently, the function maj [. . . ]). One particularly expensive feature of maj [x 1 , . . . , x L ] is its global nature: It depends non-trivially on all L inputs while their number grows with the system size [see Eq. (10)]. Indeed, one needs to take into account at least L+1 2 of the inputs to be sure about the majority; for generic inputs even more. This makes the evaluation of maj [. . . ] a relevant factor that has to be taken into account when the scaling of the quantum code with L is addressed.  Figure 2: Locality constraints. a A global correction scheme on a quantum code of linear size L requires the syndrome data to be merged, processed, and afterwards the results to be distributed again. The finite communication speed makes the time between syndrome measurements (blue squares) and error corrections (red circles) scale with L in the best case. b With constrained hardware overhead, only one instance of the syndrome processing runs at once, giving rise to intervals without correction growing with ∼ L. Repeating syndrome measurements and corrections with a period independent of L requires ∼ L instances running in parallel, thus increasing the hardware overhead dramatically. c Local schemes can be used to keep hardware overhead in check by reducing the computation time needed to sublinear (or even logarithmic) scaling if D ∼ L κ with k < 1. This restricts the syndromes a local correction operation depends on to a subsystem inside the past light cone of diameter 2D, and allows syndromes to influence only corrections in their future light cone.
A generic global function, depending on ∼ L spatially distributed inputs (syndromes) requires at least ∼ c −1 L time steps to gather its input data (for a 1D geometry). Here c denotes the speed of classical information propagation in the auxiliary systems framing the quantum chain. This is illustrated by the light cone in Fig. 2 (a). In addition, the evaluation itself (shaded region) also requires at least O(L) time steps because every input has to be read at least once, see e.g., Eq. (10). Depending on the decoder, the latter may be improved by parallelization (which, in turn, is payed for by additional hardware overhead), whereas the former argument remains valid as it is based on physical constraints alone.
An immediate consequence for the global majority decoder ∆ is sketched in the left panel of Fig. 2 (b): The time between syndrome measurement (blue square) and correction (red disk) scales with the system size L. Depending on the relevant velocity c (which should be henceforth thought of as comprising both information propagation and computations) and chain length L, this upper-bounds the rate at which ∆ can be applied to fight continuous noise on the quantum code.
This has important consequences: The probability that ∆ flips the logical (qu)bit after accumulating errors for t = c −1 L time steps is given by Eq. (20b), where the limit holds for all 0 < p 0 ≤ 1 2 , see Appendix A for the derivation. This is in contrast to P which led to the exponential growth of T L,p 0 if the correction rate is independent of the system size, see Eq. (23). We conclude that the exponential growth of T L,p 0 (which depends on the exponential vanishing of the logical error probability P L,p 0 ) is lost if we take into account the time needed to evaluate the global majority vote.
A possibility to keep both a size-independent correction rate and the global majority vote decoder ∆ is illustrated in the right panel of Fig. 2 (b): Multiple copies of ∆ running in parallel can keep up with continuous noise if after each time step δt a new instance of ∆ is fed with the syndrome ∂(x t−1 ⊕ x t ) = ∂x t−1 ⊕ ∂x t that encodes only errors accumulated during δt. Note that intertwining corrections and errors is acceptable as both commute. The obvious downside of this approach is its hardware overhead: The number of parallel instances required (the "depth" of the decoder) scales with the time needed for a single instance to finish, that is, with L.
If we retrace our line of thought, it is obvious that the global nature of ∆ is responsible for the L-scaling of the depth in the presence of continuous noise. This motivates the question whether the global decoder ∆ can be replaced by a local version ∆ D which requires only syndrome data within a radius D of each site to compute the correction at this very site; the corresponding spacetime diagram is shown in Fig. 2 (c). The benefits of such a local decoder would be less hardware overhead, simpler implementation, and thus better scaling properties. It cannot implement maj [. . . ] perfectly and one has to expect decoding errors in some cases (where ∆ would have succeeded). However, if these cases are rare for low error rates p 0 < p c with finite critical rate 0 < p c ≤ 1 2 , and the relaxation time T L,p 0 still scales exponentially with L, this would be perfectly acceptable. It is such a "D-local" decoder that we describe and analyze in this manuscript: denote the neighborhood of site i with radius D.
for some family of functions {f i }: Its correction at site i only depends on (syndromes of ) error patterns within distance D of i.
Since D-local decoders finish after ∼ D time steps (if the f i can be evaluated efficiently), the required depth [cf. Fig. 2 (c)] also scales with D.
In the remainder of this subsection, and before we zoom in on our particular decoder, we discuss a constraint that follows for the class of D-local decoders ∆ D quite generally. Namely the (weak) upper bound for the probability P dec of successfully decoding Bernoulli samples with a D-local decoder Note that this result is generic in the sense that it holds for all decoders of the MCQC where the correction of site i depends only on nearby syndromes in the neighborhood π D i x, irrespective of their local functions {f i }. We call Eq. (29) the light cone constraint; its proof can be found in Appendix B.1.
Here we discuss some scaling limits of Eq. (29) and their implications for potential decoders replacing ∆. We assume D = D(L) to be a function of the linear size L of the code (the length of the quantum chain). We stress that the interpretation of the radius D can be either a spatial depth of a feed-forward physical circuit or a time-like depth in a spacetime diagram of a truly one-dimensional physical automaton. In the first case, scaling D with L means growing the system into the second dimension; in the second case, it accounts for a longer runtime of the decoder. Note that the class of decoders with at least D ∼ L comprises exactly the global ones (e.g., ∆).
We discuss two important cases: This describes a truly one-dimensional feed-forward circuit of finite depth D. We find in the thermodynamic limit i.e., there is no successful decoding possible for any finite microscopic error rate p 0 > 0.
. This describes a truly two-dimensional feed-forward circuit, possibly slowly growing in the second dimension if κ ≈ 0. We find in the thermodynamic limit i.e., except for the critical point p 0 = 1 2 , there is no constraint on P dec . At the critical point, the upper bounds depend on whether the second dimension scales slower or faster than the length of the chain. For faster scaling depth, there is no constraint, whereas for slower scaling depth, non-trivial upper bounds arise. Note that actually P dec ≤ 1 2 follows for p 0 = 1 2 for all decoders (not only D-local ones) since a completely mixing Bernoulli process destroys all encoded information. P dec < 1 2 arises whenever the decoder fails to get rid of all syndromes (this is in contrast to the global decoder ∆ which always succeeds in removing all syndromes). P dec = 1 2 can be realized if the decoder succeeds in removing all syndromes but still fails to recover the original state in 50% of the cases. A detailed derivation of these results is presented in Appendix B.2.
In conclusion, decoding the Majorana chain in a single step with a constant-D decoder is impossible in the thermodynamic limit. However, while D ∼ L κ with κ ≥ 1 describes only global decoders (in particular, the global majority vote ∆), there is also no restriction on P dec for the larger class of local decoders with 0 < κ < 1. This leaves the possibility open for local decoders with less hardware overhead than ∆.
One of the main results of this manuscript is a lower bound on P dec for a class of local decoders which allows to scale D at will. In particular, we find that Eq. (29) is saturated in the thermodynamic limit for D ∼ L κ with arbitrary κ > 0 below a critical error rate p c > 0.

Cellular automata
In this section, we introduce a strictly local decoder for the MCQC. Our approach is based on cellular automata, thus we start with a description of this framework and the relevant properties. In particular, we demonstrate that for the MCQC-where only the syndromes can be measured-we have to resort to CAs that are characterized by self-duality, a symmetry of the local evolution rules. The natural choice is then to focus on such CAs which additionally approximate global majority voting. This task is known as density classification and we present two CAs that are known to perform well as density classifiers, one of which (called TLV) exhibits self-duality. Since the quantum code is embedded on a finite chain with open boundaries, it is essential to modify TLV at the edges; this new CA is denoted as TLV. We argue that it acts as a self-dual density classifier on finite chains, and thereby qualifies as a promising local replacement for global majority voting on the MCQC.

Properties of cellular automata
To describe our local decoder, we make use of the well-known framework of one-dimensional, binary cellular automata [51,52]; discrete dynamical systems defined on a 1D lattice L of binary cells i ∈ L with indices in L = Z (infinite), N (semi-infinite), or {1, . . . , L} (finite).
A state x ∈ Z L 2 is formally a map x : L → Z 2 assigning a state x i to each cell i ∈ L. Equivalently, x ⊆ L may be read as the subset of lattice indices i ∈ L where x i = 1. A cellular automaton Γ L : We write x = Γ L (x) for short. If γ i = γ for all i ∈ L, Γ L is called translationally invariant. We write x(t) = Γ t L (x(0)) for the state x(t) that is produced by t consecutive applications of Γ L on the initial state x(0). A state x * with x * = Γ L (x * ) is called fixed point of Γ L . More generally, a finite subset of states C ⊆ Z L 2 which is invariant, Γ L (C) = C, and does not contain a proper invariant subset is called a cycle (a fixed point is a cycle with one element). On finite chains L = {1, . . . , L}, the CA always ends up in a cycle after a finite relaxation time due to the finiteness of the state space Z L 2 . For a given cycle C, We will be interested in CAs with the homogeneous fixed points x * = 0 and 1 (characterized by the absence of syndromes) and their corresponding attractors A 0 and A 1 .
The dynamics of a CA can be strongly influenced and restricted by symmetries of the transition rules Γ L . In the following, we are particularly interested in the special class of self-dual CAs: Self-duality is therefore a symmetry satisfied only by particular CA rules Γ L . For example, local rules based on majority votes of adjacent cells are automatically self dual because the binary majority function is, whereas logical dis-and conjunctions violate the symmetry, e.g., The importance of self-dual CAs in the context of quantum error correcting the Majorana chain stems from the following observation: acting on a onedimensional chain L (infinite, semi-infinite, open or periodic boundaries); let s = ∂x denote the syndromes.
Then there are two equivalent representations of Γ L : • The state-state representation is given by the conventional transformation rule which transforms the current state x into the new state x . It operates on the states of cells Z L 2 on the lattice L.
• The syndrome-delta representation is given by the two-step process and transforms the current syndrome s into the new syndrome s via the intermediate result (delta) ∆. It operates on states of syndromes Z ∂L 2 on the dual lattice ∂L. The derived rule ∂Γ L is defined for i ∈ L as where for two sites k, i ∈ L, ki = ik denotes the set of sites in ∂L (edges in L) between i and k (for periodic boundary conditions, this is well-defined due to the constraint ⊕ σ∈∂L s σ = 0).
The following (rather technical) proof can be skipped on first reading if the existence of the syndrome-delta representation is intuitively understood and/or accepted as a fact.
Proof. We show that there is a one-to-one correspondence between the two descriptions by constructing them explicitly. To this end, consider an arbitrary self-dual binary function and therefore for any k ∈ L and s = ∂x, it follows (for fixed but arbitrary k) For a self-dual CA in state-state representation, this reads for arbitrary k ∈ L. If we set k = i, this becomes If we define the state change as ∆ ≡ x ⊕ x and ∂Γ i (s) ≡Γ i | i (s), we arrive at On the other hand, it is indeed describes the evolution of the syndrome s = ∂x given by the action of Γ L on the states x. Thus we provided a procedure to derive a syndrome-delta representation from a given state-state representation. Conversely, it is enough to realize that the knowledge of ∆ = ∂Γ L (∂x) allows for the computation of x via x = x ⊕ ∆, i.e., This concludes the proof.
It should be clear that it is exactly the syndrome-delta representation of a self-dual CA that makes it suited for decoding the Majorana chain and comply with the "quantum-handicap": It operates on the measured syndromes s via the correction operations ∆ that can be applied directly to the quantum chain.

Density classification in 1D
We seek to apply a simple, one-dimensional binary CA as local decoder for the MCQC. If we upper bound the allowed runtime of a radius-R CA Γ L with T time steps, the map Γ T L is D = RT -local per construction (since information spreads over R sites per time step under CA evolution). Then the depth scaling discussed previously becomes a matter of required runtime for a specific CA.
As we know that (global) majority voting is a perfect decoder for the Majorana chain, it is natural to ask whether one can implement the function maj [x 1 , . . . , x L ] by a hypothetical CA MAJ L such that this is known as the density classification problem [41,42]. Unfortunately, it can be rigorously shown that perfect majority voting cannot be achieved with binary CAs in any dimension [43]. This, however, is not a deal breaker for majority-based error correction (both classical and quantum) as long as the erroneously classified instances are rare with respect to the noise channel in question. Motivated by applications for classical error correction, there evolved a vivid field concerned with the construction of approximate density classifiers (e.g., [53,35,54,55,56]) and extensions capable of performing density classification exactly (e.g., [57,58,59,60]), see [42] for a review. This is how we address the problem of finding a local decoder for the MCQC: Lemma 1 allows us to filter the literature of one-dimensional binary CAs for self-dual density classifiers; rewritten in syndrome-delta representation, these could be directly applied as potential Majorana chain decoders. The first, most famous and well-studied (approximate) density classifier is dubbed "soldiers rule" and has been introduced by Gács, Kurdyumov, and Levin (GKL) [53,61]. On L = Z, it is defined by the transition rule with radius R = 3 [see Fig. 3 (a)]. Unfortunately, it is easy to check that it violates self-duality, due to the dependence of the evaluated sites in the local majority vote on the state of site i. This can also be seen from the exemplary time evolution of a three-cluster configuration under GKL shown in Fig. 3 (d): The emerging patterns are different for the left and right boundaries of clusters. This cannot be interpreted in terms of syndromes because on this level both boundaries are indistinguishable and hence must give rise to the same pattern.
Note that most elementary CAs (one-dimensional binary CAs with radius R = 1) violate self-duality as well, and the few that do not are unsuited for (approximate) density classification [62]. Most generalizations capable of exact density classification are not self-dual either [57,58,63] and/or reformulate the task such that a solution is no longer applicable as Majorana chain decoder [57,63].
There are stochastic generalizations of density classifiers, some of which are self-dual [54,55] and some of which are not [56]. However, we prefer deterministic CAs due to their simpler realization in terms of elementary logic gates. We therefore resort to the less known "two-line voting" automaton (TLV) introduced by Toom [35]. Originally, it is defined on the extended state space (Z 2 × Z 2 ) L describing two parallel binary chains ("two lines") and defined by the transition rule depicted in Fig. 3 (b). In x α i , the index i = 1, . . . , L denotes the position along the chains while α = ±1 selects the subchain (up or down).
The payoff of this more complicated geometry is the sought after self-duality which is easily checked to hold, due to the new independence of the evaluated sites in the local majority vote on the state of site i (as compared to GKL). For our purpose, it is more convenient to rewrite TLV in its "stretched" form [ Fig. 3 (c)] with state space Z L 2 and state-state representation for even and odd sites i. Fig. 3 (e) depicts the evolution of the same three-cluster configuration as in (d). In contrast to GKL, left and right boundaries spawn symmetric patterns that eventually annihilate (initially, the majority of cells was white).
Despite the rather abstract rules (53), the spatio-temporal visualization reveals the simple functional principle of TLV [see Fig. 3 (e) and also Fig. 8 (c)]: Domain walls emit "slow signals" of the form . . . 010101 . . . symmetrically in both directions, seeking for nearby domain walls to pair with. When two counter-propagating slow signals meet, they transmute into "fast signals" that head back and delete the 01-markers along the way. Since the velocity of the fast signal is twice that of the other slow signal traveling into the same direction, the latter is overtaken by the returning fast signal eventually. As a result, TLV fills the gaps between the pairs of domain walls which are closest; if errors are sparse, this implies convergence to the homogeneous state maj [x(0)].
We can now apply Lemma 1 to construct the syndrome-delta representation. Namely, and here the upper (lower) signs correspond to i even (odd).
This describes the action of TLV completely in the quantum mechanically more suitable language of syndromes s (obtained by measurements) and deltas ∆ (applicable by local operations). Due to the equivalence of both representations, we can (and will) still use the "common" state-state representation Eq. (53) to discuss the properties of TLV. The implementation, however, requires Eq. (54) as a concession to the "quantum handicap".
Both GKL and TLV can be shown to share a property which is known to be responsible for their superior performance as approximate density classifier [36]. Clearly, the homogeneous (syndrome free) configurations 0 and 1 are fixed points (a necessary condition for density classifiers). What distinguishes them from most other CAs with these fixed points is the structure of the attractors A 0 and A 1 , i.e., the perturbed states which are drawn towards the homogeneous fixed points: Every finite perturbation of diameter l on an infinite homogeneous background of zeros or ones is eroded after a time t dec ≤ m l where m ∈ R + is a CA-specific constant. Therefore GKL and TLV are called linear eroders-a crucial property for their use as approximate density classifiers (see below) and responsible for their stability close the homogeneous fixed points. The time evolutions in Fig. 3 (d) and (e) are examples for the erosion of finite perturbations of ones (red/black) on a background of zeros (white).

Boundary conditions
Often CAs are studied in the limit of infinite system size with state space Z Z 2 . However, we employ the CA for physical means which requires finite systems. Finiteness, in turn, entails a choice of boundary conditions and complicates the analysis due to finite size effects. Periodic boundary conditions (PBC) are common as they mimic the infinite case as closely as possible [see Fig. 4 (a)]: Translational invariant CAs on Z remain translational invariant on a finite system with PBC and no modification of the rules is necessary.
Again, due to physical constraints we cannot use periodic boundaries: It is crucial that the quantum subsystem is an open chain with spatially separated endpoints (edge modes). Thus we are forced to modify TLV close to the endpoints to comply with open boundary conditions. Modifying the rather complicated rules of TLV can go amiss easily. It is therefore helpful to specify our goal: Since the edges of the quantum chain carry edge modes, they can host endpoints of error strings which do not show up in the syndrome, see Eq. (4); physically, this corresponds to a quasiparticle in the delocalized edge mode. It is therefore crucial that solitary quasiparticles close to the edges are transfered into the corresponding edge mode. Here every other copy is reversed, giving rise to a finite system bounded by two mirrors with modified rules TLV L . c An unmodified TLV operating on an infinite chain and restricted to bond-inversion symmetric states (magenta/white sites for x i = 1/0) is equivalent to a modified −−→ TLV operating on a semi-infinite chain with MBC on the edge. This is a consequence of the bond-inversion symmetry of the unmodified TLV rules. d A finite cluster of errors can be effectively doubled in size if it is close to the mirror. Consequently, correction times close to the mirror can be larger than for free cluster of the same size.
Thus we have to modify TLV so that syndromes are attracted by the edges (which do not emit signals themselves), while preserving self-duality and the eroder property (in a modified sense, see below). A neat trick to come up with the correct modifications is to put the finite chain in a "cavity", between two imaginary mirrors placed left (right) of the first (last) site, see Fig. 4 (b). Rules which traverse the edges use the mirrored cells to compute their local update. Formally this is achieved by redefining these rules to use the corresponding "real" cells of the system (note that this is a local modification for a stretched open chain, in contrast to periodic boundaries); we call this mirrored boundary conditions (MBC). If we denote the finite size version of TLV on L = {1, . . . , L} with mirrored boundary conditions as TLV, the modified rules on the left edge read and on the right edge (L even) where we used the shorthand notation k ≡ L + 1 − k to index cells from the end of the chain (e.g., 1 = L), see Fig. 4 (b). For all other sites it is TLV i = TLV i . Clearly, 0 and 1 are still fixed points of TLV (there are no static signal sources introduced) and self-duality is also preserved. By construction, a slow signal emitted from a solitary syndrome close to the edge will meet its mirror image at the edge which sends it back as a fast signal to capture the other slow signal heading into the bulk and thereby initiates pairing towards the edge, see Fig. 4 (d). Note that this mechanism affects the time needed to erode a contiguous cluster of errors: Adjacent (or close) to the mirror, the number of errors is "doubled" artificially; correction time and affected territory double accordingly. In Fig. 4 (d) we illustrate this effect by comparing the same cluster far away and close to the edge.
An important observation allows for the analysis of systems with mirrored boundary conditions in terms of the unmodified rules (on the infinite chain L = Z): Let x ∈ Z Z 2 be an arbitrary state and define the bond-centered inversion I s as I s x describes the configuration that is obtained by inversion of x at the bond (s, s + 1). We define the set of invariant configurations, and argue that I s is a symmetry of TLV for any s ∈ Z, namely Indeed, this follows from the fact that TLV i is related to TLV i+1 by a bond-centered inversion at (i, i + 1); this is true for both even and odd sites i, see Fig. 3 is invariant under the evolution of TLV which hence can be restricted to K s . Note that this is a special feature of TLV, in contrast to GKL, for instance. Without loss of generality, we set s = 0 in the following, i.e., (I 0 x) i = x 1−i . Then we can describe a semi-infinite chain on L = N with a single mirrored boundary (we write −−→ TLV) by the unmodified rules of TLV operating on the infinite chain L = Z if we restrict the state space to K 0 . Indeed, where we used x 4 = x −3 and x 2 = x −1 for x ∈ K 0 ; see Fig. 4 (c) for an example. This allows us to trade the rule modifications of −−→ TLV for a restriction on the state space of TLV which, in turn, simplifies the analysis of the finite version TLV (see below). As an immediate consequence, it follows that the semi-infinite −−→ TLV is an eroder because TLV is one (and mirrored finite perturbations remain finite).
While the previously introduced definition of eroders carries over to semi-infinite chains, it cannot be applied to finite systems because there is no longer a qualitative difference between perturbation and background (both of which are necessarily finite). A possible finite-size modification reads as follows: A cellular automaton on a finite chain L = {1, . . . , L} is a finite-size linear eroder if there exist real constants 0 < a < 1 and m ∈ R + such that for any size L < ∞ and any finite perturbation of 0 (1) with diameter l ≤ a L, the unperturbed state 0 (1) is recovered at t dec ≤ m l. It is easy to check that TLV is an eroder in this sense if one uses that −−→ TLV is an eroder in the original sense (see Appendix C). Alternatively, note that the majority function is monotonic, i.e., changing an input bit from 0 to 1 cannot change the output bit from 1 to 0. Therefore the evolution of a generic, non-contiguous, finite cluster of errors under TLV/ −−→ TLV/TLV can be constructed from the evolution of a contiguous cluster of the same size by erasing errors in the spacetime diagram, and it is sufficient to consider contiguous intervals of errors to check for the eroder property (which is straightforward to verify).

Decoding with a self-dual density classifier
In the previous section, we introduced TLV and argued that it is a self-dual, finite-size linear eroder. These properties make TLV a promising candidate for decoding errors E(x) that are small compared to L and/or sparse enough. In the following, we assess the decoding performance of TLV by numerical and analytical means. We show that error patterns for which the erosion (viz. decoding) fails are rare for reasonably small error rates, and that the time t dec required for decoding scales sublinearly with the chain length-one of the main benefits of locality.

Numerical results
We start with a qualitative discussion of possible evolutions under TLV: Apart from the two stable homogeneous fixed points 0 and 1, there are four additional (unstable) fixed points for TLV [35] two of which cannot be realized by TLV on finite chains with MBC (see Appendix D). This leads to the four possible fixed points depicted in Fig. 5 (a) and (b). Note that their realization on finite chains with MBC (vertical lines) is only possible if their realization on L = Z is consistent with the boundary conditions given by the mirrors. Whereas the homogeneous fixed points survive, independent of the bond where the mirror is placed (a), the two additional fixed points can only be realized if the leftmost (rightmost) site is denoted by an even (odd) index (b). Henceforth, we will take the first index to be odd (i.e., i = 1) and the last to be even (i.e., L), which eliminates these two additional fix points. Note that this choice coincides with the default labeling of sites L = {1, . . . , L}. We stress that the elimination of the two additional fixed points (which are not syndrome free) is not crucial for the performance of the decoder: First, both are characterized by a density of set bits ρ = 0.5 which is far from the relevant error densities realistic for small p 0 . Second, simulations suggest that their attractors are trivial, i.e., contain only the fixed points themselves (Appendix D). c Three examples of random initial configurations (magenta) which relaxed into cycles of various lengths. The first recurring configurations are highlighted with the same color to separate the cycle from the relaxation path. We find only cycles close to criticality with ρ ≈ 0.5. d Sketch of the state space with dashed attractor paths, based on the results in a-c. The two homogeneous fixed points are attractors of all states away from criticality; this motivates the application of TLV as decoder. e Numerical results for the probability of erroneous decoding P dec = 1 − P dec vs. chain length L for different microscopic error probabilities p 0 . P dec vanishes exponentially in the thermodynamic limit for any p 0 < 0.5. f Away from criticality (presumably for p 0 < 0.5) the probability to relax into a cycle vanishes exponentially. For realistic error rates (p 0 ≤ 0.1), cycles cannot be observed in reasonable sample sizes. g Averaged time needed to reach a homogeneous fixed point (t dec ) as a function of the system size L for various error rates p 0 . The growth is remarkably slow but unbounded for p 0 > 0. Whether t dec grows algebraically or only logarithmically for small p 0 > 0 cannot be inferred from these results. h Distributions of the decoding times t dec for two error rates p 0 = 0.1/0.4 and system sizes L = 50/500. For p 0 = 0.1, there is barely any difference between L = 50 and L = 500 visible (squares). We sampled 10 6 random initial states for each data point in e-h.
However, there are competing cycles of various lengths and with non-trivial attractors, three examples are shown in Fig. 5 (c). The longer a cycle and the larger its attractor, the more probable it is with respect to Bernoulli noise. This explains why the largest cycle in (c) is by far the most common in simulations. Note that it also illustrates the MBC nicely by "bouncing" a cluster of errors hence and forth between the two mirrors. Note, that again these cycles are characterized by densities ρ close to criticality, which renders them rare for p 0 1. In Fig. 5 (d) we sketch the attractor landscape of the total state space ordered by the density ρ: Close to the extreme densities ρ = 0/1 every configuration is drawn towards the corresponding homogeneous fixed point due to the eroder property. This is where TLV implements effectively majority voting by local rules and therefore becomes a viable replacement for the global decoder ∆. Only close to criticality ρ ≈ 0.5, TLV fails to decode a (still small) fraction of error patterns by evolving them into cycles instead of cleaning them according to a global majority vote. This underpins our previous statement that the impossibility of realizing global majority voting perfectly is not too much of an issue if it fails in regions of the state space which are exponentially suppressed by Bernoulli noise for physically realistic error rates.
In the remainder of this subsection, we will quantify these statements by sampling error patterns from a Bernoulli distribution with fixed rate p 0 and evolving them with TLV until we can decide whether it reached a fixed point or entered a cycle. We interpret the empty state 0 as error free and define the probability of successful decoding as We stress that in addition to lim t→∞ TLV t L (x) = 1, and in contrast to the global decoder ∆, TLV can also fail by evolving into cycles which are not syndrome free. Both cases make up for the failed decodings of TLV and are measured by the probability P dec = 1 − P dec . As a consequence, P dec > 1 2 is possible for TLV even for p 0 ≤ 1 2 . In Fig. 5 (e) we plot estimates for P dec as function of the system size for various error rates 0 < p 0 ≤ 0.5. Except for the critical value p 0 = 0.5, the probability of unsuccessful decoding vanishes exponentially with the chain length L, confirming our hope that TLV is a viable replacement for ∆. Note that this result already tells us that the measure of all attractors of cycles vanishes quickly for L → ∞. Indeed, in Fig. 5 (f) we plot the probability of an error pattern to belong to the attractor of a non-trivial cycle, again as function of L for the same error rates as in (e): For p 0 < 0.5 and L 50, there seems to be an exponential decay which is in accordance with the results in (e). Whether at criticality p 0 = 0.5 the probability vanishes or saturates at a small but non-zero value cannot be inferred from (f). Interestingly, the results so far not only support the hope that TLV can replace ∆ for p 0 ≤ p c with a non-trivial critical rate 0 < p c < 1 2 , but even suggest that p c = 1 2 is still optimal (at least 0.4 p c ). Now that we know that the decoding probability of TLV approaches 1 exponentially with L → ∞ comes a crucial question we shunned so far: How many steps t dec does TLV need, on average, to evolve an error pattern x into the error-free state 0? If the decoding time scaled linear, t dec ∼ L, there would be barely any benefit from replacing the global decoder ∆ by the local one. Fortunately, Fig. 5 (g) reveals that the average decoding time grows linearly only at criticality whereas the growth for p 0 < 0.5 is much slower. E.g., for L = 600 and p 0 = 0.1 on average only t dec ≈ 3 steps are necessary to eliminate all errors correctly. We stress that due to the almost vanishing slope in (g) it is not possible to decide whether t dec ∝ L κ for 0 < κ 1 or t dec ∼ log L, even though the very fact that t dec grows so slowly hints at a logarithmic scaling. To describe the required decoding times in detail, we show the complete probability distribution in Fig. 5 (h) for two sizes L = 50/500 and error rates p 0 = 0.1/0.4 in bins of ∆t dec = 10. Most strikingly, for the lower error rate p 0 = 0.1 there is no difference between L = 50 and a chain of the tenfold length; again a manifestation of the extremely slow growth of t dec for reasonable error rates.

Rigorous analytical results
In this subsection, we prove a central statement of this work: The probability for TLV on a chain of length L with MBC to be in a non-empty state x(t) = 0 after t ∝ L κ time steps vanishes exponentially with L for arbitrary κ > 0 if the initial state x(0) is a Bernoulli random configuration with single-site error probability p 0 < p c for some critical value 0 < p c ≤ 1 2 . In Section 5 we will use this result to construct a completely local decoder for the Majorana chain with length L and depth ∝ L κ that stabilizes a logical qubit for times that grow exponentially with L. Furthermore, it confirms the numerical results of Subsection 4.1: Neither competing fixed points nor cycles threaten the performance of TLV as long as p 0 is small enough. To prove the claimed result, we follow the lines of [64] with modifications to account for the finiteness of TLV and the mirrored boundaries. In the following, we present three crucial steps but provide only brief sketches of their proofs; the details are presented in Appendix E.
Before we can state our first result, we have to introduce the pivotal concepts of independence and sparseness [45,26,46,65,64]. Let x ⊆ Z be an arbitrary subset (error pattern). A finite subset I ⊆ x is called cluster of diameter I = max{|x − y| | x, y ∈ I}. If we fix an integer k > 0 (the sparseness parameter, to be chosen later), the territory T k (I) is defined as the interval of integers with distance at most k I from I. Two clusters I 1 and I 2 are called independent if at least one does not intersect the territory of the other, i.e., I 2 ∩ T k (I 1 ) = ∅ or I 1 ∩ T k (I 2 ) = ∅ (or both); since I ⊂ T k (I), this implies I 1 ∩ I 2 = ∅. This concept is illustrated in the lower part of Fig. 6. If, in addition, there exists a partition of x into a family I = {I a } of pairwise independent clusters I a , x = a I a , then x is called sparse.
We need some additional terminology: First, I ≤l denotes the family of clusters I ∈ I with diameter I ≤ l and x \ I ≤l ≡ x \ I∈I ≤l I is the subset of sites for given x that remains after cleaning all independent clusters of diameter at most l. Second, a (infinite) mirrored Bernoulli random configuration x ⊆ Z is defined by the single-site probability Pr(x i = 1) = p 0 for sites i > 0 and the mirror constraint We can now state our variation of the main result of Ref. [64]: Consider infinite mirrored Bernoulli random configurations x with singlesite probability p 0 . Let k ∈ N be a given sparseness parameter. Then, for each instance x, there exists a constructive family I x of pairwise independent clusters (it is not necessarily x = I∈I x I, i.e., x does not have to be sparse) such that the probability of a site i ∈ Z to be in x and remain uncovered by independent clusters of diameter l or less (write I x ≤l ) is upper-bounded by for β = ln(2)/ ln(4k + 3) (and therefore 0 < β < 1) and α = (2k)(4k + 3) p 0 . If we define the critical valuep independent dependent Figure 6: Independent clusters. A pattern of three clusters I i (i = 1, 2, 3) with I 3 < I 1 < I 2 . The erosion process of TLV is sketched for I 1 whereas for I 2 and I 3 only the causal patches that cover the erosion are shaded gray (time runs upwards). For a linear eroder, the erasure of an independent cluster requires at most m I i time steps, the height of the shown trapezoids. During this time, signals can travel at most Rm I i sites from the boundary of a cluster I i . Clusters that are independent with sparseness parameter k = 2Rm do not interact; this is true for I 1 and I 2 since I 2 ∩ T k (I 1 ) = ∅ (dashed circle). Note that I 1,3 ∩ T k (I 2 ) = ∅ (green intervals) has no effect on their (in-)dependence because I 2 ≥ I 1 , I 3 . In contrast, since I 2 ∩ T k (I 3 ) = ∅ (red interval), the causal trapezoids of I 2 and I 3 intersect (red triangle). Thus, I 2 and I 3 are dependent and may not be erased separately.
for p 0 <p c it is α < 1 and Eq. (62) becomes an exponentially decaying upper bound.
The proof can be divided roughly into three steps: First, the family I x is constructed recursively in the cluster diameter l for a given instance x. In a second step it is shown that this prescription always yields a family of pairwise independent clusters. In the crucial third step, an upper bound on the probability for a site i ∈ x to be not covered by a cluster in I x of diameter l or less is derived. To do this, one constructs so called explanation trees, hypothetical error patterns that explain why a given site i could survive the construction of I x without being covered by clusters up to diameter l. The probability for its survival is then estimated by finding an upper bound on the number of possible explanation trees and calculating their probability with respect to a mirrored Bernoulli distribution. One finds that the number of explanation trees grows exponentially (with cluster diameter l) while the probability for a single explanation tree to be realized by a Bernoulli process vanishes exponentially. The latter factor dominates for p 0 <p c so that the probability for the existence of at least one explanation tree vanishes exponentially for increasing l; this leads to Eq. (62).
The rationale behind TLV (or any other linear eroder) is the following: For an error to survive the erosion process, there must be other errors nearby that protect it; and these, in turn, require further errors in their neighborhood to survive and so forth. Such a structure of errors that protect each other from being eroded constitutes an explanation tree which prevents a global error pattern from decaying into independent clusters. Explanation trees are dense in a very specific sense-and this denseness renders their existence improbable for low error rates. In contrast, sparse error patterns are those without explanation trees that span the whole system. They are initial states of linear eroders such that the causal regions of correlated sites in the spacetime diagram do not percolate through the system, but instead separate into many local patches which are eroded independently. The initial seeds of these patches are the independent clusters from above: Linear eroders clean a single cluster I after at most m I time steps, and can therefore influence only sites with maximum distance Rm I from I. Then, a collection of pairwise independent clusters is eroded independently if the sparseness parameter is set to k = 2Rm, see Fig. 6. It is this causal locality on sparse sets which results in the sublinear scaling of decoding times for TLV [recall Fig. 5 (g)].
Eventually we want to use Prop. 1 to derive an upper bound for the probability of errors to survive the first t steps of TLV on a finite chain with mirrored boundaries. To this end, we first need an intermediate step: Then the probability of x(t) = −−→ TLV t L (x(0)) to be non-empty on J is upper-bounded by with γ = − log(α) (γ > 0 for p 0 <p c ), and 0 < β < 1 as in Prop. 1. Here the sparseness parameter is given by k = 2Rm = 8 where m = 1 and R = 4 are the eroder parameter and the radius of TLV, respectively. x denotes the greatest integer less than or equal to x.
The proof exploits that −−→ TLV is equivalent to TLV for symmetric states in K 0 . Then Prop. 1 provides us with a family I x that fails to cover errors in x with a probability that vanishes exponentially with increasing cluster diameter l. If an error x i = 1 belongs to an independent cluster of diameter l, the linear eroder property of TLV ensures that it is eroded after at most ml time steps. It is important to realize that this does not imply x i = 0 for all later times as signals from distant, larger clusters may enter the territory of smaller ones (e.g., I 1 and I 2 in Fig. 6). With R the radius of local rules, the neighborhood U tR (J ) includes all sites that potentially influence sites in J after t time steps, i.e., sites with distance at most tR from J . Therefore one has to demand that all sites in the growing neighborhood U tR (J ) belong to clusters of maximum diameter l in I x to guarantee that J is clean after t = ml time steps. Subadditivity of probability measures then leads to the upper bound of Lemma 2 where (2tR + |J |) is the size of U tR (J ).
With Lemma 2, we are now prepared to tackle the case of finite chains: Whereas the infinite TLV and the semi-infinite −−→ TLV are qualitatively similar due to the discussed equivalence on K 0 , there are fundamental differences to the finite TLV. This can be understood intuitively as follows: −−→ TLV equals TLV with pairwise correlations between mirrored sites. These pairwise correlations lead to the square in the expression forp c (recall Prop. 1). In contrast, TLV introduces an infinite number of perfectly correlated partners for each of the L sites due to the cavity geometry (imagine standing in front of a single mirror vs. standing between two opposing mirrors). To avoid these complications, we use a trick: For times t ≤ t * L = L/2R , there is no site with both boundaries (mirrors) in its past light cone (the site(s) closest to the center of the chain can get "aware" of the cavity geometry earliest at t * L + 1). Therefore locally the finite system TLV behaves exactly as the semi-infinite system −−→ TLV for t ≤ t * L and the results of Lemma 2 apply. For t > t * L we can exploit the finiteness of L: Recall that in the context of Lemma 2 we stressed that an empty interval does not necessarily remain empty on a (semi-)infinite chain because signals from outside the interval may interfere at later times. Now L is finite and the argument no longer holds: if x(t) = ∅ at some time t, it follows x(t ) = ∅ for all later times t > t. Thus the probability of x(t) = ∅ is monotonically decreasing in t. This leads to the replacement t → {t} = min{t, t * L } in Lemma 3.
Note that the lower-bounded decay of the probability in Lemma 3 is to be expected for finite systems: Due to the finite state space, there is an upper bound for t (depending on L) such that the system either (1) relaxed to the clean state, (2) to a non-clean fixed point, or (3) entered a non-trivial cycle. In the first case, it is clean forever, whereas in the latter two cases, it can never become clean. Therefore the probability to be not clean cannot decrease arbitrarily and must be bounded from below for fixed L and t → ∞. However, if we are interested in the limit L → ∞, we can ask how long one has to wait for TLV to clean the system almost surely.
This leads to our main result: Corollary 1. Consider a finite chain of length L on L = {1, . . . , L} governed by TLV with mirrored boundaries and initial configurations x(0) ⊆ L drawn from a Bernoulli distribution with parameter p 0 . For κ ∈ R with 0 < κ < 1, the probability of x(t) = TLV t L (x(0)) to be non-empty after time steps is upper-bounded by exponentially fast. The parameters are the same as in Prop. 1 and Lemma. 2.
To prove this, we use the result of Lemma 3 with t max (L) < t * L for L ≥ L R large enough, thus {t max (L)} = min{t max (L), t * L } = t max (L). With L κ ≤ L and L κ /m = L κ /m (for m ∈ N), Eq. (67) follows immediately.
The important message of Corollary 1 is that the probability Presumably there is still perfect performance for p 0 < 0.5 and L → ∞. c Sublinear: t max = L 0.5 . The results suggest lim L→∞ P dec = 1 for 0 ≤ p 0 < p c . Whether p c < 1 2 cannot be inferred from numerics (note that the light cone constraint allows for p c = 1 2 ). d Constant: t max = const(= 20). As dictated by the light cone constraint, there is no decoding possible for p 0 > 0 and consequently lim L→∞ P dec = 0.
increases sublinearly as long as p 0 <p c . We stress that there is no statement about p 0 ≥p c ; Corollary 1 only asserts that there is a finite range for p 0 where decoding with TLV is possible and that the required decoding time t dec scales favorably with L on average.
To conclude this subsection, we present numerical results for P dec as a function of the microscopic error probability 0.3 ≤ p 0 ≤ 0.5 and for different chain lengths L = 16, . . . , 784 in Fig. 7. As constraints we use (a) t max = ∞, (b) t max = L, (c) t max = L 0.5 , and (d) t max = const = 20. Instances that are not empty after t max (L) time steps count as failed decodings, even if they are cleaned eventually (for t → ∞). In addition, forbidden regions due to the light cone constraint (29) with D = R t max (L) (R = 4) are shaded for L = 16 (orange) and L = 784 (black). Note hat for t max = ∞ it is P dec = P dec , compare Eq. (61) and Eq. (69). All numerical results satisfy the rigorous bounds of the light cone constraint which manifests as a weak upper bound for p 0 close to criticality. Note the difference between (a), (b), and (c) where the light cone constraint does not rule out successful decoding for any noncritical p 0 and L → ∞, and (d) where it does. The rigorous results from above complete this picture by providing lower bounds that imply lim L→∞ P dec = 1 for p 0 <p c . However, we do not know the true error threshold p c except that it is larger thanp c ≈ 3.2 × 10 −6 for TLV and therefore finite (see Appendix F). Fig. 7 (a) and (b) suggest that p c = 1 2 for (super-)linear t max which matches the performance of global majority voting (but also requires at least the same runtime scaling). In contrast, Fig. 7 (c) is compatible with a non-optimal 0 < p c < 1 2 , even though we believe that still p c = 1 2 due to a (slow) tendency of the crossing point towards 1 2 for L → ∞. Finally, Fig. 7 (d) confirms that p c = 0 for a fixed-depth decoder, in compliance with both Lemma 3 and the light cone constraint.

Error correction for continuous noise
So far, we focused on the decoding of initial error patterns x(0), the probability of success P dec (respectively P dec ) and the time needed to clean the system t dec . The ultimate goal, however, is the preservation of the logical qubit in the presence of continuous noise: While we assume error-free processing of classical information, decoherence of the quantum chain is a constant source of errors, characterized by the microscopic error rate p 0 per time step δt. In this section, we first demonstrate numerically that TLV cannot cope with such perturbations of its evolution in that the lifetime of the logical qubit grows only subexponentially with the chain length. In the second part, we resolve this problem by extending the TLV decoder into the second dimension, and show that its depth grows weakly (sublinearly) with the chain length. We conclude that shallow circuits suffice for reasonably low error rates.

Continuous noise in strictly one dimension
As a first step, we evaluate the performance of TLV as follows: Starting from an error-free chain, x(0) = 0, we apply errors x (t + 1) = e(t + 1) ⊕ x(t) and TLV-steps x(t + 1) = TLV L (x (t + 1)) in turns. Here e(t + 1) ∈ Z L 2 is drawn from a Bernoulli distribution with parameter p 0 and describes the accumulated errors on the quantum chain between time t and t + 1. To quantify the ability of TLV to prevent errors from accumulating, we introduce the time to the first majority flip T ff , i.e., maj [x(T ff )] = 1 and maj [x(t)] = 0 for t < T ff . Sampling over many error histories {e(t)} yields the average T ff characterizing the time scale over which the logical qubits survives (decay time).
Numerical results are shown in Fig. 8. In (a) T ff is plotted versus 1/p 0 for different lengths L, revealing an substantial growth of the decay time for p 0 → 0. In contrast, the dependence of T ff on L seems to be much less pronounced. This is confirmed in (b) where T ff is plotted as function of the system size L for two error rates p 0 = 0.050 and 0.125: The growth with L is clearly subexponential, although the absolute scale of T ff strongly depends on the error rate. To asses the gain in decay time by using TLV, we compare it with global majority voting (∆; complete correction after each time step) and no correction at all (; accumulating errors without corrective actions). As shown in Fig. 8 (b), global majority voting exhibits perfectly exponential growth of T ff and outperforms TLV clearly. The comparison of a system without corrective actions and TLV reveals that the latter does not improve on the scaling but only increases the absolute values of T ff and their susceptibility to variations in The same data vs. the system size L for different error rates p 0 = 0.050 and 0.125 (joined bold crosses and bullets for TLV). The numerics clearly suggests that there is no exponential growth of T ff for L → ∞, i.e., the storage time of the encoded qubit grows considerably slower than for global majority voting (∆) with constant correction rate; for comparison, we show simulations and theory for ∆ with p 0 = 0.125 (disjoined bullets and circles). With no correction (), T f f becomes almost constant (joined small crosses and bullets for p 0 = 0.050 and 0.125, respectively). For statistics, we sampled 10 3 evolutions per data point to measure T ff ; the standard error of the shown sample mean is ∼ 3% such that the error bars are not visible. c Spacetime diagram of a large cluster without continuous noise (p 0 = 0) and TLV-evolution. d Evolution of the same initial state with continuous noise p 0 = 0.1 and the same scale as in c. Note that the messaging between left and right boundary of the cluster is jammed by the noise within the cluster, leading to an effective deconfinement of the charges at the cluster boundaries. p 0 . In conclusion, continuous noise thwarts the benefits one expects from making the quantum chain longer. This is in contrast to the previous sections where we considered the decoding of static error patterns and found an exponentially suppressed failure rate P dec for increasing chain length.
The susceptibility of TLV to continuous noise can be most easily understood by example: Fig. 8 (c) depicts the spacetime diagram encoding the evolution of an initial cluster of errors under TLV without continuous noise; as this is the decoding procedure discussed above, TLV erodes the cluster reliably. Since TLV effectively operates on the syndrome space, it is instructive to think of the evolution as attraction and subsequent annihilation of Z 2 -charges (the syndromes, red bullets). If continuous noise is switched on [ Fig. 8 (d)], the attractive interaction is screened by a bath of noise-induced charge-anticharge pairs and the cluster's endpoints are governed by an undirected, diffusive process. From a renormalization-group perspective, there is a confinement-deconfinement transition at p 0 = 0 which prevents the erosion of large clusters of errors, supporting their proliferation throughout the system. The susceptibility of simple one-dimensional CAs to continuous noise is a well-known phenomenon, see e.g. [36] for TLV and GKL. Indeed, due to the lack of counterexamples, it was conjectured that all one-dimensional CAs subject to noise are ergodic, that is, forget about their initial state eventually; this is known as the positive rates conjecture [44]. Peter Gács proved it wrong by providing an extraordinary complex counterexample that relies on self-simulation [45,26,46]. To the authors knowledge, there is no simpler counterexample known till this day, and it is widely believed that any non-ergodic CA in 1D must, in some form or another, implement the core mechanisms of Gács' automaton. It is therefore highly unlikely that a simple CA (such as TLV) can retain information about its initial state for t → ∞ if continuous noise is switched on. This is exactly what our numerical results suggest: The timescale T ff after which TLV forgets about the initial majority does not diverge exponentially in the thermodynamic limit-an indicator for ergodicity.

Evading noise with a two-dimensional extension
To protect the evolution of TLV in the face of continuous noise, we pay with (classical) hardware by unrolling the time evolution into the (spatial) second dimension, perpendicular to the quantum chain. Then, our previous discussions and results on the time required for decoding translate directly into statements about the scaling of the depth of this "overhead dimension". We start with a description of the envisioned setup in Fig. 9. Note that the details of its implementation, described in the next few paragraphs, serve as a proof of principle only and may be subject to optimizations depending on the physical setup chosen for its realization. We start with the quantum chain which is placed on top of a 2D substrate that hosts a classical two-layer circuit parallel and attached to the chain. The circuit has the topology of a cylinder glued to the quantum chain, see Fig. 9 (a); we illustrate both layers by slicing the cylinder along the chain and unfolding the circuit into the plane. The logical wiring of the circuit is sketched in Fig. 9  The rules defining the classical automaton (applied in Substep 2) can be divided roughly into two functional classes [see Fig. 9 (b) Substep 2]. The first is independent of the depth D L and located close to the chain. It consists of the syndrome register (upper layer), syndrome memory (upper layer), and the final correction register (lower layer) and computes the syndrome of errors that accumulated since the last syndrome measurement, taking into account the correction operations at the end of the last step. Formally,  Figure 9: 2D-evolved TLV. a The quantum chain is placed on top and framed by a 2D substrate that allows for the implementation of classical two-layer circuitry (e.g., by photo lithography) that connects to projective (measurement-) and unitary gates along the chain. The classical circuits are used to process measurement results and control unitary gates in an integrated, scalable fashion. For illustrative purposes, the two layers are drawn unfolded with a copy of the quantum chain at top and bottom. The length of the chain is L and the depth of the (unfolded) circuit is denoted by D L , the scaling of which is discussed in the text. b Detailed setup to fight continuous noise on the quantum chain. Information propagates in a feed-forward manner from top (syndrome measurements) to bottom (correction operations). At the beginning of each time step, the syndrome pattern from projective measurements is fed into the syndrome register (Substep 1, red arrows). Subsequently, the content of all horizontal registers is evolved to the next layer (Substep 2, black arrows). Finally, the results in the final correction register are applied to the quantum chain (Substep 3, red arrows). The initial syndrome register is fed by the parity of syndrome memory, syndrome register and the syndrome of the last applied correction in the final correction register (indicated by the yellow box). The shading of classical bits (squares and circles) from black to white (or vice versa) illustrates the typical operation of the circuit: The syndrome register starts off in a non-empty state at the top whereas the cumulative correction register is initialized with all bits zero. Propagation to the bottom transforms the correction register into the non-trivial result of the decoding procedure while depleting the syndrome register. The latter reaches an empty state at the bottom (with high probability, see text). The elements marked by (*) are drawn twice for illustrative purposes and exist only once in hardware (see a). c Detailed logical flow in the scalable bulk that defines the new state of the next row in dependence of the state of the last row. The shown operations implement the evolution of TLV in syndrome-delta representation directly on the syndrome registers (squares) while accumulating all applied operations on the qubits in the correction registers (circles). Note that actual qubit rotations are only applied once at the bottom, defined by the state of the final correction register. For the sake of compactness, we write c t i = c i (t) etc. as compared to the text. d Besides the ubiquitous XOR-gates, the only additional gate required is the majority gate with three inputs (MAJ3-gate) which can be easily translated into a network of elementary AND-and OR-gates.
where s(t + 1) denotes the newly measured syndrome (in the syndrome register), s(t) is the previously measured syndrome (in the syndrome memory), and c(t) encodes the previously applied correction (in the final correction register); the (inaccessible) error configuration is x(t) and e(t + 1) denotes the accumulated errors during [t, t + 1]. Eventually, the syndrome memory is overwritten with the values of the syndrome register.
The result (70b) describes only errors that occurred in the previous time interval [t, t + 1] and ignores both older errors (which are already taken into account) and previous corrections (which are not to be "corrected"). (70b) is fed into the first row (initial syndrome register) of the second sector, a translationally invariant 2D circuit (except for the boundaries) with freely adjustable depth D L . Its purpose is to simulate TLV in the syndrome-delta representation in a feed-forward manner, from top to bottom in Fig. 9 (b), where ∆ = ∂TLV L (s) is accumulated modulo 2 in the correction registers (circles) and s = ∂∆ ⊕ s is written into the syndrome register of the next row. ∂TLV L is given by Eq. (54) and the MBC modifications in (55) and (56)-and can be implemented as illustrated in Fig. 9 (c): Notably, there are only two types of logic gates required, both of which can be easily reduced to elementary gates. The XOR-gates (⊕) are equivalent to X ⊕Y = (X ∨Y )∧¬(X ∧Y ) while the majority gates on three bits (MAJ3) can be rewritten as see Fig. 9 (d). The XOR-gate can be realized in established CMOS technology with only 3 transistors [66,67], while the MAJ3-gate requires about 14 transistors. However, going beyond CMOS may be beneficial [68], depending on the environment preferred by the coherent subsystem (the quantum chain): Whereas the MAJ3-gate is rather complex in CMOS technology, it becomes an elementary logic gate in the framework of quantum-dot cellular automata [69]. The second sector evolves TLV on ∂e(t+1) in space and thereby circumvents the noise-induced deconfinement because subsequent errors e(t + 2) . . . are tackled by a completely decoupled evolution of TLV. If TLV decodes the syndrome ∂e(t + 1) successfully in time steps, the final syndrome register is empty at t * = t + 1 + D L and the final correction register contains c(t * ) = e(t + 1) which is applied to the quantum chain in Substep 3 to cancel the errors E(e(t + 1)). We point out that the occurrence of errors e and the application of corresponding corrections c are separated by D L time steps, the depth of the circuit, which reflects the finite speed of information transfer in a spatially extended decoder [recall Fig. 2 (c)]. Since errors and correction operations commute, this is not an issue.
We demonstrate the evolution of the complete circuit for a single (minority) cluster of errors without continuous noise in Fig. 10 on a chain of length L = 10 with depth D L = 5. Note how the syndrome memory prevents the automaton from issuing multiple instances of the same computation (Frames 4-6), and how the final correction register prevents the "correction of the correction" in Frame 9. Most importantly, the classical subsystem only uses syndrome information and is not aware of the actual error pattern (which we plot as blacked out qubits for convenience only).  Fig. 9 (b) for a description of the setup.) The initial syndrome register and the final correction register are highlighted yellow and green, respectively. A copy of the final correction register is reproduced between syndrome register and syndrome memory (gray) to emphasize that the initialization of the initial syndrome register depends on all of them. Shown are 6 time steps in total, each consisting of three substeps: syndrome measurement, a single step of the 2D CA, and a unitary correction. We omit trivial substeps (measurements and corrections), indicated by broken arrows. The first time step comprises frames 1-3 where the correction step is omitted at the end. Time steps 2-4 are shown in frames 4-6 where both measurements and corrections are omitted. The 5th time step starts in frame 7 and ends with the first (and last) non-trivial correction in frame 8. The 6th and final time step starts with a non-trivial syndrome measurement in frame 9 and resets the CA to the empty fixed point in frame 10. Details are given in the text.
Our setup fails to protect the qubit if, at some point in time, an error pattern accumulates during a single time step which cannot be successfully corrected by TLV within D L time steps. This may be because it is eroded to 1 instead of 0 or there are syndromes left after D L time steps so that residual errors survive. The last case splits into two subcases: First, TLV might have succeeded and reached 0 after t dec > D L time steps, or, second, the initial configuration was in the attractor of a cycle such that no correction was possible anyway (even for t → ∞).
The time that quantifies the performance of our setup is then the time-to-first-failure T tff ("decay time"), i.e., the time after which the first uncorrectable (in the above sense) error pattern appears. Its expectation value is given by where P dec = 1 − P dec denotes the restricted failure probability of TLV with t max (L) = D L , as discussed previously. Eq. (74) follows because each time step corresponds to a Bernoulli sample independent of the previous error patterns, a consequence of the spatial evolution of TLV in our 2D circuit. In Fig. 11 we show simulations of P dec as function of L for fixed Failure probability System size Figure 11: Failure probability for 2D-evolved TLV. Failure probability P dec of decoding error configurations with a 2D-evolved TLV-decoder of depth D L as a function of the system size L for microscopic error rate p 0 = 0.2. Note that failed decodings include both syndrome-free states with corrupted logical qubit and states with residual syndromes. We sampled 5 · 10 6 instances per data point. The gray rectangle (dashed boundary) is shown as inset. The solid and dashed lines show the analytic functions e −L 0.32 , 1/L, and e −0.15L as a guide to the eye; these are neither fits nor analytical results. We compare setups with constant-depth D L = const = 20 (empty squares) and sublinear-depth D L = L 0.5 (filled squares). Note that the curves intersect for L = 400 because √ 400 = 20. The inset depicts the much faster decreasing cases of unbounded depth D L = ∞ (filled circles) and linear depth D L = L (empty circles). There is no qualitative difference between the two for the shown parameters. error rate p 0 = 0.2 and four different depth scalings D L [see also Fig. 7]. Decreasing failure probabilities P dec translate via Eq. (74) into growing decay times T tff : A constant depth D L = 20 decoder does not benefit from longer quantum chains whereas both "infinite depth" and linear depth decoder perform similarly and yield exponentially increasing decay times T tff . Decoders with slowly growing algebraic depths, such as the shown D L = L 0.5 , still exhibit exponential growth of T tff , although weaker than that of global decoders with D L L.

Conclusion
Motivated by the requirement for scalable and modular decoders for topological quantum memories, we set out to construct a strictly local decoder for the one-dimensional Majorana chain quantum code. As the latter constitutes the quantum analogue of the classical repetition code, it can be efficiently decoded and stabilized by global majority voting if spatio-temporal constraints are ignored. Taking into account the time needed for classical syndrome processing and communication suggests the implementation of decoders as cellular automata. We argued that the decoding problem at hand translates into the problem of one-dimensional density classification with the additional symmetry constraint of self-duality; this led us to the twoline voting automaton TLV as promising local decoder. We equipped the latter with mirrored boundaries (called TLV) to comply with the requirement of open boundaries on the level of the quantum chain.
Both numerics and rigorous analytical results showed that TLV succeeds in decoding Bernoulli random patterns with exponentially vanishing failure rate as L → ∞. Whereas the rigorous results are restricted to small but finite microscopic error probabilities p 0 <p c ≈ 3.2 × 10 −6 , numerics suggest that p 0 < 1 2 may be enough for successful decoding. In addition, the time needed for decoding scales sublinearly for p 0 < 1 2 . In particular, we proved that the failure rate for decoding a code of length L in at most t ∝ L κ time steps (κ > 0 arbitrary) vanishes exponentially with L → ∞ for small but finite p 0 <p c . In a nutshell: for low error rates, global majority voting is not required.
In the final section we investigated the performance of TLV in the presence of continuous noise. In accordance with the expected ergodicity of simple, one-dimensional cellular automata, we argued that TLV cannot fight continuous noise because long-range communication is cut off by locally created charge-anticharge pairs. As a consequence, we had to evolve TLV into the second dimension to prevent errors from accumulating during the syndrome processing. Thereby the superior (i.e., sublinear) scaling of decoding times for TLV-as opposed to the linear scaling of global majority voting-was turned into a modest scaling of classical hardware overhead: For reasonably low error rates, simple, shallow circuits, lacking the capability of global communication, can replace the hardware-expensive global majority voting. These results add to the quest of scalable and modular realizations of actively corrected topological quantum memories.

A Cumulative Bernoulli distribution
Here we prove some of the statements about global majority voting used in Sections 2.2 and 2.3 of the main text. To this end, we start with the probability for more than half of L (odd) binary sites x i to be error afflicted (i.e., in state x i = 1) after t rounds of additive, uncorrelated Bernoulli noise: where is the renormalized single-site probability for X(t) = X 1 ⊕ · · · ⊕ X t with X i Bernoulli random variables with parameter p 0 , i.e., Eq. (75) is a special case of the cumulative Bernoulli distribution function which is known to be expressible in a closed form by the incomplete beta function, where I x (a, b) = B(x; a, b)/B(1; a, b) is called regularized incomplete beta function. With a, b ∈ N we can use to evaluate Then Eq. (79) reads which is a useful form to derive estimates and limits of Eq. (75). In particular, we can now derive the limit for 0 < p 0 ≤ 1 2 and 0 < c < ∞ lim L→∞ P L,p if we use the asymptotic expression (Stirling formula) where we used the substitution x = L−1 2 u in the last row. If we use that (0 < p we find for 0 < p 0 ≤ 1 2 and 0 < c < ∞ the final result Note that lim L→∞ P L,p 0 (c −1 L) > 0 only because p (t) renormalizes to 1 2 exponentially fast with t, such that the upper bound of the integral converges to zero. Similarly, for lim L→∞ P L,p 0 (t = 1) one easily re-derives the exponential decay to zero (modified by √ L from the integral bounds).

B Light cone constraint B.1 Derivation
Here we prove the following upper bound for the decoding probability P dec of a D-local physical decoder of linear size L: The microscopic error probability per qubit and time step is p 0 . Let the error pattern be described by the vector x of length L with syndrome s = ∂x. A given D-local decoder ∆ D then calculates a correction ∆ D (s) such that ∆ D (s) ⊕ x describes the new error state after the correction has been applied. For an arbitrary but fixed site 1 ≤ i ≤ L we define two sets: i and X i describe the sets of all error patterns that ∆ D (un)successfully corrects at site i, respectively. We define the local complement operator C D i such that for i.e., it inverts the error pattern in a region of radius D around site i. Clearly C D i • C D i = 1, such that C D i defines a bijection on the total error state space be the projector that slices the range on which C D i acts non-trivially from a state x. We have ∂x = ∂C D i x due to the boundaries of the partial complement. However, since the syndrome does not change inside the range of the local complement. We can now define the two setsX Since C D i is a bijection, we still have X =X i∪X i . Now comes a crucial step: Because ∆ D is D-local, its action on site i only depends on the syndromes within π D i x, i.e., ∂π D i x. Therefore we find that if x ∈ X i is successfully corrected at site i, then ∆ D fails to correct C D i x because its action on site i is the same. In a nutshell, Thus we haveX where is the change of errors in the light cone due to the local complement. Thus We start from the trivial relation and rewrite the first term which yields So far, all statements are exact and valid for 0 ≤ p 0 ≤ 1. Now we assume 0 ≤ p 0 ≤ 1 2 and estimate p where we used that |π D i x| ≤ 2D + 1 and p 0 /(1 − p 0 ) ≤ 1 for p 0 ≤ 1 2 . We have and therefore the lower bound on the error probability Note that this is the probability that an error at site i survives a single correction procedure with ∆ D . This lower bound can be easily recast as an upper bound on the probability of successful correction, The last step is to use this result for an upper bound on the global correction probability where 1 X i denotes the indicator function of X i . The last inequality follows from the fact that 1 X i and 1 X j may be correlated random variables for |i − j| < 2D + 1, i.e., if their past light cones overlap and they depend on common syndrome measurements. This motivates the second estimate where we assume for simplicity that L is a multiple of 2D + 1. We can separate the system into subsystems x k of length 2D + 1 such that 1 X Here we use the fact that the correctability of site k(2D + 1) only depends on a causal region of radius D. The last step is to realize that Pr p 0 (x) is a product measure due to the uncorrelated Bernoulli process, so that factorizes. Using translational invariance and our result Eq. (105), it follows the final result Note that this result is generic and we used only that the decoder (1) has only access to the syndrome ∂x which is invariant under complementation of error patterns and (2) the correction of site i only depends on nearby syndromes in the neighborhood π D i x.

B.2 Scaling behavior
Here we derive some scaling limits of Eq. (110). To this end, we assume D = D(L) to be a function of the linear size L of the code. There are three major cases: • D = const. This describes a truly one-dimensional feed-forward circuit of finite depth D. We find in the thermodynamic limit i.e., there is no successful decoding possible for any finite microscopic error rate p 0 > 0.
• 2D + 1 ∼ L κ (κ > 0). This describes a truly two-dimensional feed-forward circuit, possibly slowly growing in the second dimension if κ ≈ 0. We find in the thermodynamic for p 0 = 1 2 and κ < 1 1 2 for p 0 = 1 2 and κ = 1 1 for i.e., except for the critical point p 0 = 1 2 , there is no constraint on P dec coming from Eq. (110). At the critical point, the upper bounds depend on whether the second dimension scales slower or faster than the length of the chain. For faster scaling depth, there is no constraint, whereas for slower scaling depth, non-trivial upper bounds arise. Note that P dec ≥ 1 2 follows for p 0 = 1 2 since a completely mixing Bernoulli process destroys all encoded information about the majority. P dec > 1 2 arises whenever the decoder fails to get rid of all syndromes. P dec = 1 2 can be realized if the decoder succeeds in removing all syndromes but still fails to recover the original state in 50% of the cases.
To prove the result for 0 ≤ p 0 < 1 2 , we write p 0 /(1 − p 0 ) = q with 0 ≤ q < 1. First, note that lim because of q ≥ 0. Furthermore it is −L κ log q ≥ log L for q < 1, κ > 0 and L large enough. This allows us to estimate for some constant C > 0 and where we used that 1/e < 1 and lim L→∞ (1 + 1/L) L = e. With this result, we can find a lower bound as follows: for κ > 0. In conclusion, we have shown lim L→∞ 1 + q L κ −L 1−κ = 1 for q < 1 and κ > 0.
• 2D + 1 ∼ log L κ (κ > 0). This describes still a two-dimensional feed-forward circuit, but with an exponentially smaller second dimension. In a certain sense, it interpolates between the one-and two-dimensional cases above. Indeed, where the critical microscopic error rate is To show this, we write with q = p 0 /(1−p 0 ) and η = −κ log q. Using lim x→0 log(1+x)/x = lim x→0 1/(1+x) = 1, we find lim for η > 0. Hence The critical value η c = 1 corresponds to −κ log q c = 1 ⇔ q c = e −1/κ and therefore p c = 1/(1 + e 1/κ ). Whereas constant depth allows for no correction if p 0 > 0 and algebraically growing D, in principle, imposes no restriction at all (except for p 0 = 1 2 of course), a logarithmically growing depth could still be sufficient for low enough error rates p 0 ≤ p c < 1 2 .

C Linear eroders on finite chains
Regarding the eroder property of a finite chain, we have to relax the definition to account for the finiteness of the system as there is no qualitative difference between perturbation and background (both of which are finite). A possible modification reads as follows: Definition 3. A cellular automaton on a finite chain L = {1, . . . , L} (with arbitrary boundary conditions) is a finite-size linear eroder if there exist real constants 0 < a < 1 and m ∈ R + such that for any size L < ∞ and any finite perturbation of 0 (1) with diameter l ≤ a L, the unperturbed state 0 (1) is recovered at t dec ≤ m l.
Here we focus on TLV. Clearly, a contiguous cluster of errors touching the mirror is eventually eroded by the modified TLV rules if it is small enough (so that no signal reaches the opposite boundary before dissolving). Since the majority function is monotonic (changing an input bit 0 → 1 never changes the output bit from 1 → 0), the evolution of TLV from a non-contiguous, finite cluster of errors can be constructed from the evolution of a contiguous cluster of the same size (its convex hull) by erasing errors in the spacetime diagram. This implies that TLV is an eroder in the above sense.
We can make this statement more rigorous [see Fig. 12 (a)]: It is straightforward to verify that a finite cluster I of diameter I = l (without loss of generality contiguous, due to monotonicity) is eroded by TLV in a spacetime rectangle of dimensions [(2Rm + 1) l] × (m l) for appropriately chosen m ∈ R + (for TLV it is m = 1 and R = 4, see Appendix F). This holds also for TLV if I is separated from the edges by more than δ(l) ≡ Rml sites since  Figure 12: Proofs. a Eroder property for a finite system with mirrored boundary conditions. Errors in the cluster I are marked red. Without loss of generality, I can be made contiguous by padding holes with additional errors (black). An explanation is given in Appendix C. b Illustration of the implication in Eq. (130b): An element i ∈ x \ I x ≤l requires the existence of another element j * ∈ x \ I x ≤l−1 in a specific range given by k and l. c Explanation tree on the semi-infinite chain with mirrored boundary condition. Cells in state 0 (1) are marked white (red). The probability to realize the shown explanation tree by a mirrored Bernoulli process with rate p = p 0 is labeled by Pr. Note that the shown error pattern (red) fails to realize this explanation tree (arrows) since some explanatory sites are empty. Details are given in Appendix E.
(2Rm + 1)l = l + 2δ(l) guarantees that the cluster is eroded before the boundaries can have any effect, see Fig. 12 (a-1). Necessary for this situation is If, on the other hand, I is closer than δ(l) to one of the edges, we can no longer guarantee that it can be eroded in the neighborhood given by δ(l) due to possible interactions with its mirror image. We define a padded interval I ⊇ I of length l ≤ l < l + δ(l) = (Rm + 1)l that closes the gap between I and the critical edge. Now we know that this interval is eroded in a spacetime box of dimensions [(2Rm + 1) 2l ] × (m 2l ) due to the mirror. In the "real" chain, this accounts for an interval of length l + δ(2l ) = (2Rm + 1)l adjacent to the corresponding edge. If the latter does not make contact with the opposite edge, the original cluster I is guaranteed to be eroded, see Fig. 12 (a-2). We have the sufficient condition If we require instead this implies Eq. (122) for all critical lengths l < l + δ(l) and Eq. (121) trivially. We conclude that with a ≡ (2Rm + 1) −1 (Rm + 1) −1 any cluster of diameter l ≤ aL is eroded in finite time (linear in l). For TLV we find a = 1/45 ≈ 0.02 (which is an extremely conservative lower bound; TLV allows for much larger values of a as simulations suggest). Note that if the interval l + δ(2l ) is larger than the system [ Fig. 12 (a-3)], it is possible that the cluster relaxes into non-homogeneous fixed points or non-trivial cycles. This is a consequence of the two mirrors which allow for the periodic reflection of messages in this "CA cavity".

D Fixed points
In addition to the two homogeneous fixed points (which are stable due to the eroder property), TLV-based automata can feature up to 4 unstable fixed points, depending on the boundary conditions imposed. As shown rigorously in Appendix E, these are no threat to the decoding capabilities because their occurrence in a Bernoulli random process is exponentially suppressed. For the sake of completeness, we discuss them in the following: • Infinite chain. The original TLV features six fixed points [35]: The two homogeneous (stable) ones and, in addition, the four periodic (unstable) configurations shown in Fig. 13 (b).
• Semi-infinite chain with mirrored boundary. A modified −−→ TLV with a single mirrored boundary features 2 of the 4 unstable fixed points of the infinite chain. Note that the first two patterns in Fig. 13 (b) are not bond-inversion symmetric and therefore cannot be interpreted as a valid configuration on the semi-infinite chain. However, the latter two are bond-inversion symmetric if the mirror is placed such that the first cell is even. In contrast, if the cell next to the mirror is odd [lower two patterns in Fig. 13 Figure 13: Fixed points. a The homogeneous fixed points are always present and the only stable ones (due to the eroder property). Competing (unstable) fixed points are possible but depend on the boundary conditions: b Infinite chain (4 additional fixed points). c Semiinfinite chain with mirrored boundary condition (2 additional fixed points if the first cell is even, none otherwise). d Periodic boundary conditions (4 additional fixed points if L is a multiple of 4, 2 otherwise) e Mirrored boundary conditions (2 additional fixed points if the first cell is even and the last is odd, none otherwise).
• Finite chain with periodic boundaries. If TLV is placed on a closed ring of length L ∈ 2N, potential fixed points can be used to construct periodic ones on the infinite chain. Since there are only the four depicted in Fig. 13 (b), we have to check which of those remains invariant under periodic boundary conditions. As illustrated in Fig. 13 (d), if L is a multiple of 4, all four fixed points in (b) can be transfered to the finite chain with PBCs. However, if L / ∈ 4N, the two 4-periodic patterns are no longer invariant and only the two 2-periodic patterns survive (compare the yellow patterns on the left with the colored patterns on the right).
• Finite chain and mirrored boundaries. If TLV is placed on a chain of length L ∈ 2N with mirrored boundaries, we can infer from the semi-infinite case in Fig. 13 (c) that only if the first (left) cell is even and the last (right) cell is odd, two additional fixed points survive. Otherwise the homogeneous configurations are the only ones, Fig. 13 (e). This is the modification TLV we use in this paper.
None of the additional fixed points are relevant for the correction of Bernoulli random patterns because the probability of their occurrence is exponentially suppressed with L. This follows directly from the fact that there are no non-trivial preimages of these fixed points, i.e., their attractors are trivial. The only way to end up in one of them is that the noise gives rise to its pattern by chance. We checked this for finite chains of TLV by solving the corresponding systems of boolean equations to determine all fixed points and their preimages.

E Sparse errors and correction time
Here we prove a central statement of this work: The probability for a chain of length L, ruled by TLV with MBC to be in a non-empty state x(t) = 0 after t ∝ L κ time steps vanishes exponentially with L for arbitrary κ > 0 if the initial state is a Bernoulli random configuration with single-site error probability p 0 < p c for some critical value 0 < p c ≤ 1 2 . For convenience, we reproduce the definitions from the main text: Let x ⊆ Z be an arbitrary subset (error pattern). A finite subset I ⊆ x is called cluster of diameter I = max{|x − y| | x, y ∈ I}. If we fix an integer k > 0 (the sparseness parameter, to be chosen later), the territory T k (I) is defined as the interval of integers with distance at most k I from I. Two clusters I 1 and I 2 are called independent if at least one does not intersect the territory of the other, i.e., I 2 ∩ T k (I 1 ) = ∅ or I 1 ∩ T k (I 2 ) = ∅ (or both); since I ⊂ T k (I), this implies I 1 ∩ I 2 = ∅. If there exists a partition of x into a family I = {I a } of pairwise independent clusters I a , x = a I a , then x is called sparse. A cluster I ⊆ x with T k (I) ∩ x = I is called independent in x and we write I x.
To state our first result, we need some additional terminology: First, I ≤l denotes the family of clusters I ∈ I with diameter I ≤ l and x\I ≤l ≡ x\ I∈I ≤l I is the subset of sites for given x that remains after cleaning all clusters of diameter at most l. Second, a (infinite) mirrored Bernoulli random configuration x ⊆ Z is defined by the single-site probability Pr(x i = 1) = p 0 for sites i > 0 and the mirror constraint We can now state our main result (an adaptation of Theorem 4 in Ref. [64] for mirrored Bernoulli random configurations): Proposition 1. Consider infinite mirrored Bernoulli random configurations x with singlesite probability p 0 . Let k ∈ N be a given sparseness parameter.
follows trivially from the construction of I x l (see step 1 above). To prove Eq. (126), we show that ∀ I (x\I x <l ) : I ≥ l, i.e., our construction never recreates clusters of smaller diameter (which could, in principle, happen because we are successively deleting clusters): Assume ∃ I * (x\I x <l ) : I * = l * < l. It must have been T k (I * ) ∩ (x \ I x <l * ) ⊃ I * because otherwise our prescription demands I * ∈ I x l * and we had I * ∩ (x \ I x <l ) = ∅. Since T k (I * ) ∩ (x \ I x <l ) = I * by assumption, there must have been a clusterĨ ∈ I x l with l * ≤l < l and T k (I * ) ∩Ĩ = ∅. But because Ĩ ≥ I * , this implies T k (Ĩ) ∩ I * = ∅. Since I * Ĩ and I * ⊆ x \ I x <l , this contradicts the independence ofĨ in x \ I x <l and we are done.

Explanation trees.
Clearly (x \ I x ≤l ) ⊆ (x \ I x <l ), i.e., fewer and fewer errors in x survive with increasing l because on each level additional clusters are deleted from x. This monotonicity holds also for all monotonic sequences (l n ) ∈ N N with l n > l n−1 for all n ∈ N: (x \ I x ≤ln ) ⊆ (x \ I x ≤l n−1 ). In the end, we aim to upper-bound the probability of an arbitrary site i ∈ Z to belong to x \ I x ≤l . We first prove this for x \ I x ≤ln instead, where (l n ) will be specified below, and generalize our result (with some tradeoff) to l n = n in the next (and last) step 5.
For the sake of simplicity, let l n be an odd integer for n ≥ 1 and define l 0 ≡ 0 in the following. To bound the probability for i ∈ x \ I x ≤ln from above, we start with a trivially true, sufficient condition: For an arbitrary configuration y with i ∈ y, we have (n ≥ 1) ∀ j∈y : |i − j| ≤ l n 2 ∨ |i − j| > k + 1 2 l n ⇒ ∃ I * y, I * ≤ln : i ∈ I * (128) (I * includes all j ∈ y with |i − j| ≤ l n /2; the strict ">" becomes important only for even l n ).
The (equivalent) contraposition reads If we now set y = x \ I x ≤ln−1 = x \ I x <ln and use Eq. (126) with l = l n (this is the crucial step that exploits the structure of I x ), we end up with the sequence of implications ⇒ ∃ j * ∈x\I x ≤ln−1 : where we used x \ I x ≤ln−1 ⊆ x \ I x ≤l n−1 (since l n−1 ≤ l n − 1) in the last line; this is illustrated in Fig. 12 (b). In combination with i ∈ x \ I x ≤ln ⇒ i ∈ x \ I x ≤l n−1 , Eq. (130c) gives rise to a binary tree of depth n (with sites as vertices) that explains the existence of i ∈ x \ I x ≤ln at its root if the sites at all its leafs belong to x \ I x ≤0 = x; it is aptly called explanation tree (ET) [65]. An example is shown in Fig. 12 (c).
By counting possible ETs and calculating their probability of being realized based on the (mirrored) Bernoulli distribution on their leafs, it is possible to upper bound the probability for i ∈ x \ I x ≤ln because the existence of at least one realized explanation tree is a necessary condition. Counting explanation trees and computing their probability (by counting their leafs) is complicated by the fact that for arbitrary l n > l n−1 > l n−2 > · · · > l 0 the allowed ranges for j * on different levels n intersect. Therefore the number of leafs is not fixed and only upper bounded by 2 n (reducing the number of leafs can be achieved by "reusing" a site to explain more than one other site). This complicates the derivation of the probability for the existence of a realized explanation tree considerably. If, in contrast, (l n ) is chosen so that different subtrees cannot intersect, the number of leafs for any ET is fixed at 2 n . This can be guaranteed if on each level 1 ≤ m ≤ n the distance between any site i and its explanatory site j * is larger than the maximum width of the subtrees emanating from each of them. Formally, where the factor of 2 is necessary because two subtrees (one at i and one at j * ) grow independently. Equating both sides yields the tightest solution via the recursion with g m = log N m and f m = m and initial conditions g 0 = 0 = f 0 (N 0 = 1). Diagonalization of the matrix yields with χ m = af m + g m and ϕ m = √ 1 + a 2 f m . Now we can use that recursions of the form X n = A X n−1 + B are generically solved by and X n = X 0 + B n for A = 1. With the initial conditions, we find immediately and we find N m ≤ [(2k)(4k + 3)] 2 m with e a+b = (2k)(4k + 3).
Now comes the only step where we use the mirror symmetry of the Bernoulli configuration x: The probability for all 2 n leafs of a particular ET to be occupied is (p 0 ) 2 n for sites that are independent Bernoulli random variables. The mirror symmetry, however, introduces perfect correlations between pairs of sites. Since all leafs are distinct sites (on Z) there are at least 2 n /2 independent Bernoulli random variables associated to an ET (the worst case being a completely mirror-symmetric explanation tree). Therefore the probability for an arbitrary ET to be realized is upper bounded by p 0 2 n (as compared to (p 0 ) 2 n in systems without mirror symmetry). This reflects the fact that mirrors "enlarge" error clusters artificially by their mirror images. This is illustrated in Fig. 12 (c).
In conclusion, we find an upper bound for an arbitrary site i ∈ Z to be in x but uncovered by clusters up to diameter l n of the constructive family I x . This follows from the subadditivity of probability measures and the statement that i ∈ x \ I x ≤ln if there is at least one of N n possible ETs realized by x. If we define (2k)(4k + 3) p c = 1 ⇔p c ≡ [(2k)(4k + 3)] −2 , it follows with α ≡ (2k)(4k + 3) p 0 Pr i ∈ x \ I x ≤ln ≤ α 2 n .
For on-site probabilities p 0 <p c ⇔ α < 1, this leads to a double-exponential decay of the probability to remain uncovered on level l n .

Upper bound.
Above we showed that Pr i ∈ x \ I x ≤ln ≤ α 2 n with l n = (4k + 3) n−1 . The doubleexponential decay of the probability with n and the exponential growth of the level l n suggest that there is an exponentially decaying upper bound with l (recall that our choice of l n was technically motivated: it is easier to count the leafs of ETs if the branches do not intersect).
Indeed, if we use the monotonicity Pr i ∈ x \ I x ≤l ≤ Pr i ∈ x \ I x ≤l−1 , it follows that if we require α l β n ! = α 2 n−1 for β > 0, because for l ∈ [l n−1 , l n ] we know that Pr i ∈ x \ I x ≤l ≤ α 2 n−1 and α 2 n−1 ≤ α l β per construction (because α < 1).
This concludes the proof.
Note that for p 0 >p c the upper bounds become trivial which still allows for an exponential decay of Pr i ∈ x \ I x ≤l . Therefore we conclude that there is a critical value p c with 0 < p c ≤ p c such that Pr i ∈ x \ I x ≤l vanishes exponentially for l → ∞ if p 0 < p c . Simulations suggest that p c = 1 2 so thatp c 1 is a rather weak lower bound on the true critical value, see Appendix F.
Eventually we want to employ Prop. 1 to derive an upper bound for the probability of errors to survive the first t steps of TLV on a finite chain with mirrored boundaries. To this end, we first need a consequence of Prop. 1: with γ = − log(α) (γ > 0 for p 0 <p c ), and 0 < β < 1 as in Prop. 1. Here the sparseness parameter is given by k = 2Rm = 8 where m = 1 and R = 4 are the eroder parameter and the radius of TLV, respectively.
Proof. Because −−→ TLV is an eroder, there is a constant m such that any cluster of errors I on a background of zeros is erased for t ≥ m I . During this process, signals emitted beyond the boundaries of I can at most travel Rm I sites where R is the radius of the local rules (or the propagation speed of information). If we set k = 2Rm as sparseness parameter, an error cluster I x(0) that is independent in x(0), is safely erased after at most m I time steps without interfering with its environment. This follows because signals from I and x(0) \ I can meet only after traversing the void territory T k (I) \ I which takes at least k I /(2R) = m I time steps-but the last trace of I is erased after m I time steps. Therefore the evolution of x(t) for t ≥ m I is completely independent of the configuration within the boundaries of I (this motivates the notion of independent clusters). See Fig. 6 of the main text for an illustration.
If x(0) is a mirrored Bernoulli random configuration with parameter p 0 <p c with k set as above, we know from Prop. 1 that the probability of any site i ∈ N to be uncovered by clusters in I x(0) of diameter at most l is upper-bounded by with 0 < α, β < 1. By subadditivity, an analogous bound holds for any finite subset J ⊂ N, Let U r (J ) be the interval of all sites within distance r ≥ 0 of J and set J = U tR (J ) for time t ≥ 0. Then Pr U tR (J ) ∩ x(0) \ I This holds for all l ∈ N, especially for l = t/m ( • is the floor function): Pr U tR (J ) ∩ x(0) \ I If we exploit that no signal from outside U tR (J ) can reach J up to time t and that all errors that belong to independent clusters of diameter l ≤ t/m ≤ t/m are erased at time t, we can conclude that and consequently Pr (x(t) ∩ J = ∅) ≤ Pr U tR (J ) ∩ x(0) \ I Therefore we find Pr (x(t) ∩ J = ∅) ≤ (2tR + |J |) α t/m β .
With Lemma 2, we are ready to tackle the case of finite chains: with t * L = L/2R due to the finite speed R of information transfer. The put it in a nutshell: the leftmost half of a finite chain evolves exactly like the corresponding section of a halfinfinite chain adjacent to the mirrored boundary for t ≤ t * L . This is obvious because these sites cannot be influenced by the existence/non-existence of the rightmost boundary as long as it does not enter their past light cone (which happens at t ∼ L/2R or later). If we combine this with the fact that, for Bernoulli distributed initial states, x 0 and y 0 are uncorrelated, it follows immediately that all results on −−→ TLV hold also for TLV as long as only times t ≤ t * for t ≤ t * L . On account of the reflection symmetry of TLV, all statements hold also for the rightmost half [L/2 + 1, L] with a mirrored boundary to the right (then with a reflected, half-infinite set of rules ←−− TLV). Therefore subadditivity yields Pr (x(t) = ∅) ≤ (4tR + L) exp −γ t/m β for t ≤ t * L . Here comes the crucial step: Since the chain is finite and x = ∅ is a fixed point of TLV, it is x(t * L ) = ∅ ⇒ x(t) = ∅ for all t > t * L . It follows that Pr (x(t) = ∅) ≤ Pr (x(t * L ) = ∅) for t ≥ t * L . This leads to Pr (x(t) = ∅) ≤ (4R {t} + L) exp −γ {t}/m β with {t} ≡ min{t, t * L } for all t ≥ 0.
Note that the lower-bounded decay of the probability is to be expected for a finite system: Due to the finite state space, there is an upper bound for t (depending on L) such that the system either (1) relaxed to the clean state, (2) to a non-clean fixed point, or (3) entered a non-trivial cycle. In the first case, it is clean forever, whereas in the latter two cases, it can never become clean. Therefore the probability to be not clean cannot decrease arbitrarily and must be bounded from below for fixed L and t → ∞.
However, if we are interested in the thermodynamic limit, L → ∞, we can ask how long one has to wait for TLV to clean the system almost surely. This leads us to our main result: Corollary 1. Consider a finite chain of length L on L = {1, . . . , L} governed by TLV with mirrored boundaries and initial configurations x(0) ⊆ L drawn from a Bernoulli distribution with parameter p 0 . For κ ∈ R with 0 < κ < 1, the probability of x(t) = TLV Decoding time Figure 14: Eroder parameter. Time t dec needed by TLV to erase a homogeneous cluster of diameter l completely. The red bullets mark exact results from simulations, featuring a 4-periodic structure that derives from the rules of radius R = 4. The most stringent upper bound is given by t dec ≤ 3 4 · l + 1 (dashed line) but we use t dec ≤ 1 · l (solid line) for the sake of simplicity (i.e., m = 1). These bounds are also valid for non-homogeneous clusters due to the monotonicity of TLV.

F Parameters
TLV is a linear eroder, i.e., clusters on a background of zeros/ones with diameter l are erased after at most ml time steps, where m ∈ R + is a rule-specific constant: for arbitrary (independent) clusters I. To determine m, it is easiest to simulate the evolution of homogeneous clusters of ones on a background of zeros for increasing diameter l. The