SciPost Submission Page
A NEAT Quantum Error Decoder
by Hugo Théveniaut, Evert van Nieuwenburg
This is not the current version.
|As Contributors:||Everard van Nieuwenburg|
|Arxiv Link:||https://arxiv.org/abs/2101.08093v1 (pdf)|
|Date submitted:||2021-02-22 16:48|
|Submitted by:||van Nieuwenburg, Everard|
|Submitted to:||SciPost Physics|
We investigate the use of the evolutionary NEAT algorithm for the optimization of a policy network that performs quantum error decoding on the toric code, with bitflip and depolarizing noise, one qubit at a time. We find that these NEAT-optimized network decoders have similar performance to previously reported machine-learning based decoders, but use roughly three to four orders of magnitude fewer parameters to do so.
Submission & Refereeing History
You are currently on this page
Reports on this Submission
Anonymous Report 3 on 2021-4-22(Invited Report)
- Cite as: Anonymous, Report on arXiv:2101.08093v1, delivered 2021-04-22, doi: 10.21468/SciPost.Report.2824
After a discussion with the editor-in-charge, due to my initial delay accepting this review request, I am only submitting short general opinion, not a full length report:
I find this manuscript represents a fascinating work. When using neural networks for physics motivated tasks we are generally lacking on good optimisation methods for their architectures. The introduced method, based on genetic optimisation, works very well and appears to scale very well. The introduced method is original and creatively addresses one aspect of scalability of error correction decoders. I am very impressed with the result!
Anonymous Report 2 on 2021-4-22 (Invited Report)
- Cite as: Anonymous, Report on arXiv:2101.08093v1, delivered 2021-04-22, doi: 10.21468/SciPost.Report.2820
1. Using the NEAT algorithm, the authors can find much smaller neural networks compared to previous works. This can lead to faster decoding time.
1. Not clear how to scale to larger distance codes and multiple rounds of measurements
This work demonstrates that it is possible to automatically find smaller networks to decode surface codes compared to manually designed ones. I find it satisfies the acceptance criteria. (I'm not a good judge of writing and formatting, so I won't comment on these aspects)
I have a few questions for the authors. Please at least add some comments about question 1 and 2 in the paper.
1. In recent years, there are a lot of works on AutoML (https://en.wikipedia.org/wiki/Automated_machine_learning, https://en.wikipedia.org/wiki/Neural_architecture_search). It also cares about finding neural network structures automatically. Can you comment how Neat compared to some newer methods in AutoML?
2. Why is the gradient-free aspect of NEAT considered as an advantage? In the recent years, a main reason to use neural networks is that they work so well together with gradient optimization.
3. A related question is, since NEAT doesn't need gradient optimization, have you considered to use some other machine learning models such as decision trees instead of neural networks?
4. I feel a good way to scale-up might be concatenate your neural decoders with another easily scalable decoder. For example, see https://arxiv.org/abs/2101.07285.
Will this be possible? Do you think the end result can be better than this work?
Anonymous Report 1 on 2021-4-20 (Invited Report)
- Cite as: Anonymous, Report on arXiv:2101.08093v1, delivered 2021-04-20, doi: 10.21468/SciPost.Report.2812
The paper by Théveniaut and van Nieuwenburg presents a neural network based decoder for the toric code built on the NEAT algorithm. Similarly to recent decoders based on deep reinforcement learning (DRL), the neural network takes as input an error syndrome and outputs a best qubit action for error correction. However, here, instead of using standard backpropagation and gradient descent to train the network, an ensemble of networks is evolved in an evolutionary fashion through genetic mutations that change weights or merge networks. I find the results novel and interesting, and the presentation is very accessible both of the decoder and the NEAT algorithm. In terms of results the performance is not very impressive, the NEAT decoder does not outperform standard minimum matching, but the paper is more exploratory in nature and presented as such. As one of many different machine learning approaches to decoding topological codes, this is certainly one of the most original, with potential to be developed further.
I have a few comments, of which I think the first one needs to be addressed properly.
1) There is too little details about the training (evolution) of the networks. What is the convergence criterion?
If I understand correctly the reward is formulated in a way that should in principle give an optimal (maximum likelihood) decoder (in contrast to what’s stated in the paper below algoritm 2):
The scheme takes as input a randomly generated error chain, based on the corresponding syndrome the decoder suggests a correction chain, and the complete chain is evaluated with respect to non-contractible loops corresponding to logical operators. A decoder that learns to maximize the return based on this reward should be optimal. (Although it may be difficult to converge since the reward signal will be obscured by the statistical fluctuations that follow from using randomly generated error chains which may or may not fall in the most likely equivalence class.) In the paper it is stated that the results are based on “best neural-network found by the NEAT algorithm after a few hundreds of generations”. What does this imply in terms of well the network actually manages to maximize the return? Given the suboptimal performance, especially for depolarizing noise, it seems likely that the evaluated networks are quite far from maximizing the return. This may also have implications for the small size of the networks as shown in Table 1. Wouldn’t they continue to grow if a more stringent criterion for the training convergence was used? How large the training set of syndromes is would probably also have implications on this growth of the network size.
2) I find that there is some confusion in the presentation about what code is used. In the actual calculations (as shown in Fig. 5 and 7) the standard toric code is used. In Fig 2 is shown the rotated code. If this is put on the torus as in Fig 2, the space of logical operators are not as neatly defined as for the standard toric code where logical X and Z are put on alternating bonds. I don’t think there is anything wrong in principle here, but it may have been better to stick to the standard toric code representation.
Also, for information, Figure 1 and 2 does not seem to display properly on Safari and Preview.
3) When referring to maximum likelihood decoders I suggest that the authors also cite the Markov chain Monte Carlo based decoder: J. R. Wootton and D. Loss, Phys. Rev. Lett. 109, 160503 (2012).
Typo: There is an extra “of”, first sentence of section IV.