The Quantum Cartpole: A benchmark environment for non-linear reinforcement learning

Kai Meinerz; Simon Trebst; Mark Rudner; Evert van Nieuwenburg

SciPost Submission Page

The Quantum Cartpole: A benchmark environment for non-linear reinforcement learning

by Kai Meinerz, Simon Trebst, Mark Rudner, Evert van Nieuwenburg

This is not the latest submitted version.

This Submission thread is now published as

SciPost Phys. Core 7, 026 (2024)

Submission summary

Authors (as registered SciPost users):

Kai Meinerz

Submission information
Preprint Link:	https://arxiv.org/abs/2311.00756v1 (pdf)
Code repository:	https://zenodo.org/records/10060570
Data repository:	https://zenodo.org/records/10036492
Date submitted:	Nov. 8, 2023, 11:54 a.m.
Submitted by:	Meinerz, Kai
Submitted to:	SciPost Physics

Ontological classification
Academic field:	Physics
Specialties:	Quantum Physics

Abstract

Feedback-based control is the de-facto standard when it comes to controlling classical stochastic systems and processes. However, standard feedback-based control methods are challenged by quantum systems due to measurement induced backaction and partial observability. Here we remedy this by using weak quantum measurements and model-free reinforcement learning agents to perform quantum control. By comparing control algorithms with and without state estimators to stabilize a quantum particle in an unstable state near a local potential energy maximum, we show how a trade-off between state estimation and controllability arises. For the scenario where the classical analogue is highly nonlinear, the reinforcement learned controller has an advantage over the standard controller. Additionally, we demonstrate the feasibility of using transfer learning to develop a quantum control agent trained via reinforcement learning on a classical surrogate of the quantum control problem. Finally, we present results showing how the reinforcement learning control strategy differs from the classical controller in the non-linear scenarios.

Current status:

Has been resubmitted

Reports on this Submission

Report #2 by Anonymous (Referee 2) on 2024-2-16 (Invited Report)

Cite as: Anonymous, Report on arXiv:2311.00756v1, delivered 2024-02-16, doi: 10.21468/SciPost.Report.8570

Strengths

1- The results seams to be correct and the code is available
2- Comprehensive comparison between LQGC and PPO on the quantum cartpole and it's quantum version.

Weaknesses

1- Results not too impactful, given already Ref.[7].
2- Some sections are not clearly written and some important details are missing.

Report

The authors consider the quantum cartpole problem and it's classical analog (which is not the classical cartpole problem). They apply the LQGC algorithm and the PPO (a Reinforcement Learning algorithm) with and without an estimator (the Kalman filter and an other PPO agent) in order to solve these problems (don't let the quantum or classical particle fall down the hill).

I do not recommend publication of this manuscript in SciPost Physics because it does not fulfill the acceptance criteria, in particular the expectations criteria: I think the system is quite simple and I do not think the methods and the results are too novel, given also ref.[7]. However the results are correct and interesting and I suggest publication in SciPost Physics Core.

Requested changes

1) The explanation of the quantum cartpole in not clear:
1a) at the beginning of page 4 they use $dt$ and $\Delta t$ and it is not clear if it is a typo or if they are different. Related to this, it is not clear if, when changing $N_{meas}$ , the time between each measurement is proportionally varied (and thus the time between each control is constant) or if it is kept constant (and thus the time between each control changes).
2) It is not clear how the estimator agent is then used in the control agent. This detail is very important in order to have a fair comparison between the performance of the control agents with and without state estimators. In fact, it is important to clarify that the "control agent + estimator agent" do not have access to more information about the system than the "control agent" alone. Otherwise the improvement in performance is obvious.

validity: good
significance: good
originality: ok
clarity: ok
formatting: excellent
grammar: excellent

Report #1 by Anonymous (Referee 1) on 2024-2-7 (Invited Report)

Cite as: Anonymous, Report on arXiv:2311.00756v1, delivered 2024-02-07, doi: 10.21468/SciPost.Report.8513

Strengths

Extensive analysis of the quantum cartpole stabilization using reinforcement learning

Weaknesses

Technical language
Missing information in main text on assessment criteria

Report

The paper is an interesting and detailed numerical study of the efficiency of a feedback-based control algorithm. The study is performed for the dynamics of the classical and quantum cartpole, with and without state estimator and for different potentials.

The language is quite technical, the authors use several acronyms and remaind the reader to the appendix - also for important definitions without which the text results hard to understand.

The authors do not explain the criteria they employ for assessing the performance. This concerns Fig. 3, 4, and 6 (Here t_termination, not defined, appears on another unspecified scale, different from the one of Fig. 3 and 4). I could not find any explanation of the white line in Fig. 4.

This paper could be published in some form after the authors have revised the text. In view of the acceptance criteria of SciPost Physics (https://scipost.org/SciPostPhys/about#criteria) I cannot recommend this work: the advance with respect to the existing paper of Wang et al (Ref. [7] of this paper) is majorly technical. On the other hand, the work here presented is useful and contains interesting ideas for future works. I therefore recommend to consider the paper for publication in SciPost Physics Core.

Requested changes

See report

validity: good
significance: good
originality: good
clarity: good
formatting: excellent
grammar: excellent

SciPost Submission Page

The Quantum Cartpole: A benchmark environment for non-linear reinforcement learning

by Kai Meinerz, Simon Trebst, Mark Rudner, Evert van Nieuwenburg

This is not the latest submitted version.

This Submission thread is now published as

Submission summary

Abstract

Current status:

Reports on this Submission

Report #2 by Anonymous (Referee 2) on 2024-2-16 (Invited Report)

Strengths

Weaknesses

Report

Requested changes

Report #1 by Anonymous (Referee 1) on 2024-2-7 (Invited Report)

Strengths

Weaknesses

Report

Requested changes

Login to report or comment