SciPost Submission Page
Krotov: A Python implementation of Krotov's method for quantum optimal control
by Michael H. Goerz, Daniel Basilewitsch, Fernando Gago-Encinas, Matthias G. Krauss, Karl P. Horn, Daniel M. Reich, Christiane P. Koch
This Submission thread is now published as
Submission summary
Authors (as registered SciPost users): | Michael Goerz · Christiane Koch |
Submission information | |
---|---|
Preprint Link: | https://arxiv.org/abs/1902.11284v4 (pdf) |
Date accepted: | 2019-12-04 |
Date submitted: | 2019-10-10 02:00 |
Submitted by: | Goerz, Michael |
Submitted to: | SciPost Physics |
Ontological classification | |
---|---|
Academic field: | Physics |
Specialties: |
|
Approach: | Computational |
Abstract
We present a new open-source Python package, krotov, implementing the quantum optimal control method of that name. It allows to determine time-dependent external fields for a wide range of quantum control problems, including state-to-state transfer, quantum gate implementation and optimization towards an arbitrary perfect entangler. Krotov's method compares to other gradient-based optimization methods such as gradient-ascent and guarantees monotonic convergence for approximately time-continuous control fields. The user-friendly interface allows for combination with other Python packages, and thus high-level customization.
Author comments upon resubmission
We respond to the specific requested changes in the first referee's report as follows:
1.-2.: We agree that the manuscript should be self-contained in specifying the algorithm implemented in the package in full detail, to the point that the reader could implement Krotov's method on their own. However, it is not the aim of the paper to re-derive Krotov's method in full. This derivation is both lengthy and highly technical and is best found (apart from Krotov's original papers) in the cited Reich et al, J. Chem. Phys. 136, 104103 (2012).
We have restructured the manuscript to put more emphasis on the *usage* of the package, and to keep more technical aspects such as the discussion of Krotov's update equation and the full description of the algorithm in the Appendix. This now includes a complete pseudocode specification of the optimization method in Appendix B. This augments the less technical description of the algorithm in Appendix A3. Also, we now introduce the Schrödinger and Liouville equations explicitly at the beginning of Section 2.
3.: Reference [20] was meant to relate to the distinction between sequential and concurrent update schemes only. We have split the sentence to clarify this and generally revised the discussion.
4.: We agree that this sentence is confusing outside of the full derivation of Krotov's method (which is out of scope for the current manuscript). We have removed it. As a clarification to the referee: we were referring to Phi in Eq (7) of ReichJCP2012.
5.: We have added Eq. (26) in the manuscript that explains how the update equation follows from the choice of $g_a$.
6.: We have revised this section to specify that the chi-states depend on the forward-propagated states if they are more than linear in the overlaps for forward-propagated states and target states.
7.: We have written out the explicit time-discretized update equation, and revised the discussion.
8.: We now reference Appendix C before discussing the example, and mention the BSD license in Appendix D.
9.: Our scope is that of open-loop control methods, but within that scope, our aim is to give a discussion that is complete with respect to all methods that we are aware of. However, especially in Fig. 2, each method is meant to include a significant number of variants. We have extended the discussion of the different methods significantly. We have also included a discussion of the recent GROUP/GOAT methods.
10.: Monotonic convergence is still expected in the discretized scheme, assuming the controls are discretized from continuous fields with a time step sufficiently small that the discretization error in the propagation is negligible. It may require a sufficiently large value for the step width lambda: values that are too small may lead to numerical instabilities where monotonic convergence is lost, but typically this is easily detected (and, the resulting optimized controls would be "spiky", violating the requirement that the discretized controls must be a good approximation to a continuous fields. We have added this dependency on lambda to the text.
11.: The line search in GRAPE is performed automatically in each iteration to determine how far to change the field in the direction of the gradient. Without a line search, the algorithm will not converge. In contrast, the parameter lambda does not require a line search. In the continuous limit, monotonic convergence is guaranteed for *any* value of lambda. In the discretized scheme, the only limit on lambda is that very small values will lead to numerical instability. Beyond that, the choice of lambda has some effect on the smoothness of the optimized fields: the less the controls change in each iteration (large lambda), the smoother the result, typically. In practice, lambda is usually set at the beginning of the optimization and kept unchanged. Alternatively, as we discuss, one may start with a large value of lambda the first few iterations to counteract the typically large changes in the first few iterations, and then lower the value by an order of magnitude or so after that. However, it is not common to change lambda in each iteration, and we believe that doing so (by a line search) is unlikely to give an improvement that is worth the additional cost of the line search.
12.: We concur with the referee and have removed the paragraph.
13.: We have revised the section, and included a reference to Nevergrad.
14.: We did not mean to imply that CRAB is the only relevant gradient-free method. We have revised the discussion accordingly.
15.: While Krotov's method in its standard form does not contain bandwidth restrictions that would guarantee smoothness, assuming the guess pulses are approximately continuous, it is an inherent feature of the method that the optimized pulses will also be continuous. This is up to numerical instabilities due to a choice of a lambda that is too small (which is easily detected). That being said, in practice GRAPE can typically be applied to approximately time-continuous pulses as well without problems, so this is largely a matter of personal preference. The advantages of Krotov are those listed in section 4.3 (monotonic convergence, faster initial convergence, and no line search).
16.: We are currently exploring the use of Cython for an efficient propagator that we hope will extend the usability of the krotov package to significantly larger Hilbert space dimensions.
17.: The are no inherent bounds on the control field (bounds may be introduced through parametrization, as outlined in Section 5). The function S(t) is not multiplied with the control field, but with the *update* in each iteration. Thus, the only bound it may place on the optimized fields is that the optimization will not change the field amplitude for any value of t where S(t) is zero. In particular, S(t) commonly is zero at t=0 and t=T, which preserves the original values of the guess control at t=0 and t=T (which often are also zero). We have renamed the corresponding parameter from "shape" to "update_shape" in the package, and rewritten the discussion to clarify this.
18.: We have revised the section to explain the origin of the monotonic convergence.
19.: We agree with the referee that a comparison of different optimization algorithms applied to the same physical problems is interesting, but it is beyond the scope of the current paper. Moreover, the current implementation of GRAPE and CRAB is limited and currently does not allow for arbitrary time-continuous guess controls. A comparison would thus require extending QuTiP's capabilities. We note that there are some comparisons of the various method in the cited literature, e.g. Schirmer et al. New J. Phys. 13, 073029 (2011), Machnes et al. Phys. Rev. A 84, 022305 (2011), Jäger et al. Phys. Rev. A 90, 033628 (2014), and Sørensen et al. Phys. Rev. A 98, 022119 (2018).
20.: In most situations, H is indeed identical in all objectives. However, this is not a requirement. The particular example where H is different in each objective is the "ensemble optimization" discussed in Section 3.3. The idea here is to consider the same process under different Hamiltonians, each with a different set of "noise" parameters. By optimizing over all objectives (noise Hamiltonians) simultaneously, the result will ideally be "robust".
21.: We require the "propagator" to simulate the time dynamics over a single time step. The mesolve routine in QuTiP only supports propagation over an entire time grid. The overhead of restarting mesolve in each step would be prohibitive. We have experimented with refactoring the internals of mesolve into a more suitable form, but this is not without challenges.
22.: H[1][1] is the guess_control from the "hamiltonian" function. We have added a comment to explain this.
23.: The value of lambda may be modified in a callable passed as "modify_params_after_iter" to "optimized_pulses". We have clarified this in the text.
24.: We do not agree that mentioning Anaconda as a recommendation is problematic. QuTiP is non-trivial to install, and in our opinion Anaconda currently provides the easiest way to do so on systems that do not have the necessary compilers installed to build extension modules.
25.: We have restructured the manuscript according to the referee's suggestion.
26.: The shape (envelope) of the control field and the "update shape" that scales the update at each point in time, are not related. We have revised the krotov package, and the paper, in an attempt to make this more explicit.
27.: We have revised the discussion to explain all arguments to the optimized_pulses function, and chi_constructor in particular
In response to the second referee's requested changes,
1.: We have significantly expanded the comparison of Krotov's method and other relevant optimization methods.
1.1: We have added a sentence to the abstract defining Krotov's method as a gradient-based method, according to the referee's suggestion.
1.2: We have revised the discussion of Krotov's method substantially in a way that should clarify time discretization.
1.3: We are very interested in exploring the possibility of integrating Krotov's method with current quantum cloud-computing libraries. However, at this time we have insufficient experience with these libraries to enter into a meaningful discussion. We hope to rectify this in future publications.
1.4 We have revised the last paragraph of the introduction to include the information in the appendices.
2.: We have added a plot to the manuscript illustrating the result of the example. As the numerical effort is mainly in the propagation routine (which in general should be user-supplied), not in the implementation of Krotov's method, performance benchmarks would not be very meaningful. We may reconsider benchmarks once a Cython-based single-time-step propagator becomes available.
3.: We have added an appendix containing the full pseudocode for the method, augmenting the high-level description of the algorithm.
4.1: We have added the references to the manuscript.
4.2: We have added a discussion of the GROUP/GOAT method.
4.3: We now cite both QuTiP papers.
4.4: The scope of our discussion is only open-loop control schemes, which we have clarified in the text.
5.1: Some of the code snippets require considerable setup. We feel that including this additional code would detract from the clarity of the code snippets. However, the full scripts from which the snippets originate are available online as part of the package's source code repository. This is mentioned in the text and in Appendix D.
5.1.1: The code snippets are not intended to be run by copy/paste. The complete example script of Section 2.3 is available online. For the shorter snippets from Section 3, complete scripts are also available online, although we would recommend a reader to experiment with the equivalent interactive notebooks in the online documentation instead. The script files for the shorter snippets are intended mostly for automated testing, to verify their correctness.
5.1.2: "itertools" is a standard library package. We have clarified this with a comment.
5.1.3: Following the referee's recommendation, the example script is now available online
5.2: We are not quite clear what the referee has in mind here. In particular, the various extensions to gradient-descent like stochastic gradient descent or the inclusion of a "momentum" do not appear to be relevant to Krotov's method.
5.3: These errors would seem to come from a mismatch of example notebook based on the most recent released version of the krotov package, and subsequent modifications on the master branch. We have modified the online documentation to show the latest released version by default, and to show a warning when a user is viewing documentation for an unreleased development version. To the extent that any of these problems are still present in the most recent version of the krotov library, we would ask the reviewer to open an issue on Github.
5.4: The example notebooks use a variety of packages (like "watermark") that are not dependencies of the krotov package. This is by design. However, the dependencies for the notebooks are included in the package's "development" requirements, which may be installed through "pip install krotov[dev]". We have added this to the manuscript and will address it in the online example notebooks.
"oct" in the name of the variables is from "Optimal Control Theory", and has slipped through as an acronym that we commonly use in internal code. However, it is admittedly jargon, and we have replaced "oct" with "opt", as in "opt_result" for the optimization result.
5.5.: We cannot reproduce these issues in the most current version of the package. To the extent that this is still a problem, we would ask the reviewer to submit an issue to Github.
6.1.1: There is no fundamental reason Krotov's method hasn't been implemented so far, just that the number of groups with sufficient expertise to do so is relatively small, and nobody has gotten around to it until now.
6.1.3: The DYNAMO library implements a method we would call "sequential GRAPE", which some parts of the quantum control community label "Krotov-type". It is quite different from the method we present here. We have added a detailed discussion of this, and specified Krotov's method in far greater detail.
6.1.4: Cython would certainly be an option. In particular, it may be useful to improve the parallelization of the Krotov implementation by releasing the global-interpreter-lock. However, apart from parallelization, we suspect that the most effective place for Cython would be in implementing an efficient propagator. We are currently working on this. The numerical effort of the optimization is dominated by the propagation, and it is not clear whether cythonizing Krotov further than that will lead to a substantial improvement. We will explore this in more detail in the future.
6.1.5: We have added a sentence about open quantum systems and NISQ in the Introduction.
6.2.1: We now mention trapped ion quantum computers in the Introduction.
6.2.2: The DYNAMO package does not implement Krotov's method, but a "Krotov-style" version of sequential gradient ascent (see 6.1.3).
6.2.3: We now introduce the Schrödinger equation in more detail.
6.3.1: We discuss the control fields in the new section 2.1. This should also clarify the role of the index "l". Indeed, the number of controls is not related to the number of states, but to the physically independent control fields. In the case of complex-valued control, e.g. in a rotating frame, there are two controls (real and imaginary part) for each physical control.
6.4.1: We are not sure what constitutes "optimal use of classes", although we appreciate the assessment.
6.4.2.: We have spelled out the name for QuTiP in the revised version of the manuscript.
6.4.3.: We are aware of the usual conventions, but we feel that in numerical code it is sometimes best to relax them. Thus, we deem capitalized single-letter or Greek variable names acceptable if they correspond to uppercase mathematical variables, or lowercase class names for classes that represent lowercase mathematical variables (as is the case here).
6.4.4: We appreciate the referee's comment. We have extended the flowchart to include a wider range of methods.
6.4.5: The size of the Hilbert space that is still numerically feasible is largely determined by the efficiency of the propagation routine. Based on our experience with a Fortran implementation of Krotov's method, Hilbert space dimensions greater than 10000 become challenging. For the krotov package with a propagator in pure Python, this limit will be significantly lower, probably around 100. We are not sure yet how far the limit can be pushed by implementing the propagator in Cython.
6.4.6: We now mention the Appendix with the installation instructions early in the paper.
6.5: We have added a Conclusions section.
6.6: We have moved the example to the main text and expanded the discussion. For more advanced examples, we would refer the reader to the online documentation.
6.7: We intend the manuscript as the "citation object" for use of the package, and feel that assigning a DOI to a specific release of the package might compete with that.
List of changes
We have thoroughly revised the manuscript in its entirety, see the response to the referee reports.
Published as SciPost Phys. 7, 080 (2019)
Reports on this Submission
Report #1 by Nathan Shammah (Referee 2) on 2019-10-24 (Invited Report)
- Cite as: Nathan Shammah, Report on arXiv:1902.11284v4, delivered 2019-10-24, doi: 10.21468/SciPost.Report.1228
Report
The authors have greatly improved the already important manuscript, beginning from the abstract, from content extensions to clearer explanations, also thanks to an improved structure of the paper, and a stronger and more engaging narrative.
I appreciate their assessment of the points raised and I agree with their replies to my comments and questions. I am glad that point 6.4.4 and other suggestions have been implemented, to the benefit of the paper, I believe.
Section 2.3 considerably improves the clarity of the paper. The Conclusions make the article's perception more impactful than in the previous version.
Therefore, I strongly advise the paper for publication in SciPost Physics.
* Comments on replies and new comments*:
- I am convinced by all points made by the authors, and in retrospect also with 6.7 on the ambiguity of an additional DOI with Zenodo: pip versioning will serve the scope of code "versioning".
- A comment on why Blackman shape is a popular choice might be nice.
- When discussing Fig. 1, the methods (or attributes) on `opt_dynamics` and else used to plot both the converging and final results of the populations and control pulses should be mentioned explicitly for clarity.
- I think $N$ is used in different ways in Sec. 2.2 and Sec. 3.3, this could be uniformed.
- In the tree of Fig. 2 and main text, the fact that $n$ could be explicitly defined as the number of control parameters.
When I read in the first bullet point of Sec. 4.4 that
"Some potential benefits of Krotov’s method compared to GRAPE are [69]:
• Krotov’s method mathematically guarantees monotonic convergence in the continuous- time limit. There is no line-search required for the step width 1/λa,l."
it seemed like a big deal, a bit hidden at page 20.
This fact is indeed mentioned in the Introduction, "A popular representative of concurrent update methods is GRadient Ascent Pulse Engineering (GRAPE) [21], whereas Krotov’s method, which comes with the advantage of guaranteed monotonic convergence, requires sequential updates". It is only a suggestion, but I think that even only reshuffling the syntax of this introductory sentence may make this very nice feature stand out a bit more, engaging more the reader. Also non-experts in quantum optimal theory may know of GRAPE, so this may be a point to underscore when introducing the lesser known Krotov method. Just a suggestion.
Response to the Reply to 5.2:
In section 4.2 of the new manuscript the authors actually mention something similar to what I had in mind with that comment:
"In all its variants [5,22,46–48], Krotov’s method is a first-order gradient with respect to the control fields (even in the second-order construction which is second order only with respect to the states). As the optimization approaches the optimum, this gradient can become very small, resulting in slow convergence. It is possible to extend Krotov’s method to take into account information from the quasi-Hessian [69]. However, this “K-BFGS” variant of Krotov’s method is a substantial extension to the procedure as described in Appendix B, and is currently not supported by the krotov package."
I simply meant that it could be possible to have such kind of extensions (K-BFGS) where the user could choose addition techniques to get some advantage while running Krotov.
As they mention, it is beyond the scope of this paper and was merely a comment.
The code snippet in Section 2 run fine on my Mac.
Just a suggestion, or idea: in the table that is printed when running `krotov.optimize_pulses`, words could define the quantities at the beginning of the columns. Like, over
"J_T" have "final t. functional", in the next column "running cost", or similar, and so on. For an experienced krotov user this is trivial, but maybe for a newcomer it would help grasp what's happening.
Minor comments:
- The Liouvillian is not explicitly introduced in Eq. (3), and it could be mentioned it is not a Hermitian matrix.
- Just as a suggestion, maybe the initial and target states expressions, present in Eq. (6), could be introduced in the relevant main text discussion occurring after Eq. (3), for redundancy between the general concept of optimal control and Krotov's.
- Now, after the equation reshuffling of Sec. 2.1, the shape function $S(t)$ is not defined when it is first introduced, immediately after Eq. (8).
- the small equal sign under the braces is a bit confusing and maybe not necessary
- The frequency $\omega$ and control pulse $\epsilon(t)$ could be introduced in Section 2.3 also before the code block, even if the control pulse concept has been introduced in the previous sections.
- After Eq. (12) on page 10, it would be clearer to cite `Results` as `qutip.Results`, otherwise it seems the Results are a krotov class. I am a bit confused here, because looking at the code, there is a result.py file and a Result class defined therein, but running help(krotov.Result) prompts an error message; indeed help(opt_dynamics) tells it's a qutip.Result object from the mesolve dynamics.
- In references, Ref. 14 is missing volume and page, Ref. 29 "rydberg" needs capital R.
Just an idea (not something to address in resubmission): the print message during the optimization, instead of having start and end date and time, could just report the total running time.
Report #2 by Anonymous (Referee 3) on 2019-10-17 (Invited Report)
- Cite as: Anonymous, Report on arXiv:1902.11284v4, delivered 2019-10-16, doi: 10.21468/SciPost.Report.1237
Strengths
1. The Krotov method is well-known in the optimal control community.
2. The addition of a high-quality open-source implementation is very welcome and could be of use to a large audience of both optimal control theoreticians and experimentalists.
3. The manuscript has been significantly revised and improved.
Weaknesses
The manuscript is still short of a paragon of clarity.
Report
The Authors have made significant changes to the manuscript, and as a result it is significantly improved.
It now clears the bar for publication.
No further changes are required prior to publication.
Requested changes
I would suggest the Authors consider tweaking the first page of section 2.2 (the discussion prior to eq. 6), as it is still not as clear as it can be. For example eq. 5 and the surrounding discussion
would probably be better if they appear after eq. 8. This is optional, and left to the Authors' discretion.
Author: Michael Goerz on 2019-12-05 [id 666]
(in reply to Report 2 on 2019-10-17)We greatly appreciate and agree with the referee's suggestion to improve the clarity of the manuscript. In the final revision (<https://arxiv.org/abs/1902.11284v5>), we have rearranged the discussion, and split Section 2.2 into a new Section 2.2 and 2.3.
Author: Michael Goerz on 2019-12-05 [id 665]
(in reply to Report 1 by Nathan Shammah on 2019-10-24)We thank the reviewer for another detailed and constructive review, and for their recommendation to publish.
We have revised the manuscript on the ArXiV (https://arxiv.org/abs/1902.11284v5) addressing all remaining issues according to the reviewer's suggestion. The full list of changes (including those in response to the second referee report) is the following:
As a further comment towards specific points:
In this case, only the fact that the Blackman shape goes exactly to zero at the beginning and end is important. Without that, in each iteration, the initial and final value of the control might shift away from zero very slightly.
We now mention the attribute involved, but the full details (including the somewhat lengthy code to generate the plot) is best presented in the corresponding notebook in the online documentation (which we now point to in the paper).
Indeed, and we have now switched N to M in Sec 3.3.
We might consider this in a future version. For now, the output matches exactly what users of our existing Fortran QDYN package are used to.
We have added the running time in the latest release of the
krotov
package in addition to the start/end time stamps.