The paper is nice and well written. Just a few comments
1) I do not like the fact the nomenclature of ``reinforcement learning'' for the optimization technique. Even though this is a fancy name, the optimisation is just standard, according to the Monte Carlo community (see ref.26).
For the real-time evolution a VMC approach was proposed in Scientific Reports 2, 243 (2012) for a Bose-Hubbard model. I think that this paper should be cited.
2) Are the variational parameters real of complex for the static calculations?
Is the Marshall sign imposed?
3) The standard way to define the energy accuracy is to normalize |E_vmc-E_0| by E_0 and not by E_vmc.