Polynomial filter diagonalization of large Floquet unitary operators

Periodically driven quantum many-body systems play a central role for our understanding of nonequilibrium phenomena. For studies of quantum chaos, thermalization, many-body localization and time crystals, the properties of eigenvectors and eigenvalues of the unitary evolution operator, and their scaling with physical system size $L$ are of interest. While for static systems, powerful methods for the partial diagonalization of the Hamiltonian were developed, the unitary eigenproblem remains daunting. In this paper, we introduce a Krylov space diagonalization method to obtain exact eigenpairs of the unitary Floquet operator with eigenvalue closest to a target on the unit circle. Our method is based on a complex polynomial spectral transformation given by the geometric sum, leading to rapid convergence of the Arnoldi algorithm. We demonstrate that our method is much more efficient than the shift invert method in terms of both runtime and memory requirements, pushing the accessible system sizes to the realm of 20 qubits, with Hilbert space dimensions $\geq 10^6$.


Introduction
Periodically driven, Floquet quantum many-body systems host fascinating nonequilibrium phenomena [1,2]. While they generically relax to a featureless state [3,4] independent of the initial state, they can undergo nonequilibrium phase transitions, such as the many-body localization transition [5][6][7][8][9][10]. In this context, Floquet systems are often cleaner counterparts of Hamiltonian systems, capturing the essence of these phenomenona, due to the absence of an energy structure and a uniform density of states. This is for example useful for high quality tests of the eigenstate thermalization hypothesis [4,[11][12][13][14][15][16][17].
Many questions, in particular in the context of the many-body localization transition, rely on studying the scaling of the properties of eigenvectors of the unitary one period evolution operator U with system size. While in static systems powerful methods like shift-invert diagonalization [33,34] and polynomial filter diagonalization [35][36][37][38] were developed for finding interior eigenpairs of a large hermitian and sparse Hamiltonian up to Hilbert space dimensions of 10 7 , the case of the dense unitary eigenproblem remains daunting.
Although progress was made for the special case deep in the many-body localized phase based on matrix product state variants of the shift invert technique [39,40] for the Floquet operator [41], a general purpose method which can go beyond full diagonalization of the unitary matrix U , limited to system sizes of about L = 14 . . . 16 qubits is yet missing.
In this paper, we introduce a new method using a spectral transformation given by the geometric sum g k (U ) of order k. The spectral transformation is a complex polynomial of U and an efficient matrix vector product g k (U )|ψ can be defined, provided there is an efficient matrix vector product U |ψ . This is the case in local Floquet systems (e.g. in a matrix product operator formulation). The spectral transformation is designed to enhance the absolute value of eigenvalues of U close to an arbitrary target on the unit circle, and to reduce the absolute value of all other eigenvalues, thus allowing rapid convergence of the Arnoldi algorithm to the requested eigenpairs of g k (U ).
We show that this procedure is effective and can be carried out with a low memory footprint, compared to dense shift-invert or full diagonalization. Effectively, it gives access to system sizes up to L ≥ 20 qubits, while full diagonalization is limited to L ≈ 15.
This computational advantage makes extensive finite size scaling studies of Floquet MBL systems possible, and can help to make progress on the recent debate on the stability of many-body localization in the thermodynamic limit [42][43][44][45][46][47][48][49], which highlights the importance of finite size effects.

Model
To investigate the performance of our method, we consider a simple generic model for a time periodic one dimensional quantum many-body system of L qubits, with a Hilbert space of dimension d = 2 L . The model is designed such that it is ergodic, with highly entangled eigenstates, and is not tractable by alternative methods, e.g. tensor network techniques [41].
The evolution operator U over one driving period is given by a two layer brickwork circuit, composed of two site unitaries.
Each of the boxes in Eq. (1) represents a unitary u i,i+1 ∈ C 4×4 , acting on qubit i and its right neighbor (i + 1). At the boundaries, we fill in single qubit unitaries u 1 , u L ∈ C 2×2 if needed. We sample all unitaries randomly from the uniform measure on the unitary group [50]. Our construction ensures that each link in the chain of qubits is represented by a 4×4 unitary, corresponding to generic 2-qubit interactions.
We can express U in terms of the two layers U a and U b (for even L): It is important to note that the unitary matrix U is dense in the computational basis. To construct an efficient matrix vector product |ψ ← U |ψ , we split the circuit in a left part U L ∈ C d L ×d L (red tensors in Eq. (1)) and a right part The Floquet operator is then decomposed into U = (U L ×1)(1×U R ), and we can calculate U |ψ by two matrix products, first calculating Note, that here ψ d L ×d/d L and ψ d/d R ×d R are two different reshapings of the vector |ψ into matrices of dimensions indicated in the subscript. The advantage of this procedure is that instead of the very large matrix U ∈ C d×d , we only need to store two much smaller matrices U L and U T R , and we have expressed the matrix vector product in terms of two matrix products, with efficient memory access per floating point operation.
This decomposition is specific to brickwork circuits, but for general one dimensional local Floquet problems efficient matrix free matrix vector products can be formulated, for example based on a matrix product operator formulation of U [41]. The matrix product operator is guaranteed by locality to have a constant bond dimension due to the area law of the operator entanglement entropy of U [51,52].

Method
Our goal is to calculate a subset of the eigenpairs {ω n , |n } of the large unitary matrix U in such a way that we obtain all n ev eigenpairs with eigenvalue ω n closest to a target z tgt ∈ C. The eigenvalues ω n of U lie on the complex unit circle, |ω n | = 1 and can therefore not be separated by magnitude, which is necessary for the convergence of Krylov space diagonalization techniques. To achieve this, spectral transformations f (U ) can be used to transform the eigenvalues ω n → f (ω n ), while leaving the the eigenvectors |n invariant. If the magnitude |f (ω n )| is large for eigenvalues ω n close to the target and small otherwise, e.g. the Arnoldi algorithm can be used to calculate the eigenpairs of interest of f (U ), while only the matrix vector product |ψ ← f (U )|ψ is needed.
One of the most effective spectral transformations is the "shift and invert" transformation which provides excellent convergence of the Arnoldi algorithm if the target is chosen on or close to the unit circle. The downside of f sinvert is that for the matrix vector product (U − z tgt 1) −1 |ψ , an inversion is involved. Due to the dense spectrum of U , the condition number of U − z tgt 1 is exponentially large in system size and therefore only a direct solution using a LU decomposition of the dense matrix U − z tgt 1 can be used, with a memory complexity O(d 2 ) and runtime complexity O(d 3 ). We provide benchmark results of this technique in Tab. 1 for comparison. Polynomial spectral transformations are useful, because they do not suffer from large memory requirements since any power of U can be applied to a vector |ψ by repeated matrix vector products U k |ψ = U (U . . . (U |ψ )). Generally, the polynomial of degree k can be efficiently multiplied onto a vector to obtain p k (U )|ψ .
We argue here that an effective complex polynomial spectral transformation to single out eigenpairs with eigenvalue closest to z tgt = e iφtgt is given by the geometric sum  This polynomial maximizes the absolute value of eigenvalues close to the target and has minimal modulus outside the target region. We conjecture here, that it is in fact the optimal choice, although we have checked this only numerically. The phase factor can be understood by noticing that multiplication by e −iφtgt rotates the eigenvalues of U closest to e iφtgt to the proximity of 1.
The geometric sum g k (U ) has the closed form (if ω n = 1) and has some similarity with f sinvert . Fig. 1 illustrates the mapping of the unit circle by g k (z). A number e iφ on the unit circle is mapped onto The left panel shows the transformed unit circle under g k (z) on the complex plane, while the right panel depicts |g k (e iφ )| as a function of the phase angle φ. In the limit φ → φ tgt , g k (e iφ ) is on the real axis and given by k + 1. This is the place with maximal modulus. If we tune φ away from φ tgt , g k (e iφ ) moves away from this point, and the modulus decreases, until we reach g k (e iφ ) = 0 for φ − φ tgt = ± 2π k+1 . If φ tgt − 2π k+1 ≤ φ ≤ φ tgt + 2π k+1 , we therefore get the "apple" shaped outer line in the left panel of Fig. 1, while the rest of the circle is compressed into the inner spirals. The target arc satisfying this condition is shown in red in Fig. 1. It is noteworthy that the target arc is not only strongly enhanced in magnitude by g k , but also shows the largest separation of eigenvalues on the complex plane. These features are important for a rapid convergence of the Arnoldi algorithm.
Quantum many-body Floquet systems typically have a uniform eigenvalue density on the unit cirle. Therefore, on average for a fixed polynomial order k, the 2d/(k + 1) eigenvalues closest to z tgt will be mapped to the outer "apple" line by g k . 4 Arnoldi algorithm for g k (U ) The Arnoldi algorithm [53] is a generalization of the Lanczos method [54] to nonhermitian matrices. It is a numerically stable variant of the power iteration, and iteratively builds an orthonormal basis {v j } of the Krylov space span |ψ , g k (U )|ψ , g 2 k (U )|ψ . . . g ncv−1 k |ψ of dimension n cv , starting from a random initial vector |ψ . Raising g k to high powers in this process filters out components of eigenvectors of g k which are in the target space (i.e. whose eigenvalues of g k have a large magnitude and are therefore enhanced). At the same time, g k (U ) is projected into the Krylov space by the matrix V ∈ C d×ncv with columns given by v j , yielding the upper Hessenberg matrix The eigenvalues λ i and eigenvectors x i of H m yield the Ritz pairs (λ i , V x i ), which are approximations of the eigenpairs of g k (U ). If the dimension of the Krylov space n cv is sufficiently large, the Ritz pairs will converge with the number of Arnoldi iterations. Typically, λ i with the largest magnitude converge first, and therefore an effective spectral transformation is important. One should not expect to converge all n cv Ritz pairs, but rather chose a maximal size of the Krylov space n cv > n ev , if n ev eigenvalues are required. Once n ev Ritz pairs are converged, we obtain excellent approximations for eigenpairs of g k (U ). The eigenvectors |n of g k (U ) are also eigenvectors of U . The eigenvalues λ n of g k (U ) are related to the corresponding eigenvalues ω n via g k (ω n ) = λ n . Rather than solving this equation by root finding, we calculate them from ω n = n|U |n . This has the advantage that we can at the same time check the residuals 1 r n = U |n − ω n |n 2 as a measure of eigenpair quality. 1 We note here in passing that for very high orders of filtering polynomials the numerical precision of applying g k (U ) may be insufficient. In this case, one can perform one iteration of the Ritz method by diagonalizing Anm = ñ|U |m with approximate eigenvectors |ñ of g k (U ) to improve the precision of Ritz pairs of U . We did however not observe such problems in practice.
We pursue here the following strategy: Since the eigenvalues on the outer "apple" line of g k (z) are well separated from the rest of the spectrum, it is advisable to use a Krylov space dimension n cv = 2d/(k + 1) (in practice, one can take it about 80% smaller as we will see below). If we want to calculate n ev eigenpairs, we furthermore use n cv > 2n ev . We use the reference implementation of the implicitly restarted Arnoldi method as provided by arpack [55].
For large system sizes, the runtime of the algorithm is dominated by matrix vector products g k (U )|ψ . We can therefore estimate the computational cost for the matrix vector product proposed in Eq. (3). A single matrix vector product, rephrased as two matrix matrix products of matrices of dimension 2 L/2 has a cost of O(2 3L/2 ). For a fixed number of required eigenpairs n ev , we use an exponentially large polynomial order k = 0.8 · 2 L+1 /n cv , and therefore a single matrix vector product g k (U )|ψ , requiring k matrix vector products involving U , has an asymptotic cost of O(2 5L/2 /n cv ). In the Arnoldi algorithm, we need at least n cv of these matrix vector products, and hence the overall asymptotic runtime complexity is O(2 2.5L ). We note that in the benchmarks in Sec. 5 this asymptotic complexity is only approximately visible, since even for the largest system sizes, CPU specific effects like cache sizes play a role.

Benchmarks
To find the optimal algorithmic parameters k and n cv , we perform benchmark calculations to obtain n ev = 50 eigenpairs closest to z tgt = 1 of Floquet random circuits from Eq. (1) of length L = 12, 14, 16 for a range of Krylov space dimensions n cv and polynomial orders k. The results are shown in Fig. 2. The black dashed line corresponds to the n cv = 2 L+1 /k, where the Krylov space dimension is equal to the number of eigenvalues on the outer apple line in Fig. 1. The colormap shows the obtained runtimes in seconds. There is a distinct minimum visible in the runtime, when the Krylov space dimension is about 0.8 · 2 L+1 /k (red dashed) for large n cv . This corresponds to the number of eigenvalues with larger modulus than the rest of the spectrum and is therefore the optimal size of the Krylov space for each k. We observe a tendency that for larger sizes, larger Krylov space dimensions are better. And we are therfore using n cv = 2 L/2+1 , k = 0.8 · 2 L+1 /n cv in the following benchmarks for larger sizes.
For an assessment of the performance of our method in comparison with the state of the art, we carry out a calculation of 50 eigenpairs of U with eigenvalues closest to 1 using (i) full diagonalization of U using the zgeev Routine from MKL, (ii) shift invert diagonalization based on the LU decomposition of U − 1 using the zgetrf Routine from MKL in combination with arpack's Arnoldi algorithm (iii) our new geometric sum filtered diagonalization using arpack's Arnoldi algorithm.
We measure the total runtime of the calculation as well as the memory footprint and show the results in Tab. 1 and Fig. 3. We also calculate the maximal residue max 50 n=1 U |n −ω n |n 2 to check the quality of the obtained eigenpairs. Fig. 3 reveals that the scaling of all techniques is exponential in system size L due to the nature of the problem. However, with a fixed runtime, geometric sum filter diagonalization yields eigenpairs of systems about 4 sites larger, i.e. for Hilbert spaces about 16 times larger. Due to the very small memory footprint, system sizes up to about L = 20 are therefore reachable, which would require about 50 TiB of memory with shift-invert. Despite the use of quite large polynomial orders, the algorithm is stable and yields eigenpairs of excellent quality with residuals about one order of magnitude smaller than obtained from full diagonalization. As an example of how the obtained eigenstates of the unitary U can be investigated, we show in Fig. 4 the entanglement entropy as a function of the subsystem size L A . For this, we cut the system into a left half (subsystem A) consisting of L A qubits, and a right half (subsystem B) with L − L A qubits. Due to the open boundaries we use in this system, there is only one cut between the two subsystems. For each eigenstate |ψ , we can then calculate the entanglement entropy between the two subsystems, given by Here, s 2 i are the eigenvalues of the reduced density matrix ρ A of subsystem A, which can be calculated also by determining the singular values s i of the wave function reshaped as a matrix ψ 2 L A ×2 L−L A , where the number of rows and columns reflect the dimensions of the Hilbert spaces of the respective subsystems. It is clear in Fig. 4 that the entanglement entropy follows a volume law, it is extensive as a function of subsystem size, almost up to L A = L, at which point the entropy decreases due to the symmetry S A = S B . This clean scaling is expected, since our circuit (1) is designed to be ergodic. Since we are dealing with a Floquet system, all eigenstates are "infinite temperature" states, with volume law entanglement. This is why this system is not amenable to tensor network techniques for calculating exact eigenstates, since the required bond dimension would scale exponentially with system size. We can go one step further and compare these results quantitatively to the expected entanglement entropy for a cut of the system into two parts of size L A and L B (L A + L B = L) for random pure states. Page [56] conjectured this entropy to be for L A ≤ L B and this was later proven to be correct [57,58].  The red dashed lines in Fig. 4 show the expected entropy (12), revealing a perfect match. This confirms the expectation, that in the random circuit model (1), the eigenstates are maximally chaotic.

Conclusion
We have shown that the geometric sum is an effective polynomial filter to obtain interior eigenpairs of local Floquet unitary operators. Due to the locality, an efficient matrix vector product U |ψ can be defined and the geometric sum can be efficiently applied to any wave function. This allows the application of the implicitly restarted Arnoldi algorithm for finding eigenpairs closest to an arbitrary target on the complex unit circle.
Although the overall exponential scaling of the problem remains, the method has a moderate memory footprint compared to full diagonalization and shift-invert, and is roughly one order of magnitude faster than shift-invert, making systems of L ≥ 20 qubits accessible.
We note that a large fraction of the runtime of the algorithm is spent in the matrix vector product due to the high order of the polynomial, and that runtimes can be reduced significantly by optimizing it.  Table 1: Average runtime and memory consumption for the calculation of n ev = 50 eigenpairs close to z tgt = 1 of our proposed geometric sum polynomial filter method with n cv = 2 L/2+1 and the optimal polynomial order k = 0.8 · 2 L+1 /n cv (cf. Fig. 2) in comparison with full diagonalization using MKL's zgeev, as well as a custom shift invert implementation using MKL's zgetrf. Runtimes were measured using 16 cores of an AMD EPYC 7H12 2.6 GHz CPU. Memory usage is estimated from the maximum resident set size (max RSS). Memory footprints are only indicative since our simple benchmark codes were not optimized for memory usage.