Quantum hypothesis testing in many-body systems

One of the key tasks in physics is to perform measurements in order to determine the state of a system. Often, measurements are aimed at determining the values of physical parameters, but one can also ask simpler questions, such as"is the system in state A or state B?". In quantum mechanics, the latter type of measurements can be studied and optimized using the framework of quantum hypothesis testing. In many cases one can explicitly find the optimal measurement in the limit where one has simultaneous access to a large number $n$ of identical copies of the system, and estimate the expected error as $n$ becomes large. Interestingly, error estimates turn out to involve various quantum information theoretic quantities such as relative entropy, thereby giving these quantities operational meaning. In this paper we consider the application of quantum hypothesis testing to quantum many-body systems and quantum field theory. We review some of the necessary background material, and study in some detail the situation where the two states one wants to distinguish are parametrically close. The relevant error estimates involve quantities such as the variance of relative entropy, for which we prove a new inequality. We explore the optimal measurement strategy for spin chains and two-dimensional conformal field theory, focusing on the task of distinguishing reduced density matrices of subsystems. The optimal strategy turns out to be somewhat cumbersome to implement in practice, and we discuss a possible alternative strategy and the corresponding errors.


Introduction
The purpose of this work is to i) introduce and review quantum hypothesis testing for readers with a background in quantum field theory and many-body theory, ii) develop some new results in a perturbative setup, and then iii) apply the tools to distinguish in particular two reduced density matrices in a subsystem of a quantum many-body system.
We begin with some background motivation. An elementary quantum task is to distinguish between two quantum states. Recently there has been much effort to study this question in quantum field theory and many-body theory, and to develop methods to compute various quantum information theoretic distinguishing measures analytically. A particularly interesting case is a large or infinite system in two different global states viewed from a small subsystem. The problem is then to distinguish the two reduced density matrices (RDMs) resulting from a partial trace over the complement of the subsystem. For this problem, critical systems modeled by conformal field theories have offered a fruitful arena for analytic progress. Additional motivation for studying conformal field theories comes from the connections between quantum information and gravity. In this context, a famous issue is the state of Hawking radiation escaping from an evaporating black hole: how can one detect in subsystems the subtle quantum correlations between radiated quanta at different times, to distinguish a conjectured pure state of radiation from something resembling thermal radiation?
In quantum field theory and many-body theory, there has been much progress in studying wellknown distinguishing measures both analytically and numerically. For example, in the context of conformal field theory and critical lattice models, there are studies of fidelity F (ρ, σ) [1,2], relative entropy S(ρ σ) [2][3][4][5][6][7], generalized divergences [8][9][10][11][12][13] and trace distance D(ρ, σ) = 1 2 ρ−σ [14,15]. In this work, our focus is instead to distinguish two states by measurements. We begin with three remarks: i) a rigorous framework for the task is quantum hypothesis testing, ii) many results obtained for relative entropy and generalized divergences can be embedded in this framework, giving them an operational interpretation, and iii) hypothesis testing also suggests an optimal measurement protocol to minimize the error in distinguishing two states. We are thus lead to study how quantum hypothesis testing can be implemented in many-body theory and quantum field theory.
Quantum hypothesis testing builds on the classical theory of hypothesis testing, which is a cornerstone of statistical analysis and the scientific method. Borrowing terminology from the classical theory, one may want to test whether the system is in a state ρ called the null hypothesis, thought of as the "background", or in another quantum state σ called the alternative hypothesis, which is the "signal" that one desires to detect. The framework of quantum hypothesis testing then provides rigorous estimates for the probabilities of the errors of mistaking the two states in an asymptotic limit of many measurements 1 . Here, it is important that by "many measurements" we mean simultaneous measurements on many copies of the system, as opposed to performing a sequence of individual measurements on independent single copies of the system. The error probability estimates involve various quantum information theoretic quantities, which depend on the details of the quantum hypothesis testing protocol. For example, for the case of so-called asymmetric testing, the error estimate involves the relative entropy as well as the relative entropy variance between the two states; both measures can be obtained from generalized divergences.
Quantum hypothesis testing has numerous applications in quantum information science, such as quantum illumination [18][19][20], entanglement-assisted communication [21], and the analysis of environment-parametrized quantum channels [22,23], to name a few. In particular, there are rigorous studies of particular quantum hypothesis testing protocols to distinguish states in spin chains, see e.g. [17,24,25].
Here, we are interested in connecting various mathematical results about hypothesis testing to implementations and applications of hypothesis testing in models at criticality with an emphasis on distinguishing reduced density matrices of subsystems associated to different global states. For example subsystems of free fermion chains have been extensively studied in the context of entanglement, because subsystem reduced density matrices are determined analytically by two-point functions [26][27][28][29]. The analytic tractability allows one to study for example entanglement spectra [30,31] and entanglement entropies of subsystems [32] (see also [33,34] for reviews). Distance measures such as relative entropy and Rényi divergences have also been explored [35,36].
We now summarize the main results of this work, which is divided in two parts. In the first part of this paper, we consider quantum hypothesis testing for general systems and develop a perturbative approach to hypothesis testing. Many applications often involve a setup where the two global states are parametrically close, as functions of one parameter (such as the ambient temperature). In that case it is natural to use a perturbative expansion to approximate two neighboring states. After giving a general review of quantum hypothesis testing in section 2, we study error probability estimates combined with a perturbative approach in section 3. The relevant error estimates involve the perturbative expansions of relative entropy and relative entropy variance, with leading terms appearing at second order. To examine the behavior of the error estimate, we study the relative size of these leading terms. In doing so, we find a universal result, a lower bound for the ratio of the two terms, applicable for any system in the perturbative setting. The result also allows us to develop a new joint perturbative bound on the two types of errors.
In section 4, we discuss and compare different types of measurements. We argue that independent (i.e. factorized) measurements perform poorly in general. We review the optimal measurement described in [37], which saturates the theoretical error bound. This measurement turns out to be rather difficult to describe explicitly. As an alternative, we consider a simpler but suboptimal measurement, the likelihood ratio (or Neyman-Pearson) test, which is easier to describe and performs rather well.
In the second part of this work, we implement these measurement protocols in quantum systems of increasing complexity: a single qubit, Gaussian fermion chains and finally two-dimensional conformal field theories.
We consider the qubit in section 5 and we construct the optimal measurement. Surprisingly, an explicit description is difficult as it leads to a challenging combinatorial problem, involving Krawtchouk polynomials and related to the Terwilliger algebra of the Hamming cube. This motivates the simpler likelihood ratio test, which can be described explicitly, and implemented with a quantum circuit given in Figure 5. Using numerical methods, we study the optimal measurement and compare it to the likelihood ratio test.
In section 6, we move on to spinless fermion chains with quadratic Hamiltonians. Motivated by hypothesis testing, we derive formulas for the relative entropy and the relative entropy variance in subsystems of free fermions (with only hopping interactions) at different temperatures. Then we present a prescription to compute overlaps between eigenstates of two different modular Hamiltonians of the same subsystem. The main technical tool is a generalization of Wick's theorem to correlators that involve Bogoliubov transformations [38,39]. The resulting overlaps allow the construction of the optimal measurement that distinguishes two thermal states by a local measurement. We find that in the simplest single fermion subsystem, the likelihood ratio test is optimal for distinguishing any two reduced density matrices, whereas for a two-fermion subsystem, it is not sufficient in general. In the XY model at finite temperature, for a two-fermion subsystem, the likelihood ratio test is again optimal.
We finally consider two-dimensional CFTs in section 7. We focus on states for which the modular Hamiltonian can be written as an integral of the stress tensor [40]. We construct optimal measurement protocols for subregions, using techniques of boundary CFT [41] to compute the necessary ingredients. This general framework can be applied to distinguish two thermal states from a subregion, and we study explicitly the case of the free fermion. We explain how to implement the optimal measurement, which is difficult to describe explicitly, and the simpler likelihood ratio test. We also consider the detection of a primary excitation on top of the vacuum, for which the likelihood ratio test can be implemented with a relatively simple procedure: by measuring one-point functions of the lightest operator interacting with the primary excitation.
We conclude with a discussion and some open questions, and summarize various useful properties and technical results in the appendices.
After the completion of this paper, related work studying various properties and applications of relative entropy variance (there called "variance of relative surprisal") from an information theoretic point of view appeared in [42].

Review of quantum hypothesis testing
In this section, we give a brief review of quantum hypothesis testing, to provide background for readers unfamiliar with this theory. In (binary) hypothesis testing, we have to choose between two hypotheses, the null hypothesis H 0 and the alternative hypothesis H 1 .
In the classical theory, the two hypotheses are associated with two probability distributions p(X), q(X) over the space Ω, and the problem is to discriminate between the two by a test T : Ω → I. If I = [0, 1], the test is randomized, if I = {0, 1}, the test is deterministic. The probability of detection for the hypothesis H 1 is then the expectation value E q [T ] = x∈Ω Q(x)T (x). If the test is deterministic, it is often expressed as an indicator function T = 1 H = 1 {x ∈ H} over an acceptance subset H ⊂ Ω.
In the quantum theory, H 0 and H 1 are two quantum states ρ and σ, and the test becomes an operator T = E 1 . More precisely the decision is made by measuring observables E 0 = A and E 1 = 1 − A which form a positive operator-valued measure (POVM), i.e. 0 ≤ E i ≤ 1 and i=0,1 E i = 1. In making a measurement, the probabilities of identifying the two states correctly are Tr(ρE 0 ) and Tr(σE 1 ), the latter being the probability of detection of the hypothesis H 1 . There are two ways to make errors, which are called of type I or type II. Type I error (false positive) corresponds to identifying H 1 while in fact H 0 is true. Type II error (false negative, missed detection) corresponds of choosing H 0 while H 1 is true. The probabilities of the two errors are given by α = Tr ρ(1 − A) (type I) , (2.1) β = Tr σA (type II) .
The objective of hypothesis testing is to find the best measurement which jointly minimizes the two errors. In this work we focus on the independent and identically distributed (i.i.d.) setting, and consider a joint measurement A (n) on n identical copies of the system, to discriminate between the states ρ ⊗n and σ ⊗n . The error probabilities then become n-dependent, α n and β n , given by α n = Tr ρ ⊗n (1 − A (n) ) (type I) , (2.2) β n = Tr σ ⊗n A (n) (type II) .
Quantum hypothesis testing addresses the question of the optimality of a measurement A (n) . The notion of optimality depends on the error optimization strategy. Symmetric testing optimizes the sum of the two errors, while asymmetric testing optimizes the type II error under the condition that the type I error remains bounded. 2 We review these two cases below.

Symmetric testing
In symmetric hypothesis testing, we treat the two types of errors equally and define the symmetric error 3 P n = 1 2 (α n + β n ) . (2. 3) The optimal measurement is obtained by minimizing P n over all possible measurements A (n) , where A (n) is a Hermitian operator satisfying 0 ≤ A (n) ≤ 1. We can define the minimum error as Tr ρ ⊗n (1 − A (n) ) + σ ⊗n A (n) . (2.4) The asymptotic behavior of this quantity is given by the quantum Chernoff bound [43], which says that where the quantum Chernoff distance is defined as We can see that − log Q s (ρ, σ) are proportional to the relative Rényi entropies defined by Petz [44]. As a result, symmetric hypothesis testing gives an operational meaning to these quantities. More precisely, their maximum for 0 ≤ s ≤ 1 gives the asymptotic exponent of the symmetric error It is also interesting that Q(ρ, σ) is related to other information quantities [43]. We have where T = 1 2 ρ − σ 1 is the trace norm distance and Q ≤ Q s=1/2 = Tr ρ 1/2 σ 1/2 ≤ F (ρ, σ) , (2.9) where F (ρ, σ) = ρ 1/2 σ 1/2 1 is the Uhlmann fidelity. If one of the states is pure, we have Q = Tr ρ σ. Q also satisfies the data-processing inequality (B.17).

Asymmetric testing
In this work, we will be interested in the asymmetric treatment of the two types of errors, which is the setting which gives an operational meaning to the relative entropy. In asymmetric testing, we require that the type I error is bounded, α n ≤ ε, and examine the asymptotic behavior of the type II error β n 4 . More precisely, we estimate the asymptotic behavior of the quantity where the infimum is taken over Hermitian operators A (n) satisfying 0 ≤ A (n) ≤ 1.
The asymptotic behavior of this quantity is given by the quantum Stein's lemma [45,46] which is the statement for any 0 < ε < 1. The relative entropy S(ρ σ) is defined as The quantum Stein's lemma shows that the type II error decays exponentially at large n with exponent given by the relative entropy, The asymptotic formula (2.11) was improved in [37,47] to subleading order. 5 The refined quantum Stein's lemma says that and involves the relative entropy variance 6 defined as 15) and the inverse Φ −1 of the cumulative distribution function of the normal distribution, In analogy with the quantum Chernoff distance, one can also define [48] the quantum hypothesis testing relative entropy for 0 < ε < 1. This quantity is another generalized divergence, satisfying the data-processing inequality [47]. In the rest of this work we will be focusing on asymmetric testing and the refinement of the quantum Stein's lemma (2.14).
The refined quantum Stein's lemma should be understood as a refined estimate of the asymptotic error of an optimal measurement. Following [37], it is useful to define the quantity This is the best type I error if we require that the type II error exponentially decays with leading exponent E 1 and subleading exponent E 2 . It is similar to β * n (ε) in that it measures the interdependence between the type II and type I errors. It is shown in [37] that an equivalent way to formulate the refined quantum Stein's lemma is to say that We see that the relative entropy S(ρ σ) acts as a threshold value for the leading exponent E 1 . Above the threshold, the type I error becomes uncontrolled and goes to one, while below the threshold, it can be made to vanish. The refined asymptotics become relevant when we are exactly on the threshold. On the threshold, we define and we have which varies smoothly from 0 to 1 when E 2 ranges from −∞ to +∞.

Single qubit example
We now consider a toy version of our problem: what would be the optimal measurement for a single qubit? This example gives a nice illustration of quantum hypothesis testing. Here, we only take a single copy of the system: we describe the "one-shot" measurement. As we will see, it can be formulated as a constrained optimization problem which has a simple geometrical interpretation.
We have a qubit in the two possible states ρ and σ and we would like to find the best Hermitian operator A with 0 ≤ A ≤ 1 to distinguish between these two states. In symmetric testing, we are minimizing the error 1 2 (α + β). In the asymmetric case, we are minimizing the type II error β under the condition that the type I error α is less than a given ε.
This can be formulated geometrically using a parametrization in terms of Pauli matrices. Defining the four-vector of 2 × 2 matrices σ = (σ 1 , σ 2 , σ 3 , 1), we write 22) in terms of two four-vectors a, b. From Tr ρ = Tr σ = 1, we have that a 4 = b 4 = 1. We parametrize the Hermitian operator A using a four-vector c as Figure 1: Geometrical problem for the one-shot optimal measurement of a qubit. We optimize over a vector c in R 4 and plot here the coordinates (c 1 , c 2 , c 4 ) (suppressing c 3 ). The condition 0 ≤ A ≤ 1 restricts c to lie in the gray diamond. Left: Symmetric testing. This corresponds to minimizing the product ( b− a)· c. The optimal vector c is the point on the black circle that is most opposite to b − a. Right: Asymmetric testing. This corresponds to minimizing β = b · c under the condition α = 1 − a · c ≤ ε, which restricts c to be above the green plane. The intersection of this plane and the boundary of the diamond and is the black circle, on which the optimal c must lie. In both cases, we show the optimal solution in red. The values chosen for these plots are a = (−0.3, 0.3, 0, 1), b = (0.5, 0, 0, 1) and ε = 0.1.
The type I and type II errors take the form The condition 0 ≤ A ≤ 1 gives 0 ≤ c 4 ≤ 1 and This defines a diamond in R 4 depicted in gray in Figure 1. Then, we have two different optimization problems corresponding to symmetric or asymmetric testing.
Symmetric testing. This is depicted in the left of Figure 1. Here, we have to find the vector c that minimizes ( b − a) · c under the condition that c lies inside the gray diamond. We can see that the optimal c lies on the circle corresponding to c 4 = 1 2 and c 2 1 + c 2 2 + c 2 3 = 1 2 (depicted in black). We can write down the solution explicitly as which is shown in red.
Asymmetric testing. This is depicted in the right of Figure 1. In this case, we have to find the vector c that minimizes β = b · c under two conditions: the requirement 0 ≤ A ≤ 1 forces c to lie inside the gray diamond and the constraint α ≤ ε implies that c must lie above the green plane. The optimal c is inside the intersection region where these two inequalities are saturated (shown in black) and is shown in red. It is also possible to write down explicit expressions for the optimal vector c by solving the quadratic equations that define it.

Perturbative hypothesis testing
In this section, we study quantum hypothesis testing in a pertubative regime. We consider the case where the alternative hypothesis and the null hypothesis states belong to a one-parameter family, and are perturbatively close. This setting is natural in many applications. We will derive a new joint bound on the type I and type II errors, and a universal lower bound on the ratio of the relative entropy variance to the relative entropy, for systems with a finite dimensional Hilbert space.
We are interested in a one-parameter family of states, with the two states related by the series expansion 7 where λ is a small parameter. This setting is natural in many applications of hypothesis testing. For example, consider the analysis of environment-parametrized quantum channels [20], where a system is interacting with an environment whose state is dependent on a parameter with unknown value.
As concrete examples, [20] studied thermal and amplifier channels, where the environment is a thermal state parametrized by the temperature. The problem then is to distinguish two channels with two nearby temperatures, differing by a small parameter λ.
Another motivation is to consider CFT reduced density matrices in subsystems in the limit where the subsystem size is perturbatively small. An example could be the eigenstate thermalization hypothesis, in which expectation values of reduced density matrices of high energy eigenstates appear close to thermal, and it is of interest to study how the system responds to changes in the ratio of the subsystem size to the global system size. Another setting is to study global thermal states reduced to a subsystem, and consider the dimensionless ratio of the subsystem size to the thermal wavelength as a parameter to vary. We study optimal measurements for such subsystems in section 7.

A perturbative bound on errors
The quantum Stein's lemma was derived by first proving a bound [45] and then showing that it can be achieved [46]. For the first part, the following bound was used: which holds for a general measurement A (n) and any n. This can be seen as a bound on how good a measurement can be. It characterizes the trade-off between the two types of errors: α n and β n cannot be made arbitrarily small at the same time.
The bound (3.2) can be seen as a "first order in n" bound that holds for a general measurement. We will now derive a "second order in n" bound that holds for a restricted set of measurements that are optimal at first order in n. This consists of all the measurements with errors satisfying the two conditions for some fixed choice of ε and E 2 . The refinement of the Stein's lemma implies that with saturation for the optimal measurement. In the notation of section 2.2, we have α n ≥ α * n (E 2 ) and β n ≥ β * n (ε), which implies that We can then use the asymptotic estimate to obtain the bound This is a bound on the measurements satisfying (3.3) and can be interpreted as a second order in n refinement of (3.2). It also characterizes the trade-off between the two types of errors, implying that we cannot make both α n and β n too small. Note that this also gives a bound on the LHS of (3.2) since we have (1 − α n )(− log β n ) ≤ log α n log β n . It becomes stronger than (3.2) for We now consider measurements satisfying (3.3) in the perturbative regime (3.1), taking ε and E 2 to be independent of λ, and we consider the perturbative version of the upper bound (3.7). As will be shown in the next subsection, the leading terms of both the relative entropy and the relative entropy variance are quadratic in λ: In the perturbative limit, we see that at leading order where we have restricted to E 2 < 0 for α * n (E 2 ) to be close to zero rather than close to one. Note that α * n (E 2 ) is non-perturbative in λ, which is a consequence of the fact that the variance becomes small in the perturbative λ → 0 limit. Because the estimate for α n is obtained using the central limit theorem, it has an error of order n −1/2 . As a result, we can trust the above result only in the regime where n is non-perturbatively large: where c is some positive constant. We can now consider the perturbative λ → 0 limit of (3.6) and we find Interestingly, this gives a finite answer in the λ → 0 limit. This implies the bound which holds on all measurements satisfying the conditions (3.3).
In the next subsection, we will obtain a general lower bound V (2) (ρ σ) ≥ 2S (2) (ρ σ) which is saturated when ρ and σ commute at first order in λ. This implies that the above bound becomes log α n log β n ≤ nE 2 It is interesting to note that this bound is universal in the sense that it is independent on the state. It is saturated for the optimal measurement if and only if ρ and σ commute at first order in λ.

Lower bound for the ratio
We will now prove a lower bound on the ratio V (ρ σ)/S(ρ σ) in the perturbative regime (3.1).
The relative entropy has the perturbative expansion with no linear term, because S(ρ σ) ≥ 0 with saturation at λ = 0. The perturbative relative entropy S (2) (ρ σ) is given by [49] 8 where L is the logarithmic derivative (3.16) Relative entropy variance has a similar expansion and the linear term vanishes again, since V (ρ σ) ≥ 0 with saturation at λ = 0. Then, where the perturbative variance is given by 9 Since perturbative relative entropy and variance have the same behaviours for small λ, their ratio is finite in the limit λ → 0: Our main result is the following universal lower bound for this ratio: Theorem 1. Let ρ(λ) be a one-parameter family of density matrices over a finite dimensional Hilbert space. Given the expansion ρ = σ + λρ (1) + λ 2 2 ρ (2) + O(λ 3 ), the ratio obeys the lower bound with an equality if and only if σ, ρ (1) = 0.
To prove the theorem, we need an expression for L in the eigenbasis of σ. Let the eigenvalues of σ be λ i . Then a generic function f (σ + X) has the following expansion in the eigenbasis of σ: Applying this to to log σ + λρ (1) , we can identify ij is also diagonal with eigenvalues λ (1) i , then L is diagonal with eigenvalues λ (1) i /λ i : where we used A(1) = 1. With these ingredients, we can prove theorem 1. We prove that Tr (σL 2 ) ≥ Tr (ρ (1) L) with an equality if and only if σ, ρ (1) = 0. Applying this inequality to V (2) (ρ σ) = 2 Tr(σL 2 ) then proves the lower bound. We emphasize that the proof is inherently finite dimensional and does not directly apply to infinite dimensional Hilbert spaces.
Proof. Assume σ, ρ (1) = 0. In the eigenbasis of σ, we can write where on the second line, we used (3.22). Using and relabeling the dummy indices i ↔ j, the second term can be written as is symmetric in i, j. Thus the second term in (3.26) can be written as We get Tr We also used A(1) = 1 in the diagonal term. As illustrated in Figure 2, it can be shown that Because of this and ρ (1) ij L ji > 0, when λ i < λ j , we get where the final equality follows by using the symmetricity of ρ (1) ij L ji . We finally get  An interesting question is whether there exists special classes of density matrices for which there is also a constant upper bound for the ratio (3.19). Such an upper bound would imply an upper bound for the perturbative variance by perturbative relative entropy. To gain more intuition, it is useful to study the lower bound (3.20) in explicit examples. At least in the simple examples studied next, no upper bound appears. 10

Single qubit
We consider a single qubit example for which the Hilbert space is two dimensional. A general initial density matrix σ has two eigenvalues which we parametrize as 1 2 + a and 1 2 − a with − 1 2 < a < 1 2 . Working in the eigenbasis of σ, we consider the following one-parameter family of states ρ(λ) = σ + λρ (1) : where a, λ ∈ R. The eigenvalues of ρ(λ) are and the positivity of p − requires that Hence we expect saturation of the lower bound when a = 0. Relative entropy and its variance can be explicitly computed for the states (3.36), but the expressions are quite complicated. For a = 0 they are so that the lower bound is saturated as expected in this case. For a = 0 we can expand the non-perturbative expressions of S, V or use the perturbative formulas (3.15) and (3.18) directly.
The results agree and are given by We find that the ratio obeys the lower bound with an equality if and only if a = 0 as required by Theorem 1. The ratio is depicted in Figure 3.

Maximally mixed initial state
In the above single qubit example, the lower bound is saturated when σ is proportional to the identity matrix, or in other words, when σ is maximally mixed. This should hold more generally for arbitrary perturbations ρ (1) in Hilbert spaces of dimension N ≥ 2, because the identity matrix commutes with all matrices. So let 1 N be the N -dimensional identity matrix and let σ = (1/N )1 N ≡ σ max be maximally mixed. To check saturation of the lower bound (3.20) we can use the fact that relative entropy and relative entropy variance generally reduce to von Neumann entropy S(ρ) and capacity 12 C(ρ) when σ = σ max : where ρ is arbitrary. Computing the expansions of von Neumann entropy and capacity explicitly using ρ = σ max + λρ (1) + O(λ 2 ), we find Combining with (3.48), we get as expected. 11 The proportionality constant is fixed by normalization to be the same. 12 By capacity we mean the quantity C(ρ) = Tr [ρ(log ρ) 2 ] − S(ρ) 2 , which for a reduced density matrix is known as the capacity of entanglement (other names include for example variance of surprisal and varentropy), see Appendix B.1. For a thermal state ρ β , it becomes the heat capacity C(β). 13 This is of course in agreement with the general definitions for V (2) (ρ σmax) and S (2) (ρ σmax).

Two thermal states
Let us consider two thermal states ρ 2 and ρ 1 of the form When the Hamiltonian H is quadratic in creation/annihilation operators, the states are Gaussian, so the result should reduce to the previously studied case in [20]. With a straightforward calculation, we obtain where all the terms involving logarithms of traces have cancelled. From this equation we recognize the heat capacity C(β 2 ) of a thermal state and we end up with a simple result (3.54) In the limit β 1 → 0, ρ 1 becomes a maximally mixed state, and the relative entropy variance reduces to the heat capacity, On the other hand, in the limit β 2 → ∞, ρ 2 reduces to the ground state, and the relative entropy variance vanishes (along with C(β 2 ) → 0). 14 Clearly, [ρ 1 , ρ 2 ] = 0 for all temperatures β 1 and β 2 so that the lower bound (3.20) should be saturated for temperature perturbations β 2 = β 1 + λβ (1) + O(λ 2 ). We can check this explicitly. Relative entropy is given by which expanded to second order in λ gives where C(β 1 ) is the heat capacity of the initial thermal state ρ 1 . Because (β 2 − β 1 ) 2 is second order in λ, we can just replace C(β 2 ) by its initial value C(β 1 ) to obtain variance of relative entropy (3.54) at order O(λ 2 ). We get which saturates the bound (3.20). Interestingly, non-perturbative relative entropy variance between two thermal states turns out to be proportional to the capacity of entanglement (3.54). This might have implications for thermodynamics of AdS black holes in the AdS/CFT correspondence where the holographic dual of the capacity of entanglement is known [50,51]. However, the holographic dual of relative entropy variance is not yet known, but further results in this direction will be reported in upcoming work [52].

Relation to parameter estimation
The framework of perturbative asymmetric hypothesis testing is related to parameter estimation and quantum Fisher information [53]. Quantum parameter estimation is the problem of determining the value of a parameter λ appearing in a density matrix ρ(λ) by performing n independent measurements of an observable E(x). For each measurement, the probability of the outcome x is Denoting the outcomes of n measurements by x i , which are random variables, an estimator is a function λ est = λ est (x 1 , . . . , x n ) used to estimate λ from the data {x i }. Suppose that the estimator is unbiased so that the quantum Cramér-Rao bound then states that is the quantum Fisher information [54]. Here, the symmetric logarithmic derivative operator L λ is defined implicitly via dρ dλ We focus on states ρ(λ) = σ + λρ (1) with λ 1 that are perturbatively close to ρ(0) = σ. Setting λ = 0 in the above equations gives The bound (3.64) gives the best accuracy for estimating the small parameter λ. Quantum Fisher information (3.65) is closely related to perturbative relative entropy 15 which has a similar expression (3.15). In the eigenbasis of σ with eigenvalues λ i , the symmetric logarithmic derivative has the expression and can be compared with the expression (3.22) for the logarithmic derivative L . When [σ, ρ (1) ] = 0, the two expressions are equal: are the eigenvalues of ρ (1) in the eigenbasis of σ. In general, we can prove the following inequality whose proof is similar to the proof of Theorem 1.
Proof. Assuming [σ, ρ (1) ] = 0, we have and relabeling the dummy indices i ↔ j in the second term, we get where we used L ii = L ii in the last term. Applying the inequality which is displayed in Figure 4, we obtain where the inequality is strict. Assuming [σ, ρ (1) ] = 0, the cross-terms vanish in (3.68) and We can also combine (3.67) with the lower bound (3.20) to give with equality if and only if [σ, ρ (1) ] = 0. This shows that both S (2) and V (2) /2 give quantum Cramér-Rao bounds, although the quantum Fisher information F provides the tightest bound. The inequality (3.67) provides a heuristic connection between perturbative hypothesis testing and parameter estimation. Suppose that the estimator is asymptotically normal, that is the probability distribution for the value of the estimator 16 is effectively described by a Gaussian distribution for large n. Then the Cramér-Rao bound (3.64) implies that the optimal probability distribution for the estimate is This distribution (3.74) is similar to the optimal type II error probability in asymmetric hypothesis testing (2.13) between two perturbatively close states σ and ρ(λ) = σ + λρ (1) : where λ is fixed here. The inequality (3.67) then implies that This can be interpreted heuristically as follows: the binary problem of distinguishing ρ(λ) from σ is easier than estimating the exact value of λ.

Generalities on measurements
In this section, we compare different measurement protocols in a setting where we have a large number n of copies of a physical system. We begin by discussing independent measurements on the n copies, and explain why they fail to be optimal. We then turn to optimal measurements for distinguishing between two states ρ and σ in the context of asymmetric hypothesis testing. Following section 2.2, we call a measurement optimal if it saturates the refined quantum Stein's lemma in the asymptotic limit n → +∞. We would like to understand this optimal measurement in order to apply it in many-body systems in the remainder of this paper. We also consider the likelihood ratio test, which is optimal among the classical measurements. Simple examples where these measurements can be described and tested are then discussed. In Appendix A, we describe and discuss similar measurements for symmetric hypothesis testing. We recall that we take n copies of the system so that we have to distinguish between the states ρ ⊗n and σ ⊗n in the asymptotic limit n → +∞. More precisely, we look for a Hermitian operator A (n) with 0 ≤ A (n) ≤ 1 which minimizes the type II error β n = Tr σ ⊗n A (n) while ensuring that the type I error α n = Tr ρ ⊗n (1 − A (n) ) remains bounded.

Independent measurements
The likelihood ratio test and the optimal measurement, which are described below, use in a crucial way correlations between the n copies. In this section, we demonstrate that independent measurements perform badly. A trivial but notable exception is the case where ρ is a pure state, for which the optimal measurement is simply the projector onto this pure state on each copy. This example is discussed in section 4.4.1.
Let's consider an independent measurement, by which we mean a factorized measurement of the form and denote a which satisfy 0 < a i < 1 and 0 < b i < 1. The type I and type II errors are then given by We see that the type I error α n becomes dangerously uncontrolled in the asymptotic limit. To obtain a bounded type I error, we have to make the a (n) i tend to 1 as n → ∞. This implies that the operators A (n) i should become close to the identity. This will make the b (n) i also close to one and spoil the type II error β n .
To illustrate this argument, consider the following example. Let's pick where B is some bounded positive Hermitian operator. This ensures that the type I error remains smaller than 1, since we have However, we see that the type II error is Thus we see that β n goes to a finite limit as n → ∞, instead of decaying exponentially to zero, as in an optimal measurement. Hence, we expect that in general independent measurements should be far from optimal.
We can reformulate the independent measurement optimization as follows. Denote We then have to impose i v ). This leads us to consider the function If the function f (x) is convex, the optimal choice is to choose one of the v (n) i to be equal to − log(1 − ε) while taking the others to be equal to zero. In other words, multiple measurements yield in this case no improvement over a single measurement.
If, on the other hand, f (x) is concave, then the optimal choice is to choose all v (n) i equal to each other, and the resulting error is whose detailed form for large n depends on the small ε behavior of β * 1 (ε). Of course, if f (x) is neither concave or convex, a more detailed analysis is required.

Optimal measurement
Let's now describe an optimal measurement which was used in [37] to prove the quantum Stein's lemma. Although we will often refer to it as the optimal measurement, it is important to note that it is not unique. 17 We define the modular Hamiltonians K and K by We consider n copies of the system with the states σ ⊗n and ρ ⊗n labeled by i = 1, . . . , n. We denote by {|E } and {| E } the set of normalized eigenstates of σ ⊗n and ρ ⊗n . They are of the form and are labeled by their eigenvalues of K and K respectively. We can define the average modular operators We will use the notation |E| and | E| to denote the eigenvalues of the states |E and | E for the average modular operators. In other words, To describe the optimal measurement, we decompose the state | E in the {|E } basis We then restrict the sum only to the states |E satisfying the acceptance condition |E| − | E| ≥ E for some fixed E that we will call the acceptance threshold. This defines the states We define the acceptance subspace The optimal measurement is then the projection onto this subspace: Unfortunately, explicit constructions of the acceptance subspace and the projection are non-trivial even in simple applications, as we will see.
To obtain the optimal type II error β n for a bounded type I error α n ≤ ε, the optimal acceptance threshold is As explained in section 2.2, this measurement leads to a bounded type I error α n ≤ ε and a type II error exponent The proof of optimality of this measurement is given in [37].

Likelihood ratio test
The optimal measurement described above is in general rather complicated to implement. In this section, we review a simpler measurement, which is efficient and becomes optimal in the classical case, when ρ and σ commute [25]. When ρ and σ are viewed as classical probability distributions, this measurement is the likelihood ratio (Neyman-Pearson) test which is known to be optimal in classical hypothesis testing.
In this setup, we consider two probability distributions P and Q on the same probability space Ω, and we would like to distinguish them by making a test modeled as a function A : Ω → [0, 1]. Let's consider n copies of the system. The task is then to distinguish between the probability distributions P (n) and Q (n) on Ω n defined as with a function A (n) : Ω n → [0, 1]. The optimal type II error is defined as where E P denotes the expected value in the probability distribution P. We are interested in the asymptotic limit n → +∞. We have the estimate (4.21) The first order in n result was originally obtained by Chernoff and Stein and the second order correction by Strassen [56] (see [57] for a review). In the above expression, the relative entropy and its variance are defined as the first and second cumulant, in the probability distribution P , of the log-likelihood ratio log P (x) The measurement that achieves optimality (in this classical setting) is the likelihood ratio test. It is a deterministic test, choosing the function A (n) to be an indicator function which takes the value 1 on an acceptance subspace, the subset of x ∈ Ω n satisfying the acceptance To apply this measurement to quantum systems, we need to express it in quantum mechanical language using the setup described in the previous section. We take the probability space Ω = {|E } to be a basis of eigenstates of K (n) = − 1 n log σ ⊗n . The probability distributions are the ensemble probabilities given by and we have 1 n log Q (n) (E) = |E| from the definition (4.12). The acceptance condition is which can also be written more transparently as We note that this measurement only involves the diagonal part of ρ (defined with respect to the basis defined by σ), which we denote We can then define the "classical" acceptance subspace To implement the likelihood ratio test, we then replace the indicator function of the acceptance subspace by an operator, the projector onto H C : When ρ and σ commute, it can be seen that H C = H Q so this is actually the optimal measurement described in the previous subsection. From the relation with classical quantities S(P Q) = S(ρ D σ) and V (P Q) = V (ρ D σ), we see that the optimal choice of threshold is and leads to a bounded type I error α n ≤ ε and a type II error exponent In general, this measurement is less efficient than the optimal measurement because the monotonicity of relative entropy implies that since the map ρ → ρ D is (completely) positive and trace preserving [58]. Nonetheless, this measurement achieves an exponentially decreasing type II error for bounded type I error. The likelihood ratio test with n LRT copies of the system achieves the same accuracy to leading order as the optimal measurement with n opt copies with In the simple example of a qubit, the likelihood ratio test can be implemented using a quantum circuit, displayed in Figure 5, and a comparison between the likelihood ratio test and the optimal measurement is shown in Figure 6.

Examples
In this section, we describe the optimal measurement in some simple cases.

Pure versus mixed
We consider the simplest possible example. We take ρ to be a pure state and σ to be a general mixed state In this case, an optimal measurement is just the projector A = |ψ ψ|. On n copies of the system, we take the factorized measurement A (n) = A ⊗n . The type I error α n = 0 and the type II error is given by which indeed saturates the quantum Stein's lemma. The second order asymptotics in n do not play a role because according to the proposition explained in section B.2.

Global thermal states
We consider two thermal states with different temperatures and we would like to distinguish between them. The modular Hamiltonians are where the free energy is defined as The relative entropy and variance are We are in a situation where ρ and σ commute so the likelihood ratio test is actually the optimal measurement. It can be described as follows. We consider n copies of the system and we define the average Let {|E } be a basis of eigenstates of σ ⊗n . These are formed from eigenstates of H. Notice that we are using the actual energies to label the states as opposed to using the eigenvalues of the modular Hamiltonian. In particular, we denote by |E| the average energy of the corresponding state The measurement is simply the projection onto the states in this basis with the acceptance condition This translates into the condition We have to distinguish two cases depending on the sign of β 1 − β 2 . The acceptance condition is where the threshold energy is The measurement is then a projection on the states satisfying the condition It is interesting to note that the optimal measurement actually doesn't depend on the value of β 1 , but only on whether it is bigger or smaller than β 2 .

Measurements of a qubit
In this section, we consider a simple system to illustrate the measurements that we have been discussing. The system is just a single qubit in two possible states ρ or σ. We are interested in the optimal measurement on n copies of the system in the asymptotic limit where n is large.

Likelihood ratio test
The best classical measurement is the likelihood ratio test and was discussed in section 4.3. In this section, we will write it explicitly for the case of a qubit. We will also give a quantum circuit that realizes it.

Setup
Let {|0 , |1 } denote the basis which diagonalizes σ, with 0 ≤ p ≤ 1. The likelihood ratio test only involves the diagonal part ρ D of ρ, which we can write as A basis of the Hilbert space for the n copies is given by the states labeled by the bit strings a 1 a 2 . . . a n . The acceptance condition for the likelihood ratio test takes the form Denoting by n(E) the number of 1s in E (the Hamming weight of the bit string), this is where we use the ceiling function · so that n * is an integer. The optimal value for E is given in (4.31) in terms of the relative entropy and its variance and leads to the acceptance threshold The acceptance subspace is 8) and the measurement is the projection onto H C . We can also identify H C with a subset of {0, 1} n , the complement of the Hamming sphere of radius n * − 1 centered at the zero string.

Quantum circuit for the likelihood ratio test
We now describe a quantum circuit that implements the likelihood ratio test. In the language of quantum computing, our problem can be posed as follows. We are given a blackbox gate V acting on a pair of qubits producing a state we wish to identify. More explicitly, acting with V on |00 and tracing over the second qubit gives a density matrix ρ V for the first qubit, and we assume that there can be only two possibilities: where ρ and σ are known a priori but we do not know the outcome. Our goal is to determine which alternative is true by making a measurement on n of these pairs of qubits, and operating only on the first qubit of each pair. The likelihood ratio test is the best classical measurement and becomes the optimal measurement when ρ and σ commute. From the previous analysis, the measurement is a projection P H C onto the acceptance subspace (5.8). Hence, we would like to compute If this quantity is close to one, we declare that ρ V = ρ while if it closer to zero, we declare that ρ V = σ. Because the state V |00 is a purification of ρ V , we can rewrite (5.10) as the overlap where P H C only acts on the first qubit on each pair. This quantity can be computed using the quantum circuit depicted in Figure 5. We start with n pairs of qubits in the state |0 together with a register of n auxiliary qubits in the state |ψ . We first act with V on each pair. We then use a controlled-I gate where I is a "increment" gate which counts the number of 1s in the register while preserving the superposition.
The register is designed to incorporate the threshold condition associated with the projection P H C by measuring the overlap of some of its qubits with some fixed state. For example, we can Circuit that prepares the state Ψ Swap test to compute the overlap take a register of n + 1 qubits and count the number of 1s as follows. We initialize the register in the state |ψ = |1 ⊗ |0 ⊗n and define I to be the cyclic permutation i → i + 1 on the n + 1 qubits. If the number of 1s is k, all the qubits in the register are in the state |0 except for a |1 in the (k + 1)-th position. Then, we can see that by measuring the overlap of the first n * qubits of the register with |0 ⊗n * , we exactly implement the projection P H C . 18 Indeed, all the states with n(E) ≤ n * − 1 are projected out. Measuring at the same time the overlap of the n pairs of qubits with |00 ⊗n precisely gives (5.11). The remaining qubits of the register should remain unmeasured. This overlap operation should be implemented using a swap test between the 2n + n * qubits consisting of our n qubit pairs and the n * first qubit of the register, with 2n + n * auxiliary qubits in the state |0 . This allows us to measure the overlap (5.11) to arbitrary precision using iterations of the circuit. We note that the register can be optimized by using only log n qubits and storing the number of 1s in binary instead of unary.

Optimal measurement
We now investigate the optimal measurement for a qubit. When ρ and σ commute, the optimal measurement reduces to the likelihood ratio test, which was described in the previous section.
Here, we would like to study the optimal measurement more generally, in a setup when ρ and σ do not commute. We consider a very simple non-commuting example, taking Moreover, we assume that the change of basis is just a rotation matrix In a basis where |0 = 1 0 and |1 = 0 1 , we have and we have e −E 0 + e −E 1 = 1 so it is useful to define p such that and we have p ≤ 1 2 . The relative entropy is The basis states of σ and ρ are defined as bit strings |E = |a 1 a 2 . . . a n , a i ∈ {0, 1} , (5.18) We define n(E) to be the number of 1s and n( E) to be the number of 1s. The acceptance condition with threshold E takes the simple form This allows us to define the states that span the acceptance subspace. For every E, we define The optimal measurement is then the projector to the acceptance subspace H Q = span Formally, we first define the operator so that the acceptance subspace H Q is the image of Q. The optimal measurement is the projector onto it, given by where G ≡ Q † Q is the Gram matrix of the vectors (5.20): the 2 n × 2 n matrix of the overlaps ξ( E 1 )|ξ( E 2 ) . The above expression is well-defined because the restriction of G to the image of Q † is invertible, and P H Q can be extended by zero on the vectors that are annihilated by Q † . Note that the above expression makes it clear that P 2 H Q = P H Q . We see that explicit construction of the projector involves finding the inverse of the Gram matrix G, which is a challenging computational problem.
Complexity of measurements. It is intuitively clear that the optimal measurement is more complicated than the likelihood ratio test, since the former involves a more complicated construction of the acceptance space and the projector. It would be interesting to formalize this intuition by defining various notions of complexity of a measurement. The definitions of complexity could be based on different resources, and could also depend on the algorithm carrying out the measurement or computing the projector. A simple algorithm independent characteristic resource is the size of the acceptance subspace, or more precisely, its dimension. If one of the states to be compared is pure, the optimal measurement involves the projection to the state. In this simplest case, the acceptance space is smallest with just one state, while its complement is maximal. Hence, for comparing the complexity different measurements, it is helpful to define the minimum dimension of the acceptance space and its complement, This defines a complexity measure which depends on the predetermined maximum size ε of the type I error, the number n of identical copies, and the two states ρ, σ through the acceptance threshold n * . Once these are given, we can compare the minimum acceptance dimension dim H < acc of the optimal measurement and the likelihood ratio test. The latter depends on the volume of the Hamming sphere and its complement, so we have an analytical formula dim H < acc,C = min For the optimal measurement, finding an analytical formula or at least an estimate for the minimum acceptance dimension dim H < acc,Q is a mathematical challenge. We study it numerically for n up to 14, by performing the Gram-Schmidt orthogonalization of the vectors |ξ( E) that span the acceptance space and then counting the number of orthonormal basis vectors. The (very limited) investigation suggests that dim H < acc,Q grows exponentially with n with a faster rate than dim H < acc,C . 19 This indicates that already at the level of the acceptance spaces the optimal measurement is "more complex" than the likelihood ratio test. There are additional levels of complexity involved in computing the Gram matrix and finding its inverse, it would be interesting  Figure 6: Optimal quantum measurement (blue) vs. optimal classical measurement (yellow). We see (left plot) that the optimal measurement gives a type II error β that is one order of magnitude smaller for n ∼ 14. We also see (right plot) that the minimum acceptance dimension is much larger for the optimal quantum measurement than for the optimal classical measurement. The curves with the logarithmic y-axis indicate exponential growth in n with a faster rate for the optimal measurement. These plots are done with the parameters θ = π 3 , p = 0.015, ε = 0.2 using a Mathematica notebook that we have made publicly available [61].
to develop rigorous complexity measures taking into account everything involved in constructing the projection.
Numerical results. The numerical implementation of the optimal measurement and the likelihood ratio test are done in a Mathematica notebook that we have made publicly available [61]. We analyze the numerical implementation of the measurements only up to n = 14, but this already proves sufficient to see some interesting features. For the threshold value E, we use the optimal value (4.17). Including the second order term (in n) is necessary because n is not very large (the second order term brings the ε-dependence). Choosing parameter values such that the finite n effects are not too strong, we see that the optimal measurement is better by an order of magnitude. This is depicted in Figure 6. This demonstrates that quantum hypothesis testing is much more efficient than classical hypothesis testing. The tradeoff is that quantum hypothesis testing is more complex. The growth of the minimum acceptance dimension with n is exponential for both measurements, but the growth rate appears to be faster for the optimal quantum measurement. It would be interesting to carry out a more extensive numerical investigation and see how generic this feature is.
Some mathematical observations. We finish this section by providing some partial results to the more challenging problem of constructing the optimal measurement in the general case. The partial results illustrate interesting connections to combinatorics and coding theory, which should inspire further study. For the rest of this discussion, we will restrict to the case θ = π 4 where many simplifications occur. In this case, the rotation matrix (5.14) is just the Hadamard matrix and we have | 0 = |− and | 1 = |+ . In this case, we have a rather explicit description of the states |ξ( E) : where n 01 (E, E) is the number of pairs (a i ,ã i ) which are equal to (0, 1) using (5.18). We now need to do the Gram-Schmidt procedure for these vectors to obtain a basis of H Q . This requires to compute the Gram matrix G of overlaps ξ( E 1 )|ξ( E 2 ) . The overlaps can be expressed as partial sums of products of binomial coefficients. Using a generalization of Vandermonde's identity, we can re-express the overlap as follows. Define the polynomial where n( E 1 + E 2 ) is the number of 1s in the the boolean sum (i.e. the sum in the ring Z 2 ) of E 1 and E 2 . The overlap is then obtained as a partial sum of the coefficients P n,k where n * ( E 1 , E 2 ) = max(n * ( E 1 ), n * ( E 2 )). We refer to Appendix C for details on the derivation of this formula. There, it is also shown that P n,k are related to binary Krawtchouk polynomials K k (x; n), and the overlaps of the Gram matrix take the explicit form It is also interesting that this problem seems related to coding theory and combinatorics. In Appendix C, we show that the Gram matrix is an element of the Terwilliger algebra [62] of the Hamming cube H = {0, 1} n (see [63,64]). This is done by identifying the labels ξ( E) as subsets of H, given by the supports of the bit strings E. In this way we obtain the explicit expansion in the basis {M t ij } of the Terwilliger algebra. Identifying the expansion coefficients x t ij then allows at least a block diagonalization of G, exploiting the results of [63], which may turn out to be a useful step towards finding G −1 , and for the construction of the projector P H Q .

Measurements in fermion chains
In this section, we study subsystem measurements in spinless fermion chains. Our goal is to construct measurements that are optimal in distinguishing between two different states, while acting only on a small subsystem. We will take these two states to be two thermal states with different temperatures. We will mostly focus on simpler hopping models, but some of our results also apply to fermion chains with Hamiltonians being arbitrary bilinears of creation and annihilation operators. This setup is the discrete analog of the chiral fermion CFT that will be studied in section 7.2.2. For small subsystem sizes, we will be able to give a more explicit description of the optimal measurement.

Spinless fermion chains
We consider spinless fermions on a chain of length L → +∞ with periodic boundary conditions. 20 The total Hamiltonian of the chain is and the fermion operators obey the anticommutation relations HereÂ is real symmetric andB is real antisymmetric to ensure Hermiticity. In addition, they are taken to be positive semi-definite so that the total energy is non-negative. The hats are used to denote L × L matrices supported on the whole chain, to be distinguished with matrices restricted to a subsystem that we study below.
As an example, the anisotropic XY model can be mapped to a Hamiltonian of the form (6.1) via a Jordan-Wigner transformation [65]. We will consider the simpler isotropic XY model in section 6.3.3 below.

Diagonalization of fermion Hamiltonians
The Hamiltonian (6.1) can be diagonalized by the Bogoliubov transformation where the vectorsv k ,û k are solutions of the equations where the constant sets the zero point energy. 21 The operators η k , η † k generate a Fock space of positive energy excitations. 20 In what follows, there is a possibility of an order of limits issue with the thermodynamic L → ∞ and the perturbative λ → 0 limits. To circumvent the issue, we simply take L to be larger than any scale in the problem and take the perturbative limit λ → 0 while keeping L fixed. We thank the referee for pointing out this subtlety. 21 The constant is explicitly 1 For fermion chains withB = 0, the diagonalization procedure can be made more explicit. One first solves the eigenvalue problemÂv k = Λ kvk which allows to writê A =v Dv (6.8) whereD is a diagonal matrix with entries Λ k . Then, performing the Bogoliubov transformation the Hamiltonian becomes where Λ k can be negative. The form (6.7) with absolute values is obtained by performing an additional particle-hole transformation on a k , a † k (which is automatically included in (6.4)). For our purposes, the form (6.10) is sufficient and the Bogoliubov transformation (6.9) is a special case of (6.4) withû =v.

Reduced density matrix of a subsystem
We consider a subsystem V = {1, . . . , } containing fermions, and place the chain (6.1) in a global thermal state 22σ = e −βH Tr e −βH . (6.11) The reduced density matrix (RDM) on V is obtained by tracing over its complement V c and takes the form where the modular Hamiltonian 23 K takes the same form as total Hamiltonian of the chain: The matrices A, B are different from the matricesÂ,B. Indeed, the modular Hamiltonian K, which depends on the global state, is not equal to the Hamiltonian H| V of the subsystem. The matrices A, B in the modular Hamiltonian can be obtained from the following equations [28,66] Tr which follow from the fact that expectation values of operators supported in the subsystem can be computed using either the global state or the reduced state. The two-point functions are sufficient, because higher-order correlators reduce to two-point functions by Gaudin's theorem (an extension of Wick's theorem). Since both σ andσ are exponentials of one-body operators, these traces can be computed explicitly (see Appendix D) to write the equations in terms of the parameters appearing in K and H. For simplicity, we will restrict to free fermion chains withB = 0, so that the Hamiltonian is Due to the absence of the pair creation/annihilation terms, the anomalous two-point function Tr (σψ † i ψ † j ) = 0 vanishes. This is reflected in the modular Hamiltonian which has B = 0 [28]: The partition function Z can now be easily obtained in terms of A as where the determinant is taken over the matrix indices. Let C denote the thermal two-point function restricted to the subsystem from which we also obtain an expression for Z in terms of C: . (6.20) Hence for free fermions, the reduced density matrix of a subsystem in a thermal state is simply given by the thermal two-point function C.

Relative entropy and its variance for free fermions
We introduce a second global thermal stateρ with temperatureβ. This induces a different reduced density matrix ρ on the subsystem: Let us now compute the relative entropy and the relative entropy variance for the two reduced density matrices. Relative entropy is given by where we have and we used The partition functions are given by (6.20): As a result, we obtain for the relative entropy 26) and the relative entropy variance is given by which doesn't depend on the partition functions. The first term can be written as Because ρ is an exponential of one-body operators, we can use Gaudin's theorem to compute the four-point function [67] (see also Appendix D). The result is and we get The first term equals ∆K 2 ρ which cancels in (6.27) and leaves us with As far as the authors are aware, the expressions (6.26) and (6.33) for relative entropy and its variance have not appeared in the literature before. However, sandwiched Rényi relative entropy between RDMs of a free fermion chain was computed in [35] (see also [36]) and one can check that the relative entropy (6.26) matches with the first derivative of their expression. Unfortunately, we did not manage to compute the second derivative to see whether the result matches with the variance. As an independent consistency check of (6.33), we will see below that it obeys the lower bound (3.20).
The expressions for S(ρ σ) and V (ρ σ) can be written explicitly in terms of eigenvalues and eigenvectors of A, A. We have so that where v k · v l = i v ki v li is the overlap between the eigenvectors. There is also a similar expression for the variance. A further simplification occurs if A and A commute so that their eigenvectors are the same: In this case, one obtains simple expressions The vanishing of the commutator of A, A is equivalent to commutativity of the RDMs [ρ, σ] = 0. This can be seen by performing Bogoliubov transformations on K and K respectively. In a similar way the full Hamiltonian was diagonalized using (6.9), the modular Hamiltonians become If (6.36) holds one finds from (6.38) that c k = c k and c † k = c † k so that [K, K] = 0. In addition, one can check that for a perturbative entanglement spectrum of the form E k = E k + λE (1) k , the expressions (6.37) saturate the lower bound (3.20), as expected for commuting RDMs.

Optimal measurement
In this section, we describe the implementation of the optimal measurement for spinless fermion chains. This involves computing overlaps between eigenstates of two modular Hamiltonians, which can be done using the generalized dick's theorem [38,39]. For free fermions, this gives a prescription on how the overlaps v i · v j between eigenvectors translate into overlaps between eigenstates E I | E J . For completeness, we will consider general modular Hamiltonians of the form (6.13) with nontrivial A and B. We will restrict to modular Hamiltonians of free fermions with B = 0 in the end.

Eigenstates of modular Hamiltonians and their overlaps
To unify the computations, we introduce some convenient notation. Let be -dimensional vectors. We define similarly the -dimensional vectors c, c † and c, c † , and combine them further into 2 -dimensional vectors as Following the analysis for the Hamiltonian of the chain, modular Hamiltonians K, K of the form (6.13) are diagonalized by transformations where The transformation matrices are obtained by solving equation (6.6) for A and B (and similarly for v, u): The matrices v, u, v, u are real and orthogonal so that W, W real and orthogonal as well. 24 They are thus Bogoliubov transformations, because the real Bogoliubov group is the orthogonal group (see Appendix D.1). As a result, the modular Hamiltonians become The exact values of E vac , E vac are not important for the upcoming analysis. From these expressions it follows that eigenstates are generated by acting on two quasi-particle vacua |E vac , | E vac with creation operators. The vacua are defined via 47) and the eigenstates are where we used -bit binary strings to keep track of the occupation numbers of the modes k. The corresponding eigenvalues are We want to compute overlaps between these eigenstates Standard Wick's theorem does not directly apply to correlators of this type because c † i is not the Hermitian conjugate of c i . The trick is to realize that the operators α and α are related via a Bogoliubov transformation T (orthogonal matrix): which is explicitly We introduce the operator T that implements the Bogoliubov transformation T in the Hilbert space [38,39]: and we have that T is unitary since T is real. The expression for T in terms of α is not relevant in what follows. However, if T can be written as an exponential T = e −ΩS , where Ω is the matrix (D.3) and S is antisymmetric, then T is an exponential of one-body operators [38,39]. It follows that | E vac = T |E vac so that all the eigenstates of the modular Hamiltonians are related according to The overlaps (6.50) are therefore and unitarity of T ensures that these overlaps determine a unitary basis rotation in the Hilbert space. All the operators in (6.55) are expressed in terms of the annihilation and creation operators c, c † which allows the use of Wick's theorem. In Appendix D, we show that the overlaps involving two operators are where T 11 = T 22 , T 12 = T 21 are the two × blocks of (6.52) and the overlap between the vacua is The overlaps (6.55) involving more operators can be computed using generalized Wick's theorem [39] and it is non-zero only when n + m = 2t is even. In that case: appearing on the right hand side are the three two-point overlaps (6.56) and we refer to Appendix D for more details. In other words, all the overlaps (6.55) can be expressed in terms of the two-point overlaps (6.56) using the generalized Wick's theorem. The computation of the contractions (6.56) requires the knowledge of v, u and v, u that determine the block matrices T ij according to (6.52). These can be computed from (6.45) knowing A, B and A, B which are obtained from two-point functions in the global state according to (6.14). Although these equations are in general difficult to solve, they become simpler for free fermions, because B vanishes and A is directly given in terms of C according to (6.19). We will demonstrate this below for the XY model.
The power of this approach is that it gives a way to compute the overlaps without the need of the explicit form of the ground states |E vac , | E vac . It can therefore be applied to modular Hamiltonians of the general form (6.13). However, there is one situation where the above computation of the overlaps fails: when det T 11 = 0 so that T 11 is not invertible. This happens when the two quasi-particle vacua are orthogonal.

Overlaps of eigenstates for free fermions
The above algorithm to compute overlaps simplifies for free fermions since B = B = 0 which implies that we can use the Bogoliubov transformations (6.43) with u = v and u = v. Hence all the overlaps are determined by the eigenvectors v, v of the two-point functions C, C.
With B = B = 0, the modular Hamiltonians are As shown before, they take the diagonal form (6.39) after the transformation (6.38): From these we get which is block diagonal. The overlap between the quasi-particle vacua is then where we used the fact that the determinant of vv ∈ SO(2 ) is unity. In this case, the quasiparticle vacua coincide with the true vacuum |E vac = | E vac = |0 (annihilated by ψ i ).
Noting that (T 11 ) −1 = v v , the only non-zero contractions are Because of this, the higher order overlaps (6.55) are non-zero if and only if n = m. The generalized Wick's theorem (6.58) for t = n gives The result (6.65) could have been obtained directly from the correlator (6.50) without reference to the generalized Wick's theorem. For example, inverting (6.38) yields with a similar strategy for the higher order correlators. It is for modular Hamiltonians with B = 0 when the generalized Wick's theorem becomes very useful.

Examples
We now give explicit examples for the general procedure described above.

A single fermion subsystem
The simplest possible subsystem contains only a single fermion. For a generic quadratic modular Hamiltonian (6.13) with = 1, the matrix B does not contribute as it is antisymmetric. Hence modular Hamiltonians of a single fermion at site k = 1 take the form The two-dimensional Hilbert space of the fermion is spanned by the vacuum state |0 and the state |1 ≡ ψ † 1 |0 , (6.68) with a fermion occupying site k = 1. In the above formalism, they are eigenstates of the modular Hamiltonians since we have T = 1 2×2 . The fermion Hilbert space spanned by |0 , |1 is equivalent to the single qubit Hilbert space studied in section 5. The two RDMs of the fermion take the form We see that the RDMs always commute. As a result, the optimal measurement is given by the likelihood ratio test described in section 4.3. The acceptance subspace for the RDMs (6.69) was determined in section 5. Relative entropy and its variance are given by (6.37) and the acceptance condition becomes where n(E) is the number of fermions in the n copies of the subsystem. The optimal measurement is then a projection onto states that contain n * or more fermions.

Two fermion subsystem
The situation is more interesting for subsystems containing more fermions. We consider here a subsystem of two fermions in a free fermion chain, taking the two fermions to be on sites i = 1, 2.
The matrices A, A have two eigenvalues E 1,2 , E 1,2 and eigenvectors which we parametrize as v = cos ϕ − sin ϕ sin ϕ cos ϕ , v = cos ϕ − sin ϕ sin ϕ cos ϕ . (6.72) Using the binary string notation for the eigenstates, we have There is a total of sixteen overlaps. From (6.65), the non-zero overlaps are and Thus the unitary rotation is given by 78) and it acts non-trivially only on the subspace spanned by |E 1 , |E 2 . The basis rotation (6.78) is effectively the same as the one studied in section 5 where the optimal measurement on a single qubit is constructed. The eigenstates |E 1 and |E 2 , with a single fermion on either site 1 or 2, correspond to the rotation between two states of a qubit. In addition, we also have an unrotated qubit. As discussed in section 5.2, the explicit description of the optimal measurement for the one-qubit case is challenging due to the difficult inversion of the Gram matrix. We will thus describe the suboptimal but simpler likelihood ratio test.
Assuming for simplicity that the two eigenvalues of K are equal E 1 = E 2 ≡ E vac + ∆, E 12 = E vac + 2∆ with ∆ > 0, and likewise for the tilded values, we have where E 0 ≡ E vac + log Z and a similar definition of E 0 . The eigenstates of ρ ⊗n , σ ⊗n are |E ≡ |a 1 a 2 · · · a 2n−1 a 2n , (6.80) | E ≡ | a 1 a 2 · · · a 2n−1 a 2n , labelled by 2n-bit strings. The average modular energies are where n(E), n( E) count the number of 1s in the binary strings. The acceptance condition |E| − | E| ≥ E becomes The likelihood ratio test is then the projector Note that E 0 − E 0 cancels in (6.82) with the same term coming from relative entropy once the threshold E = S(ρ D σ) + . . . is substituted. It's also possible to obtain an explicit expression for S(ρ D σ) using the overlaps (6.76).
The acceptance space is given by (the complement of) the Hamming sphere of radius n * centered at zero in the Hamming cube {0, 1} 2n . While the likelihood ratio test is in general a suboptimal measurement, it becomes optimal when the reduced density matrices commute. The next example gives a situation where this happens.

Example: XY model at finite temperature
The isotropic XY spin chain has the Hamiltonian [65] where σ x,y i is the Pauli matrix at site i and the boundary conditions are periodic. In the thermodynamic limit, this Hamiltonian can be mapped to a periodic free fermion chain [65] 26 Hence the Hamiltonian is of the form (6.1) withB ij = 0 and the eigenvectorsv k and eigenvalues Λ k can be found in [65]. Due to translation invariance, the thermal two-point function is a function of i − j only, and in the thermodynamic limit L → ∞, it takes the form ] e β cos q + 1 . which corresponds to ϕ = π/4 in equation (6.72). We see that v is independent of the temperature β and of the distance r. This is true in any translation invariant fermion chain for which the thermal two-point function is of the form (6.88). Now when considering thermal states of two different temperatures, leading to two modular Hamiltonians K and K, the unitary rotation (6.78) between their eigenstates is trivial: U IJ = δ IJ . Hence the RDMs of the two fermions commute and the optimal measurement is the likelihood ratio test. If the fermion chain is not translation invariant this is no longer true, because then the modular Hamiltonians K, K do not generally commute. It is interesting that translation invariance implies commutativity of two-fermion density matrices in global thermal states.

Measurements in conformal field theory
We now turn to the implementation of quantum hypothesis testing in quantum field theory. We will discuss in detail how the measurements described in section 4 are realized as operators acting on states. For simplicity, we restrict to two-dimensional conformal field theory because the infinitedimensional group of conformal transformations in two dimensions allows for a certain flexibility. For an introduction to the subject, we refer to [68].
The physical system we consider will live on a line or on a circle. We will be particularly interested in distinguishing two different states from an interval subregion. Our main technical result is the construction of the optimal measurements for special types of states, studied by Cardy and Tonni [40]. As an illustration, we study the free chiral fermion CFT, which could be viewed as a continuous limit of the discrete fermion chain studied in section 6.
While we obtain some basic technical results in implementing measurements in conformal field theories, we are merely scratching the surface of a vast number of possibilities in the choices of theories and states. As our free fermion case will show, there are interesting analytical challenges when trying to simplify the implementation of efficient measurements.

Subregion measurements
We now describe the situation where we want to distinguish between two states in a CFT 2 while only having access to a subregion. After tracing out over the rest of the system, the two states are given by two density matrices σ and ρ supported in that subregion.
The measurements described in section 4 are given in terms of the modular Hamiltonians. In general, the modular Hamiltonian of a reduced density matrix would be a complicated non-local operator and be difficult to study. For a special class of states in a CFT 2 , the modular Hamiltonian is local: it can be written as a suitable integral of the stress tensor. We will restrict to these types of states in the following two sections, drawing on the results of [40]. We will first describe the optimal measurement in the generic situation, and then explore in some more detail the task of distinguishing between two thermal states at different temperatures in the next section. We will explain how to implement the likelihood ratio test to distinguish between the vacuum and a primary excitation.

Setup
Let's now describe the setup. The CFT 2 is defined on a line or on a circle and the subregion we consider is an interval I = [− 2 , 2 ]. We consider the Euclidean spacetime described by a coordinate z. We cut out little disks of size around the endpoints of I to regulate the entanglement entropy. The boundary conditions are given by two boundary states |a and |b and they contribute a finite amount to the entanglement entropy via Affleck-Ludwig boundary entropies [40].
We consider two reduced density matrices σ and ρ defined on the interval I. The corresponding modular Hamiltonians K = − log σ and K = − log ρ are assumed to be local. As a result, each of them can be viewed as generating a flow along a vector field, as represented on the left of Figure  7. To define the optimal measurement, we are interested in the eigenstates of both K and K, and their overlaps. To obtain a useful description of these states, we will use the flexibility of two-dimensional CFTs to conformally transform the setup to a simpler geometry for each state, as represented on the right of Figure 7. In this simpler geometry, the modular Hamiltonian becomes a dilatation operator, whose eigenstates are easily described.
We first use the conformal map which takes the spacetime to an annulus of width W . 27 More precisely, the interval is mapped to w ∈ [− W 2 , W 2 ], and the imaginary part of w is periodic with period 2π. The modular Hamiltonian in these new variables becomes simple: it just generates translations in the imaginary w direction.
To describe the eigenstates of the modular Hamiltonian, it is useful to consider the universal cover by allowing the imaginary part of w to be unconstrained. The geometry becomes an infinite strip. We can then map it to the upper half plane with The interval becomes a half unit circle C + , ranging from u = 1 to u = −1. As explained in [41], the choice of boundary conditions is such that one can extend this to the other half plane and perform radial quantization on the full plane. The modular Hamiltonian K is simply related to the generator L 0 of dilatations in this geometry: where c is the central charge and the additive constant ensures that Tr e −K = 1 [40]. We refer to [69] for a more detailed discussion of this setup. The upshot of all these manipulations is that we can now relate the spectrum of the modular Hamiltonian to the spectrum of L 0 in the presence of two boundary conditions |a and |b . For example, we can choose |a = |b = |0 where the Cardy state |0 projects onto the vacuum sector of the theory [41], so that the only states in the entanglement spectrum are the vacuum and its descendants.
Iũ L 0 ρ σ Figure 7: The modular Hamiltonians K = − log σ and K = − log ρ are conformally mapped to dilatation operators in the upper half plane. The entanglement spectrum is then obtained using radial quantization. The inverse maps give expressions for K and K in the original spacetime, giving a way to compute the overlaps of their eigenstates, as required to implement the optimal measurement. The modular flows are depicted in blue for K and in orange for K. 29 In the u-plane, we obtain from radial quantization the Virasoro generators where C is the unit circle. This is then translated to an integral over the original interval I: The entanglement spectrum of a state σ can then be generated by acting with these operators on the vacuum. We can use the same procedure for another state ρ using a different map w =f (z) giving an annulus of width W . The spectrum of ρ is then generated by another Virasoro algebra Similarly, the modular Hamiltonian K = − log ρ is them given by Since both Virasoro algebras are written on the interval, we can compare them. Their commutators can be computed using the general commutation relation of two stress tensors in a CFT 2 : We can restrict to the vacuum sector by choosing the boundary condition |a = |b = |0 . Then, the eigenstates of K are given by the eigenstates of L 0 which takes the form Similarly the eigenstates of K at the eigenstates of L 0 and take the form The general commutation relation (7.8) can be used to compute the commutators [L n , L m ], even though this is difficult in practice. This then gives a way to compute the overlaps ∆| ∆ , as required to describe the optimal measurement.

Optimal measurement
The optimal measurement can then be implemented in this language, following section 4.2. Let's now consider n copies of the system. The eigenstates of σ ⊗n and ρ ⊗n are respectively denoted |∆ = |∆ 1 ⊗ |∆ 2 ⊗ · · · ⊗ |∆ n , | ∆ = | ∆ 1 ⊗ | ∆ 2 ⊗ · · · ⊗ | ∆ n . (7.11) Using the formula (7.3), we see that the average modular energies for K and K are respectively where the average conformal dimension is denoted The optimal measurement is then described by first decomposing | ∆ in the {|∆ } basis where we have ∆| ∆ = n i=1 ∆ i | ∆ i . We then restrict the sum over ∆ to those satisfying the acceptance condition K (n) − K (n) ≥ E which is here: This allows us to define the states The optimal measurement is the projector onto the subspace with the choice of acceptance threshold being

Likelihood ratio test
We will see that the optimal measurement is difficult to describe explicitly. A simpler measurement, which is suboptimal but still performs well, is the likelihood ratio test discussed in section 4.3. The measurement projects on part of the spectrum of σ ⊗n . More precisely, it is a projection on the acceptance subspace 19) and the best value of E is given in (4.31). We can rewrite the acceptance condition as where we define the averages To obtain a more explicit description, we should compute ∆| L 0 |∆ , which can be written As a result, a fairly explicit description of this measurement can be given with only the knowledge of the overlaps ∆| ∆ .

Thermal states
As a concrete example of the procedure described above, we can consider the problem of distinguishing two thermal states of different temperatures, having only access to a subregion. We take the subregion to be an interval I = [− 2 , 2 ] in the infinite line. Following [70], the reduced density matrix obtained from a thermal state is associated to the conformal mapping f β (z) = log e 2πz/β − e −π /β e π /β − e 2πz/β (7.23) which allows to obtain the corresponding modular Hamiltonian, as described in section 7.1.1. We consider two reduced density matrices σ and ρ in the interval I obtain from global thermal states of inverse temperature β 1 and β 2 . The corresponding modular Hamiltonians are explicitly where T 00 is the energy density of the CFT and c(β 1,2 ) are normalization constants.

Entropy and variance
In a thermal state at temperature β, the one-point function of the energy density is T 00 = πc 6β 2 . We can determine the constant c(β) in (7.24), because we know that the entanglement entropy is where is the UV cut-off and g a , g b are the Affleck-Ludwig boundary entropies originating from boundary conditions at the entangling points [40]. This allows us to compute the relative entropy The variance can be computed directly from the formulas (7.24) and the two-point function (7.28) At leading order in the small interval limit /β 1 → 0, we have We note that we have the ratio satisfying the lower bound (3.20). 30 It turns out that this ratio is an interesting quantity to study for more general states, and further results on this ratio will be presented elsewhere.

Free fermion
The description of the optimal measurement in section 7.1 is valid for a general CFT 2 . We can try to be a bit more explicit by considering the example of the free fermion in two dimensions. This theory can be seen as a continuum analog of the fermion chain considered in the previous section. The free boson is very similar and presented in Appendix E.
We consider a free fermion ψ on a circle with antiperiodic boundary conditions (Neveu-Schwarz sector). It has a mode decomposition As above, we can compute the Fourier mode where we are using the notation The anticommutation relation of the field is This implies that for the Fourier modes, we have from which one can show that {ψ n , ψ m } = δ m+n . For the state ρ, we have similarly We would like to compute overlaps between the eigenstates of ρ and that of σ. This information is contained in the commutator Although explicit, this integral is hard to compute analytically.
The Hilbert space is a Fock space generated by acting on the vacuum with creation operators. A basis adapted to σ is given by where s = (s k ) k with s k ∈ Z + 1 2 and s k > 0, which we take to be in an increasing sequence. The conformal dimension (eigenvalue of L 0 ) of such a state is Similarly, we can consider a basis adapted to ρ given by the states wheres = (s k ) k being an increasing sequence.
To describe the optimal measurement, we would like to compute the overlap ∆ s | ∆s . We see that the overlap is non-zero if and only if |s| = |s| where | · | denotes the cardinality of the set s. Moreover, we see that the overlap is simply given by the corresponding minor of the matrix A which defines a matrix M . The eigenvalue E of K is related to that of L 0 via the relation (7.3).
We now consider n copies of the system to implement the optimal measurement. Following section 7.1.2, we have the acceptance condition (7.15). This allows us to define the states |ξ( ∆) using the overlaps computed above. The optimal measurement is then the projector onto the subspace (7.17) spanned by these states. It is difficult to obtain a more explicit description of this optimal measurement. The first obstacle is the computation of the integral (7.38) which is needed to obtain the states |ξ( ∆) more explicitly. Furthermore, even if we managed to have a simple expression for these states, describing the subspace (7.17) will be even harder, involving their orthonormalization using for example the Gram-Schmidt process. This procedure was discussed in section 5.2 in the much simpler case of a qubit, where it already leads to a challenging combinatorial problem.
It is then of interest to find suboptimal but simpler measurements which still perform well. A good candidate is the likelihood ratio test discussed in section 4.3 in a general context. Following section 7.1.3, implementing this measurement in CFT only requires the computation of the onepoint function ∆ s | L 0 |∆ s . For the free fermion, it can be written as This only requires the computation of A nm and its minors, which is much more tractable, as compared to what is required to describe explicitly the optimal measurement.

Primary excitation
We now consider a setup consisting of a primary excitation that we wish to distinguish from the vacuum. We are interested in the case where we have only access to a subregion of the system. We will take the example of an interval in the circle. Let σ and ρ be the states on this interval corresponding respectively to the vacuum and to the excitation. 31 Considering n copies of this setup, we would like to distinguish between the two states σ ⊗n and ρ ⊗n .
The optimal measurement is more difficult to describe because in this case, we do not have an analytic expression for the modular Hamiltonian of the excitation. Nonetheless, we will be able to implement the likelihood ratio test, as discussed in section 7.1.3.
Consider a two-dimensional CFT on a circle with circumference L at zero temperature. The Euclidean space is then an infinite cylinder of circumference L with a complex coordinate w = φ + iτ where φ ∼ φ + L is the spatial coordinate and τ ∈ R is the Euclidean time coordinate. We will study the interval I = [0, ] with 0 < < L on the τ = 0 circle. We map the cylinder to the complex plane using the map w −→ z = e 2πiw/L , (7.45) so that the Cauchy slice τ = 0 is mapped to the |z| = 1 circle. The interval I is mapped to the circular arc between z = 1 and z = e 2πiλ with λ = /L. Using a primary operator Φ, we create an excited state |Φ = Φ(0)|0 in radial quantization by performing the path integral over the unit disk with Φ(0) inserted at the origin. The corresponding bra state is then defined as Φ|= 0|Φ (0) We further perform the conformal transformation which maps the Cauchy slice |z| = 1 to the real axis with the interval I mapped to the negative real axis. 32 We define two reduced density matrices on I by tracing over its complement: The vacuum modular Hamiltonian is defined as K ≡ − log σ. In our conventions, K/(2π) generates counter-clockwise rotations in the ζ-plane. The excited state ρ is computed by a path integral over the ζ-plane with a cut along the negative real axis and with operator insertions Φ(e −πiλ ) and Φ † (e πiλ ). We rotate the boundary conditions above and below the cut to the positive real axis using σ 1/2 which gives the Rindler representation of the density matrix: Here the vacuum 2-point function · = Tr (σ ·) in the denominator ensures that Tr ρ = 1. 33 See [71] for an analogous representation of ρ in higher dimensions. As in [71], we expand ρ in the short interval limit λ → 0 using the OPE 34 32 See [2] for more details on this setup. 33 The expression (7.48) is Hermitian since the adjoint maps the operator insertions Φ † and Φ into each other. 34 Note that e πiλ − e −πiλ h O e −πiλ − e πiλ h O = (2πλ) ∆ for small λ.
where ∆ is the scaling dimension of the lightest primary O of the theory that couples to Φ (in the sense that the OPE coefficient C O ΦΦ † is non-zero), which we assume to be spinless and real for simplicity. Since two-point functions of real primaries are normalized to the Kronecker delta, we can lower the index in the OPE coefficient C O ΦΦ † = C OΦΦ † . Based on the OPE, we take the expansion parameter to be (πλ) ∆ so that We can now start constructing the acceptance subspace. Given an eigenbasis |E of σ ⊗n in H ⊗n A , the optimal classical measurement is determined by an acceptance condition of the form We first consider the case n = 1 of a single copy, for which we have |E = |E . From we obtain E + log E|ρ|E = (πλ) ∆ e E E|ρ (1) |E + . . . , (7.54) Next, using the above Rindler quantization, we see that where the states |E now live on the positive real axis in the complex ζ-plane. Rotating the expectation value E|O(1)|E to the negative real axis and mapping back to the w-cylinder, we get where O(E) ≡ E|O( /2)|E is the one-point function in the eigenstate |E of the operator O inserted at the midpoint of the interval I. Hence to determine the acceptance subspace, one has to compute these one-point functions first. This can be seen as a precomputation that can be done once and for all for each O that one wishes to use.
Let us now return to the case of n copies using the same notation as in section 4.2. We denote and eigenstate of σ ⊗n and we use |E| = 1 n n i=1 E i . The acceptance condition is and we have where we denote the average of the precomputed values O(E i ) . (7.60) In the short interval limit, relative entropy has the expansion 35 Although it might be subtle to properly define ρ D in a continuum CFT, we expect that S(ρ D σ) has a similar expansion since positivity and monotonicity implies that 0 ≤ S(ρ D σ) ≤ S(ρ σ).
Hence, in the short interval limit, the acceptance condition becomes This is a condition on the one-point functions of the lightest primary O which couples to Φ, inserted at the interval midpoint. The measurement that implements the likelihood ratio test is then the projection on the eigenstates of σ ⊗n satisfying this condition:

Discussion
In this paper we have reviewed some aspects of quantum hypothesis testing and studied a few applications in quantum many-body systems and two-dimensional conformal field theories. We have mostly focused on asymmetric testing, with a few comments about the symmetric counterpart.
We believe that we have only scratched the surface of this subject and would like to conclude by mentioning some possible avenues for future investigation. We have seen that the error estimates of different types of hypothesis testing involve different interesting quantum information theoretic quantities. One is therefore led to wonder which notions of distance on the space of states can arise in error estimates of different types of quantum hypothesis testing, and whether there is a more direct connection between properties of the distance measure and features of the type of test.
We have also observed that the (non-unique) optimal measurement which saturates the error bound in the large n limit tends to be rather difficult to implement in practice. For the case of asymmetric testing, the measurement we studied requires knowledge of the spectra of eigenstates of the modular Hamiltonians associated to subsystems, which is in general difficult if not impossible to obtain. An important question is therefore whether there are simpler testing protocols that one can develop which still do reasonably well in the large n limit. In this paper we have considered the likelihood ratio test as a possible alternative, but it would be interesting to explore this question in more detail. From a practical point of view, one ultimately would like to find the simplest possible protocol whose asymptotic error does not deviate too much from the optimal one.
An important assumption of quantum hypothesis testing is the ability to perform simultaneous (collective) measurements on n copies of the system, for arbitrarily large n. Clearly, this assumption is not realistic, and the finite n or finite blocklength case has been considered in [16,17]. One could imagine applying finite n measurements in cases where one has an evenly spaced collection of subsystems in a translation invariant state, where the distance between the subsystems is large enough for the subsystems to be approximately uncorrelated. But the situation that is most realistic is arguably to make a repeated series of single-shot measurements, i.e. one prepares the systems in a particular state, makes a measurement, and then repeats this procedure n times. It is not necessarily true that the best strategy in this case is to repeat the optimal n = 1 measurement n times, it is conceivable that a series of different measurement protocol yields a better outcome. Such adaptive measurement strategies in symmetric testing are known to attain the optimal error probability of collective strategies [73] and we leave the asymmetric case to future work. There are various closely related questions which deserve further study, such as distinguishing more than two states through POVM's [74], and contrasting these results with continuous parameter measurements and ideas from quantum metrology.
One important motivation for this work came from quantum gravity and holography. For example, in [75] a relationship was found between distinguishability measures and bulk reconstruction in entanglement wedges. One could imagine that the quantum hypothesis testing protocol whose errors are bounded by these measures plays an operational role in the actual reconstruction process and it would be interesting to explore this in more detail. Many other questions in quantum gravity center around the issue of whether or not different states can be distinguished by low energy observers, and if so, whether the necessary measurements are very complex or not. Translated into the language of quantum hypothesis testing, one would like to bound the error associated to restricted measurements (e.g. the measurements can only be made by low energy observers). In particular, can one bound the errors in hypothesis testing as a function of the maximal complexity of the measurements? This question involves the need to first develop rigorous definitions of complexity of a measurement. We briefly touched upon this in section 5.2 by considering the minimum dimension of the acceptance space as one resource associated with a measurement. More sophisticated definitions would take into account additional steps involved in the construction of the POVM, and the time and space associated with the algorithms or circuits executing the measurement. We hope to return to some of these questions in future work.
for asymmetric testing. In this appendix, we will discuss the optimal measurement for symmetric testing, where we try to distinguish between ρ ⊗n and σ ⊗n by minimizing the combined error where β n = Tr(σ ⊗n A) and α n = 1 − Tr(ρ ⊗n A). In section 2.1, we considered the case κ = 1 2 but the same result holds for any κ with 0 < κ < 1. Asymptotically, the optimal error is given in terms of the Chernoff distance The optimal measurement was obtained in [43] and is the projection on the positive part of This involves diagonalizing the operator L and projecting onto the subspace corresponding to positive eigenvalues. In general, it is difficult to describe explicitly this measurement. We consider simplified cases below.

A.1 Classical testing
We use the same notation as in section 4. We take {|E } to be the eigenstates of σ and for n copies of the system, the eigenstates of σ ⊗n can be written As in section 4.3, we can define the best classical measurement by the acceptance condition where we recall that |E| ≡ 1 n n i=1 E i . The measurement is the projector onto the subspace spanned by the states |E satisfying this condition. This is also a likelihood-ratio test but with a different threshold value.
When ρ and σ commute, the acceptance condition (A.5) is precisely the positivity of the operator L so this is actually the optimal measurement. When ρ and σ don't commute, we can define the diagonal part of ρ ρ D ≡ E E|ρ|E |E E| , (A.6) and the above measurement optimally distinguishes between ρ D and σ but doesn't make use of the off-diagonal components of ρ. This gives an error and the data-processing inequality for the Chernoff distance implies that so this measurement is suboptimal as expected. In conclusion, as in asymmetric hypothesis testing, the likelihood-ratio test (with a different threshold value) provides a simple measurement for symmetric testing which is the optimal classical measurement.

A.2 Perturbative testing
We now consider the perturbative setting where we have where ρ is in the i-th position and there are n tensor factors. Perturbatively, we have We see that perturbative testing is non-trivial only for κ = 1 2 . For κ > 1 2 , L is positive so that the measurement is the identity while for κ < 1 2 , L is negative so the measurement is zero. Focusing on the case κ = 1 2 , the measurement is a projection on the positive part of In the case where ρ (1) and σ commute, this reduces to the classical measurement described in the previous section.

B General properties of the relative entropy variance
The relative entropy variance is a less familiar concept than the relative entropy, and we survey here some of its properties. Introducing the modular Hamiltonians of ρ and σ, we consider the so-called relative modular Hamiltonian Then, the relative entropy and the relative entropy variance are its first and second cumulants, i.e. the expectation value and the variance, in the state ρ:

B.1 Relations to other quantities
We give here the relations between the relative entropy variance V (ρ σ) and other information quantities.
Rényi relative entropies. In the literature there are different generalizations of the relative entropy. Petz's defines [44] Rényi relative entropies as with D 1 (ρ σ) = S(ρ σ). On the other hand, the sandwiched Rényi entropy or the quantum Rényi divergence is defined in [76,77] as The relative entropy variance can be obtained from both versions of Rényi relative entropy [21,78], It is shown in [21] that the sandwiched Rényi entropy is the minimal quantity that satisfies the axioms expected from a relative Rényi entropy. In particular, we always have Refined Rényi relative entropies. In [11], a refined version of the Rényi relative entropies was defined as where D α (ρ σ) is the sandwiched Rényi entropy. In AdS/CFT, this quantity was shown to have a holographic dual when σ is the vacuum state reduced to a spherical subregion. It is analogous to the refined Rényi entropies defined in [79]. The relative entropy variance is obtained as (B.10) Higher cumulants. It's also possible to give an interpretation to the higher α derivatives of the Petz relative Rényi entropy D α (ρ σ) at α = 1. This is better done in the algebraic formulation given in section B.4. They correspond to cumulants of the operator − log ∆ Ψ|Φ , which are not equivalent to cumulants of ∆K. 36 Their first and second cumulants are the same and give the relative entropy and its variance, but the higher cumulants differ. Following [21], the higher α derivatives of D α (ρ σ) can also be interpreted as classical cumulants of the log-likelihood of the Nussbaum-Szkola probability distributions associated to ρ and σ. Note that the higher α derivatives of D α (ρ σ) differ from that of D α (ρ σ) because they are different functions of α.
Capacity of entanglement. For density matrices in a finite dimensional Hilbert space with dim H = N , it is simple to derive a relationship between the Rényi entropy and its relative generalization. Let σ max be the density matrix with uniform spectrum, i.e. proportional to the unit matrix, Then the Rényi relative entropy between an arbitrary state ρ and σ max reduces to is the Rényi entropy. The relative entropy, respectively, reduces to the von Neumann entropy by S(ρ σ max ) = log N − S(ρ) (B.14) and, the relative entropy variance reduces to the variance of the entropy, also known as the capacity of entanglement (see [50] and references therein), The capacity of entanglement vanishes for a pure state ρ ψ = |ψ ψ| and for the maximally mixed state σ max . It follows that the relative entropy variance vanishes between a pure state and a maximally mixed state We next give necessary and sufficient for the vanishing of the relative entropy variance.

B.2 Vanishing of the variance
The relative entropy variance V (ρ σ) is nonnegative. In this section, we consider the conditions for it to vanish, for finite-dimensional Hilbert space. When ρ is full-rank, the variance vanishes if and only if ρ = σ. More generally, the variance vanishes if and only if ρ and σ are proportional on the complement of ker ρ, where ker ρ is the subspace on which ρ vanishes. This is explained in [37] and follows from the saturation case of the Cauchy-Schwarz inequality. This implies that the relative entropy variance V (ρ σ) vanishes when ρ = |ψ ψ| is a pure state and σ has no matrix element between |ψ and any other state. For example, the relative entropy variance vanishes between the vacuum (the ground state) and any thermal state.

B.3 Violation of data processing inequality
The hypothesis testing relative entropy and the relative entropy are generalized divergences D(ρ σ), satisfying the data processing inequality where N is a quantum channel. The refinement of quantum Stein's lemma (2.14) gives an asymptotic expansion for the hypothesis testing relative entropy (2.17), involving the relative entropy and the relative entropy variance, so it is interesting to note that the latter alone does not satisfy the data processing inequality. Given a quantum channel N , there is no general inequality between V (ρ σ) and V (N (ρ) N (σ)). This can be seen in a simple two-qubit system with pure density matrices As a quantum channel, consider the partial trace over the second qubit. It produces the reduced density matrices We obtain for the relative entropy 37 in agreement with monotonicity that says that S(ρ A σ A ) ≤ S(ρ σ). For the relative entropy variance, we obtain This shows that the variance is not monotonous since we have

B.4 Algebraic formulation
We can also define the relative entropy variance for infinite-dimensional Hilbert space, in the context of algebraic quantum field theory (we refer to [80] for a review). This allows a rigorous definition of this quantity in the case of conformal field theory. Araki defined the relative entropy between two states Ψ and Φ S Ψ|Φ = − Ψ| log ∆ Ψ|Φ |Ψ , (B.23) in terms of the relative modular operator ∆ Ψ|Φ defined with respect to a subsystem for which Ψ is cyclic and separating. In the finite-dimensional case, ρ and σ are the reduced states of Ψ and Φ in that subsystem. We recover the usual definition of relative entropy, as can be seen from the formula Ψ|∆ 1−α Ψ|Φ |Ψ = Tr ρ α σ 1−α .

(B.24)
This also allows us to write the Petz relative Rényi entropy as which realizes it as a well-defined UV finite quantity in quantum field theory. In particular, taking two derivatives gives us an algebraic definition of the relative entropy variance which shows that the relative entropy variance is well-defined in quantum field theory. This formulation also gives an interpretation for the higher α derivatives of the Petz relative Rényi entropy at α = 1. The Petz relative Rényi entropy is the cumulant generating function of the operator Note that this operator is not equivalent to the operator ∆K defined in (B.2). In particular, the Petz relative Rényi entropy does not generate the cumulants of ∆K. It is however true that the first and second cumulants of K Ψ|Φ and ∆K agree ; they give the relative entropy and its variance. An algebraic version of the sandwiched relative Rényi entropy has been investigated in [81].

C Optimal measurement of a qubit
We discuss here the optimal measurement in the case of a qubit and give the derivations of the formulas of section 5.2. We focus on the case θ = π 4 which appears to be the simplest case when ρ and σ don't commute and we want to describe the optimal measurement. It is useful to write As a result, the optimal threshold value for ε = 1 2 gives We recall that |E and | E are binary strings where we used the fact that | 0 = |− and | 1 = |+ for θ = π 4 . It is useful to introduce the notation n ss (E, E), with s ∈ {0, 1} ands ∈ {−, +}, counting the number of pairs (a i ,ã i ) which are equal to (s,s). We then have Let's now compute the overlap of two states |ξ( E 1 ) and |ξ( E 2 ) . We can write where we introduced the notation We also denote ns 1s2 for the number of overlapping pairs (s 1 ,s 2 ) in ( E 1 , E 2 ) and n ss 1s2 for the number of overlapping pairs (s,s 1 ,s 2 ) in (E, E 1 , E 2 ). We have the relations n 0s 1s2 + n 1s 1s2 = ns 1s2 , (C.7) and we have n(E) = n − (n 0−− + n 0−+ + n 0+− + n 0++ ) . (C.8) Hence, the acceptance condition is We can rewrite the sum over E as a sum over the four integers n 0±± with the combinatorial factor counting the number of basis state |E for a given choice of n 0±± . We then have It is convenient to define P n,k = 1 2 n n 0−− ,n 0−+ ,n 0+− ,n 0++ n 0−− +n 0−+ +n 0+− +n 0++ =k It can be noted that P n,k are coefficients of the polynomial This follows from expanding each factor using the binomial theorem. Note that we can write where E 1 + E 2 denotes the boolean sum. This follows from the fact that n( E 1 + E 2 ) = n ++ + n −− . This second expression gives an alternative representation of the coefficients P n,k as Let us introduce binary Krawtchouk polynomials K k (X; n) which can be defined via the generating relation These are discrete orthogonal polynomials related to the binomial distribution which have many applications [82,83]. From the definition for P n,k in (C.15), we see that As a result, we can express the overlap as This relation might be useful since many combinatorial identities involving Krawtchouk polynomials are known [84,85].
Relation to the Terwilliger algebra. The Hamming cube H n = {0, 1} n is the set of binary strings of length n with Hamming distance as the metric. The Terwilliger algebra of the Hamming cube [62,64] is an algebraic structure which is useful in combinatorics and coding theory (see [63] and references therein). We proceed as in [63], and identify the binary strings a 1 a 2 · · · a n with their support, the subset X of labels i for which the bit a i in the string takes value 1. There are 2 n possible such subsets, in other words every X is an element of the power set P (H n ) of the Hamming cube. We then define a P (H n ) × P (H n ) matrix M t ij whose coefficients are where we are using |X| to denote the number of elements in X (the number of 1s, the Hamming weight of the binary string). The Terwilliger algebra is defined as the set of matrices of the form n i,j,t=0 x t ij M t ij , x t ij ∈ C , (C. 21) which is closed under matrix multiplication. To the state |ξ( E) , we can associate the element X ∈ P (H n ) by writing E as a binary string and identifying it with its support X. Then we have |X| = n( E). The Gram matrix of the set of vectors {|ξ( E) } can be represented by an P (H n ) × P (H n ) matrix G such that where X 1 and X 2 are the elements of P (H n ) associated to E 1 and E 2 . Let's denote We have n * ( E 1 , E 2 ) = max(i, j), n( E 1 + E 2 ) = i + j − 2t .
(C. 24) so that the Gram matrix element is G X 1 X 2 = 1 2 n n k=n−max(i,j) (−1) k K k (i + j − 2t; n) . From this observation, we could attempt to use the techniques of [63] to diagonalize the matrix G, and construct the optimal measurement.

D Overlaps in fermion chains
The purpose of this Appendix is to review the tools used in the computation of overlaps in section 6.2.1. We review Bogoliubov transformations, generalized Wick's theorem and the computation of correlators that contain insertions of Bogoliubov transformations. Then we show how the results lead to the overlaps presented in the main text.

D.2 Generalized Wick's theorem as a limit of generalized Gaudin's theorem
Let σ be a density operator that satisfies for some matrix M . Operators of the exponential type (such as reduced density matrices of subregions of spinless fermion chains) σ = 1 Z exp 1 2 α Sα , Z = Tr σ, (D. 10) belong to this family with M given by [38,39] M = e −ΩS A (D. 11) where S A is the antisymmetric part of S. However, not all σ that satisfy (D.9) can be written as exponentials (D.10). Let T be the operator that implements a real Bogoliubov transformation T on the Hilbert space: T αT −1 = T α (D.12) Since T is real, this equation implies that T −1 = T † is unitary. In addition, we do not assume that T can be written as an exponential of one-body operators. The generalized Gaudin's theorem states that [39] α µ 1 · · · α µn T α ν 1 · · · α νn σ T σ = pairings (−1) P pairs (contraction of a pair). (D. 13) There are three different types of contractions that can appear on the right hand side: and they are categorized based on the location of the pairs. Equation (D.13) generalizes Gaudin's theorem [67] by including insertions of T i in the expectation value. 38 Generalized Wick's theorem is analogous to equation (D.13), but with the expectation values in the quasi-particle vacuum state |E vac which is a pure state. It is obtained as a limit of (D.20) by sending σ to |E vac E vac |. For this, we take σ to be of the exponential type (D.10) with (this would correspond to a free fermion Hamiltonian) The generalized Wick's theorem is then E vac |α µ 1 · · · α µn T α ν 1 · · · α νn |E vac E vac |T |E vac = pairings (−1) P pairs (contraction of a pair) (D. 20) and the three types of contractions appearing on the right hand side are the lim {s i }→∞ G (1,2,3) µν . We will next compute the contractions.

D.3 Computation of contractions
We start with the simple 2-point function α µ α ν σ = Tr (σα µ α ν ) in a mixed state σ that obeys the relation (D.9). Using the canonical anticommutation relations and (D.9), we can write α µ α ν σ = Ω µν Tr σ − α ν α µ σ = Ω µν Tr σ − where we used cyclicity of the trace. Using we get (D.30) The quasi-particle vacuum expectation values are obtained by focusing on exponential σ with M = e −ΩS and taking the limit {s i } → ∞: We focus our attention to the following 2-point functions that appear in the computation of the overlaps: The other limits were not given in [39], but we can compute them using the identity The normalization factor is computed in [38,39]: with the normalization given in (D.39). This leads to the formula (6.58) presented in the main text.

E Optimal measurement for the free boson
In this appendix, we consider the free boson CFT and attempt to describe the optimal subsystem measurement that distinguishes between two thermal states, using the setup of section 7.2.
Let φ(z) be a free boson and define j(z) = ∂φ(z). We have the modes α n = 1 2πi 0 du u n j(u) = 1 2πi C + du u n j(u) − 1 2πi C + dūū n j(ū) . Using the above formula, we can check that [α n , α m ] = nδ m+n as expected. We now consider the state ρ with α n = 1 2πi I du i n e inπ f (z)/ W j(z) + h.c. .

(E.4)
To obtain the overlaps between the eigenstates of ρ and that of σ, we need to compute the commutator [α n , α −m ]. After some manipulations, we find [α n , α m ] = i n+m n 2W I dz f (z)e iπ(nf (z)/W +m f (z)/ W ) + h.c. ≡ A nm , (E.5) which appear difficult to compute explicitly. A basis of normalized eigenstates for K is labeled by k = (k 1 , k 2 , . . . ) with where the normalization is N k = i≥1 i k i k i ! and we have Similarly, for K, we have k = (k 1 ,k 2 , . . . ) and The overlap ∆ k | ∆ k is non-zero only if N = i k i = ik i . Is is given as where M k k is the N × N matrix constructed by starting with the matrix A ij and replacing each entry (i, j) by a k i ×k j block where all the elements are equal to A ij . Here, perm denotes the permanent which is similar to the determinant, but with only plus signs in the sum over permutations.
We will now attempt to describe the optimal measurement for the free boson, where we have two global thermal states as described in section 7.2. To compute the overlaps, it is convenient to change variable to w = f (z) so that where F (w) = f (f −1 (w)). Unfortunately, this quantity is hard to compute analytically. It can be probed in the small L expansion. At first order, we get As a result, we see that |∆ k and | ∆ k can have a non-zero overlap at first order only if they differ in less than one place. We can write k = k 0 + δ a , k = k 0 + δ b , a + b odd, a = b (E. 12) where δ i means a one in position i. We compute . (E. 13) We have N k N k = N 2 k 0 ak a bk b so we get for a + b odd . (E.14) Following section 7.1.2, we can also define perturbatively the states |ξ( ∆ k ) which span the acceptance subspace H Q . Although it's possible to write explicit perturbative expressions, this is not enough. Indeed, to understand this subspace and define the measurement, we would need them to do a Gram-Schmidt procedure to orthonormalize these vectors. To do this, we will have to go beyond the perturbation theory in L and we don't expect to be able to obtain analytical results using this approach. In conclusion, the optimal measurement seems to be difficult to describe explicitly, even in simple examples. An alternative is to use the likelihood ratio test following section 7.1.3, which will be more tractable to implement here, because it requires only the knowledge of the overlaps.