Probabilistically Violating the First Law of Thermodynamics in a Quantum Heat Engine

Fluctuations of thermodynamic observables, such as heat and work, contain relevant information on the underlying physical process. These fluctuations are however not taken into account in the traditional laws of thermodynamics. While the second law is extended to fluctuating systems by the celebrated fluctuation theorems, the first law is generally believed to hold even in the presence of fluctuations. Here we show that in the presence of quantum fluctuations, also the first law of thermodynamics may break down. This happens because quantum mechanics imposes constraints on the knowledge of heat and work. To illustrate our results, we provide a detailed case-study of work and heat fluctuations in a quantum heat engine based on a circuit QED architecture. We find probabilistic violations of the first law and show that they are closely connected to quantum signatures related to negative quasi-probabilities. Our results imply that in the presence of quantum fluctuations, the first law of thermodynamics may not be applicable to individual experimental runs.


Introduction
The laws of thermodynamics are cornerstones of modern science, providing fundamental constraints on physically allowed processes [1]. The first law of thermodynamics states that a change in energy can be divided into heat and work which is a statement of the fundamental law of energy conservation (the signs being chosen such that heat consumed and work produced by a heat engine are positive). The second law of thermodynamics states that entropy never decreases [1,2] 〈∆S〉 ≥ 0.
In addition, the zeroth law of thermodynamics provides a definition for the concept of temperature, and the third law provides constraints on the behavior at zero temperature. The theory of thermodynamics as originally developed applies to macroscopic systems, where fluctuations around mean values are irrelevant. For small systems, where fluctuations can no longer be neglected, stochastic thermodynamics [3,4] provides an extension to the traditional theory. In the presence of fluctuations, entropy production becomes a stochastic variable described by a probability distribution P(∆S) [5]. Importantly, processes with negative entropy production may be observed, probabilistically violating the second law of thermodynamics. On the level of these probabilities, a generalized formulation of the second law is provided by fluctuation theorems [3,[6][7][8][9]. As a consequence, Eq. (2) still holds, but the left-hand side now denotes the average entropy change. Similar to entropy, in small systems also work and heat fluctuate. In classical systems, energy conservation enforces the first law for every process, that is, there is no probabilistic violation of the first law of thermodynamics.
In quantum systems the situation is less clear. Recently, a considerable effort went into including quantum effects, such as the superposition principle and entanglement, into the theory of thermodynamics [10]. Despite considerable progress, defining the basic thermodynamic observable work as a fluctuating quantity still presents a topic of debate [11]. At the heart of this debate stands the active role that the observer takes in quantum theory: while classical fluctuating systems can be described by well-defined trajectories through phase space, the superposition principle prevents such an observer-independent picture. Indeed, it has been shown that any definition of fluctuating work cannot simultaneously fulfill a number of desirable properties that are taken for granted in the classical regime [12]. While work fluctuations are in general affected by an observer, this is not necessarily the case for heat fluctuations. In particular, for weak coupling between system and reservoir, heat is mediated by energy quanta that are exchanged with the reservoir. The number of exchanged quanta is well-defined and independent of any observer. In this case, one may define heat as a fluctuating quantity in complete analogy to classical stochastic systems [13][14][15][16].
This illustrates that in a given system, heat and work may behave in a qualitatively different way [17]. Can the first law of thermodynamics then still be expected to hold beyond average values? In this work we show that in quantum systems, the first law of thermodynamics can be violated probabilistically. Such violations may occur when heat and work are independently accessed in a situation where quantum superposition prevents an observer-independent definition for these quantities. Importantly, probabilistic violations of the first law do not imply a violation of energy conservation. Rather, they reflect an uncertainty on our knowledge of energy changes, once they are split into heat and work.
To illustrate these effects, we provide a detailed case study of heat and work fluctuations in the heat engine proposed in Ref. [18], sketched in Fig. 1. There, work may be defined through the time-integral of power. We find probabilistic violations of the first law which are a consequence of the non-commutativity of the power operator with the Hamiltonian. The engine we consider is particularly suitable for our purposes because of multiple reasons: As a thermoelectric device, work fluctuations are directly linked to current fluctuations, a topic that is discussed extensively in the literature [19]. Furthermore, heat and work are carried by photons and electrons respectively, providing a natural separation of these quantities and their fluctuations. Finally, the same architecture was recently used to produce entangled photon beams [20], illustrating the experimental feasibility of the heat engine. We note that the variances of heat and work were investigated in a very similar architecture [21]. As we show below, the probabilistic violations of the first law are only manifested in higher cumulants.
The rest of this article is structured as follows. In Sec. 2, we consider a general setting, introduce definitions for heat and work fluctuations, and discuss the occurence of probabilistic violations of the first law. In the subsequent sections, we turn to a detailed case study to illustrate these violations. In Sec. 3, we introduce the heat engine that is the subject of our case study. Using a local master equation, the laws of thermodynamics follow from a consistent treatment as discussed in Sec. 4. Section 5 illustrates how the fluctuations of heat and work are computed. We present our quantitative results in Sec. 6. Conclusions and an outlook are provided in Sec. 7.

The Hamiltonian
To set the stage, we first consider a general scenario before providing a detailed case study below. To this end, we consider the Hamiltonian whereĤ(t) denotes the Hamiltonian of the system of interest,Ĥ B the Hamiltonian of the environment, andV the coupling between system and environment. We will be interested in the work produced by the system, the heat provided by the environment, as well as the change in internal energy during the time-interval [0, τ]. To define these quantities and their fluctuations, we consider a way of measuring them. Throughout this manuscript, we consider the weak system-bath coupling regime, where the contribution ofV to energy changes may be neglected.

Measuring internal energy changes
To measure the changes in internal energy, we consider the two-point measurement scheme [22]. In this scheme, the energy of the system is determined by a projective measurement both at t = 0 as well as t = τ. The internal energy change is then determined by where E(t) denotes the outcome of the projective energy measurement at time t. While often resulting in appealing results (e.g., validity of the Crooks theorem [23]), the two-point measurement has been criticized because the initial energy measurement destroys any initial coherence in the energy basis and is thus not compatible with processes that rely on such initial coherences. Below, we will be interested in a scenario where such initial coherences are unimportant.

Measuring heat
Measuring heat and its fluctuations is experimentally very challenging. A promising route towards measuring heat fluctuations is provided by detecting single energy quanta that are exchanged with the system, which can potentially be achieved by monitoring temperature [24][25][26]. Very recently, temperature fluctuations, which can be connected to the second cumulant of heat fluctuations, have been observed experimentally [27]. To circumvent the limitations and challenges that come with any specific experimental setup, we consider here again the two-point measurement scheme.
To determine the heat exchanged with the environment, its energy is determined by a projective measurement at the beginning and at the end of the time interval [0, τ] [with outcomes denoted by E B (t)]. The heat provided by the reservoir is then simply given by note the sign difference with respect to Eq. (4).
Below, we are interested in the weak coupling regime and the long-time limit, where any effect due to initial coherence in the system as well as system-bath correlations can be neglected. To ensure the same in the general scenario, we consider an initial statê with [ρ 0 ,Ĥ(0)] = 0 as well as [ρ B ,Ĥ B ] = 0. In this case, both Q and ∆U can be measured precisely without disturbing the dynamics of the system. As a consequence, heat and internal energy changes take on a well-defined, observer-independent value in each experimental run. We note that a projective energy measurement of the environment is usually a hopeless task as the environment often comprises a large amount of degrees of freedom. The heat measurement introduced here should thus be understood as a way to define heat, not to measure it in practice. However, as heat can in principle be measured without disturbance (in the weak coupling regime), any experimental measurement that avoids disturbance should give the same results within the experimental precision.

Measuring work
To introduce a measurement of work, we consider the quantity of interest being the power output described by the operatorP = −∂ tĤ (t), as is the case in thermoelectric devices as the one discussed below. Detailed information on why a two-point measurement scheme for work is not appropriate is given below, in Sec. 2.7. Following Refs. [28][29][30], we model the measurement by a detector, described by the conjugate variablesr andπ, which is coupled linearly to the power operator during the time interval [0, τ] by the coupling Hamiltonian where s denotes the coupling strength. At t = τ, the observabler is measured projectively, yielding information about the integrated power and thus the performed work. In practice, the detector could be provided by an LC element [31]. For simplicity, we consider a detector that has no internal dynamics (i.e., no additional term to the Hamiltonian). This measurement scheme results in the measured distribution [29,32], see App. A where the subscript m stands for measurement, W denotes the initial Wigner function of the detector, and P γ (W ) is a quasi-probability distribution (QPD) that is independent of the detector parameters. This QPD may take on negative values which have been shown to arise from quantum interference effects [33]. he quantity γ can be understood as the momentum of the detector (re-scaled by s), which acts back on the system through the coupling Hamiltonian given in Eq. (7). The expression for the measured distribution in Eq. (8) can be interpreted as the intrinsic fluctuations of the system, encoded in the QPD, being modified by the detector [34]. This modification presents a trade-off between measurement imprecision and back-action [30]. For a weak measurement (small s), γ is restricted to small values, implying small back-action. At the same time, the integral over W describes a convolution with a broad distribution, implying large imprecision. A strong measurement (large s) yields small imprecision, as W − W is restricted to small values. At the same time, a large range for γ contributes to the integral implying large back-action. It is interesting to note that while P γ (W ) may become negative, back-action and imprecision conspire in a way that always ensures the measured distribution P m (W ) to remain non-negative. It is instructive to consider an ideal but unphysical detector that has no imprecision and exerts no backaction. Such a detector would be described by a single point in phase space, i.e., W(x, p) = δ(x)δ(p) (as obtained in the classical limit of the ground state of a harmonic oscillator [35]). In that case, the measured distribution reduces to which can be interpreted as describing the work fluctuations in the absence of a measurement. This is the distribution we will use for characterizing work fluctuations. Even though P(W ) cannot describe a measurement by itself (as it may become negative), it can nevertheless be accessed experimentally by considering a weak measurement, where s → 0. In this case, the integrand in Eq. (8) is only finite for vanishingly small γ and we may replace P γ (W ) by P(W ). The measured distribution then simplifies to where we assumed an initial Gaussian distribution for the detector positionr with variance sσ. Clearly, if σ is known, the distribution P(W ) can be recovered from the measurement. This becomes particularly apparent by considering the cumulants (i.e., the mean, variance, skewness, etc.). In the weak measurement limit, the cumulants of the measured distribution are related to the cumulants of P(W ) by the simple relation The only effect the measurement has is an increase of the variance by σ, which describes the measurement noise. We note that the QPD described here has been used extensively to characterize electronic transport in phase-coherent systems [19,36,37].
In the presence of initial coherences, where our definition for internal energy fluctuations becomes problematic, one may use a similar approach as for work fluctuations [32,38,39]. Fluctuations in ∆U are then described by a QPD which reduces to the distribution obtained from a two-point measurement scheme in the absence of initial coherences.

The classical regime
To understand probabilistic violations of the first law, it is instructive to first consider a scenario where they are absent, which we here denote by the classical regime. We will be interested in the distribution that describes a joint measurement of heat, work, and internal energy changes, P m (Q, W, ∆U). Just as for the work measurement, this distribution can be written as a convolution between a QPD and the Wigner function of the detector used for measuring work (see App. A for details). In the classical regime, we make the assumptions for all times t and t . The first assumption ensures that no coherences in the energy-basis are created, such that the system can always be described by the populations of states with welldefined energy. The second assumption ensures that the energy stored in the coupling does not influence the thermodynamics. Alternatively, one could consider a weak coupling approximation, where the thermodynamics to lowest order in the coupling strength is not affected by the energy stored in the coupling. Under these approximations, we find that the work measurement does not exert any backaction, such that the QPD is independent of γ. The relation between the measured distribution and the QPD then simplifies to which holds for all measurement strengths. We furthermore find that the QPD is non-negative and obeys where P(Q, ∆U) denotes the distribution obtained from measuring heat and internal energy changes as outlined above, in the absence of any detector for work. Equation (14) implies that in the classical regime, the first law is not only valid on average but holds on the level of the QPD. For the cumulants, this equation implies The measured distribution however may not obey an equation analogous to Eq. (14). This is simply a consequence of the fact that the work measurement may be imprecise. Indeed, we find that the measured distribution is proportional to δ(W − Q + ∆U) only when σ → 0, i.e., in the strong measurement limit where the measured distribution reduces to the QPD (in the classical regime). Fortunately, in case a strong measurement is not available, one may recover the same cumulants from a weak measurement, where we find the relation If the detector parameter σ (characterizing measurement noise) is known, one may thus determine the cumulants of the QPD from a weak measurement and verify the validity of the first law using Eq. (15).

Probabilistic violations of the first law
In this work, we are interested in scenarios where the first law does not hold on the level of the distribution describing the joint work, heat, and internal energy fluctuations. Due to measurement imprecision and backaction, the measured distribution will in general not respect the first law: Finite imprecision introduces randomness in the measurement outcomes while backaction may introduce energy exchanges between the system and the detector which are not being detected. This implies that we should expect separate measurements of heat, work, and internal energy to result in statistics that do not respect the first law, except in the rare occasions when these quantities can be measured precisely without disturbing the system. To circumvent these issues, we consider the QPD as the relevant distribution. As mentioned before, this distribution describes the system in the absence of a measurement but can still be recovered by a weak, non-invasive measurement. Furthermore, it obeys the first law in the classical regime. We thus define probabilistic violations of the first law to be present whenever In terms of the cumulants, this implies that there exist values for j, k, l, such that Such violations can be expected if any of the assumptions in the last subsection are lifted. In particular, if [Ĥ(t),Ĥ(t )] = 0, the fundamental backaction vs imprecision tradeoff cannot be circumvented and the first law is no longer guaranteed on the level of the QPD. Below, we provide a detailed case study of a heat engine where such probabilistic first law violations are present. Alternatively, such violations may occur when the coupling energy between system and reservoir cannot be neglected. In this case, our definitions for heat and internal energy are no longer justified and a strong coupling analysis should be employed [40][41][42].
An obvious but crucial requirement for observing first-law violations is that heat, work, and internal energy are determined by separate measurements. Otherwise, one may only measure heat and internal energy and simply determine work by assuming the validity of the first law [43][44][45].
As there is an ongoing debate on how to define work as a fluctuating quantity [11,22,46,47], the choice for the work measurement takes on a crucial role. While we consider a particular definition for work fluctuations, we believe similar conclusions to hold for other approaches where work, heat, and internal energy are determined by separate measurements. However, when these quantities are defined as conditional averages, conditioned on the same measurements [48][49][50][51], the first law holds on the level of probabilities by construction.

Why no two-point measurement scheme for work?
As we are employing the two-point measurement scheme both for heat as well as for internal energy, one may ask why it should not be applied for measuring work as well. To address this question, we note that the two-point measurement scheme for work can be applied in two different ways. The first way is provided by projectively measuringĤ tot (t) given in Eq. (3) at the beginning and at the end of the time-interval [0, τ]. In the weak coupling regime, where the contribution of the couplingV to the energy changes is negligible, this is equivalent to simultaneously performing a two-point measurement scheme on the system and on the bath. This measurement of work is thus equivalent to measuring heat and internal energy simultaneously, implicitly relying on the first law. The first way of implementing a two-point measurement scheme for work does therefore not constitute a separate measurement of work. When considering a heat engine, a direct way of measuring its work output is arguably necessary in order to call the output of the engine useful.
The second way of implementing the two-point measurement scheme is to resort to a timeindependent description, by explicitly considering the degrees of freedom that render the Hamiltonian in Eq. (3) time-dependent. These degrees of freedom provide a work-storage device. In the scenario considered below, this is particularly illuminating as the work-storage device is provided by a Josephson junction, where work is carried by electrons that move against a voltage bias, just like when a battery is being charged. One may then envision a two-point measurement scheme applied on the work-storage device. However, to result in a time-dependent Hamiltonian, the work-storage device usually contains a considerable amount of coherence in its initial state. The first projective measurement in the two-point measurement scheme would destroy this coherence, considerably influencing the dynamics of the system and rendering the description in terms of a time-dependent Hamiltonian invalid. For a work-storage device provided by a Josephson junction, this is discussed in more detail at the end of Sec. 5.2.

The Heat Engine
To investigate the joint fluctuations of heat and work, as well as the occurence of probabilistic violations of the first law, we consider the heat engine proposed in Ref. [18], see Fig. 1. The engine consists of a voltage biased superconducting circuit that defines two single-mode microwave resonators with frequencies Ω h and Ω c . Each resonator is in radiative contact with a thermal reservoir kept at temperatures T h ≥ T c respectively. A Josephson junction in series mediates a coupling between the resonators by the exchange of photons with tunneling Cooper pairs [52,53]. Choosing the voltage bias such that (we set ħ h = 1) allows Cooper pairs to tunnel against the voltage bias by absorbing photons from the resonator with frequency Ω h (henceforth denoted as the hot resonator) and emitting photons to the resonator with frequency Ω c (the cold resonator). When operating the system as a thermoelectric heat engine, the temperature gradient drives a net flow of photons from the hot to the cold reservoir. This heat flow drives Cooper pairs, i.e., a supercurrent, against the voltage bias, hence producing electrical work. As the supercurrent is dissipationless, it carries no entropy and all the heat is carried by the photons, while the work is provided by the Cooper pairs. This separation of heat and work is very useful when defining these quantities as fluctuating variables. It is convenient to introduce a simplified pictorial representation of the heat-to-work conversion process (see the inset in Fig. 1): First, a photon enters the hot resonator from the hot reservoir. This photon is then converted into a photon in the cold resonator by a Cooper pair tunneling against the voltage bias. Finally, the photon leaves the system into the cold reservoir. In this process the electrical work performed by the Cooper pair is 2eV , the heat provided by the hot reservoir is Ω h , and the heat emitted into the cold reservoir is Ω c . It is tempting to use this representation to describe the full statistics of heat and work. However, we find it to be an oversimplified picture: While it describes the behavior of mean values well, it fails to capture the behavior of heat and work fluctuations due to the coherent nature of the heat-to-work conversion process.
For a quantitative description, we model the system by the Hamiltonian (for details on the derivation and the involved approximations, see Ref. [18]) where the time-dependence arises from the dc Josephson effect, which turns a time-independent voltage into a time-dependent phase, see [54] for details. We note that a Hamiltonian of this form has been introduced for a heat engine by Kosloff in 1984 [55]. The couplings of the hot and cold resonators to the reservoirs are characterized by κ h and κ c , respectively. Considering throughout the paper the regime the dynamics of the system can be described by a local Master equation (without any constraint on the ratio where with D[Â]ρ =ÂρÂ † − {Â †Â ,ρ}/2, and the Bose-Einstein distribution The first (second) term on the right hand side in Eq. (23) corresponds to the process where one photon is emitted to (absorbed from) the thermal reservoir.

The laws of Thermodynamics
Local master equations have been criticized for not being thermodynamically consistent, which may result in violations of the second law of thermodynamics [57][58][59]. These violations have been shown to be small (i.e., of the order of terms that are neglected in deriving a local master equation) [60,61]. Here we show how the smallness of g/Ω α , a requirement for the validity of Eq. (22), can be exploited to obtain thermodynamic consistency. To this end, we drop the coupling term in Eq. (20) in the thermodynamic bookkeeping, introducing a thermodynamic Hamiltonian This Hamiltonian will be used to define the internal energy of the system Importantly,Ĥ TD is only used for the thermodynamic bookkeeping, the dynamics is still described by Eq. (22) with the full Hamiltonian given in Eq. (20), including a finite value for g. We note that this is consistent with a microscopic derivation of the local master equation, which implies that the energy exchanged with the reservoirs cannot be resolved on the scale of g and κ α [56].
We now demonstrate that this approach results in thermodynamic consistency. We note that our treatment here is a specific implementation of a generic approach, presented in Ref. [62]. For a different approach to treating the term dropped in Eq. (25), see Ref. [63].

The zeroth law
The zeroth law of thermodynamics states that if two bodies are in equilibrium with a third, they are also in equilibrium with each other. Here, the three bodies are the two reservoirs and the system. Equilibrium is obtained by setting the temperatures equal T = T c = T h and removing any voltage bias V = 0. As our model is only justified when Eq. (19) holds, it may only describe equilibrium when Ω h = Ω c . For these equilibrium conditions, the steady-state solution of Eq. (22) is found to be the Gibbs state at temperature T with respect toĤ TD where β = 1/k B T . We thus find that the zeroth law holds, and that thermal equilibrium is characterized byĤ TD . We note that this is indeed the expected form of the equilibrium state within our approximationsρ whereχ β denotes the Gibbs state with respect to the total Hamiltonian, including the reservoirs, and Tr B denotes the partial trace over the reservoir degrees of freedom [64,65].

The first law
To formulate the first law of thermodynamics, we require definitions for heat and work. In thermoelectric devices work is usually accessed through the electrical current, which is related to the power operatorP is the particle-current operator for tunneling Cooper pairs. We note that we are only interested in the time-averaged dc-current, and do not consider any ac-current arising from the Josephson effect. The average power is the time derivative of the average work and reads where the dot denotes a time-derivative. Next, we define the average heat current from reservoir α to the system as We note that we used the thermodynamic Hamiltonian to define heat, in consistency with the discussion above. We return to the consistency of these definitions with Sec. 2 below. Using Eq. (26), it is then straightforward to show that the first law of thermodynamics holds

The second law
The second law of thermodynamics states that the entropy production cannot be negative. Using Eq. (32) for the average heat flow, we finḋ where we used Spohn's inequality [66] which relies on L αρβ α = 0 [whereρ β α =ρ β α (Ĥ TD ), cf. Eq. (27)]. Once again, the use ofĤ TD is crucial for obtaining a consistent thermodynamic bookkeeping.
We have thus shown that our approach is thermodynamically consistent in the sense that it yields average values for heat, work, and internal energy changes that satisfy the laws of thermodynamics. We stress that these laws make statements about mean values and may not simply be taken over to fluctuating quantities.
We note that the weak system-bath coupling regime has been noted to violate the third law of thermodynamics for time-dependent Hamiltonians [67]. In the limit we consider (where g Ω α ), this effect is expected to be negligible.

Fluctuations: full counting statistics
We now expand our analysis from average quantities to the full statistics of heat and work. To this end, we consider Q α as the heat exchanged with reservoir α and W as the work produced in the time interval [0, τ]. We do not consider changes in internal energy, as these become negligible in the long-time limit considered below (see App. C). Interestingly, the heat and work distributions are fully determined by the statistics of transferred photons and Cooper pairs respectively, i.e., the full counting statistics of these particles.

Heat fluctuations -counting photons
We first focus on the statistics of heat fluctuations. As mentioned above, the heat flow is exclusively carried by photons exchanged with the environment. To lowest order in g/Ω α and κ α /Ω α , every photon exchanged with reservoir α carries an equal amount of energy, Ω α . Thus, the heat exchanged with reservoir α is fully determined by the number of photons exchanged with the reservoir, denoted by q α , and we can identify Q α = q α Ω α such that where q = (q h , q c ). We stress that q α denotes the net number of photons exchanged with reservoir α during the time interval [0, τ]. The sign is chosen such that a positive q α denotes photons entering the system. The distribution P(q), known as the full counting statistics of the photons, can be obtained by introducing the photon counting fields χ = (χ h , χ c ) in the master equation [68][69][70] with the superoperators The quantityρ(χ) is directly related to the cumulant generating function which in turn defines the heat distribution Note that the 2π periodicity ofρ(χ) reflects the fact that its Fourier transform, P(q), is a discrete distribution, taking only finite values for integer values of q α . It is straightforward to show that the average values obtained from this distribution agree with Eq. 32. Furthermore, under Born-Markov approximations, the heat statistics obtained from the full counting statistics can be shown to be the same as the statistics obtained from a two-point measurement scheme applied to each reservoir [14][15][16].
From the cumulant generating function, the cumulants of the distribution P(q) may be derived, with the first, second, and third cumulant providing the mean, variance, and skewness of the distribution. The k-th cumulant is obtained by In the long-time limit, S(χ), and thus all cumulants, become linear in time, see App. C. In this limit, any entropy change in the system becomes negligible and the second law given in Eq. (34) puts a restriction on the average values It is well known that this restriction only holds for the mean values and does not carry over to fluctuating quantities, i.e., where θ (x) denotes the Heaviside theta function that is equal to one for x ≥ 0 and zero otherwise. Indeed, for fluctuating systems, the second law is generalized by the fluctuation theorem [8], which in the long-time limit reads (the validity of this equality for our system is shown below) Using Jensen's inequality, the second law, as stated in Eq. (41), may be recovered from the fluctuation theorem. Finally, we note that in the long-time limit, we find (see App. C) Because there is no photon accumulation (or generation) within the system, the statistics of heat is fully determined by the photons which traverse the system and it is sufficient to consider a single counting field for heat.

Work fluctuations -counting electrons
In our system work is exclusively performed by the supercurrent carried by Cooper pairs tunneling across the Josephson junction. We denote the net number of Cooper pairs that tunneled against the voltage bias in the time-interval [0, τ] by w. The work provided in this time-interval may then be written as W = 2eV w implying where P(W ) denotes the QPD discussed in Sec. 2.4. For phase-coherent systems, the full counting statistics is obtained by introducing a counting field λ in the master equation as [29,36] (see Ref. [32] for the connection to the measurement scheme outlined in Sec. 2.4) where the current operatorÎ is defined in Eq. (30). In analogy to Eqs. (38) and (39), we introduce a cumulant generating function which defines the work distribution It is again straightforward to show that this distribution provides the average value given in Eq. (31). The cumulants of P(w) can then be obtained in analogy to Eq. (40) In contrast to the photon distribution, P(w) is a continuous function. This is related to the fact that the Cooper pairs cannot be counted one by one without disturbing the dynamics. To appreciate this, one may consider a two-point measurement scheme for counting Cooper pairs, in analogy to the discussion on measuring heat in Sec. 2.3. In this approach, the total number of electrons on one side of the Josephson junction is measured at the initial and final times. Here we consider a Josephson junction that has a well defined phase [71,72]. As the phase is conjugate to the particle number, a measurement of the number of electrons disturbs the phase and would strongly affect the dynamics of the heat engine. We note that a different approach for counting electrons has been employed in a similar system [73]. As shown in Sec. 2.4, the approach taken here is motivated by its connection to an explicit measurement of power.

Joint heat and work fluctuations
We now turn to the distribution P(q, w), providing a joint description of heat and work fluctuations. Because heat is carried only by photons and work only by Cooper pairs, this distribution may be calculated by including counting fields for both photons and Cooper pairs where we dropped the counting field dependence ofρ for ease of notation. The cumulant generating function is then introduced in the usual way Setting λ = 0 (χ = 0), we recover the cumulant generating function for photons (Cooper pairs) respectively. The joint distribution can then be written as and the cumulants may be obtained by Throughout, we will be interested in the long-time limit, where the cumulants grow linearly in time and where q h = −q c . The fluctuations in the internal energy may then safely be neglected (cf. App. C) and P(q, w) provides a complete description of the fluctuating energy flows through the system, where q = q h denotes the number of photons that went from the hot to the cold reservoir. In this case, the first law constrains the mean values to satisfy where It is tempting to take over this statement of energy conservation to fluctuating quantities, assuming that the work fluctuations can be completely described by the heat fluctuations. However, this is not generally possible in quantum systems and we find probabilistic violations of the first law, i.e., This condition may also be expressed in terms of the cumulants We stress that the internal energy fluctuations cannot restore the first law because the corresponding cumulants scale differently with time, see App. C. Investigating a particular heat engine allows for a better understanding of the origin of these violations. In the present system, work is probed via the power (or electrical current) operator, as is usually the case in thermoelectric devices. Because the power operator does not commute with the Hamiltonian [Ĥ,P] = 0, the Heisenberg uncertainty principle implies that any information acquired on the power restricts our knowledge on energy. The probabilistic violations of the first law reflects this fundamental limitation in the joint knowledge of work and energy and should not be seen as a violation of energy conservation. As no such limitation applies for classical systems, probabilistic first law violations are a purely quantum phenomenon. In contrast to fluctuations, mean values can quite generally be obtained without disturbing the dynamics, ensuring that the first law is still respected on average.

Results
In order to explicitly calculate the heat and work distributions, we follow the work of Clerk and Utami [74] and cast the master equation including counting fields [cf. Eq. (50)] into an equation of motion for the Wigner function. A Gaussian ansatz then allows for reducing an infinite amount of differential equations (one for each entry of the density matrix) to four coupled, non-linear differential equations. This procedure is outlined in App. D. Here we focus on the long-time limit, where any transient behavior due to the initial state is negligible. The probabilistic first-law violations that we find can thus not directly be linked to the energy-time uncertainty relation.

Distributions of heat and work
From Eq. (50), we can derive a cumulant generating function for the joint fluctuations of heat and work in the long-time limit where we abbreviated g 2 λ = g 2 (4 + λ 2 ) and introduced the function The quasi-probability distribution for heat and work is obtained from the cumulant generating function through Eq. (52) and illustrated in Fig. 2. This distribution exhibits two striking features: I The first law of thermodynamics can be violated probabilistically.
II The distribution takes on negative values.
The first law violations are visualized by finite values of P(q, w) for q = w and are a direct consequence of the fact that S(χ, λ) = S(χ + λ). It is instructive to inspect the function Ψ(χ, λ). The first (second) term in Eq. (58) corresponds to photons going from hot to cold (cold to hot). The terms in the brackets show how these photons contribute to heat and work. In general, their contributions to heat and work are different. Only for small λ do these counting terms reduce to exp [∓i(χ + λ)] − 1. Furthermore, Eq. (57) shows an additional influence on work fluctuations arising from a rescaling of the interaction strength g 2 → g 2 λ = g 2 (4 + λ 2 ). We note that while q = w is not enforced on the level of (quasi-)probabilities, the first law still holds on average as we find 〈〈q〉〉 = 〈〈w〉〉.
The negative values of the quasi-probability distribution reflect the fact that it is impossible to measure heat and work simultaneously without affecting these quantities by measurement backaction. Such negativities have been found before in the charge transfer between superconductors [37].
Both features I and II become particularly apparent in the distribution describing the difference of work and heat defined as  This distribution is illustrated in Fig. 3. It exhibits oscillations and negative values. The fact that it takes on non-zero values for ∆ = w−q = 0 implies the presence of probabilistic first law violations. Note that the condition for probabilistic violations of the first law can be cast into [cf. Eq. (56)] We note that the probabilistic violations of the first law occur almost exclusively for w < q. As shown in Sec. 6.5, this implies that measuring a work value less than the expended heat is vastly more probable than measuring work that exceeds the heat input.

Cumulants
For understanding both features I and II, it is instructive to consider the cumulants that follow from Eq. (57). The averages read (see also, [18,75]) for the (co-)variances, we find and the third order cumulants read We thus find that for the cumulants up to second order, the distribution behaves as if q = w (in agreement with Ref. [21]). However, starting from the third order cumulants, this is no longer the case. In particular, from Eq. (59), we find that the first non-vanishing cumulants for ∆ = w − q are We note that for higher cumulants, also even cumulants are non-vanishing. The distribution P(∆) thus has both a vanishing mean and a vanishing variance. For any non-negative distribution, vanishing mean and variance imply that all cumulants vanish (i.e., the distribution is the Dirac delta distribution). The finite higher cumulants in Eq. (64) thus imply that P(∆) takes on negative values and exhibits probabilistic first law violations [cf. Eq. (60)]. When the first law holds up to the second order cumulants, probabilistic first law violations are thus intimately linked to negative quasi-probabilities. Since higher cumulants capture the behavior of the tails of a distribution, one can usually reproduce the characteristic features by only keeping the lowest cumulants in the generating function. We thus approximate where the Airy function of the first kind is defined as Equation (65)

Limiting cases
To get a better analytical understanding of the heat-to-work conversion process, it is instructive to consider the two limiting cases κ c , κ h g and g κ c , κ h . For κ c , κ h g, we find For λ = 0, this equation reduces to the cumulant generating function for a single resonator coupled to two baths, cf. Eq. (F14) in Ref. [76]. Due to the strong coupling between the resonators, they act similarly to one single resonator. Note however that the frequencies associated to the two reservoirs are different (Ω c = Ω h ), reflecting the fact that photons change their energy when going from one resonator to the other. The analogy with a single resonator should thus be treated with care. In the opposite limit, g κ c , κ h , we find In this limit, we obtain the marginals analytically. For the heat distribution, we find a bi-directional Poissonian with the distribution where I q denotes the modified Bessel function of the first kind. The rate for photons to go from reservoir α to reservoir β is given by The moments of this distribution are particularly simple and read Such a bi-directional Poissonian describes particles being partitioned at a single junction. In the present case, the Josephson junction provides a bottleneck for the photons, making it the only junction that is relevant for transport statistics.
For the work distribution, we find a Gaussian distribution with the same mean and average as for the heat

Out-of-equilibrium relations
Having access to the analytic expression of the cumulant generating function [cf. Eq. (57)], we may verify out-of-equilibrium relations that are expected to hold for the present scenario. In particular, we consider the fluctuation theorem [3] as well as the thermodynamic uncertainty relation (TUR) [77]. From the symmetry a fluctuation theorem follows While this implies a fluctuation theorem for heat, see Eq. (43), no simple fluctuation theorem for work can be derived from Eq. (74). Indeed, derivations for work fluctuation theorems usually rely on the first law in order to relate work to entropy. Not surprisingly, this approach breaks down in the presence of probabilistic first law violations. From the second cumulant given in Eq. (62), we find that the TUR is obeyed in our system where the average entropy production reads 〈〈Σ〉〉 = 〈〈q〉〉(β c Ω c − β h Ω h ). Equation (75) can be proven by noting that the second term in the variance [cf. Eq. (62)] is strictly positive, and by using the inequality x coth(x) ≥ 1. Since heat fluctuations are equal to work fluctuations in our system, the TUR implies a direct trade-off between power, efficiency, and power fluctuations [78] for the present quantum heat engine. The validity of the TUR is in agreement with Ref. [79], where it was shown that the TUR is valid for harmonic oscillator junctions. Figure 4: Distribution describing a measurement of heat and work. While heat is assumed to be measured perfectly (e.g., by monitoring the photons that enter and leave the system), the measurement of work is described by Eq. (8). The measurement strength is quantified by σ = 0.35. The rest of the parameters are the same as in Fig. 2 (a).

The joint probability distribution for measuring heat and work
For a joint measurement of heat and work, we consider a detector with a Gaussian Wigner function [cf. Eq. (84)] To ensure an optimal trade-off between measurement imprecision and back-action, we choose σ x σ p = 1/2, saturating the uncertainty principle, and define σ ≡ σ x /s which is the only remaining free parameter describing the influence of the detector on the joint distribution. We note that for this optimal trade-off between measurement imprecision and back-action, the detector generally has to be in a pure state. The energetic cost of preparing such a state (which diverges due to the third law of thermodynamics [80,81]) is not taken into account here. We stress however that this does not hamper our conclusions for weak measurements, which only rely on a Gaussian distribution for the detector variabler.
Coupling a detector to the system in general modifies the energy balance even on average, see App. B. This may act as a resource, fueling engines and refrigerators [49,[82][83][84]. We note however that the energy exchanged with the detector does not restore the first law for the measured distribution, since measuring this energy would require an additional detector which would face the exact same problems.
The probability distribution for the outcomes of a joint measurement of work and heat is shown in Fig. 4. While measurement imprecision and back-action alter the quasi-probability distribution rendering it non-negative, similarities are clearly visible [cf. Fig. 2]. In particular, the oscillatory behavior which results from the coherent nature of the heat-to-work conversion process remains visible. Furthermore, the measured distribution exhibits probabilistic violations of the first law, just as the underlying quasi-probability distribution. As a consequence, a work value that is considerably smaller than the heat input may be measured.
First law violations in the measurement are to be expected because neither back-action nor imprecision are generally able to restore the first law on the level of probabilities. While there are some instances where back-action may ensure the validity of the first law, this is not generally the case and it may even alter the energy balance on average. In addition, measurement imprecision introduces randomness to the measurement outcomes in a way that is unrelated to energy conservation.

Conclusions & Outlook
Fluctuations challenge the laws of thermodynamics. While they hold on average, it is well established that the second law of thermodynamics can be probabilistically violated. Here we showed that this holds true for the first law as well, in the presence of quantum fluctuations. An imprecision vs backaction tradeoff will in general prevent the first law of thermodynamics to be applicable for single experimental runs, when heat, work, and internal energy are determined by separate measurements. In order to determine the statistics of these quantities in the absence of any measurement, a weak measurement may be employed and sub-sequentially corrected for measurement imprecision. This procedure results in a quasi-probability distribution that becomes non-negative and respects the first law of thermodynamics in a classical regime. For quantum systems, this QPD does in general not respect the first law.
To illustrate these probabilistic violations of the first law, we provided a detailed investigation into the joint fluctuations of heat and work in a quantum heat engine. In this device, heat fluctuations behave classically due to the weak coupling between system and bath. At strong coupling, a thermodynamically consistent definition of internal energy and heat is a subject of debate [40][41][42], further challenging the validity of the first law on the level of probabilities. Additionally, on time-scales shorter than the bath correlation time, the time-energy uncertainty relation may render energy conversion inapplicable [85,86].
In agreement with previous works [87,88], our results imply that conservation laws, such as energy conservation, do not enjoy the same impact in quantum systems as they do in classical systems. The reason for this is that when dividing a conserved quantity into parts, a Heisenberg uncertainty relation may restrict our simultaneous knowledge of these parts. The quasi-probability distribution used to describe work fluctuations here is a member of a larger family, which have been called Keldysh quasi-probability distributions [32]. Another prominent member is the Wigner function and generalizations thereof [89]. Different Keldysh quasi-probability distributions may shed light onto the impact of different conservation laws on measurable but non-commuting quantities.
Our results open up several interesting avenues: Equation (74) implies that a usual Crooks fluctuation theorem for work cannot be derived when the first law of thermodynamics does not hold on the level of the distribution. Nevertheless, fluctuation theorems including quantum corrections have been derived [90,91]. These quantum corrections may have a connection to the probabilistic violations of the first law investigated here.
A further interesting avenue to pursue is provided by potential links between probabilistic violations of the first law and the energy-time uncertainty relation. Such links are expected to become particularly important in the short-time regime not considered in the present manuscript.
Another open question concerns the relation between first law violations and negative values in the Keldysh quasi-probability distribution. Such negative values have recently been shown to be a necessary and sufficient condition for ruling out a classical description of the measurement outcomes [92]. Here we showed that if the first law holds not only for the first but also for the second cumulants, then first law violations are accompanied by negative quasi-probabilities. While the first law is only a statement about average values, we may speculate that it also generally holds for the second cumulants. If this were true, it would strengthen the link between probabilistic violations of the first law and negative quasi-probabilities.
Probabilistic violations of the second law are by now well understood, in particular because of fluctuation theorems which are recognized as the generalization of the second law to fluctuating systems. By demonstrating probabilistic first law violations, our results may open a quest for finding a generalization of the first law that includes quantum fluctuations. At this point, we may only speculate how such a generalization may look like and what its potential impact may be.

acknowledgments
We acknowledge fruitful discussions with the audience from the Quarantine Thermo seminar series as well as the QTD2020 conference, in particular with P. Strasberg

A The general scenario
In this Appendix, we provide details for the general scenario discussed in Sec. 2. There we introduced measurements for internal work, heat, and internal energy changes. A joint measurement of these quantities provides outcomes W , Q, and ∆U with probability where the time-evolution includes the detector with the time-ordering operator T . Furthermore, |E t B , E t , r〉 is a joint eigenstate ofĤ B ,Ĥ(t), and the position operator of the detectorr. The initial density matrix of the detector is denoted byρ d .
Fourier transforming Eq. (77), we obtain the moment generating function where the states that sandwichρ d in the last row are eigenstates of the momentum operator π. Furthermore, we assumedρ tot (0) to commute both withĤ(0) as well as withĤ B and we introduced the moment generating function of the QPD with the modified time-evolution operator which only acts on the Hilbert space of system and bath (not including the detector). From Eq. (79), using the identities together with the convolution theorem, we find As discussed in the main text, the probabilities describing the measurement can be written as a convolution of the QPD and the Wigner function of the detector used for measuring work.

A.1 The classical regime
In the classical regime, we make the assumptions [Ĥ(t),Ĥ(t )] = 0 as well as In this regime, Eq. (80) reduces to which implies P γ (Q, W, ∆U) = P(Q, ∆U)δ(W − Q + ∆U), where P(Q, ∆U) describes a joint measurement of heat and internal energy change, in the absence of a detector for measuring work.
In the classical regime, backaction thus becomes irrelevant and the QPD respects the first law of thermodynamics on the level of probabilities.

A.2 Weak measurements
For a weak measurement, when s → 0, a finite momentum support of the detector Wigner function implies that we can make the replacement W( The integral over γ can then trivially be executed in Eq. (84) and from the convolution theorem, we find a simple relation between the measured cumulants and the cumulants of the QPD where c k denotes the k-th cumulant of p d (W ). For a detector with a Gaussian initial position distribution, we recover Eq. (16).

B The first law in the presence of the detector
The coupling to the detector represents a time-dependent contribution to the Hamiltonian, cf. Eq. (7). This term modifies the energy balance resulting in a modified first law where we used the subscript d to label the power provided by the detector. The contribution from the detector to the first law reads Since the detector couples to the power operator in the absence of the detector itself, it only measures W , which is no longer the total work. This is a general issue: whenever a time-dependent power operator is being measured, the measurement Hamiltonian will necessarily also be timedependent, providing an additional contribution to the total power. With a linear detector, it is impossible to include this contribution in the measurement. If we however consider weak measurements, then W d can safely be neglected. This can be seen by noting that the Hamiltonian describing the coupling to the detector is proportional to sπ. As the Hamiltonian commutes witĥ π, the momentum values of the detector much larger than σ p are negligible, cf. Eq. (76), and the power provided by the detector will be proportional to a factor smaller than sσ p = 1/(2σ), which vanishes for weak measurements. Just as in the absence of the detector, Eq. (89) holds for average values and does not take over to fluctuations. Indeed, when adding a second detector to measure W d , the energy balance gets modified again. Including the energetics of the detector does therefore not salvage the first law of thermodynamics beyond average values.

C The long-time limit
In the main text, we argued that the internal energy of the system may be neglected in the long time limit. We further stated that the fluctuations of heat are completely determined by a single variable q, describing the net number of photons that went from the hot to the cold reservoir. Here we prove these two statements. We start with the moment generating function for heat, work, and internal energy changes e S(χ,λ,ξ) = Tr e −iξĤ(τ) e L(χ,λ)t e iξĤ(0)ρ , where the Liouvillian L(χ, λ) is determined by the right-hand side of Eq. (50). The counting field ξ corresponds to the internal energy fluctuations. We may re-write Eq. (91) using the spectral decomposition of the Liouvillian e S = i e ν i t Tr e −iξĤ(τ) P i e iξĤ(0)ρ , The eigenvalues of the Liouvillian are denoted by ν i and the projectors P i depend on the eigenstates of the Liouvillian. In the long-time limit, the sum is dominated by the eigenvalue with the largest real part, denoted ν max , and we find S(χ, λ, ξ) = ν max (χ, λ)t + S 0 (χ, λ, ξ), where the time-independent term is given by S 0 = ln Tr e −iξĤ(τ) P max (χ, λ)e iξĤ(0)ρ .
In the long-time limit, we may safely drop the term S 0 in the cumulant generating function. In particular, for the cumulants we find In particular, this implies that the first law may not be recovered by taking into account the internal energy fluctuations, as these scale differently with time.
We now show that the long-time limit, the statistics of heat is fully determined by a single counting field, i.e., P(q) ∝ δ q h ,−q c . To this end, we introduce the unitary superoperator To show that this superoperator is unitary, we note that its adjoint, U † , is defined by where we introduced the inner product 〈B,Â〉 = Tr{B †Â }. We find and it follows that U † U = 1. We now consider the Liouvillian, setting λ = 0 for simplicity (the proof also holds for λ = 0) AsL only depends on the difference of the counting fields, so do its eigenvalues. Since a unitary transformation leaves the eigenvalues invariant, the same holds for the eigenvalues of the original Liouvillian L(χ). In the long-time limit, the cumulant generating function thus only depends on the difference in counting fields, cf. Eq. (93). The relation P(q) ∝ δ q h ,−q c then follows directly from Eq. (39). We note that for short times, the cumulant generating function also depends on the eigenvectors of the Liouvillian and therefore explicitly depends on both χ h and χ c .

D The Wigner function approach
In this appendix, we show how the distribution P(q, w; γ) can be calculated. All results shown in the main text follow from this distribution. As shown in Sec. 5, the cumulant generating function of this distribution can be written as S = ln[Tr{ρ}], whereρ is governed by the master equation Here the Hamiltonian is given in Eq. (20), the current operator in Eq. (30), and the superoperators responsible for dissipation in Eq. (37).
Equation (103) can be solved by the Gaussian ansatz