Generative learning for the problem of critical slowing down in lattice Gross Neveu model

In lattice field theory, Monte Carlo simulation algorithms get highly affected by critical slowing down in the critical region, where autocorrelation time increases rapidly. Hence the cost of generation of lattice configurations near the critical region increases sharply. In this paper, we use a Conditional Generative Adversarial Network (C-GAN) for sampling lattice configurations. We train the C-GAN on the dataset consisting of Hybrid Monte Carlo (HMC) samples in regions away from the critical region, i.e., in the regions where the HMC simulation cost is not so high. Then we use the trained C-GAN model to generate independent samples in the critical region. Thus, the overall computational cost is reduced. We test our approach for Gross-Neveu model in 1+1 dimension. We find that the observable distributions obtained from the proposed C-GAN model match with those obtained from HMC simulations, while circumventing the problem of critical slowing down.


I. INTRODUCTION
Lattice field theory is the most reliable and well established technique to solve quantum field theories nonperturbatively. In this approach the theory is formulated on a discrete space-time lattice to solve numerically. In Monte Carlo(MC) simulation of lattice field theory the efficiency of simulation depends on the algorithm used. Algorithm like Hybrid Monte Carlo (HMC) [1] works well away from the critical points of the lattice theory but when one approaches the critical region the simulation algorithm suffers severe critical slowing down [2,3]. Near the critical point the autocorrelation time increases dramatically and can become larger than the total simulation run time. Therefore we have no control over the statistical uncertainties of calculated observable on the simulated lattice configurations. As an example, in lattice QCD as we approach the continuum limit a → 0 for a fixed physical volume, the computational cost of HMC scales approximately as a −z with z > 6 [3]. Several methods in gauge theories has been developed [4][5][6] to improve the MC simulations on lattice. Machine Learning(ML), in the mean time has made tremendous advancements and found application in many branches in physics. ML has been applied extensively in many condense matter and statistical physics problems [7][8][9][10]. In [11], supervised learning has been adopted to accelerate the MC simulations for statistical physics problems, a self learning MC has been proposed in [12] to reduce the autocorrelation time specially near the critical region by learning an effective Hamiltonian. In recent times some machine learning approaches [13][14][15][16][17][18][19][20] are used to circumvent the problem of diverging autocorrelation time in lattice filed theory and XY model [21] as well. Machine Learning(ML) has been also applied to circumvent the problem of critical slowing down in U(1) gauge theory [17] and parameter regression task in [15]. In this work we explore a system with fermions viz. the Gross-Neveu model [22]. In this work, following the ML approach we have used Generative Adversarial Network (GAN) [23] conditioned on parameter of the theory to efficiently generate lattice field configurations near critical point. ML based generative models generates uncorrelated samples which is one of the reason for using it as a replacement of MCMC simulation. To the best of our knowledge, GANs have not been applied to any fermionic system. However, normalizing flows have been used for Yukawa model [14]. In [21], C-GANs have been found to be effective for studying phase transitions in XY model. The critical point of a lattice theory corresponds to a particular value of the parameters (λ) of the theory. Our target is to generate uncorrelated samples from a probability distribution of kind: P (Φ|λ crit ) = 1 Z e −S(Φ,λcrit) where Φ is lattice field, S is the action and Z is the partition function of the theory. The basic idea of our method consists of the following three steps: 1. Generate samples using HMC for λ away from critical region of the lattice theory : Φ ∼ p(Φ|λ noncrit ).
2. Train the GAN models conditioned on the parameter λ using the data from step 1, i.e., learn the distribution p(Φ|λ). emerges on a finite lattice [24]. The Aoki phase structure of GN model with Wilson fermion and staggered fermion with flavored mass term has been investigated in strong coupling limit in [25]. The chiral phase transition of the GN model with minimally doubled Borici-Creutz fermion has been investigated in detail in [26]. The mass spectrum of GN model has been studied in [27,28].
To check the validity of the generative model, we evaluate it in the critical region. For evaluation, we compare the observables calculated from the samples generated by the proposed C-GAN with those from the samples generated by HMC simulations. Since proposed C-GAN model's samples are independent, given λ, it alleviates the critical slowing down problem. Since the lattice constant a changes with parameter of the theory, we must choose the lattice size accordingly so that at critical region we get a lattice of desired physical volume.

A. Continuum Theory
The Euclidean Lagrangian of GN model in 1+1 dimension is [22]: With the help of so called Hubbard-Stratonovich(HS) transformation we can reduce the four fermion part to a term quadratic in the fermion fields and an additional auxiliary bosonic field. The transformation is basically a shifted Gaussian integral.
The partition function becomes The action is given by where, D GN = ∂ + σ(x). One can show that the σ field and condensate field ψψ Therefore, the average of auxiliary field σ(x) can be referred to as the Chiral Condensate. This σ(x) can be used as an order parameter to study spontaneously chiral symmetry breaking of the GN model. GN model is analytically solvable in the infinite flavor limit:N f → ∞. Its phase structure has been studied extensively in this limit [29,30]. Inhomogeneous phases of GN model in lattice are also studied for finite number of flavors with proper continuum limit in [31,32].

B. Lattice theory
The action of lattice GN model in the staggered formalism [33] is generally written as where the coupling constant is inverted to λ = 1/λ for simulation purpose and D = D 1 + Σ with where1 and2 are unit vectors in the two directions in 2D. This theory has discrete chiral symmetry: Higher N f value is necessary to match continuum (N f −→ ∞) results but N f = 2 will serve our purpose in this work. After introducing pseudofermionic [34] method(for N f = 2 ) action become non-local : where M = D † D and φ are pseudofermionic complex field. Partition function can be written as- With the action given in Equation (10) we perform our HMC simulations. In this work we have used the staggered fermion(for details about staggered fermion, see [35]) for lattice simulation.

III. GENERATIVE ADVERSARIAL NETWORK(GAN)
Generative Adversarial Networks (GANs) can be trained to generate samples from a high dimensional probability distribution. A GAN [23] basically consist of two neural networks, namely, the generator and the discriminator, where the generator's prime job is to generate realistic samples from a noise vector and discriminator is a binary classifier whose output is either 1 or 0. Notably, GAN learns from the samples from the true distribution, without using the true distribution explicitly. Likewise, it generates samples and does not explicitly tell the probability density.
The Generative model G, parameterized by Θ is a map from a random noise z ∼ p z (z) to x ∼ p g (G(z, Θ)). The training dataset is from a true distribution x ∼ p real (x). The discriminator D(x, Φ) predict whether it is coming from p g or p real . After completion of the training we expect p g to be as close as possible to p real . The training process is a two players min-max game where discriminator improves its ability to distinguish generator's fake samples and generator also improves its ability to produce more realistic samples to fool the discriminator as the training continues. In the training process both the generator and discriminator's weights are updated in tandem.
The objective function of GAN is: Conditional-GAN: If the true dataset has categories or classes, then the original GAN approach has no control over the type or class of output generated by the generator as output depends only on the random noise. But in many situations it become necessary to generate data of a particular type or class. So we want to train a GAN so that it can learn a conditional probability distribution.
In C-GAN [36] we append the random noise with additional information λ, which could be attributes or class labels to produce output G(λ, z, Θ), which is conditioned on λ. We also append λ to the input of discriminator.
The Objective function of C-GAN is: IV. HMC SIMULATION

A. HMC Algorithm
HMC algorithm can be use to produce a Markov Chain whose stationary distribution is: where S(σ, φ) is the lattice action and Z is the partition function defined in Equations (10) and (11) respectively. However, this partition function does not represent a classical Hamiltonian system. We can transform it by introducing a canonically conjugate momentum variable π(x) into the system. Then it become a Hamiltonian system, where Hamiltonian can be written as: In HMC algorithm we solve the Hamiltonian equation in discrete time for σ(x) and π(x). We can sample pseudofermionic variable φ easily by sampling complex vector ξ from exp(−ξ † ξ) and setting φ = D † ξ. This ensures that φ can be sampled according to the distribution Equation (14) for a given σ. For details of HMC for psedofermion action refer to [35]. The common steps one follows in HMC simulation are: 1. Choose σ 0 configuration from cold-start or hotstart.
3. Choose ξ as Gaussian noise and Evaluate: 4. MD steps to update σ and π keeping φ as background field: Solve Hamiltonian differential equations for some discrete time step τ .
It will generate new configurations (σ new , π new ) as the next proposal.
5. Do Metropolis test to accept or reject the new configuration.
In this way we can generate ensemble of (σ, φ) configurations according to the distribution (14).

B. HMC Simulation and Observables
In this work, we simulate for N f = 2, with lattice size=32 × 32. During HMC simulation, we adjust the MD step-size to keep acceptance rate around ∼ 80% with legitimate autocorrelation time. In this work we set MD step size to 0.1 and trajectory lenght to 1. We left first 500 lattice configurations for thermalization. At each λ values in the range [0.6 -2.5], we generate 4000 lattice configurations for training dataset and baseline for evaluation of our proposed C-GAN, which is discussed further in section VI. The quantity:σ = 1 N x σ(x), which is measured in a single lattice configuration can be use to study the phase transition as it's ensemble average has direct relation to Chiral Condensate ψψ . However, there is a problem with quantityσ as its ensemble average σ vanishes even for λ λ crit i.e. even for broken phase close to the critical point. This can be seen from Figure 1 which is calculated near critical point where configurations fluctuates between two minimum and hence average σ nearly vanishes. This is due to the ability of configurations to make tunnel from one minimum to the other. So instead of using σ , we choose |σ| as our order parameter which is a suitable observable to study phase transition. One more observable of importance is susceptibility. The two observables can be defined as where N is the lattice volume.

V. PROPOSED METHOD
In HMC simulation we sample σ, φ field according to the the distribution given in Equation (14) i.e. σ, φ ∼ P (σ, φ|λ). We want the C-GAN to learn the marginal distribution of σ field i.e p(σ|λ). So we discard HMC gener-ated pseudofermionic φ samples and only consider σ samples which now represent the marginal distribution of σ from the joint distribution in Equation (14). Let the samples from the C-GAN represent an implicit distribution p(σ|λ). Our target is to train the C-GAN so thatp(σ|λ) approximates the true distribution p(σ|λ). To address the problem of critical slowing down, we train the C-GAN model for λ values sampled in non-critical region, where the autocorrelation time is much smaller comparing to the critical region. Hence generation of training dataset is not affected by critical slowing down. Then we use the trained C-GAN model to generate samples near critical λ. Since C-GAN model generates independent samples, hence our method can produce uncorrelated lattice configurations in the critical region. Vanilla C-GAN trained over the HMC samples fails to learn the distribution reliably. The learning is made efficient as well as robust by incorporating into the C-GAN model the information of symmetries and constraints in the theory. Also, transforming the samples so as to reduce the imbalance in their values improves learning. We discuss these in detail in the following subsections.

A. Translation Symmetry
Due to translation symmetry in GN model lattices, C-GAN generator made of dense layers fails to learn the true distribution properly. Convolutional kernels allow translational invariance in the lattices. Hence, using convolutional layers in the generator allows the learning to take place efficiently.

B. Transformation of σ field
Since the observables of GN model can be calculated from |σ| , hence for training the C-GAN we transformed the lattice configurations such that each configuration has σ > 0. This will reduce degrees of freedom for the C-GAN model which will help in exploring the distribution space more efficiently. For that purpose, we select a particular configuration and if foundσ < 0 then we apply a transformation: For training purpose of C-GAN, we apply natural log transformation to the HMC generated samples as follows: where i represent a single lattice configuration from the ensemble and c is a constant such that the sum inside the logarithm become positive. This transformationEquation (18) become necessary for stable training of C-GAN as it balances data values and reduces the dynamical range of the σ(x) field. For efficient training we apply the Min-Max scaling to the above transformed data to bring into a range [-1,1].

C. Periodic Boundary Condition
During generation of configurations by HMC, we apply periodic boundary condition i.e. we replace σ(i, j) by σ (i, j) := σ((i) N , (j) N ), where (i) N represents i modulo N . In order to learn the periodicity by the C-GAN model we apply periodic padding to the all layers of generator and initial two layers of discriminator.

A. Dataset
Our training dataset is consisting of 10 ensembles, each of which has 4000 lattice configurations corresponding to 10 different λ values generated by HMC simulation. Λ tr is the set of λ on which we train the C-GAN model.  It has three 2D convolutional layers with Tanh activation function followed by a dense layer with Sigmoid activation. We use kernel of sizes (4,4) and strides (2,2),(1,1) for discriminator.The detail architecture of generator and discriminator model is given in the appendix. We add periodic padding to the all layers of generator model and only two initial layers of discriminator model to learn the periodicity in the lattice configuration.
Once the training of C-GAN is over, we use the generator model to generate two ensembles each consist of 20000 configurations for Λ tr and Λ ts respectively. In both cases we evaluate our C-GAN model by comparing observables calculated on the above two set with those calculated from the HMC generated ensembles. The observables used for this purpose are: |σ| and χ as defined in Equation (16).

C. C-GAN training and sampling process
In the preperation of datasetfor training, we put λ label for each HMC generated configurations which are the real data in discriminator terminolgy. While updating the generator one batch of random noise z (with random label)is drawn from p g (z). For updating discrimintor, a full batch of 256 configurations is used, where half batch from x = G(z|λ) , where z ∼ p g (z) and other half from x ∼ p real (x|λ). In this work we use Adam optimizer with an initial learning rate of 0.0002 for both geneartor and discriminator loss. Initially the C-GAN model was trained upto 200 epochs. Then we use observables errors, δσ = |σ| hmc -|σ| C−GAN as stoping criteria. We stopped the trainig where the error on validation set is minimum and remains approximatly constant for 10 further epochs. However, this epoch range will change as the learning rate, optimizer, batch size, number of weights and biases, padding structure etc. in the C-GAN model changes.The loss curve is shown in Figure 2. For sampling purpose, we choose a particular λ value of shape (4 × 4) and a batch from z ∼ p g (z) which is then fed to the pretrained generator model. The batch size is the number of samples required to generate which is 2000 for our case. We generate total 20000 configurations for different λ values for both phase case. For interpolation and etxrapolation at critical λ values we use the same sampling procedure.

Testing on Λtr Set
We do the analysis on λ tr set to confirm that the C-GAN model has correctly learned the training data distribution. In this ensemble we calculate theσ for each lattice configuration then plot the histogram of |σ| as shown in Figure 3. Different peaks in the histogram roughly corresponds to different λ values. The histogram generated from the proposed C-GAN model and HMC overlaps quite well. It indicates that our proposed distributionp(σ|λ) represented by C-GAN approximates the true distribution for the λ tr set. Also in Figure 4, we plot ensemble averaged |σ| for Λ tr set. Here we take ensemble average |σ| for each λ separately and then plot |σ| vs λ. It shows that the observables are matching well for both C-GAN and HMC ensembles for Λ tr i.e. the set used during training.

Testing on Λts Set
Since our main goal is to generate lattice ensembles in the critical region we must evaluate our C-GAN model in Λ ts . We asses the performance of the proposed C-GAN in terms of being able to produce observables matching with those obtained from true distributions(i.e., generated by HMC).
Mean |σ| : In Figure 6 we can observe that histogram of |σ| matches quite well with the true histogram obtained via HMC samples even for Λ ts . Different peaks at high |σ| values roughly represent different λ values. However, there are no distinct peaks visible near low |σ| values as the peaks gets overlapped. We also present the histograms of |σ| in Figures 7 and 8 for λ ∈ {1.5, 1.6} which are in the critical region where we didn't train the model. In Figure 9 we present the results for the mean σ in the critical region. We can see that the phase transition behaviour is described very well by the generator model. Susceptibility(χ): We show the susceptibility values obtained from HMC configurations as well as those obtained from C-GAN in Figure 5 for non-critical data set. One can observe that the peak coincides for both HMC and C-GAN. The same plot for critical dataset is shown in Figure 10. We have found that in critical region both mean σ and susceptibility agree quite well with the HMC results even without training in that region. This gives a good indication that the trained model can reproduce the second order phase transition in the GN lattice model. In Figure 11 we show the autocorrelation time generated from HMC simulation with unit trajectory length in MD step, while keeping acceptance rate ≈ 80%. We see that near the phase transition point the the autocorrelation time increases sharply. However, during sampling from the C-GAN model we starts with a random Gaussian noise vector to generate lattice configurations. Therefore, the lattice configurations generated by the C-GAN model are independent of each other, which will solve the critical slowing down problem. In this way we can generate uncorrelated samples near critical region at the cost of generation of samples by HMC at the non critical region.

E. Numerical experiment with data from a single phase
We also train the C-GAN model using HMC generated dataset consiting of λ values only from one single phase. This experiment is necessary to check our model's utility in latttice gauge theory where extrapolation to critical point is necessary from one direction of parameter space.  The results are shown in Figures 12 and 16. We observe that the histogram and mean σ matches quite well with HMC results for λ ts 1ph , where critical points are included. Also in Figures 13 to 15 we have shown the individual histogram of |σ| for λ=1.5 ,1.55, 1.6. We found that for the critical λ values observables does't differ either we train the model with one single phase or consider both the phases.

F. ABLATION ANALYSIS
We perform ablation analysis to see the effect of certain key component of the proposed method on its performance.
Transformation of σ field: We find that log transformation Equation (18) is one of the crucial component for the training of C-GAN model. On removing it ,the training loss becomes high and the observables do not agree well with the HMC observables which can be seen from Figures 17 to 19. There is a large deviation of mean of |σ| compared to HMC results in both critical and non-critical regions as shown in Figure 17. It is observed that susceptibility is too sensitive to log transformation as shown in Figure 18. It is also seen that without Min-Max scaling the C-GAN model is unable to learn different modes corresponding to different λ values.
Periodic Boundary Condition: We have noticed that the C-GAN performs well using periodic padding in both discriminator and generator as seen from Figures 6 to 10. But when we remove periodicity from both discriminator and generator, C-GAN fails to reproduce the HMC results. Figures 20 and 21 show the disagreements between C-GAN and HMC results for |σ| , susceptibility χ respectively and CrefFig16 compares the histogram of |σ| without periodicity in C-GAN for λ = 1.5. Likewise, not applying periodic padding to the generator and applying only to the discriminator, also degrades the performance.

G. COST ANALYSIS
Sampling a fixed numbers of configurations via HMC algorithm depends on various parameters like MD step size, no. of MD steps, parameter value of action and also hardware used for simulations. Here we have used MD step-size=0.1 and no. of MD steps=10 .The hardware used for HMC simulations is a six core i7-9700CPU CPU machine. In non-critical region we generate around 10000 configurations per hour, leaving 10 intermidiate configurations. In critical region, we leave 20 intermediate configurations and able to generate roughly 4000 configuration per hour. The training of C-GAN model was done on a single GPU machine (GeForce RTX) for 2-3 hours 1 . Once the training 1 We have used tensorflow 2.4 for our model implementation.
is over, sampling of lattice configurations become very efficient. It roughly takes only 2 minutes to generates 8000 lattice configurations. These gains looks significant and we expect them to be more significant as we go to the higher dimension where autocorrelation for HMC simulation is more severe near critical region.

VII. SUMMARY & CONCLUSION
MCMC methods are generally used to generate lattices as they give theoretical guarantees on validity of samples. In this work, we use GANs which don't give theoretical guarantees but the empirical results show that they are able to efficiently interpolate as well as extrapolate to critical regions. In lattice field theory, the cost of generation of lattice configurations by MCMC methods is severely affected by critical slowing down as the lattice parameters are tuned towards the critical region. At the critical point the cost of HMC simulation diverges for theories like QCD due to the diverging autocorrelation time. Therefore, generation of configurations in lattice field theory in the critical region is a challenging task. This paper proposes to use HMC generated configurations for GN model away from the critical region and trained a C-GAN to generate lattice configuration near critical point. With HMC data in non-critical region, we train the C-GAN model conditioned with parameter λ. For evaluation of the proposed C-GAN model at critical region we compare few observables on the samples generated from both C-GAN and HMC. We found a good matching between the results of HMC and our C-GAN model and also observed that phase transition can be very well reproduced by the generative approach. Since the C-GAN model in the critical region gives correct critical behaviour, we can infer that our generative model is a good interpolator in the critical region. Since C-GAN generates independent configurations, there is no correlation in the samples generated by the C-GAN model, thereby avoiding the critical slowing down problem. In this work we evaluate our proposed C-GAN model by comparing observables with HMC samples. We could also use our C-GAN model distributionp(σ|λ) as proposal distribution to construct a Markov chain as done in [13]. However, to construct such a Markov chain we must know the proposal density explicitly which is not available for GANs. Rather in this work we have accepted all the samples generated by the C-GAN model and thus having a vanishing autocorrelation. However, we can construct a markov chain in future looking at the recent development regarding density estimation for GAN in ML community. One such method could be the FlowGAN [37] which explicitly estimate the densities for GAN, where generator network is replaced by an invertible flow network. Another method could be the Round-trip method [38] which try to estimate density approximately. There are also other ML architecture like Conditional Normalizing Flow which estimate densi-ties explicitly for generated samples. These are few possibility in ML architecture which can be use for MCMC accept/reject step and we are planing to work in these directions. Although the problem of critical slowing down is not as severe for GN model in 1+1 dimensions but building and testing the C-GAN in the GN model establishes its applicability in the lattice formulation of fermionic system. In this work, we dealt with only fermionic fields without any gauge interaction. Extending our work to lattice gauge theory and QCD will be an interesting as well as challenging task.