Detecting Nematic Order in STM/STS Data with Artificial Intelligence

Detecting the subtle yet phase defining features in Scanning Tunneling Microscopy and Spectroscopy data remains an important challenge in quantum materials. We meet the challenge of detecting nematic order from local density of states data with supervised machine learning and artificial neural networks for the difficult scenario without sharp features such as visible lattice Bragg peaks or Friedel oscillation signatures in the Fourier transform spectrum. We train the artificial neural networks to classify simulated data of isotropic and anisotropic two-dimensional metals in the presence of disorder. The supervised machine learning succeeds only with at least one hidden layer in the ANN architecture, demonstrating it is a higher level of complexity than nematic order detected from Bragg peaks which requires just two neurons. We apply the finalized ANN to experimental STM data on CaFe2As2, and it predicts nematic symmetry breaking with 99% confidence (probability 0.99), in agreement with previous analysis. Our results suggest ANNs could be a useful tool for the detection of nematic order in STM data and a variety of other forms of symmetry breaking.

Detecting the subtle yet phase defining features in Scanning Tunneling Microscopy and Spectroscopy data remains an important challenge in quantum materials. We meet the challenge of detecting nematic order from local density of states data with supervised machine learning and artificial neural networks for the difficult scenario without sharp features such as visible lattice Bragg peaks or Friedel oscillation signatures in the Fourier transform spectrum. We train the artificial neural networks to classify simulated data of isotropic and anisotropic two-dimensional metals in the presence of disorder. The supervised machine learning succeeds only with at least one hidden layer in the ANN architecture, demonstrating it is a higher level of complexity than nematic order detected from Bragg peaks which requires just two neurons. We apply the finalized ANN to experimental STM data on CaFe2As2, and it predicts nematic symmetry breaking with 99% confidence (probability 0.99), in agreement with previous analysis. Our results suggest ANNs could be a useful tool for the detection of nematic order in STM data and a variety of other forms of symmetry breaking.

I. INTRODUCTION
Scanning Tunneling microscopy (STM) and spectroscopy (STS) data is difficult to fit to theory. These experiments can achieve visualization of the surface electronic structure with atomic level spatial resolution. They do so by bringing a metal tip near the sample surface to allow electron quantum tunneling under a bias voltage. The resulting tunneling current is a function of tip position and applied voltage. From it the local density of states (LDOS) of the sample is measured and can be compared to simulations for model interpretation following solid-state theory. For instance, the impurity scattering of electrons on the surface of a metal may result in standing wave pattern commonly referred to as quasi-particle interference that depends on the momentum transfer across the Fermi surface 1,2 . Therefore, we can map out the electronic Fermi surface and the underlying systematic phase and symmetries by interpreting the Fourier transform of the quasi-particle interference pattern. Some other electronic properties such as the presence of a spectral gap 3 also have smoking gun features. However, it is hard to connect STM experimental data to idealized models and tangible theories. For example, inhomogeneous behavior found in strongly correlated materials can leads to complex interference and render single-impurity quasi-particle interference analysis irrelevant. Also, the uncertainty, noise, and resolution in measurement are usually hard to account for and lay substantial difficulty towards a smoking-gun judgment for a unbiased theoretical match, especially during close comparisons between facing-off hypothetical models.
Recently, machine learning techniques have seen widespread adoption and increasing utility in the fields of condensed matter physics as a new route for data analysis and model building 4,5 . Machine learning is a branch of artificial intelligence where systems can learn from data, identify patterns and make decisions with minimal hu-man intervention. These capacities are consistent with various routine goals and challenges in condensed matter physics -connecting detailed microscopic models with qualitative universal features. Indeed, after training on simulated data from diverse microscopic models categorized into a series of classes following respective hypothetical claims, artificial neural networks (ANNs) can extract information from 'big' STM experimental data, and determine the characteristic symmetries of the realistic electronic quantum matter 6 .
The recent trend of applying machine learning techniques to condensed matter physics, beginning with its use in density function theory 7-10 and its extension to strongly correlated electrons models 5,11 suggests a new route to extracting information from STM data 6,12 . Specifically, one can train artificial intelligence (AI) architecture such as ANNs on simulated data from diverse microscopic models to capture a macroscopic phase defining feature of interest. Then we can ask this AI for its judgment on a realistic data set that may or may not exhibit this phase defining feature. By using simulated data sets with rich and detailed microscopic information, the ANN can extract the feature even when it manifests differently under different microscopic settings. Intriguingly, much like following a renormalization group flow, through machine learning the ANN automatically summarizes the relevant phase defining features 4 .
With this in mind, consider the case of detecting nematic order in STM data. Nematic order describes the onset of discrete anisotropy that breaks systematic fourfold rotation symmetry C 4 down to two-fold rotation symmetry C 2 . Detecting nematic order in STM or STS data can sometimes become a challenge 13 . One origin of the challenge is instrumental, as an anisotropic output could originate from that of the STM metal tip instead of the sample. For example, the claims of nematic orders in Bi 2 Sr 2 CaCu 2 O 8+x following analysis of Bragg peaks [13][14][15] were questioned 16 until evidence of nematic domains was later discovered 17 . Another difficulty arises in the absence of sharp features such as Bragg peaks, which have helped to establish the presence of nematic order in CaFe 2 As 2 18,19 . Further, a large amount of disorder, poor spatial resolution, and limited field of view also add to the complications. Even though we can sometimes feel ambiguously that the LDOS pattern 'seems nematic' when such anisotropy is strong, a quantitative analysis is lacking in general.
Here we revisit the challenge of detecting nematic order in the absence of sharp features from a machinelearning perspective. We choose our training sets to be simulated STM images representing the LDOS of various tight-binding models on a two-dimensional square lattice in the presence of various types of impurities. To provide a range of dispersion and Fermi surfaces, we also vary the hopping amplitudes in the tight-binding models, which are separated into two classes according to the presence or absence of global four-fold rotation symmetry. We limit the number of lattice site and LDOS pixel to one per unit cell, thus there are no meaningful Bragg peaks. The Friedel oscillation signatures are also lost when the density of impurities is too large or system size is too small, and traditional analysis fails to identify nematic order. Using supervised machine learning, we succeed in training an ANN architecture with one hidden layer but many neurons to separate the two symmetry classes with high accuracy. On the contrary we analytically show only two neurons are necessary if the data supports Bragg peaks. Finally, we input the realistic STM data of CaFe 2 As 2 19 into a successfully finalized ANN, and the output acknowledges nematic symmetry breaking with 99% confidence (probability 0.99), in support of the claims in Ref. 19. This result is especially remarkable given the longer length scale variations of the experimental data set compared to the simulated data set. We note that by adding an extra category and training set devoted to an anisotropic metal tip scenarios, it may be possible to train the artificial intelligence to distinguish the microscopic source and mechanism of the anisotropy, thus eliminating the potential bias from instrumental anisotropy as well. However, this is beyond the scope of the current paper and left to future work. Our results therefore further demonstrate the utility of ANNs for STM data analysis and their capacity to capture phase-defining universal physics from abundant microscopic information.
The rest of the paper is organized as follows: in the next section, we present our model setup for simulated LDOS data, the architecture of the ANN, and the supervised machine learning algorithm. The training results are discussed and compared with traditional approaches in Sec. III. In Sec. IV, we study the application of our ANN on STM experimental data in CaFe 2 As 2 . Sec. V is our conclusion and future outlook. There are several appendices which support the methods and results presented in section II.

A. Models and data set generation
A simple model of a nematic ordered material is the square lattice tight binding model with anisotropic hopping where α ∈ {x,ŷ} and r + α denotes one of the square lattice sites nearest to the site r. At the appropriate filling via a choice of the chemical potential µ, this model roughly emulates the band structure expected of an overdoped cuprate superconductor in its normal state. Further, the hopping parameters t α characterize the spatial symmetry, and when the difference between the horizontal bond and the vertical bond introduces the nematic order. The specific choices of model parameters are presented in Appendix A. For our purposes, however, this model is too simple for non-trivial local density of states (LDOS) N r (ω) (defined below), since it is invariant under translation and scalar under rotation -spatially isotropic even with hopping t x = t y . In reality, there are further complications due to the sub-unit-cell structure factors and impurities, which generate more information as well as challenges. Here, we focus on the latter and add to the Hamiltonian H = H 0 + H imp the following terms where δt rα and δµ r are only finite at a few locations and characterize the strength of on-bond and on-site local quenched disorders, respectively. The settings of the random disorders including its density, distribution, etc. are also presented in Appendix A. For a given Hamiltonian H, we compute N r (ω) via the imaginary part of the Green's function where h is the matrix entering the full Hamiltonian when using vector notation: H = c † hc and the states |r select a matrix element of h such as r|h|r + α = t α + δt rα , = 10 −5 is a small imaginary part characterizing the width of the energy level. Using different parameters for hopping, chemical potential and impurities, we generate about 5,000 images of N r (ω) for the anisotropic data set (t x = t y ) as well as for the isotropic data set (t x = t y ). The resulting data set then contains the diverse ways of the impurity generated anisotropy in both the nematic and the isotropic systems. In the following, we attempt to coarse grain the detailed and big data in our data sets and summarize the essence of nematic order using ANNbased AI method, and then generalize its understanding beyond simulations to real experimental data.
FIG. 1. Neural networks with the LDOS Nr(ω) as the inputs x, and the outputs y1 and y2, normalized with a sigmoid function, give the probabilistic ANN judgments on the input image being isotropic or nematic, respectively. Each neuron processes its inputs x and output as y = σ(W·x +b), where W are the weights associated with each of the inputs, b is a bias, and σ is a sigmoid function. (a) A neural network with no hidden layer, and (b) a neural network with one hidden layer.

B. Artificial neural network with no hidden layer
A fully-connected, feed-forward ANN consists of sequential layers of neurons. Each neuron processes the inputs from all the neurons from the previous layer according to the associated parameters known as the weights W and the bias b, and outputs its outcome y to the trailing neurons: where x is the input, and σ is the sigmoid function σ(x) = 1/(e −x +1). Due to the existing and efficient optimization algorithm, this architecture and its descendants are highly popular for AI applications. For instance, Fig.  1(a) illustrates a simple architecture for an ANN with no hidden layer. We can think of conventional measures of nematic order from the ANN perspective. Conventionally 13-15 , a nematic order parameter can be defined as a signature of the anisotropy, where Q x = (2π/a, 0) and Q y = (0, 2π/a) are the wave vectors of the two Bragg peaks related by a 90 degree rotation. Essentially, such treatment is a Fourier transform of the STM image, and linear in the input data N r . So we can interpret this as an ANN with a hidden layer of just two neurons, one to detect if O N > 0 the other to detect if O N < 0. This is achieved by first setting x = N r , b = 0, and W = cos(Q x · r) − cos(Q y · r). Then the output of the desired ANN is Of course, we do not take the O N as directly meaningful, but instead compare it to a noise floor where the data is isotropic. These neurons are therefore equivalent to the sigmoid function 0 ≤ σ(±(O N − |b|)/|b|) ≤ 1 where |b| is the value of |O N | in a noisy isotropic image, and we declare an image anisotropic if either neuron fires. The output of these two hidden neurons is then fed to two output neurons, one acting as an "or" gate which fires if either y 1 or y 2 is positive and the other a "nor" gate which does the opposite. So, the problem of detecting nematic order in the presence of Bragg peaks is captured by an ANN with just two hidden neurons.
In the absence of Bragg peaks, the situation appears more complicated. In this case, disorder is necessary to observe anisotropy. But it may also serve as a curse. A local impurity can scatter electrons and generate Friedel oscillations, which spread out anisotropically away from the impurity in an anisotropic system. These oscillations are detectable in the Fourier transform of the LDOS N r (ω). However, complications arise when the field of view is small, or the density of impurities is large. In this case, the conventional direct study of Friedel oscillations breaks down (see pros and cons details of Friedel oscillations for measuring anisotropy in Appendix B).
So, in contrast to conventional approaches that study Fridel oscillations or Bragg peaks, we begin our search for a neural network capable of identifying anisotropy in a dataset without Bragg peaks and in the absence of clear Friedel oscillation signatures due to multiple impurities. We will start with the warm up problem of an ANN with no hidden layer. Then we will add a single hidden layer which is necessary to detect nematic order for the simpler problem of data with Bragg peaks as discussed above and attempt to classify a hard data set where conventional approaches struggle.
We use the TensorFlow package from Google for neural network calculations. We have 256 input neurons representing the LDOS N r at each of the sites on the 16 × 16 lattice. We also have two output neurons with normalized outputs y 1 +y 2 = 1. The outputs y 1 and y 2 represent ANN's probabilistic predictions for the nematic order and the isotropic phases for the input x = N r , respectively. The corresponding weight W is a 256 × 2 matrix; and the bias b, the threshold above which a neuron fires, is a 1 × 2 vector. Fig. 1(a) shows an example of such an ANN. We use supervised machine learning algorithm to training the ANN, and optimize the weights and biases using the gradient descent method so that the outputs are as consistent as the true nematic and isotropic classifications of the training data sets as possible(see Appendix C for details).
Remarkably, we only obtain maximally a 53% accuracy upon distinguishing the data sets. Therefore, the ANN with no hidden layer essentially learned little and is hardly better than coin flipping. The residue of the cross entropy loss (see appendix C) is high and above 0.72, indicating that there is inconsistency between the ANN predictions and the true classifications. These are not helped nor improved by prolonged training and increased number of epochs (iterations through the data set). We note that the neural network with no hidden layer is limited to linear expressibility and regressions. Therefore, it seems that a non-linear data analysis scheme is required when the data meets an absence of the Bragg's peaks. Indeed, our above discussion of O N proves we need at least two hidden neurons to detect nematic order in the presence of Bragg peaks. We also see that detecting order is non-linear in the order parameter (|O N | 2 > 0) and an ANN without a hidden layer is linear 20 . So it would be surprising if ANN without a hidden layer could detect nematic order.
In the following, we use a slightly more advanced ANN architecture that is capable of non-linear expressibility, and re-examine the supervised machine learning using the same data sets.

C. Artificial neural network with one hidden layer
We can allow non-linearity by adding a hidden layer to the ANN between the 256 input neurons and 2 output neurons. The hidden layer consists of 31 sigmoid neurons. Fig. 1(b) is an illustration of an ANN with a single hidden layer. The hidden layer neurons are fully connected to the input neurons through the 256 × 31 weight matrix W, and the output neurons are fully connected to the hidden layer neurons through the 31 × 2 weight matrix V. We also introduce a 1 × 31 vector b and 1 × 2 vector a as the biases for the hidden layer neurons and output neurons, respectively. We address the ANN architecture with a single hidden layer and its training algorithm in Appendix D.
We are able to achieve a satisfactory 95 % accuracy with the ANN with a single hidden layer before it saturates. Additional preliminary studies suggest that further of the hyper-parameters such as the learning rate, regularization, and number of neurons, can increase the accuracy and bring down the loss even more to a near 100 % accuracy stage. Hence, the addition of one hidden layer enables the network to identify nematic order. In the following, we will focus on an ANN with a more generally reachable 95 % accuracy, which is an already sufficiently good demonstration of classification.

D. Sensitivity of the one hidden layer AI
To get a sense of the power of the ANN, in figure  2 we plot two images that are examples taken from our database one that is labeled nematic (has hopping t x = t y ) and the other labeled isotropic. Both images seem difficult to label by eye. But the ANN has 69% confidence the labeled nematic image is nematic and the labeled isotropic images is isotropic (i.e. y 1 = 0.69 or y 2 = 0.69). So it appears to be detecting subtle correlations associated with either case. But it is difficult to extract from this whether or not it is actually detecting nematic order. It could be detecting some other correlations that arise in our microscopic model when t x = t y that are unrelated to the symmetry breaking. an image of the weights associated with the hidden neuron which most negatively contributed to isotropic neuron (large, negative element in the V matrix).(Bottom) An image of the weights associated with the hidden neuron that had relatively little effect to isotropic neuron (small element in the V matrix). It seems identifying whether any of these images has nematic order is as hard as the original problem of identifying nematic order in an STM image.
Let us go one level deeper to see if we can understand what correlations are being detected by the ANN. Let's try to understand the social interactions among the neurons. Each hidden neuron weighs the pixels of a simulated STM image differently. In essence, they are each looking for a feature in the data. The output neurons then weigh each of the hidden neurons. They decide to fire if certain features are found by the hidden layer neurons, features which the anisotropic neuron looks for to identify anisotropy and the isotropic neuron looks for to identify isotropy. In Fig. 3, we present three images based on this reasoning. These images depict the weights used by a hidden neuron in assessing an input simulated STM image. The hidden neurons we present are those that contribute the most to whether or not the isotropic neuron fires and one which contributes little to the decision. Namely, they have the highest weight connections to this neuron or the lowest weight connection. Remarkably, there is not much difference between the nematic order in these three weight images, at least as far as we can see by eye. Indeed they appear not so different from the weakly nematic images shown in Fig. 2. So determining if the nematic neuron is indeed weighing the data nematically is as hard a problem as determining nematic order in a weakly nematic image to begin with. Since this is our original task, we cannot directly affirm whether the hidden neurons are assessing the right correlations in the data. But since the weight images of Fig. 3 are similar to the weakly nematic images of Fig. 2. This suggests the criteria is not close to being linear (or simple), or it would be dominated by fewer hidden neurons. The clueless figures in 3 show it is not just searching for particular patterns, but involve interplays of hidden neurons, therefore correlations between the patterns. So while not direct evidence this suggests they are not looking at spurious signals but instead for subtle nematic signals that are hard to detect.
Finally, we supply a real STM data set taken from CaFe 2 As 2 , an Iron-based superconductor, to our AI with one hidden layer. This will check whether it learned to identify anisotropy beyond that arising from simulations of Fridel oscillations off impurities in overdoped cuprates.
In Refs. 18,19 , anisotropy was observed in STM data on CaFe 2 As 2 in several ways. It is strikingly apparent in a Fourier analysis known as quasiparticle interference 18 (similar to our study in Appendix A). It is also apparent in the autocorrelation function of a given image which has a different correlation length in the two directions 19 . The conclusion of these references is that the data appears to look like leaves fallen randomly on the ground (no positional order that would produce Bragg peaks) but where all the leaves point in the same direction. But, as shown in Figure 4, the anisotropy is not so easy to detect by eye and these results are unique to this compound.
Let us now seek additional evidence for anisotropy by passing this data to our AI with one hidden layer. Surprisingly, it claims the image is indeed anisotropic with 99% confidence even though it looks very little like the simulated images. We obtained this result as follows. We pass numerous 16 x 16 sub-images such as that in the inset of the top panel in Fig. 4 to the AI and then assessed the statistics of the predictions. It is remarkabe the AI is so confidence given these images have only long wavelength information and are smooth on the pixel scale unlike the images the AI was trained to analyze. So we tested the conclusion by sampling the image on a larger scale such as every two pixels or every three pixels. On each scale, the AI still remained confident the image is anisotropic. In this way it appears to have learned the  Fig. 2b of Ref. 19) and converted to monochrome color scale. (Top inset) A typical 16x16 sub-image of the data that can be fed to the AI. Many such sub-images were used to analyze the data. (Bottom) The activity of the hidden neurons after receiving the 16x16 sub-image presented in the inset. This shows many more orientation seeking neurons fire than isotropic seeking neurons explaining why the AI believes with 99% confidence the image is nematic.
meaning of anisotropy away from the specific microscopic mechanisms used to create the training set, i.e. the AI appears to follow a renormalization group flow.

III. OUTLOOK
We have trained artificial neural networks with zero and one hidden layer to detect nematic order in the LDOS of materials. The supervised machine learning based on simulated data sets works only after a hidden layer is incorporated into the ANN to enable it to detect correlations in the data. Remarkably, we find that the ANN is sufficiently sensitive that it may even be better than the human eye at identifying the nematic order. Finally, we apply our ANN architecture to real STM data and obtain an anisotropic response, consistent with previous consensus. This result is robust against sampling the data at different length scales suggesting our ANN is able to follow a renormalization group flow 21,22 from microscopic considerations to the global notion of nematic order.

IV. ACKNOWLEDGEMENTS
We thank Seamus Davis and Eun-Ah Kim for illuminating discussion. MJL acknowledges supported in part by the National Science Foundation under Grant No. NSF PHY-1125915. MJL acknowledge the kind hospitality from KITP at the preliminary stages of the work. YZ acknowledges support from the Bethe fellowship at Cornell University.

Appendix A: Creating a database of Nematic and isotropic STM images
We generate a database of STM images from spinless fermions hopping on the square lattice as discussed in section IIA of the main text. The database is characterized by the following parameters: t α The uniform hopping strength in the α ∈ {x, y, x + y, −x + y, 2x, 2y}-directions. t x and t y vary from 1.1, to 1.5 eV. t x+y , t −x+y vary from 0.6 to and 0.8. t 2x and t 2y are 0.3.
µ Chemical potential which sets the filling of the Fermi sea. This is taken to be 1.0 eV (sizably over doped).
δt rα This is a distortion of the uniform hopping between sites r and r + α where α refers to the same set of first, second and third neighbor bonds as for uniform hopping. They are taken to be δt rx = δt ry = 0.15, δt rx+y = δt r−x+y = 0.1 and δt r2x = δt r2y = 0.05 with r a randomly chosen site.
δµ r A chemical potential variation on the site r. This is taken to be 0.2 with r a randomly chosen site.
ω The energy of the states the STM is trying to tunnel electrons into. This effectively changes the chemical potential µ. It ranges from -0.05 to 0.05.
All of these parameters were varied over the ranges discussed above to generate 5940 images, 5400 of which were used for training and the rest for validating. A typical image had between 8 and 52 impurities (3% to 20% impurity concentrations). Note: a fresh data set was generated after all hyperparameters of the neural network were fixed to ensure the resulting accuracy was not biased by the choice of these parameters.
Finally, we present the Fermi surface and band structure for a typical model in the absence of impurities in Fig. 5. The range of such band structures of the set of models defined above did not change by much more than the width of the line used in the plots.
FIG. 5. Typical band structure and Fermi surface of the clean system behind the simulated data set described in this appendix. The Fermi surface is a slightly distorted lattice warped circular Fermi surface.

Appendix B: Comparison with quasiparticle interference techniques
As a comparison with the machine learning approach, we study in this note the possibility of identifying the nematic order though conventional methods on the local density of states (LDOS) in metallic systems. The problem is complicated by local impurities, which inevitably break the symmetries of the original pristine system. On the other hand, the presence of impurities allows us to focus on the quasi-particle interference behaviors in the Fourier transform of the LDOS, which is detectable with STM. A comparison between the peak structures along the high symmetry directions alongx andŷ allows us to determine whether the symmetry connecting the them is present or broken by the nematic order. In the following, we discuss the scenarios where this method fails to yield conclusive results and the introduction of machine learning approach becomes indeed constructive.
For concreteness, we consider a tight-binding model on a two-dimensional lattice: where R w is a set of positions with quenched disorders and w r ∈ [−W, W ] is the on-site disorder strength. We also study a system of size L × L with periodic boundary conditions in both thex andŷ directions. The corresponding LDOS is obtainable through the Green's function: where δ = 0.01 is a small imaginary part that introduces a finite width for each energy level. We set t x = 1.0, t y = 0.5, µ = −2.5 for a nematic metal and t x = t y = 1.0, µ = −3.0 for an isotropic metal. W = 2.0. The LDOS is then Fourier transformed into the momentum space for the amplitude ρ (k) at each wave vector.
In the presence of a single impurity (within L×L lattice sites) and a large system size L = 100, the quasi-particle interference pattern has a clear connection to the Fermi surface and the symmetry of the model. This even holds true with a moderate amount of impurities,see Fig. 6. A straightforward comparison of the peak locations along the high symmetryx andŷ directions reveals whether these two directions are physically equivalent, see Fig.  6(c) and (d). On a 100 × 100 system, the sharp ρ (k) behaviors at ∼ 2k F persists even in the presence of 250 disorders, or an equivalent occupancy of 2.5% of the total sites.
Unfortunately, there exists scenarios where this approach fails to meet our goal: (1) when the field of view is limited, which in turn results in sloppy resolution after the Fourier transform, thus making identifying any difference or discrepancy in k difficult; also, when the impurity density becomes overly large, the Fourier transform becomes too noisy to convey any useful information. As examples, we show in Fig. 7 results of ρ (k) for both the nematic and the isotropic models with a larger density of impurities and a smaller field of view L × L, L = 20. Smoking-gun signatures as those in Fig. 6 are no longer available for a clear-cut judgement. Even in the cases where there may exist vague signatures that we trace back to from the model wave vectors, such as Fig.  7(c) and (d), it is difficult to isolate them from the other noisy peaks, especially when we do not have the answer in advance. Likewise, we apply Fourier transform to the sample data sets we used for machine learning, and a meaningful, interpretable sharp peak signature in ρ (k) is absent in general. One solution to increase the signal noise ratio is to average over disorder configurations, see Fig. 8; however, a large amount of LDOS data set of the same sample or model is necessary to make this approach available.
Therefore, we conclude that the application of Fourier transform on the LDOS data is only helpful in determining nematic order when we have a sufficient field of view with relatively sparse impurities, while the machine learning approach focusing on the original LDOS is not limited in these scenarios. Another scenario where this Fermi-surface-sensitive scheme fails is when the broken symmetry is in the Fermi velocities instead of the Fermi vectors at the specific Fermi energy. For instance, consider the following model Hamiltonian: where we set t x = 1.0, t y = 0.5,t y = 0.167. The model is clearly nematic. On the other hand, the Fermi surface given by the dispersion k = −2 cos (k x ) − cos (k y ) − 0.334 cos (2k y ) is close to isotropic at µ = −2.333, see Fig.  9(a). This leads to the absence of nematic behaviors in the Fourier transform ρ (k) of the LDOS, see Fig. 9(c) and (d). In comparison, machine learning approaches base upon the original real-space LDOS data thus may look beyond the mere Fermi wave vectors.
Appendix C: Training no-hidden layer ANN We trained the neural network using a gradient descent optimizer trying to maximize the accuracy. To complete this we need to minimize the loss of the cross entropy of the expected value y_ and actual value y. The cross entropy is defined as follows where W.x is the matrix-vector product of W and x, reduce_mean sums over the mean of all dimensions and gives back a tensor with a single element and max(a,b) takes the greater of a and b. We calculate cross entropy as a tool to measure the similarity between the predicted nematic order and actual nematic order of the image. We then calculate the loss which represent missing information that would make the prediction correct. Loss accounts for the confidence that the AI has for it decisions since it gives a probability if it is nematic or non-nematic and states it is nematic if y is greater than a half and nonnematic if less than a half for the nematic output neuron. The b value will make it so that 0 ≤ y ≤ 1.
loss = reduce_mean(cross_entropy + 0.00001 * norm(W)) where norm(W) is the matrix norm of W which we define as the sum over the magnitude squared of all its matrix elements. Adding a small number proportional to this matrix norm called is adding "L2 regularization" which reduces over-fitting. Over-fitting is when the network accurately predicts the behavior of the training data but has poor accuracy with testing data due to over specialization. It typically leads to large matrix elements in W.
We now need to define the learning-rate which is a very important hyper-parameter which will be utilized in the gradient descent optimizer with the loss. It is a value that determines how quickly the W and b values are allowed to change. We step the learning-rate down every 100 epochs where an epoch is 200 batches of 50 images. Specifically, we choose the learning_rate to be Take note that the learning rate is decreasing over a number of epochs since we want less change of these values the closer it is to the loss minimum. Then we feed learning_rate into the gradient descendant optimizer and have it minimize our loss. With relative ease we can present what this optimizer is doing when there are no hidden layers. We first calculate the gradient of W and b with respect to loss. Then adjust the new W and b accordingly which completes one iteration of the loop. The next step is to stop when we reach a desired accuracy. This is defined as the number of correct predictions of a validating set that the neural network obtains per total number of predictions. We measure this accuracy by stochastically feeding 600 batches of images from a separate validating set of images into this basic accuracy formula and track accuracy and loss after each batch of training. We may tune the hyperparameters and repeat the calculation until we are satisfied with the resulting accuracy. Finally we test the accuracy from a third set of data, the test data set, the same way yet with no hyperparameter tuning and do not allow ourselves to change these hyper-parameters. The result is an optimized no hidden layer neural network ready to be analyzed.

Appendix D: Training single-hidden layer ANN
We also define z_ instead of y_ as the placeholder variable to hold the correct output class (nematic or isotropic) of the image. In the end, this neural network now calculates y = sigmoid(W.x + b) z = sigmoid(V.y + a) Hence, we have more parameters in a single layer hidden neural network to optimize which should allow for better function fitting of the non-linearities. We also have additional non-linearity since the output of the hidden layer y non-linearized the input W.x+b before passing it on to the next layer. This enables it to capture measures of anisotropy not possible in a linear network (such as anisotropy revealed by an autocorrelation function[Ref Milan Allan's Davis group paper on Ca122 superconductor]).
We once again use gradient descent optimizer to maximize accuracy yet we now have 4 parameters that need to be optimized. We still will be calculating cross entropy but know it will be between z and z − . cross_entropy = reduce_mean( max(V.y + a, 0) -(V.y + a) * z_-+ log(1 + exp(-abs(V.y + a))) We next want to calculate loss as before but we need to add the L2 loss for V as well as W. loss = reduce_mean(cross_entropy + 0.00001 * (norm(W) + norm(V))) Adding this L2 regularization helps insure there is no over fitting for calculating both y and z. The learning rate is defined in the same way as for the no hidden layer case. We are not changing the batch sizes or epochs sizes. We again use the gradient descent optimizer to optimize W, b, V and a, the parameters of this single hidden layer network. We keep the same definition of accuracy and once again stochastically feed 600 epochs into our neural network. Then we run our test sets and validating sets.
We are now capable of testing accuracy and loss of this more advanced AI.