A Somewhat Random Walk Through Nuclear and Particle Physics

These notes are an outgrowth of an advanced undergraduate course taught at the University of Maryland, College Park. They are intended as an introduction to various aspects of particle and nuclear physics with an emphasis on the role of symmetry. The basic philosophy is to introduce many of the fundamental ideas in nuclear and particle physics using relatively sophisticated mathematical tools -- but to do so in as a simplified a context to explain the underlying ideas. Thus, for example, the Higgs mechanism is discussed in terms of an Abelian Higgs model. The emphasis is largely, but not entirely theoretical in orientation. The goal is for readers to develop an understanding of many of the underlying issues in a relatively sophisticated way.


Preface
These notes are an outgrowth of an advanced undergraduate course on nuclear particle physics, which I first taught in the Spring Semester of 2018 at the University of Maryland. This is an elective course and the curriculum often fluctuates depending on the instructor.
There are many ways such a course could be taught. These range from a purely qualitative description of experimental phenomena organized in a historical manner to formal treatments in terms of quantum field theory. Alternatively, one could emphasize experimental methods and results. A course at this level could be a survey covering a wide swath of material rather superficially, or one that focuses deeply on a few topics. A key challenge in designing such a class for undergraduates is that the natural language of the subject is quantum field theory, a subject typically encountered in the second year of graduate school.
The course that I designed and taught was intended as an introduction to various aspects of particle and nuclear physics. The class emphasized the role of symmetry. The basic philosophy was to try to get students to grasp how nuclear and particle physicists think about the underlying physics. One tactic was to emphasize topics that were were in some sense simple so that students could understand what is happening. In some cases this was to choose topics that could be understood via extremely simple physical reasoning, such as the semi-empirical mass formula in nuclear physics and its connection to a liquid drop picture. In other cases it was to introduce introduce many of the fundamental physics ideas using relatively sophisticated mathematical tools. Thus, the course introduces many of the ideas of quantum field theory. However, to make this accessible to undergraduates, this is often done in as a simplified context to bring out the underlying ideas. Thus, for example the Higgs mechanism is discussed in terms of an Abelian model, rather than full electroweak theory. The emphasis of the course is largely, but not entirely, theoretical in orientation. The goal is for students to develop an understanding of many the underlying issues in a relatively sophisticated way.
The subjects of nuclear and particle physics are vast, and within the community there is no agreed upon standard list of topics that undergraduate class must cover. I tried to find an appropriate mix of topics with immediate experimental relevance such as the use of electron scattering to measure form factors and map out charge distributions and more theoretical issues such as Goldstone's theorem. The collection of topics that ended up in the course might best be described as "A Somewhat Random Walk Through Nuclear and Particle Physics".
While there are a number of undergraduate textbooks aimed at nuclear and/or particle physics, for a variety of reasons none of them was suitable for the kind of course course that I thought best for students at this level. Since there was no book which was really suitable for the course, I produced a set of hand written lecture notes which I distributed to the students. They were a poor substitute for a book. Apart from the hand written quality of those notes, to call them "very terse" was gross understatement: they had equations, very few words and extremely limited explanation.
The notes produced here are quite different. They are a more-or-less self-contained document. They represent a collaboration between an undergraduate student, Nick Poniatowski and me. Nick's role in producing these notes is quite remarkable and cannot be overstated.
Nick served as the teaching assistant for the course the second time it was taught (Spring 2019). This was remarkable given that, at the time Nick was an undergraduate junior who had not taken the course, had never studied most of the topics in the course and whose a research interest was (and remains) in experimental condensed matter physics. A priori it seems kind of crazy that a student with his background was allowed to TA such a class under these circumstances. However, when asked Nick me if he could serve as an undergrad TA, I agreed. I knew Nick's talents well, having supervised Nick in an independent study on quantum field theory the previous semester and was certain that he could do it.
As a TA, Nick offered to typeset my handwritten notes to aid the students. This was already above and beyond the call of duty, but I was pleased for the help. At the time, I assumed he was merely going to transcribe the notes. However Nick did much more than this. Instead of the word or two that I had in my original notes, Nick had full, thoughtful and clear explanations.
It is said of the famous series of books by Landau and Lifshitz, "Not a word of Landau and not a thought of Lifshitz." Given my description of the way these notes were put together, one might think that these notes would be"Not a word of Cohen and not a thought of Poniatowski." Indeed, you might think, "how could it be otherwise?" given that Nick wrote these as an undergraduate (mostly as a Junior) working in experimental condensed matter physics. However, that would be grossly unfair to Nick. In point of fact, during the semester when I was teaching when Nick felt that my lectures lack sufficient background, he added added supplementary material to the notes. Thus for example, in the section on gauge theories Nick added an entire section: "A Quick and Dirty Group Theory Primer", to which I contributed nothing, similarly the discussion of symmetry breaking in the context of the Ising and Heisenberg models of magnetism was entirely his. Overall Nick was the driving force behind the project to create a set of useful notes that can serve as an undergraduate level introduction to this subject.
We have not included references in the notes. The material is on the whole sufficiently well-established that there is no need to credit the original authors. We have included a list of books for further reading so that students can pursue these topics in greater depth.
At the end of these notes are number of problems. Student that wish to get a better sense of the subject are urged to work through these problems.
Finally, it is likely that are errors in these notes. The authors would greatly appreciate, if you would call any of these to our attention so that we can fix them. Please direct any such feedback to cohen@umd.edu or nponiatowski@g.harvard.edu.

Historical Introduction
If one were to ask a contemporary physicist for a simple cartoon overview of our current understanding of fundamental physics they would probably give something like Fig. 1. In that figure the overall description of nature is divided into three areas-matter: the stuff of which things are composed, interactions: the forces that the matter feels, and "the rules of the game": the overarching intellectual structures used to describe the matter and interactions. Things that are well-established are included without question marks in the figure, while more speculative things are labeled with a question mark. Thus, for example "dark matter" has a question mark-while we know that there is dark matter and it constitutes a large fraction of the universe, we do not know what it is. Now before one panics looking over the complexity of the physical world as described by Fig. 1 and contemplates working through these lecture notes, it is important to realize that these notes are not intended to fully cover the current state of our understanding of fundamental physics. In fact, they address a rather small fraction of the physics in the figure.
Rather, these notes emphasize physics developed over a forty-year span: from some aspects of nuclear physics developed in the mid-1930s to the standard model which was constructed by the mid-1970s. Even then, the notes pretty much over-simplify most of the topics in order to focus on the underlying physical ideas. Before beginning this scientific journey, it is is perhaps instructive to consider briefly how the world's understanding of fundamental physics developed over the forty years prior to the mid-1930s. In fact, that forty year period probably represents the single biggest surge in mankind's understanding of nature at a fundamental level of any comparable period. Consider Fig. 2, which revisits fig. 1 but crosses out that which was unknown at the time.
By 1935 many of the crossed out aspects were already well-established including quantum mechanics, special and general relativity, the strong and weak nuclear forces, electrons, protons, neutrons and nuclei, anti-matter and the pion (at least as a conjecture).
If one wishes to skip this historical introduction and jump into the meat of these notes, please free to do so.
In 1895 there was neither nuclear nor particle physics, since neither nuclei nor subatomic particles had yet been discovered.
Indeed, at the time the existence of atoms and molecules remained controversial. While they were useful in describing many aspects of chemistry and the assumption of atoms and molecules allowed the derivation of thermodynamic results from the statistical mechanics of Boltzmann, Maxwell and Gibbs, prominent scientists at the time including Mach (of Mach's principle fame) and Ostwald (a future chemistry Nobel laureate) both doubted the physical existence of atoms at this stage. Indeed, one of the reasons that Einstein's 1905 paper on Brownian motions was so important was that it was one of the final nails in the coffin of resistance to the acceptance of atoms. Interestingly, the discovery of both subatomic particles and of the nucleus stemmed from the same technical development of the late 19th century: cathode-ray tubes. Many physicists studied what happened when one ran an electric current in evacuated glass vessels which had a large electrostatic potential across them. J.J. Thompson's 1897 discovery of the first subatomic particle, the electron, was directly tied to cathode ray tubes. By studying the curvature of these rays in magnetic fields Thompson showed that whatever composed cathode rays had a fixed ratio of charge to mass, which Thompson measured. The simplest explanation for this is that they were composed of a single type of particle with fixed charge and mass. When Millikan subsequently measured the charge of the electron, its mass was determined as well.
The connection of cathode rays to the discovery of the nucleus was far more indirect. At the end of 1895, Röntgen discovered a new type of penetrating radiation-x-rays (or as the German's still call them, Röntgen rays)-while experimenting with a cathode ray tube. These were produced when high voltage cathode rays impinged on a surface. Now we know that x-rays are ordinary electromagnetic radiation at an extremely high frequency. Moreover, its origin is of an atomic nature (inner shell electrons) and has nothing to do with nuclear dynamics.
However, the discovery of X-rays was a true international sensation both in the popular press and among scientists. Virtually any scientists with the resources to study X-rays, did so. One of these was Henri Becquerel. Becquerel was the physics Professor at the Musum National d'Histoire Naturelle-a job previously held by his father and grandfather. Within a few months of the announcements of Röntgen's discovery, Becquerel conducted a fateful experiment. Röntgen had reported that X-rays cause phosphorescent materials to emit light. Becquerel wondered whether there was an inverse process in which light impinging on a phosphorescent material would emit X-rays and set out to test the idea experimentally.
As it happens, we know this process does not exist. Becquerel, did not. He decided to probe this idea in the following way: expose a phosphorescent screen to bright sunlight and then place it in a sealed envelope with an unexposed photographic plate. Since he knew that X-rays expose film, he hoped to show that the phosphorescent screen would expose the film, indicating X-rays. However, the day he set out to do the experiment was cloudy, he decided to proceed, in any case-presumably to give a comparison to what happened. When the film was developed Becquerel was surprised to discover that it had been exposed. Puzzled by this, he repeated the experiment, this time keeping the phosphorescent material entirely in the dark; again the film was exposed. Becquerel deduced correctly that whatever was exposing the film was spontaneously coming from the phosphorescent material itself.
As it happens, the phosphorescent material used by Becquerel was a uranium salt and Becquerel was able to deduce that some new type of radiation was emanating from the uranium. He had discovered radioactivity.
Thus began an intense period of study of radioactivity. At the time, it was not known that radioactive decays came from nuclei; indeed the existence of the nucleus would not be deduced for another 17 years. But early researchers, led by the husband and wife team of Pierre Curie and Marie Skodowska Curie were able to determine quite a lot about radioactivity.
It was realized that different types of chemical elements had characteristic radioactive decays. Each type of decay was associated with a characteristic half-life. Radioactive properties were used to deduce the existence of new elements-the first two, radium and polonium were discovered by the Curies. It was soon realized that some chemical elements had distinct decays with different half-lives, so that isotopes of elements which were were virtually identical chemically but never-the less distinct.
It was also recognized there were distinct types of radioactive decays: α decays which were deflected by electromagnetic fields and easily stopped by matter and which were monoenergetic, β decays which were also deflected by electromagnetic fields, were far more penetrating and had a continuous energy spectra, and γ radiation which was associated with some α and β decays, were not deflected by electromagnetic fields, were highly penetrating and mono-energetic. We now understand that α decays involve the emission of 4 He nuclei and are associated with the strong nuclear force while β decays involved the emission of an electron and were associated with an entirely distinct force-the weak nuclear force. Thus, two of nature's forces which had gone unnoticed throughout human history were uncovered within a few years of each other. It turns out that γ radiation involves the emission of an ordinary photons, but they are far more energetic than ones coming from atomic processes.
While these studies of radioactivity were taking place there were two revolutionary theoretical developments. The earliest ideas of quantum physics were formulated by Planck in 1900 to help resolve paradoxes in the statistical mechanics of black-body radiation and five years later by Einstein to explain the photoelectric effect. That same year Einstein proposed special relativity.
On the experimental side, Rutherford then began a series of truly revolutionary studies. He demonstrated that when when radioactive decays occur, chemical elements change from one type to another. When he won the Nobel prize in chemistry for this work he note that he "had dealt with many different transformations with various time-periods, but the quickest he had met was his own transformation from a physicist to a chemist." This work helped clarify what is going when a radioactive decay occurs. The system (which Rutherford subsequently showed to be a nucleus) is characterized by two positive integers Z, the electric charge in units of e and A which is to pretty good approximation proportional to the mass. When an α decay occurs Z decreases by 2 while A decreases by four; in contrast in a beta decay, Z increases by one while A remains the same.
Rutherford had a brilliant insight: instead of passively studying matter by looking at how some types of matter emit radioactive particles, one could use radioactivity as a probe of matter. He designed an experiment carried out at the University of Manchester by a postdoctoral scientist Geiger (of Geiger counter fame) and an undergraduate Marsden, in which α radiation impinged on a thin gold foil, and the angle of their deflection was measured. At the time, the prevailing atomic model was Thompson's "plum pudding" model in which the electrons (the "plums") were contained in a diffuse positively charged "pudding". (Ironically this almost the exact opposite of what actually happens in which a diffuse quantum mechanical cloud of electrons surround a compact nucleus.) Since the electrons were much lighter than the α, Rutherford expected very small deflections. He was shocked when Geiger and Marsden found deflections at all angles including at back angles. Rutherford was dumbfounded: "It was quite the most incredible event that has ever happened to me in my life. It was almost as incredible as if you fired a 15-inch shell at a piece of tissue paper and it came back and hit you." During the next two years Rutherford analyzed the data. He eventually realized that that the differential cross-section observed by Geiger and Marsden was that of scattering off of a Coulomb potential. This analysis is quite impressive-while the calculation of the differential cross-section for classical scattering from a 1/r potential is now a standard undergraduate exercise, the concept of differential cross-section did not exist when Rutherford began his analysis; he needed to invent it to proceed.
Rutherford's analysis indicated that there was a small, heavy charged core at the center of the atom-a nucleus. Rutherford quickly realized that the mass of the atom was almost entirely contained in the nucleus. The atomic number Z which determined the chemical properties was given by the charge of the nucleus. Rutherford rapidly postulated that the nucleus of the hydrogen atom was a charge unity particle with A = 1: the proton.
First Bohr and subsequently Sch"odinger, Heisenberg, Dirac and others took the quantum ideas of Planck and Einstein and constructed a viable and quantitatively accurate description of the atom.
Rutherford's experiment gave very little information about the nucleus other than its existence and an upper bound on its size. The differential cross-section was that of a Coulomb potential, and the nature of a spherically symmetric charge distribution looks like a point charge at the center when one is outside the distribution. Thus, the results of the gold-foil experiment meant that the few MeV α particles did not have the energy to get close enough to the nucleus to penetrate it.
The question of how one could probe the dynamics of the nucleus itself was critical. Clearly, nuclei had their own internal dynamics: γ radiation, the emission of photons from excited nuclear states, was analogous to the emission of photons from excited atomic states that Bohr had described. Rutherford again had an important insight: Since the Coulomb of repulsion of gold was too strong for an α particle to penetrate the nucleus, if one shot α particles at much lighter nuclei, they might well be able to penetrate. Rutherford conducted the following experiment: Direct α particles onto a gas of nitrogen in a container. While the analysis took some time, it was ultimately shown that collisions emitted a proton and left behind an isotope of oxygen-Rutherford had discovered nuclear reactions.
Unfortunately, this technique of inducing nuclear reactions was restricted to light nuclei. In order to get charged particles inside of heavier nuclei, one needed more energetic beams than those produced by natural radioactive decays. This motivated early attempts to develop machines that could accelerate particles to high energies. The most significant of these was the invention of the cyclotron by Ernest Lawrence. The key to this device is resonance: rather than giving particles a single large kick, give them many coherent small ones. This was possible because non-relativistic particles in a magnetic field have orbits with a natural frequency that depends on the field strength, the charge and the mass of the particle but not its energy. Thus, if the particle were to cycle in a magnetic field and be driven by a radio frequency electric field tuned to the cyclotron frequency all of the kicks from the RF field would be coherent and the energy would grow. Ultimately, Lawrence's"atom-smashers" grew to be quite large and expensive-it was the beginning of "big science".
Another way to learn about nuclei was developed: instead of using large energies to study them, use high precision. The charge to mass ratio of ions could be determined by measuring how they bent in magnetic fields. This technique-mass spectroscopy-was the same technique by which Thompson discovered the electron. Francis Aston developed the mass spectrometer into a precision instrument. Since the charges of the ions were known (they were some multiple of the charge of the electron) he was able to measure their masses and to do so quite accurately. He found that the masses were not exactly proportional to the mass number A. These small discrepancies were related to the underlying masses of the constituents of the nucleus and, through Einstein's mass-energy relation to the binding energy of the nuclear force. The data on nuclear masses was sufficiently accurate that the binding energies themselves could be be determined quite well. This in turn give significant information about the nature of the strong nuclear force.
As the 1930's dawned much of our modern understanding was in place. Relativity and quantum mechanics were well-established and early attempts to develop quantum field theories were ongoing-although they were afflicted by theoretical problems which would not be tamed until after the second world war. Much had been learned about strong interactions and the nuclear world although there were large gaps.
Weak interactions responsible for β decays remained very mysterious. A critical problem is that the outgoing electrons had a continuous spectrum. One possibility seriously considered at the time was that energy was simply not conserved. Wolfgang Pauli made a critical suggestion that ultimately turned out to be correct, namely that the missing energy was carried by a very light neutral particle (which Pauli dubbed a "neutron"-as neutrons, the partners of proton, had yet to be discovered; Fermi renamed them "neutrinos," Italian for "little neutral ones.") Pauli's suggestion was one of the more remarkable scientific communications of the 20th century. Rather than publish this seminal idea in a peer-reviewed journal, he communicated it in in a very flippant letter to a scientific meeting on radioactivity in Tübingen that he did not attend.
The letter begins"Dear radioactive ladies and gentlemen." He notes the problem, suggests neutrinos as a way out and then writes "But so far I do not dare to publish anything about this idea, and trustfully turn first to you, dear radioactive people." He goes on to say "I admit that my remedy may seem almost improbable because one probably would have seen those neutrons, if they exist, for a long time. But nothing ventured, nothing gained, and the seriousness of the situation, due to the continuous structure of the beta spectrum, is illuminated by a remark of my honored predecessor, Mr Debye, who told me recently in Bruxelles: 'Oh, It's better not to think about this at all, like new taxes.' Therefore one should seriously discuss every way of rescue. Thus, dear radioactive people, scrutinize and judge." Toward the end of the letter he explains that "Unfortunately, I cannot personally appear in Tübingen since I am indispensable here in Zrich because of a ball on the night from December 6 to 7." A critical discovery about the nature of fundamental physics was made by Carl Anderson. When studying cosmic rays in the early 1930s the found tracks that bent in a magnetic field as though it had the same charge-to-mass ratio as the electron but with the opposite charge. He had discovered the positron-the anti-particle of the electron. Interestingly, Dirac's relativistic quantum treatment (which these notes discuss later on) which was formulated a few years prior to Anderson's experimental observation predicts the existence of positrons. However, the prediction was so radical at the time that Dirac basically did not believe his own prediction and, until the discovery of the positron, tried in vain to come up with a sensible way to interpret the positron as a proton.
One interesting sociological fact about the particle physics community is that while it is now commonplace for particle theorists to postulate new particles-a decent particle theorist should be able to propose six new particles before breakfast-through the 1920s it was hard for physicists to even consider the possibility of new particles. At the time only three particles believed to be fundamental were known: the proton (which we now know to be composite), the electron and the photon. Proposing a new particle at the time was a truly radical step. Thus we see Dirac's unwillingness to accept the implications of his own equation and Pauli's very apologetic and tentative proposal for neutrinos.
As noted above, at the time Pauli introduced what we now call the neutrino, the neutron had not been discovered. At the time, it was generally believed that a nucleus was composed of A protons and A − Z electrons to yield the correct charge and mass. The existence of electrons in the nucleus seemed to make sense in that β decays emitted electrons from the nucleus. There were known to be problems with such a picture. In the first place it is unclear how it would fit with a continuous spectrum for β emission and how Pauli's neutrinos would fit such a picture. Also the energetics were problematic since the uncertainty principle would indicate that electrons confined to a region as small as a nucleus should have very large kinetic energies.
The neutron was discovered soon there after. Fredric and Irene Joliot-Curie in Paris (Irene was the daughter of Pierre and Marie Curie) followed Rutherford's approach of bombarding light nuclei with α particles. They studied the reaction of α impinging on 9 Be. A nuclear reaction occurred in which a neutral particle was emitted. The Joliot-Curies assumed that it was a γ ray-a photon. Photons were the only known neutral particles at the time, and as noted, at the time postulating new particles was virtually unthinkable to most physicists. However the "γs" seen by the Joliot-Curies were highly problematic-when directed on paraffin (a good source of hydrogen) they knocked out protons-as one might expect from γs but-the protons were of very high energies; given the need to conserve energy and momentum when the neutral particle knocks out a proton, one need implausibly energetic γs to emerge from the initial reaction. The Joliot-Curies noted this oddity and considered it a puzzle, but never made the intellectual leap of considering the possibility of a new particle.
Chadwick, a Physicist at Cambridge's Cavendish Lab and one of the many nuclear scientists trained by Rutherford, was not averse to the possibility that what was observed by the Joliot-Curies was a new particle. In part this was because given the difficulties of nuclear modeling at the time, Rutherford had previously speculated about the possibility of a neutron . In a series of experiments Chadwick demonstrated that the electric neutral emissions observed when α particles impinged on 9 Be were massive and had a mass nearly identical to that of the proton. Eventually it was shown to be just slightly more massive than the proton-approximately .1% heavier. Ultimately the near degeneracy of the proton and neutron masses gave rise to the understanding that the nuclear force had an underlying approximate symmetry. Going forward in these notes we will use the word "nucleon" to refer to either a proton or neutron when there is no need to specify which one.
With the discovery of the neutron the basic constituents of the nucleus were known. The nucleus was composed of Z protons and A − Z neutrons, bound together by the strong nuclear force. Thus A is the total number of nucleons.
With the discovery of the neutron, Fermi was able to construct a quantum field theoretic description of β decay based on the existence of the neutron and Pauli's idea of a neutrino. In Fermi's theory, while there are no electrons or neutrinos in the nucleus, there is a process in which the neutron becomes a proton and the process itself creates an electron and an antineutrino. While Fermi's theory ultimately turned out to be incomplete and not fully consistent mathematically, it played an important role in the development of the standard model and illustrates some key ideas. We will encounter it later in a later chapter of these notes.
Another key idea tied to quantum field theory was developed by the mid-1930s. Mass spectroscopy had determined the binding energies of a great many nuclei and this allowed for the understanding of some key features of the strong nuclear force. As we discuss in the following section, the systematic of nuclear binding energies implied that the force between nucleons must be short-ranged-unlike the long-ranged Coulomb force that is that binds electrons to nuclei in atoms. Yukawa realized that a short-ranged interaction naturally arises in a field-theory if the interaction arises from the virtual exchange of a massive particle with the range of the potential being inversely proportional to the mass of the exchanged particle. The long-ranged nature of the Coulomb force is associated with the fact that it arises due to the virtual exchange of massless photons. Thus, the short-ranged nature of the strong force between nucleons is associated with the exchange of massive particles that we now call mesons. We will discuss the Yukawa theory in some detail later in these notes.
The discovery of the neutron lead to an enormous advance in the study of nuclear physics. As noted above, prior to this discover it was very difficult to study nuclear dynamics experimentally. Electrically charged probes such as α particles from radioactive decays could only penetrate small nuclei due to the strong Coulomb repulsion. While this problem was ultimately overcome by cyclotrons, which were able to produce high-energy beams, these were very expensive to build and operate. Neutrons, which were easy to produce by shooting α particles on light nuclei such as 9 Be, did not suffer from Coulomb repulsion and hence could be used to probe nuclei large and small.
Fermi, more than any other physicist, seized this opportunity. He was appointed to the newly created chair of theoretical physics in Rome, while still in his mid twenties. From this perch, ostensibly in theoretical physics, he led a group of young scientists that began a remarkable and systematic experimental program that studied neutron induced reactions on nuclei throughout the periodic table. This group made numerous discoveries. One of the most remarkable ones was that the cross-sections for neutron-induced reactions increased dramatically when neutrons were slowed down by elastic scattering against the protons in a medium such as paraffin.
At first this behavior seemed highly counter-intuitive. Naively, if one simply views the nucleus as a collection of protons and neutrons, it seems natural that the higher energy an incident neutron has, the more likely it is to have the oomph to knock out a nuclear constituent and induce a reaction. Bohr advocated a key idea that might explain this apparently counter-intuitive behavior. In the next section we will discuss some aspects of this idea which involves thinking about nuclei as being analogous to drops of liquid. While this picture is clearly not the entire story, it provides a remarkably simple way to understand some basic feature of nuclear physics over a wide-array of nuclei.

The Liquid Drop Model
Historically, the nucleus was viewed as nothing more than the sum of its constituents, the protons and neutrons. In light of this view, the results of the slow neutron experiments were baffling: why should slow neutrons be more effective at starting reactions than fast ones? Intuitively, one would expect precisely the opposite, that a fast neutron would be more effective at knocking a proton or neutron out of a nucleus and starting off a reaction.
Some light was shed on the issue in the mid 1930s by Niels Bohr, who shifted focus to the interactions which held the nucleus together. Although he didn't actually know any of the details of the forces at play in nuclear dynamics, he was able to realize that whatever force was at work must be capable of efficiently sharing energy between many nucleons, and enabling them to act collectively.
As the title of the section suggests, the key analogy is to a drop of liquid. In a water droplet, there are some kinds of forces holding the water molecules together, as well as surface tension, which makes it energetically favorable to minimize the surface area of the drop, pulling it into a spherical shape.
Applying this picture to a nucleus, we can imagine that when a neutron is absorbed it spreads its energy into the "liquid" made up of the nucleons, causing the liquid to be heated up. Over time, the nucleus can dissipate this energy by emitting particles, and eventually return to its ground state. One can then imagine that a low energy particle incident on the nucleus would be more efficient at starting a reaction since it won't just knock off a single nucleon right away, but rather allow its energy to spread throughout the nucleus. As simple as this liquid drop model may sound, it was ultimately the paradigm by which nuclear fission was understood.

Basic Nuclear Energetics
In the context of Bohr's liquid drop model, it is reasonable to suggest that there is a natural density for nuclear matter. While this isn't precisely true for any nucleus (quantum effects play a role), it's a good approximation that roughly fits the trend.
If the density of the nuclear "liquid" is constant, the volume will be proportional to the atomic weight, A (recall this is the total number of neutrons and protons, A = Z + N ). Since the volume goes like the radius cubed, we then expect the radius to be proportional to A 1/3 . If we take this model seriously, we can imagine several contributions to the binding energy of the nucleus, Since each nucleon will contribute some binding energy, the total binding energy should be proportional to A.
However, just like a liquid drop has surface tension, we also expect it to be energetically costly to maintain a large surface area. Since we've already established the radius goes like A 1/3 , we expect the binding energy from surface tension to be proportional to the surface area, and hence scale like A 2/3 . The Coulomb force will act to push the nucleus apart, so keeping it together will cost some energy. Although this is a small effect for small nuclei (about 1% of the binding energy), it becomes important for larger ones. We know the electric potential energy goes like the charge squared over the radius, so we expect a contribution ∼ Z 2 /A 1/3 . The nucleus "wants" to have an equal number of protons and neutrons, so in the simplest case a proton-neutron asymmetry will come with an energy cost ∼ (Z − N ) 2 , or equivalently (2Z − A) 2 . The fact that nuclei with fixed A will tend to energetically favor configurations with an equal number of protons and neutrons (all else being the same) can be understood as due in part to the Pauli principle; the detailed dynamics of the strong nuclear force also plays a role. In any case, it is an empirical fact.
Putting all of these pieces together, we arrive at the Bethe-Von Weizsäcker semi-empirical mass formula, where the nuclear mass is given by where M p and M N are the proton and neutron masses respectively, and BE is the binding energy, This equation simply sums up the considerations we discussed above: the first term is due to the fixed nuclear energy density of nuclear matter, the second accounts for the surface energy, the third for the electric energy, and the fourth for the proton-neutron asymmetry. The coefficients of each term are determined by fitting to experimentally measured masses, and are roughly By rewriting this as the binding energy per nucleon, BE/A, we can quantify the stability of nuclei, where we have defined as the fraction of protons to total nucleons. This form of the equation makes it easy to see that if we can ignore the electric energy, the only remaining term dependent on f p is the final one, which for a nucleus with fixed A is minimized when f p = 1/2, i.e. when we have an equal number of protons and neutrons.
We have already mentioned that the electric term is unimportant for small nuclei, so we expect that for small nuclei the most stable configuration (largest binding energy) has an equal number of protons and neutrons. However, as A increases and the nucleus gets larger, the proton fraction for the most stable configuration will decrease as the Coulomb force becomes more important, making the nucleus neutron rich. This is shown in Figure 3.
The most stable nucleus is that of 56 Fe, which has 8.8 MeV of binding energy per nucleon. 1 This is the maximum binding energy per nucleon, and allows us to neatly divide nuclei into those larger or smaller than 56 Fe. Smaller nuclei will gain binding energy from getting pushed together and fusing, emitting the extra energy via mass and kinetic energy. To achieve fusion, one must start with sufficient energy to overcome the Coulomb repulsion encountered when the nucleus is compressed, which typically only occurs in very hot, "thermonuclear" environments, such as the interior of a star or during the explosion of a hydrogen bomb. The central challenge of creating a safe and controlled fusion reactor is maintaining a sufficient temperature and density for fusion reactions to take place.
On the other hand, nuclei much larger than 56 Fe gain binding energy by breaking apart into smaller nuclei in a process known as fission. Typically the nucleus is locally stable, so it needs some "encouragement" to undergo fission. One commonly used means of encouragement is hitting it with an external particle. Fission can also occur spontaneously due to quantum mechanical tunneling, but it happens at a sufficiently low rate that we wouldn't recommend trying to start an energy company based on it.

Nucleosynthesis
It is generally believed that protons and neutrons were created after the big bang, as the universe cooled. Since the universe was still hot and dense, nuclear fusion could easily occur, and left over neutrons could β decay into protons. Models of this process suggest that after the big bang, the universe contained deuterium ( 2 H), tritium ( 3 H), 3 He, 4 He, and a small amount of Li. These predictions are in line with astronomical measurements, but a big bang nucleosynthesis offers no explanation of where heavier elements come from. These elements are synthesized in stars, which are powered by nuclear fusion. This is essentially an equilibrium process, of which we have a solid understanding; we expect that any step in the fusion process will result in an increase of the binding energy. But, this means that fusion in stars can only account for the creation of elements up to 56 Fe! Elements heavier than 56 Fe must come about via some complicated non-equilibrium process in the presence of many excess neutrons. For a long time, the popular view was that creation of these elements occurred in supernova explosions, but the recent LIGO observation of colliding neutron stars showed that heavy nuclei were synthesized in the process. This means that some (or perhaps most, or all) heavy nuclei are forged in neutron star collisions.
The semi-empirical mass formula also implies that there is a maximum size for nuclei. All of the terms scale as A to some power (using appropriate variables), and the fastest growing term in the binding energy per nucleon is the electric term, The minus sign tells us the interaction is repulsive, since f 2 p and A are positive. Further, f 2 p is generically nonzero because of the symmetry term in (4). The numerical coefficient a elec is small, causing the term to be unimportant, except when A is large. In fact, since this term has the fastest A dependence it dominates at sufficiently large A. Since we've established this term is repulsive, this could feasibly lead to a bound on A, and hence the size of the nucleus.
In addition to the electric force, which is fairly weak, there is also the strong nuclear force. Which, as its name suggests, is strong, but is extremely short-ranged. Meanwhile, the electric force is effectively infinite ranged. Since this course is largely theoretical in emphasis, we can make things simpler by turning off the Coulomb force and asking what happens. (It turns out that experimentalist friends are somehow unwilling to do this for us in the lab). In such a world, the volume term dominates, so even with the surface energy term, there is still nothing precluding infinite nuclear matter.
This infinite nuclear matter is a theoretical substance that gets to the heart of the semiempirical mass formula, in that it would then suggest a natural density for nuclear matter. In fact, experiment suggests that such a natural density exists. Extrapolating from the density of real nuclei, the density of infinite nuclear matter is estimated to be which is to say that there are .16 nucleons per cubic femtometer (1 fm = 10 −15 m).

Improving the Semi-empirical Mass Formula
There are several corrections we could add to the semi-empirical mass formula to improve its accuracy. First of all, we could add an "even-odd" correction, reflecting that fact that nucleons "want" to bind, and thus there is an energy cost for odd numbers of nucleons. We could also add a "shell correction," by considering each nucleon to sit in an effective potential due to all the other nucleons, and additionally subjecting them to the Pauli Principle. Filled shells are most stable, and occur when the number of protons Z or neutrons N = A − Z are equal to 2, 8, 20, 28, 50, 82, or 126. The first three levels can be found by considering a simple spherical potential, and the subsequent ones can be found by taking into account spin-orbit coupling. The last (126) is only possible for neutrons, since we have yet to find an element with Z ≥ 126.
Another class of effect one could include is the contribution of higher order terms in (Z − N ) 2 . Of course, it must be an even function, but we need more data on neutron-rich nuclei to fit this term effectively.

Measuring Nuclear Density
Having established that the density of nuclei is something interesting we'd like to learn more about, the question becomes how we measure it. Ideally, we get a very accurate scale and a very tiny meterstick, but unfortunately things aren't quite so easy. Instead, the trick is to shoot something at the nucleus that has simple and well understood interactions. Then we can hopefully infer the nuclear density from how the particle scatters. The ideal candidate is electron scattering, since we know its interactions are electromagnetic and thus it will only couple to the electrically charged constituents of the nucleus -the protons. We also understand the form of the interaction quite well, whether it be the Coulomb interaction of non-relativistic quantum mechanics or photon exchange in Quantum Electrodynamics (QED), to be discussed later in the course. Further, as we will see shortly the scattering is weak, enabling tractable calculations using either the Born approximation in non-relativistic quantum mechanics, or single photon exchange in QED.
In what follows, and for the rest of the course, we will typically work with natural units where = c = 1.

Natural Units
A brief word about natural units is in order. Conventionally, one measures distances and times with different units. However, we do not have to. Since we know the speed of light, we can specify a spatial distance by specifying a time and consider the distance to be the how far light would travel in that time. You are undoubtedly familiar with the notion of a light-year. This is of course not the same as an ordinary year with 1/3 fewer calories; rather it is the distance that light travels in a year. Thus, light travels 1 light-year per year. Now the innovation of natural units is simply to recast lightyears as years. In that case the speed of light is unity-with no dimensions. One can similarly recast to be unity so that energies and inverse times have the same units.
One neat thing about doing this is that it greatly simplifies dimensional analysis. While we generally will use natural units, in the nuclear domain it is not uncommon to measure distances in fm (which corresponds to 10 −15 m and stands for either Fermi or femtometer) while measure energies in MeV. To convert between them one uses c ≈ 197 MeV-fm To resolve distances of order the nuclear size, R ∼ 1 fm, the uncertainty principle ∆x∆p 1 (in units with = 1) and tells us that we need a momentum transfer q = |p f − p i | of at least 2π/R. To get a momentum transfer of this size, we need an initial momentum p i of the same order. Then, the maximum momentum transfer for elastic scattering is q max = 2|p i |, and we need an initial momentum of roughly Thus, we need ultra-relativistic energy scales to probe the nuclear charge distribution. However, for the sake of simplicity we will ignore this and pretend that non-relativistic quantum mechanics is adequate to address this situation. We'll come back to this problem later in the course-after we have introduced the Dirac equation and treat the issue relativistically. But, for the time being, let's continue on non-relativistically. Suppose we have a charge distribution ρ(r), such that where we have chosen to measure the electric charge in units of e. The potential seen by an electron is just the Coulomb potential, where we have once again made our equations simpler by choosing units with 1/4πε 0 = 1. Hopefully, you've already met the fine structure constant, which in our choice of units is simply α = e 2 . Since V (r) ∼ e 2 ∼ α and α is a small number, the potential, and hence the scattering off of it, is weak. This allows us to use the Born approximation, which is valid for weak scattering, in that it basically assumes that the particle only interacts with the potential once. When we treat scattering in quantum mechanics, our goal is to calculate the scattering amplitude, f (θ, φ), from which we can calculate the differential cross section, which, roughly speaking, is the ratio of particles scattered in a given direction to the number of incoming particles. In the Born Approximation the scattering amplitude is proportional to the Fourier transform of the potential, For later convenience, we will define the scattering angle and momentum transfer as Looking back at our potential (10), we notice that it is a convolution of 1/r and ρ(r).

Reminder: Convolution Theorem
A convolution h(r) of two functions f and g is of the form and we say h is f convolved with g. There is a useful theorem, called the Convolution Theorem, which says that the Fourier transform of a convolution is the product of the Fourier transforms. That is, for h(r) as defined above, its Fourier transformh(q) is wheref are the Fourier transforms of f (r) and g(r).
Using the Convolution theorem we can write the scattering amplitude as the product of the Fourier transforms of the charge density and the 1/r potential, The first term is simply the Fourier transform of the nuclear charge distribution, which is called the electric form factor, g E (q 2 ). We write it as a function of q 2 for a spherically symmetric distribution. The second term is the Born approximation for scattering off of a point charge of charge +e. This can be evaluated exactly, and gives the well known Rutherford formula. Now, suppose our experimental friends measure the scattering cross section, as a function of q = p cos θ. By dividing out the theoretically calculable cross section for a point charge, we can determine the electric form factor, from which we may perform an inverse Fourier transform to find the nuclear charge distri- So, the punchline is that electron scattering allows us to map the nuclear charge distribution! However, before we get too proud of ourselves, we have to remember that we did this calculation within the Born approximation, and we really should figure out how to do this relativistically. Also, we only considered elastic scattering, and inelastic scattering opens up an entirely different set of information. However, the fact remains that within these limits the density can still be extracted, and if we extrapolate to infinite nuclear matter we will find ρ N ∼ .16 fm −3 , as advertised above. However, you've been swindled (get used to it, it'll happen a lot throughout these notes)! In addition to doing this problem non-relativistically, we also treated it as an electron scattering off of a static potential, when in fact, real scattering is a two body problem, where the nucleus moves as well. We can do this more carefully by using the center of mass variable R = m e r e + m n r n m e + m n , with m e and r e being the mass and coordinate of the electron, and m n and r n being those of the nucleon. We also introduce the relative coordinate r = r e − r n and reduced mass µ = m e m n /(m e + m n ), so we can write down the Hamiltonian where p is the momentum conjugate to R. We then repeat our analysis in the center of mass frame where p = 0, and will still find that where g E (q 2 ) is calculated from the cross sections in the center of mass frame, 3 Our limitation to elastic scattering also poses problems, since real scattering can be inelastic, and the nucleus can break up when hit in a process such as (here, D represents a deuteron) e − + D → e − + p + n Clearly, this can't be described by a two body potential! So, only the elastic part of the scattering gives us the form factor. Adding on a comedic number of superscripts, this means Finally, even though our derivation was non-relativistic, the end result holds if we calculate the point charge cross section taking into account relativistic effects (assuming there is no spin involved, which slightly complicates things). The form factor then gets another superscript, Finally, it is worth introducing some notation that you will inevitably encounter in the literature. Namely, the form factor is usually expressed as Here, |p and |p + q are momentum eigenstates, which is the natural basis to use for a scattering problem. After all, scattering experiments basically amount to inserting some particle in a well-defined momentum state, allowing something complicated to happen, and then eventually measure the outgoing particles which have settled into a new momentum state. The ρ(0) operator in the middle is the charge density operator evaluated at r = 0, which we pick solely for convenience. To see why we can always do so, recall that the momentum operator is the generator of spatial translations, i.e. we may translate an operator by r if we act withρ (0) = e −ip·rρ (r) e ip·r If we put this relation into (28), we have We can then evaluate at r = 0 to get the form factor, so without loss of generality we can always simply use g E (q 2 ) = p + q|ρ(0)|p .

Modeling the Nucleus
Now, we'd like to consider how to model the nucleus. This is very rich subject and one that has seen significant advances in recent years. One could easily construct an entire semesterlong course on this subject, and still not do it justice. Here we are only going to consider a handful of very simple models that capture some key aspects. The simplest approach is to simply treat nuclear matter as a finite region of stuff, and describe that stuff as the idealized infinite nuclear matter we discussed in the section on the liquid drop model. Then, the game simply becomes understanding the nature of infinite nuclear matter.
Although the nucleons in a given piece of nuclear matter will create an overall attractive potential for other nuclei, to good approximation the potential will be constant within the interior of the nucleus. This justifies treating the nucleons within the nuclear matter as a noninteracting Fermi gas. Since the nucleons are non-interacting, the single-particle Schrödinger equation for a given nucleon is simply where ε is the single particle energy eigenvalue. We're already familiar with the solution to this equation, If the nucleons were bosons, they would all occupy the lowest energy level k = 0, but they are fermions and are restricted by the Pauli principle. Instead, the nucleons fill the available states starting from the lowest energy levels, until all of the nucleons are accounted for. The energy of the most energetic nucleons is the Fermi energy, ε F = k 2 F /2m, where the corresponding momentum k F is called the Fermi momentum.
If we imagine putting the system in a giant box of size L, the momentum will be quantized as for n x , n y , n z ∈ Z, so the allowed momenta form a lattice in momentum space with spacing 2π/L, and hence the volume of a state in momentum space is To count the total number of nucleons in the system, we can just add up the occupancy of each state. However, for a system of many nucleons we can replace the sum over levels with an integral over momenta, divided by the volume per state in k-space. That is, where the factor of four accounts for the fact that we have two different kinds of nucleons (protons and neutrons), each with two spin states for a given energy level. 5 The step function tells us that we should only integrate up to the Fermi-momentum. We can now divide both sides by the volume of the box L 3 to find an expression for the nuclear density, 6 Reminder: Step functions and Delta functions Recall that the step function (sometimes called the Heaviside function) is defined as In the context of (37), it tells us that the integrand is 1 if p 2 < k 2 F , i.e. the state is filled, and the integrand is 0 for p 2 > k 2 F , i.e. the state is empty. While we're at it, let's take this opportunity to also remind ourselves of the delta function δ(x), which is infinite when its argument is zero, and zero everywhere else. By definition, the delta function integrates to one so long as its "spike" is within the range of integration, This means that if we integrate it against a function f (x), the delta function picks out its value at one point, We can also represent a delta function as To round out our collection of delta function facts, δ(ax) = 1 |a| δ(x), or more generally 5 Recall that the Pauli Principle only applies to identical fermions: two fermions of different species may occupy the same state 6 Here we can safely take the limit L → ∞, where N will go to infinity with it, but the density will remain fixed.
if we put any function f (x) inside the argument of a delta function, we have where x i are the zeroes of f (x) and f (x) = ∂ x f . We can also define the derivative of a delta function inside an integral by integrating by parts, Finally, the delta function is the derivative of the step function, Since the single-particle energy depends only on |k|, the momentum space distribution is spherically symmetric, allowing us to evaluate the integral in spherical coordinates, This expression can then be used to estimate several bulk properties of nuclei. A less crude approach is to use a shell model, where each nucleon experiences an effective potential due to all of the others. This effective interaction can be included in the Hamiltonian, from which one can find the allowed energy levels, which form shells that filled up by the nucleons. The details of the model can then be appropriately tweaked to reproduce the experimental data.
In the traditional potential approach, one deduces the potential between two nucleons from phase shift analysis of scattering data, which looks something like the drawing above. The potential is repulsive at short ranges, attractive and intermediate distances, and has a tail well described by single pion exchange.
However, this potential model is only valid for low energy scattering, since at higher energies mesons can be produced and scattering can otherwise be inelastic. There are several further challenges in using this approach to model many-body systems such as nuclei. In many cases, a simple two-body potential is insufficient. A more accurate calculation requires three-or four-body potentials. Unfortunately, there is no superposition principle for many-body potentials, and a three-body potential generally cannot be decomposed as That is to say, it depends on the positions of all three bodies, not just the relative coordinates between pairs. In this respect, it is fundamentally different from the Coulomb potential, which is fundamentally a two-body potential and satisfies superposition. So, although one can fix V (r 1 −r 2 ) from experimental scattering data, it is much more challenging to constrain the three particle interaction, even armed with scattering data for systems containing three nucleons.
The other major problem is our inability to solve the Schrödinger equation for large systems. Few body systems can in many cases be solved nearly exactly using numerical methods (at least for the ground state), but larger systems are far more challenging. In some cases, their solutions may be approximated by variational methods or some kind of Green's function Monte Carlo, but even these approximations are only reliable up for a system with relatively few particles.

Review of Special Relativity
The language in which nuclear and particle physics is written is that of quantum field theory (QFT). In turn, QFT is the marriage of quantum mechanics and special relativity. Hopefully, after two semesters of courses you have a decent memory of quantum mechanics, but just to ensure everyone is on the same page, its worth briefly recapping special relativity, which may not be as fresh in everyone's memory. This review also has the added benefit of clearly establishing the notation that we will use throughout the rest of the course.
We label the coordinates of an event by a four-vector in spacetime, where the index µ = 0, 1, 2, 3 tells us which component of the four-vector we're talking about, e.g. x 0 = t, x 1 = x, x 2 = y, and x 3 = z. Notice that the index is raised ("upstairs"), and we call such objects contravariant.
We also have the metric tensor, 7 which lets us turn contravariant vectors into covariant vectors, which have lowered ("downstairs") indices, In the second equality we have inroduced the Einstein summation convention, where we agree that everytime we see an index appear twice in a term-once raised and once loweredthat we will sum over it. This saves us the time and energy of having to write lots of summation signs, so much so that this is often (semi-) jokingly referred to as Einstein's greatest contribution to physics. Indices that are repeated, and thus summed over (or, "contracted," if you're fancy) are called dummy indices, reflecting the fact that their names don't matter. That is, This is the discrete version of the probably familiar fact that However, the names of uncontracted or free indices do matter: they must match on both sides of an equation. It's not hard to see that if the contravariant vector is given by (47), then the associated covariant vector is We can also use the inverse metric g µν , which happens to be the same as g µν , to raise indices and turn a covariant vector into a contravariant vector,

Reminder: Indices
For the purposes of this box, let's just stay in familiar R 3 . We typically represent vectors as v = v xx + v yŷ + v zẑ . If we take the completely superficial step of renaminĝ x →ê 1 ,ŷ →ê 2 ,ẑ →ê 3 , and similarly for the components, we can write this more succinctly as If we choose orthonormal basis vectors, soê a ·ê b = δ ab , i.e. the product is one if a = b and zero if a = b, then the dot product of two vectors can be written (55) A smart guy named Einstein realized that we don't actually have to write all of the summation signs, since whenever we sum over an index it appears twice. So, if we just agree that any time an index is repeated we know to sum over it, we can stop writing the summation signs, so the dot product is just v · w = v a w a , with the sum implicit. Now, suppose we have a matrix, If we use the normal "row-column" method of matrix multiplication, we can calculate how it acts on v, You can convince yourself that this is equivalent to the index expression where (Mv) a represents the a th component of the vector Mv. Notice the index a is free-it tells us which component of Mv we're talking about, whereas the index b is contracted (summed over). You can explicitly calculate the terms in this sum and confirm it agrees with "row-column" matrix multiplication.
Given a covariant vector and a contravariant vector, we can form the scalar product, which is the moral equivalent of the dot product in normal three-dimensional space. We now define the Lorentz transformations as the set of all linear transformations which preserve the scalar product, s. We can represent a Lorentz transformation as a matrix, Λ µ ν . The funny index structure (one up, one down) is chosen to reflect that when we contract it with a contravariant vector, we get back a contravariant vector, There is a useful analogy here. Recall that in normal R 3 we define a rotation matrix R as a linear transformation that preserves the norm of a three-vector v. That is, ifv = Rv, then R is a rotation matrix ifv ·v = v · v. This specifies a condition on R that lets us determine if it is a rotation matrix, 8v The same is true in special relativity. A Lorentz transformation is defined to be a linear transformation which preserves the scalar product, and we can determine a condition on Λ µ ν to check if it is a Lorentz transformation, just as in the three-dimensional case. We'll do this in both normal matrix notation and index notation at the same time to help you get a hang of index gymnastics, x · gx =x · gx x · gx = (Λx) · gΛx Making contact with physics, we can state the Principle of Relativity as the requirement that the laws of physics be invariant under Lorentz transformations. Since we know from earlier classes that Lorentz transformations take us from one intertial frame to another, this is just a different way of stating a familiar law. Now, we need to expand our vocabulary: x µ is not the only four-vector in town! In fact, we define a four-vector to be anything that transforms under a Lorentz transformation as In general, a column of four random numbers is not a four-vector, it has to follow this very particular transformation law. On the other hand, if we find two four-vectors A µ and B µ , we can take their scalar product A µ B µ and get a Lorentz scalar, which is a quantity that is the same in all frames. One four-vector that warrants a brief discussion is the momentum four-vector, whose scalar product is the mass squared, To derive these properties, let's consider a moving particle of mass m. We define the proper time τ to be the time measured by a clock moving alongside the particle (i.e. the time as measured in the particle's rest frame). The difference in proper time between two events is which is manifestly Lorentz invariant. Now, let us parameterize the trajectory of the particle using the proper time, where we have an event at each value of τ . If we differentiate this with respect to the proper time, we will end up with another four-vector If u µ is a four-vector, then u µ u µ is a scalar and must be the same in all frames. This means that we can evaluate the scalar product in whatever frame is most convenient, and the result will hold in every other frame. Let's choose the rest frame of the particle, where t = τ and Thus, u µ u µ = 1 in the rest frame, and every other frame. You can check that the most general four-vector for which u µ u µ = 1 is Since this looks a velocity, it makes sense to multiply it by the mass to get the momentum, Then, E 2 − p 2 = p µ p µ = m 2 u µ u µ = m 2 , as advertised. Another important object is the derivative operator: Notice that we've defined this derivative operator with a lowered index, implying that it behaves like a covariant object. We can see intuitively why this should be the case by considering a Lorentz-scalar field, s(x), whose value at a given spacetime point should be the same in any reference frame. Consider the value of this field at two nearby points, s and s. Their difference is ∆s = s − s, which for small separations we can expand as where ∆x µ is the distance between the two points. ∆x µ is clearly a contravariant vector, and we've already stated that ∆s must be a scalar. The only way for this to be satisfied is if the derivative ∂ µ s is a covariant vector, implying ∂ µ itself is covariant. In particular, take note that the contraction of the derivative operator and a Lorentz vector field has all plus signs: This is sometimes called the divergence, for obvious reasons. We can also raise the index on the derivative operator using the metric, Finally, we'll introduce a Lorentz tensor. Just like we defined a vector by its transformation law (and a scalar by the fact it does not transform), we define a tensor as something that transforms with two copies of the Lorentz transformation, Notice that each index transforms like a vector. One of the most important examples of a tensor is the electromagnetic field strength, which is constructed from the four-potential A µ = (Φ, −A) where Φ and A are the scalar and vector potentials. If you evaluate each component of the above and compare to the definitions of the electric and magnetic fields, you can easily show that In light of this, we can organize the components of the field strength into an array of numbers that is not a matrix, We can contract both indices of F µν with itself to get a Lorentz scalar, We'll use this fact in section 9 to formulate electromagnetism in the Lagrangian formalism.

The Yukawa Potential
The semi-empirical mass formula that we discussed in section 2 is only sensible if the nuclear force is short-ranged. However, simply saying a dimensionful quantity like a distance is "short" isn't good enough, we need some other scale with which to compare it. In our case, the important length scale is the typical size of the nucleus, which is typically a few fm. So the range of the nuclear force should be shorter than the typical size of the nucleus, but it also can't be too short: if its shorter than the typical size of a nucleon (proton or neutron) then things stop making sense. We now have an upper and a lower bound for the range of the force, but the question still remains as to what the range of the force actually is, and more importantly why it has the range that it does. 9 In 1935, Yukawa arrived at an insightful answer to this question: he posited that the nuclear force is mediated by the virtual exchange of massive bosons, and the mass of the boson sets the range of the force. This idea was nothing short of brilliant; at the time people did not simply invent particles out of thin air, so to do so was an act of genius. The idea is still relevant today, in fact if one were to observe a mysterious short ranged force not accounted for by the standard model the first thing any theorist would try is a new particle with a mass commensurate with the force's range. Of course, like many discoveries in particle physics, Yukawa's original model, while getting at a fundmental truth, were not quite correct in detail. Now, we've said a few times that the mass of the particle sets the range of the force, but it may not be clear how this is so. Since we're interested in process that are both quantum mechanical and relativistic, the two constants and c are in the game (they've been hiding so far, since we set them equal to one). If we now have a mass m, we can use these constants to write down a length scale, where R is taken to be the range of the interaction (after all, it's the only length scale we have). Returning to civilized units (those with = 1 and c = 1), this is simply written R ∼ m −1 . We previously figured that the range of the interaction should be about a femtometer, so the mass of the mediating boson should be It turns out that the particle Yukawa was looking for is the pion, which actually comes in three kinds: the charged π + and π − which are antiparticles of one another, and the neutral π 0 which is its own antiparticle. The masses of these particles are which are both reasonably close to the rough estimate of 200 MeV, so our simple reasoning about scales was fairly predictive. However, it is not only the pions that mediate the nuclear force; there are many other particles, generally called mesons, which act in this capacity. One should also note that these particles are not fundamental, they're made out of quarks and gluons. We also said that the exchange of these particles is "virtual," which is worth briefly explaining. The notion of a virtual particle only really makes sense in the context of perturbation theory, which we'll learn much more about in section 9.5. As we'll see, the basic idea is that over the course of some physical process, the system can be in an "intermediate state" and include particles that don't appear in the initial or final states. A virtual particle is a particle that exists only in such an intermediate state, and isn't actually observable: it just comes and goes as part of an interaction between other entities.
Although these mesons appear only as virtual particles in the nuclear force, they also exist as real, observable particles. However, the heavier mesons are not stable; they all decay into pions via the strong interaction. The pions themselves also decay via the weak interaction for the π ± and via the electromagnetic interaction for the π 0 , with lifetimes of 2.6 × 10 −8 s and 8.4 × 10 −17 s respectively. These might seem very short, but they are in fact much longer than the typical timescale in strong interactions involving hadrons-which is about 10 −24 s; we will discuss hadrons a bit later in these notes. The timescales relevant to nuclear physics are often a bit longer than these typical hadronic scales, but are still very much shorter than the pion lifetime. The net result being that the pions "look" stable in hadronic and nuclear interactions.
The final aspect of Yukawa's idea that we need to explain is the notion of a particle mediating a force. To do so, let's go back to E&M: we have an electromagnetic field, which in the quantum picture can be thought of as a swarm of virtual photons (which are bosons) in the same state. Charged particles can interact with the one another via the electromagnetic field by exchanging these virtual photons with one another. This photon exchange gives rise to the familiar Coulomb potential between charged particles, The same picture holds for the nuclear force, with mesons playing the role of the photons.
The key difference is that the mesons are massive whereas photons are massless. This means that the potential we get is not the Coulomb potential, but instead the Yukawa potential, where g 2 is the square of the coupling constant, analogous to the e 2 /4π in the Coulomb potential, and m is the mass of the particle. This potential decays exponentially with mr, so the characteristic range of the interaction is indeed m −1 . It is often said the the nucleons are a "source" for the meson field. We can understand what this means by again appealing to E&M, where the sources of the electromagnetic potential Φ are charges and currents, J µ . These sources (charges and currents) can then interact with one another via the electromagnetic potential.
In E&M it is particularly useful to consider physics in the absence of sources, i.e. solutions to the Maxwell equations in vacuum. We know quite well that these solutions are electromagnetic waves which satisfy the wave equation, We've written this equation for some generic massless bosonic field φ rather than E, B, or A µ to avoid the messy complications that arise in E&M due to gauge invariance, which will not be an issue for the present discussion. Notice that we can use the fancy relativistic language from the previous section to write ∂ 2 t − ∇ 2 = ∂ µ ∂ µ , and the wave equation as simply 10 We'd now like to figure out what the equation of motion is for the meson field in the absence of sources. The trick to do this quickly is to use quantum mechanical ideas: we know that as operators we can replace In case you're not familiar with the first relationship, it's nothing more than the Schödinger equation, which says that i∂ t ψ = Hψ.
The Hamiltonian is the energy operator, so we can relate the operator on the left-hand side, i∂ t , to the energy E. Next, we recall that we live in a relativistic world, so the energy and momentum of a massive particle are related by (remember c = 1) which we can write as p 2 − E 2 + m 2 = 0. Swapping out E and p with (89) this becomes 10 The operator ∂ µ ∂ µ is called the D'Alembertian operator and plays the role of the Laplacian in four space-time dimensions. Because mathematicians often write the Laplacian as a triangle (since it is a three-dimensional derivative), it is common to see the D'Alembertian written as a box in the literature, in which case the wave equation is φ = 0. A less common (but aesthetically superior) notation is ∂ 2 φ = 0. To avoid confusion, in this course we'll always just write out ∂ µ ∂ µ .
Of course, for this to make any sense we should act with this operator on a function. Doing so results in the Klein-Gordon equation, You can check that the solutions to this equation are propagating plane waves, which satisfy the dispersion relation In quantum mechanics E = ω and p = k, so this dispersion is equivalent to the relativistic energy-momentum relation (91). In light of this, you can think of the Klein-Gordon equation as the moral equivalent of the wave equation for massive particles. This is all well and good, but we went down this rabbit hole to understand the Yukawa potential, and a potential is the energy for a static object. This means that the solutions we care about are time-independent, i.e. ∂ 2 t φ = 0. In this case the Klein-Gordon equation simplifies to This equation has lots of solutions, but for the time being we'll only be interested in spherically symmetric solutions (that is, solutions that depend only on r = |r|). It turns out that there are two solutions of this form, The second option diverges as r → ∞ which is unphysical, so we should throw it away. Don't worry about how we found these solutions, but feel free to plug them into (96) and check that they work.
If we now stick a source (particle 1) at position r 1 the resultant meson field will be where g is the strength of the coupling to the source. If we add a second particle at position r 2 , the potential particle 2 feels due to particle 1 is which is precisely the Yukawa potential. The overall minus sign means the potential is attractive, and comes about for subtle reasons that we won't worry about in this course.

Discovery of the Pion
Although Yukawa originally proposed mesons to explain the short range of the nuclear force in the 1930's, it's actual application is chiefly to the pions, which were discovered in 1947 in a cosmic ray collision. This was a remarkable discovery but also a source of confusion, considering another heavy(ish) particle was discovered a few years earlier in 1936 by Anderson and Neddermeyer in a separate cosmic ray experiment: the muon. The muon has a mass of 105 MeV, which was within the range expected for the pion, however its not a meson at all. Rather, it is a lepton (which is a kind of fermion which plays no role in nuclear interactions) that is more-or-less a heavier version of the electron. The discovery of the muon was completely unexpected, in that it wasn't needed to explain any previously observed phenomena, and is the subject of I.I. Rabi's famous quip of "who ordered that?" It took a while to disentangle muons from pions, but once things were straightened out it was clear that the pions interacted very strongly with nuclei. The force cause by exchange of physical pions (the one pion exchange potential, or OPEP) was found to fall off like a Yukawa force, with a mass m π . However, the coupling to nucleons is slightly convoluted, owing to the fact that the pions have negative parity and couple to the nucleon's spin in a rather intricate way. We'll discuss this issue later in the course in section 10.5.

The Dirac Equation
At this point we know how to do quantum mechanics, and we know how to do special relativity. The question now becomes how we can put the two together. It turns out that this is not so easy. Our story starts in 1928, when Dirac sought to combine relativity and quantum mechanics in such a way as to maintain the probabilistic interpretation of the single-particle wavefunction. It turns out that his approach was misguided, and the Dirac equation as it was originally envisioned doesn't really make sense. However, when interpreted in the context of quantum field theory (QFT), the Dirac equation is the basis for the correct theory of a fundamental spin 1/2 particle. The Dirac equation also predicted the existence of anti-matter, and automagically gives us the correct g-factor for the electron and spin-orbit coupling in the presence of an electromagnetic field. Despite its rocky start, the Dirac equation was an extraordinarily profound accomplishment. It's also aesthetically quite pretty. Here it is: We'll spend the rest of this section learning where this equation comes from, and what it means.

All is Not Well with Relativistic Wave Equations
Let's remember our old friend the Schrödinger equation, We define the probability density as the squared modulus of the wavefunction, and the probability current as 11 It's then trivial to show that any wavefunction that satisfies the Schrödinger equation also satisfies which is a continuity equation, and tells us that probability is locally conserved. If we consider the time rate of change of the the probability to find the particle within a region V, we have where to get to the last equality we used the divergence theorem to turn the volume integral over V into a surface integral over its boundary ∂V. This has a simple physical interpretation: the only change in the probability contained in a region is the outward probability flux through the surface enclosing it. Further, we can take the region V to be all of space, with the boundary at spatial infinity. Since any normalizable wavefunction must vanish as r → ∞, the current J must also go to zero at infinity. Thus, so the total probability to find the particle is globally conserved. Now, let's try this for the Klein-Gordon equation, which we recall from section 6 is It turns out this equation also has a conserved current, which we might be able to interpret as a probability current. Since the Klein-Gordon equation is in some sense relativistic, the conserved current is a four-vector, A continuity equation is then written as In this notation, its easy to see that the current is conserved: we just differentiate In the second line we used the fact that ψ obeys the Klein-Gordon equation, ∂ µ ∂ µ ψ = −m 2 ψ and its complex conjugate ∂ µ ∂ µ ψ = −m 2 ψ . So, we've found a conserved current! Given the similarity to the probability current in the Schrödinger case, there is hope this could represent a probability current. However, these hopes are dashed by considering the candidate probability density, Problematically, ρ is not positive definite, so we could have negative probabilities-which clearly violates the notion of probability. Moreover, there are real solutions to the Klein-Gordan equation, in which case ρ = 0 everywhere, always. These two issues kill any chance of ρ functioning as a single particle probability density, so ψ can't be interpreted as a single particle wavefunctioná la Schrödinger. In 1934 Pauli and Weisskopf pointed out that this isn't actually a problem if one thinks a little differently, but back in 1928 this perspective was unknown and Dirac was determined to write down a relativistic wave equation with a single-particle probabilistic interpretation. Now, we'll consider how he did it.

Motivating the Dirac Equation
In the most dramatic of fashions, Dirac supposedly had the epiphany of how to fix the Klein-Gordon equation while staring into a fire. He realized the problem was that the Klein-Gordon equation was second order in time, and to admit a probabilistic interpretation he needed an equation first order in time. Then to have any hope of Lorentz invariance the equation must be first order in space as well, so as not to treat space and time on different footing. The idea is to essentially take the square root of the Klein-Gordan equation: all we need to do is factorize ∂ µ ∂ µ into first order pieces, and things should work out. Lorentz invariance requires that the first order equation must have the differential operator appear contracted with a four-vector, i.e. in a term like γ µ ∂ µ . Now, let's suppose that we can find a four-vector γ µ such that If we can, then we can simply factor the Klein-Gordon equation as Then any solution to which we've already seen is the Dirac equation, will automatically also be a solution to the Klein-Gordon equation, and ensure we have the desirable relativistic dispersion E 2 = p 2 +m 2 . Now, the problem has been reduced to just finding the right γ µ .
Notice that we can trivially rewrite γ µ γ ν = 1 2 (γ µ γ ν + γ ν γ µ ). 12 Also remember that we're looking for γ µ such that γ µ ∂ µ γ ν ∂ ν = ∂ µ ∂ µ , where we can write ∂ µ ∂ µ = g µν ∂ µ ∂ ν . Putting these pieces together, we want or, Unfortunately, as you can convince yourself, there is no ordinary four-vector that can satisfy this equation. However, all hope is not lost: what if we let γ µ be a four-vector of matrices? This idea isn't as outlandish as you might think. In fact, you're already very familiar with a vector of matrices, namely the Pauli matrices which are often packaged into a vector, and happen to satisfy σ i σ j + σ j σ i = 2δ ij , which is reminiscent of (117) above. It turns out that there do exist sets of matrices that satisfy our desired condition, in fact there are an infinite number of them! Any set of matrices γ µ which satisfy (117) are said to form a Clifford algebra, and we can choose any convenient set of them we wish. You can convince yourself that the smallest matrices that can form such an algebra are 4×4. In this class, we'll use the convention In this notation, these are pretty hard to remember. However, recalling the definition of the Pauli matrices, (labeling x → 1, y → 2, z → 3) we can write the γ matrices as 2×2 matrices of 2×2 matrices. That is, Or, in even slicker notation, where Roman indices always run over just the spatial components, i = 1, 2, 3. In this form its straightforward to check that this choice of matrices satisfies (117): So, we have found the equation we were looking for! But what does it mean?

Solutions to the Dirac Equation
Before going any further, we're going to introduce the Feynman slash notation, where we denote contraction with the γ matrices by a slash through the contracted four-vector, i.e. / ∂ ≡ γ µ ∂ µ . The γ matrices show up almost everywhere, so this shorthand will save us a lot of writing. The Dirac equation is then written Using our representation of the γ matrices from the previous section, we see this is a compact way of writing the matrix equation (125) For this equation to make sense, we see ψ must be a four component object, which we call a Dirac spinor. Since we've expressed the γ matrices in terms of 2×2 matrices, it is natural to also split this four-component spinor into to its upper and lower components U and L, each of which is itself a two-component spinor, Carrying out the matrix multiplication, we get two equations in terms of the upper and lower components, The two-component spinors U and L can each be written in terms of the spinors which form a basis for two-component spinors. We can use these, and a plane wave with dispersion (remember = 1), to construct solutions to the Dirac equation: where again each component of ψ is itself a two-component spinor, and the prefactor out front is just a convenient normalization. You're invited to check that these are in fact solutions to the Dirac equation by plugging them in. Notice that for a given momentum p we have two linearly independent solutions, suggestively named ψ + ↑ and ψ + ↓ . As the subscripts suggest, these are in fact the two states of a spin 1/2 particle. Also note that in the limit p m, the lower components go to zero and the upper components give us back the familiar two component spinors of non-relativistic quantum mechanics (with the components corresponding to spin up and spin down).
We can take linear combinations of these two solutions to construct more general spinors, or even combine solutions of different momenta into wavepackets. However, we musn't forget that the Dirac equation is a set of four coupled differential equations, and thus we expect four solutions for a given momentum. Where are the other two?
Luckily, the two remaining solutions aren't hard to find, but they do end up posing a where which means these solutions have negative energy. What do negative energies even mean in this context? The situation seems to be very bad! To convince ourselves that this isn't as bad as it looks, let's first remember that the Dirac field represents a fermion, since we saw above that it describes a spin 1/2 particle. Crucially, this means that it satisfies the Pauli principle and we may have at most one particle per energy level. The state with the lowest energy is called the ground state, and in the case of fundamental physics has another special name: the vacuum! So, if negative energy states exist, the lowest-energy physical state (the vacuum) will have all of them filled. 13 This picture of the vacuum being comprised of a sea of filled negative energy states is called the Dirac sea, and although this is certainly not in line with the modern understanding of physics via quantum field theory, it nonetheless enabled Dirac to predict the existence of anti-matter by the following observation: suppose we remove a state of energy −E and momentum p from the Dirac sea. This increases the energy of the vacuum by E, and changes its momentum by −p. So, this "hole" in the Dirac sea looks just like a particle with energy +E and momentum −p! So, we interpret it as an anti-particle. Finally, note that if some interaction were to knock a filled negative energy state into a previously unoccupied positive energy level, we would get a particle where the positive energy state was filled, and a hole (anti-particle) from where the negative energy state was vacated. Thus, particle-hole pairs can be pulled from the vacuum!

Properties of the Dirac Equation
Just like its convenient to define the Hermitian conjugate ψ † when working with complex vectors, it is useful to define the Dirac conjugate, For reasons we won't go into in these notes, it turns out that products likeψψ are Lorentz invariant, while those like ψ † ψ are not. Basically, it all comes down to different representations of the Lorentz group and their properties, and you are mercifully being spared from having to learn about the details. It's not too hard to show that any solution of the Dirac equation will also satisfyψ where the arrow on top of the derivative indicates that it acts to the left. This might seem odd at first, but it saves us from having to write annoying expressions like −i∂ µψ γ µ , which is what the first term above represents more compactly.

Proof
We'll get this fact for free in the next section when we derive the Dirac equation from a Lagrangian, but to get some practice working with the γ matrices we'll also prove it here. We'll start by taking the Hermitian conjugate of the Dirac equation, Since the left-hand side is zero, we can multiply the equation by anything we like. Let's multiply by γ 0 on the right, We may now use the following properties of the γ matrices, so we can bring the γ 0 on the right of (136) next to the ψ † and have as advertised.
Coming back to Dirac's original goal of a relativistic wave equation with a sensible probabilistic interpretation, we do indeed have a conserved current, Provided ψ satisfies the Dirac equation, we can show this current is conserved (i.e. ∂ µ J µ = 0) by differentiating The probability density is the zeroth component of the current, which is exactly what Dirac wanted! We'll see later that although we don't interpret this as the single-particle probability function that Dirac intended it to be, the conserved charge associated with this density, is a statement of the conservation of fermion number (electrons minus positrons). Finally, we can introduce a cousin of the γ matrices, which is a Lorentz scalar (although it has negative parity, which will be discussed later in the course). You can show that γ µ γ 5 = −γ 5 γ µ for all of the γ matrices. Using this new object, we can construct the axial current We can differentiate it to see if it is conserved, and find So ∂ µ J µ 5 = 0, and thus in general axial current is not conserved. However, suppose we have a massless particle. If m = 0 then the RHS above will vanish and the axial current will be conserved! One could be skeptical as to why this matters, since there aren't actually any massless spin 1/2 particles in nature to which this relation would apply. However, there are approximately massless spin 1/2 particles, i.e. particles for which the mass is much smaller than the typical momentum of the system.
For example, in Quantum Chromodynamics (QCD), the up and down quarks have masses m u ≈ 2.3 MeV and m d ≈ 2.8 MeV, which are considerably smaller than typical hadronic mass scales of ≈ 1 GeV. So, in QCD the axial current is almost conserved. In practice, one can treat it as being exactly conserved, and then apply perturbative corrections to reflect that it actually isn't.

The Electron Magnetic Moment
One of the greatest successes of the Dirac equation is that if we couple it to an electromagnetic field in the standard way by replacing i∂ µ → i∂ µ − qA µ (see box), we automatically find the electron g-factor to be 2. As a reminder, the g-factor appears in the definition of the magnetic moment, where q is the charge of the particle, m is its mass, and S is the spin. The algebra required to show this gets pretty messy, but given the historical importance of this calculation, we would be remiss to not include it. In case you'd rather skip the details, the important implication Figure 6: This is our first example of a Feynman diagram, which we will have much more to say about in section 9. The solid line represents the propagation of an electron, and the squiggly line on the bottom represents the externally applied magnetic field we use to probe the system. The arcing squiggly line is a spontaneously emitted and reabsorbed virtual photon.
is that according to the Dirac equation, a truly structureless "point particle" with spin 1/2 should have g = 2. It turns out a real electron has g = 2.002319304 . . .
which is almost in exact agreement with theory. But why isn't it exact? The reason is that the electron isn't truly a point particle; in fact, within QED we can show that the electron has structure due to processes where the electron spontaneously emits and absorbs a virtual photon, as shown in the diagram. This coupling is of order α, so its impact on the g-factor is small. On the other hand, a proton, which is also spin 1/2, has a g-factor of g = 5.585, which is very much not equal to 2. This was one of the earliest pieces of evidence that the proton has internal structure.

Minimal Coupling
The fact that the replacement i∂ µ → i∂ µ − qA µ will couple a particle to an electromagnetic field is often taken to be common knowledge, but is rarely ever actually explained. We'll start at the beginning. As you may know, the classical Lagrangian for a charged particle in an electromagnetic field is where Φ and A are the scalar and vector potentials. To see that this is the correct Lagrangian we can calculate the Euler-Langrange equations and confirm that the equation of motion this will give us is the Lorentz force law. If you need a refresher on Lagrangian mechanics, see the review in section 9.1.
Things are easier if we consider components x i instead of vectors, and the Einstein summation convention will remain in effect. The Langrangian is then written We'll need the derivatives d dt to write down the Euler-Lagrange equations: Using the definition of the electric field, The second term will turn out to be the i th component of qv × B. You can show this by using B = ∇ × A, which in terms of components is written B i = ε ijk ∂ j A k and the identity ε ijk ε ilm = δ jl δ km − δ jm δ lk . In any case, if you work through the algebra you'll find the Euler-Lagrange equations give us so the Lagrangian we started with does indeed describe a charged particle in an electromagnetic field. Going back to vector notation, the canonical momentum is We can then construct the Hamiltonian via the Legendre transformation, So we see the effect of the field is to shift the canonical momentum p → p − qA. Now, to go from classical to quantum mechanics we must replace the c-number momentum with the momentum operator, which has the position space representationp = −i∇.
We then see that in the presence of an electromagnetic field, this operator gets shifted p = −i∇ → −i∇ − qA, or generalizing to the relativistic case, This procedure is called minimal coupling, and is often implemented by introducing the gauge covariant derivative, D µ ≡ ∂ µ + iqA µ . We'll see much more of this when we discuss gauge theories in section 11.
To find the magnetic moment of the electron, we will need to take the non-relativistic limit of the Dirac equation. As a warmup, we'll illustrate the process by first considering a relativistic boson obeying the Klein-Gordan equation, In the spirit of this section, we will interpret this equation (incorrectly) as a relativistic single-particle wave equation. We know that the solutions to this equation time evolve like φ ∼ e −iωt , where ω = E = p 2 + m 2 is the energy of the state. In the non-relativistic limit, p 2 m 2 , so E ∼ m plus small corrections. This motivates us to rewrite φ as where ϕ is a field whose time dependence is much slower than e −imt , and will turn out to be the non-relativistic wavefunction. The time derivative terms in the Klein-Gordan equation are then Since the time-dependence of ϕ is much slower than m,φ mφ, and thus we can drop the second time derivative above, leaving us with ∂ 2 t φ ≈ (−m 2 ϕ − 2imφ)e −imt . Plugging this into the Klein-Gordan equation, we have which is the Schrödinger equation for a free particle. Now, taking the non-relativistic limit of the Dirac equation is a little more involved, in that we have to deal with the spinorial nature of the equation. Recall from the previous subsections that we can write the four-component Dirac spinor ψ in terms of its positive and negative energy solutions, each of which are themselves two-component spinors. Recall also that −i∇ = p, and thus we can write and the Dirac equations becomes (iγ 0 ∂ t − γ j p j − m)ψ = 0, which in matrix form says This gives us two equations in terms of the two-component spinors, In the non-relativistic limit, we are of course interested in the positive energy solutions rather than the negative energy ones. To get rid of the dependence on ψ − , recall that its time dependence is like e −i(−E)t = e iEt . Again, in the non-relativistic limit E ∼ m, and we can replace the time derivative in the second equation by −i∂ t ψ − ≈ −mψ − . The second equation is then just an algebraic equation that can be used to write ψ − in terms of ψ + , Putting (166) into the first line of (165), we get a closed equation for ψ + , Now, we play the same game as with the bosonic field, writing ψ + = e −imt Ψ, were Ψ is the slowly-time-dependent non-relativistic wavefunction. In terms of this wavefunction, the equation above reads Again, we arrive at a Schrödinger equation, with the non-relativistic Dirac Hamiltonian,

The Dirac Equation in Condensed Matter Physics
Having seen that the Schrödinger equation emerges as the low-energy limit of the Dirac equation, we would be remiss to not mention that the Dirac equation can sometimes re-emerge at even lower energy scales. This occurs in a number of condensed matter systems where the periodic potential of the crystal structure can lead to the lowest energy excitations of the system being governed by the Dirac equation, rather than the Schroödinger equation, and give rise to qualitatively new physics. Most notably, these low energy Dirac fermions appear in graphene (a single atomic layer of carbon atoms), d-wave superconductors (including high-temperature superconductors), and the surfaces of exotic materials called topological insulators.
To couple the electron to an electromagnetic field, we implement the minimal coupling procedure outlined above, replacing p → P = p − qA. Specializing to the electron with q = −e, the Hamiltonian becomes At this point, we're done with physics, and all that follows is algebra. Life will be much easier if we use index notation instead of vectors, and make use of the algebraic properties of the Pauli matrices, We can then write the product of two Pauli matrices as Using this identity, the Hamiltonian is In the second term, P i P j is contracted with the completely antisymmetric ε ijk , so only the antisymmetric piece of P i P j contributes. This means we can write ε ijk P i P j = 1 2 ε ijk [P i , P j ]. 14 So, our task is now to evaluate 14 You can think about this by using the usual trick: write . When we contract this with ε ijk (which is completely antisymmetric), the ε ijk {P i , P j } is identically zero. To see this, either expand out all the terms, or notice that if we switch i ↔ j, ε jik = −ε ijk while {P j , P i } = {P i , P j }. Thus, under this change of indices ε ijk {P i , P j } = −ε ijk {P i , P j } and therefore must be zero. This is just like integrating an even function against an odd one: their opposite behaviors cause them to vanish identically.
The first term will give us the coupling between the orbital angular momentum and the field, so we won't worry about it here. Our interest instead lies in the second term: to evaluate the commutator we work out Subtracting the two, we find We now change representations to Just like in introductory quantum mechanics, it is easiest to compute this commutator by acting with it on a test function, Peeling off the test function, we have the operator equation for the commutator, The second commutator we need in (176) Putting these back into (176) gives where in the second equality we've used the definition of the electromagnetic field strength tensor. Using the above relation in the Hamiltonian (174), we have As you can convince yourself by comparing to the explicit matrix form given in section 4, the magnetic field can be written in terms of the spatial components of the field strength as and thus the Hamiltonian is simply In terms of the spin operator S = 1 2 σ, this is Comparing the second term to the canonical interaction term H int = gµ B S · B, we immediately identify g = 2.
And thus we have completed a historically important calculation! Additionally, if we couple the Dirac equation to a Coulomb potential, we will automatically get the correct spin-orbit coupling term and the short-ranged relativistic correction called the Darwin term, both in good agreement with experimental data. However, this is a computation for another class.

Electron Scattering Revisited
In section 3, we considered how one could probe the charge distribution of the nucleus via elastic electron scattering. However, we neglected the intrinsically relativistic nature of the problem, since one needs momentum transfers comparable to the mass of the nucleon to achieve adequate resolution. In this section, we return to the problem of electron scattering and sketch its proper relativistic treatment for determining the charge and and current distributions 15 of the nucleus and in nucleons.
The quantity which one seeks to measure in a scattering experiment is the matrix element of the electromagnetic current, J µ . If we take the simplest model of a nucleon as a point relativistic particle, this matrix element is simply given by the current appearing in the Dirac equation, Here, N (s, p) is the Dirac spinor for the nucleon and s and p label its spin and momentum. Of course, the nucleon is not a point particle and has an internal structure, reflected in the dependence of the current matrix element on the momentum transfer q = p − p. The charge and current densities are then given by the appropriate Fourier transforms of the q-dependent current matrix elements. Our goal in this section is to determine some of the possible structures such functions can take. Note that by virtue of energy and momentum conservation, an elastic scattering process viewed from the center of mass frame will leave the final energies of the electron and nucleon unchanged from their initial values. So, in this frame we have q 0 = 0 and q = 0, so that q 2 = q 2 0 − q 2 < 0. But, since q 2 is a Lorentz scalar, and hence frame-independent, we find that q 2 < 0 in any reference frame. At this point, it is conventional to introduce the notation Q 2 ≡ −q 2 , which is a positive quantity.
By appealing to the symmetries of the nucleon, we can greatly constrain the form of the current matrix element. Since nucleons are governed by Quantum Chromodynamics, they must respect parity and time-reversal invariance (as discussed in later sections), and the only functional forms consistent with these symmetries lead to the matrix element where F 1 (Q 2 ) and F 2 (Q 2 ) are called the Dirac and Pauli form factors. It turns out that both of these functions can be extracted from an electron scattering experiment. However, these form factors do not separate out the electric and magnetic effects, as we know that J 0 and J mix under Lorentz transformations and there is not Lorentz-invariant way to distinguish 15 Why is there a current? Answer: it's spinning (unless it is a J=0 state).
the charge and current densities. However, one can choose a "natural" frame in which the electric and magnetic contributions can be defined. The most convenient choice is the so-called "Breit frame" in which the electron's threemomentum is reversed in the scattering process, i.e. p = −p and q = −2p. This defines a line over which the scattering event occurs (rather than a plane) and allows one to consider the projection of a vector along that line. However, the Breit frame is generally different between different scattering events, i.e. it is a mathematical convenience rather than a physically useful object.
It can be shown that in the Breit frame the electric and magnetic form factors, which are the Fourier transforms of the charge and magnetization densities, (here, M(r) is the magnetization density andẑ is the direction of momentum in the Breit frame) are related to the Dirac and Pauli form factors in the following simple manner: We will omit the demonstration that this is the case for reasons of simplicity. We note in passing that these form factors evaluated at Q 2 = 0 correspond to familiar quantities: G E (0) = Z is the charge of the nucleon (+e for the proton and zero for the neutron), and G M (0) is the g-factor, which measured in units of e/2m p is 5.585 for the proton and −3.826 for the neutron.
Having stated the functional form of the electric and magnetic form factors, we can now briefly discuss how they are extracted from electron scattering experiments. The differential scattering cross section for unpolarized electrons and nucleons is, where dσ/dΩ| Mott is the Mott cross section for relativistic scattering of point particles, whose form is well-known, and = 1 + 2(1 + τ ) tan 2 (θ/2) −1 .
Again we simply state this result without proof for reasons of simplicity. By measuring with different angles of incidence and energies, one can measure the cross section at fixed Q 2 but different values of τ / , from which one can separately determine the values of G E (Q 2 ) and G M (Q 2 ).
In a homework problem at the end of these notes, one shows that the mean radius of a charge distribution is given by the derivative of the electric form factor, Extracting the nucleon radius from an electron scattering experiment in this way gives the radius of the proton and neutron to be, respectively, The negative radius of the neutron suggests that the nuclear charge density is positive at short distances and become negative further away. But, there is a problem with the proton charge radius: the same quantity can be measured using atomic physics, in which one finds which does not agree with the result from electron scattering! This so-called "proton radius puzzle" is a major experimental anomaly, and barring an experimental error, implies that either there is some subtle QED effect which invalidates the atomic calculation, or, most excitingly, hints at physics beyond the standard model. Recent measurements suggest, however, that the resolution to this puzzle may lie in experimental difficulties in the older measurements.
Atomic Measurement of Proton Radius The basic principle behind the atomic measurement of the proton charge radius is that the Coulomb potential goes like −e 2 /r all the way down to r = 0. So, if the proton has a finite size, this potential is less negative inside the volume of the proton, and consequently the atomic level is less tightly bound. The magnitude of this effect depends on the size of the proton and the density of the wavefunction near the origin. For this reason, the experiments are carried out using muonic hydrogen, as the muon is 200 times heavier than an electron, and hence the wavefunction has (200) 3 times as much weight near the origin as one would find in ordinary hydrogen. Of course, there are many other effects which shift atomic energy levels, but they can all be calculated (up to very small corrections), and one can perform what is believed to be a reliable calculation for the proton charge radius.

Quantum Field Theory for Pedestrians
In this section we will introduce the language of modern particle physics: quantum field theory. First, however, we will briefly discuss classical field theory, which we will then quantize to arrive at QFT. Usually one uses the Hamiltonian formulation of classical mechanics to move from classical to quantum physics, but one can also use the Lagrangian formulation to quantize a theory using Feynman's path integral approach. Although we won't discuss path integrals in this class, the Lagrangian formalism will still be used to treat classical field theories, so to refresh everyone's memory and establish notation we will briefly review Lagrangian mechanics in the following section.

Hamiltonian Lagrangian
Quantum Mechanics

Review of Classical Mechanics
Suppose we have a system with N degrees of freedom, e.g. the positions of N/3 particles in 3 spatial dimensions. We combine all of these degrees of freedom or "generalized coordinates" into an N dimensional vector q in the N -dimensional configuration space of the system. If we run the system forward in time, the vector q will trace out a path in configuration space, which specifies the time evolution of all of the particles in real space. The question of how the entire system changes in time is now encapsulated in what path the system takes through configuration space. So, our goal is to determine the path. We do so by associating a number, called the action, with each possible path through configuration space. For a path which starts at q i at time t i and ends at q f at time t f , the action is defined to be where L is the Lagrangian, which depends on the generalized coordinates q and their time derivativesq. The central axiom of Lagrangian mechanics is the Principle of Least Action, which states that the actual path taken by the system is the one for which the action is minimized. So, to determine the path the system takes through configuration space, or equivalently its time evolution, we need to simply minimize S with respect to the path. We can't just set the derivative of S equal to zero, since we are minimizing with respect to a path, not a single variable. To carry out the minimization (technically, the "variation") we consider some path q(t) through configuration space, and suppose we deform it slightly by adding a new time dependent function δq(t), so that we have a new path q(t) + δq(t).
To compare apples to apples, we need to make sure that the new path begins and ends at the same point as the original one, which means the deformation δq(t) must vanish at the endpoints, The time derivative of the deformed path isq(t) + δq(t). The change in the action as a result of this deformation is then and the actual path taken by the system is the q(t) for which δS = 0. The change in the action is In the second line, we simultaneously used the first order approximation on each argument of the Lagrangian, which is justified since the deformation of the path is assumed to be small. The next step is to integrate the second term(s) in (202) by parts, The first term is a boundary term, and vanishes since δq(t i ) = δq(t f ) = 0. Putting the remaining term back into (202), we have To have δS = 0, the above must be zero for any δq, which requires the term in brackets vanish, resulting in the equations of motion called the Euler-Lagrange equations. Notice that i is now a free index, so we have one equation for each generalized coordinate. To make contact with Newtonian physics, its easy to show that if we take the resulting equations of motion are simply Newton's Law F = ma.
In the Lagrangian formalism, the canonical momentum is defined as For example, if L = T −V where T = i 1 2 mq 2 i , then p i = mq i . However, a more complicated system can have canonical momenta that do not agree with the naive Newtonian definition.
Having found the canonical momenta, we may construct the Hamiltonian H(q, p), which is a function of the generalized coordinates and the canonical momenta conjugate to them. To perform the change of variables, we use the Legendre transformation, where since the Hamiltonian may not depend onq, we must invert the canonical momentum to solve forq in terms of p, and use that expression everywhereq appears. For example, if L = T − V and p i = mq i , thenq i = p i /m, and the Legendre transformation proceeds as Given a classical Hamiltonian, one typically quantizes it by promoting q and p to operators on the Hilbert space and imposing the canonical commutation relations, In passing, we also note that quantization can be implemented in the Lagrangian formalism via the Feynman path integral, wherein the system need not take the path for which the action is minimized, but rather it may take any path with a probability weighted by e iS .
Having reminded ourselves of the details of classical particle mechanics, we will now turn to the classical theory of fields.

Classical Field Theory
Simply put, a field theory describes a system whose dynamics is specified by an uncountably infinite number of degrees of freedom. For example, throughout this section we will consider a scalar field, φ(x, t), which assigns a scalar (number) to each point in spacetime. To fully specify the configuration of this field, you would need to tell me the value it takes at each point x in space for every time t. One can think of φ(x, t) as the continuous analogue of the finite dimensional vector q from the previous section. Now, given a scalar field φ(x, t) we'd like to construct a theory describing it. In point particle quantum mechanics we could equally well use the Hamiltonian or Lagrangian formalism, but for a relativistic field theory the Lagrangian approach is preferable. The key word here is relativistic: in any sensible theory the action, which lies at the heart of the Lagrangian formalism, will be Lorentz invariant, and we may thus use it to describe physics in any frame. Since a field theory has degrees of freedom at every point in space, it is traditional to write the Lagrangian L in terms of the Lagrangian density, L, where L depends on the values of the field φ and its spacetime derivatives ∂ µ φ at a particular point in spacetime. The Lagrangian density is used so often that, perhaps confusingly, everyone simply calls it the Lagrangian. In terms of the Lagrangian (density), the action is where the four-dimensional integral measure is shorthand for d 4 x = dt d 3 r. So, to formulate a relativistic theory, all we have to do is make sure that the action is Lorentz invariant, and we're good to go.
On the other hand, the Hamiltonian is not so easy to work with. Just like the Lagrangian, we can write it in terms of a Hamiltonian density, However, the Hamiltonian density is not Lorentz invariant, since it is the 00 component of the stress energy tensor T µν , and hence transforms as Thus, it is substantially easier to work with the Lagrangian to formulate relativistic dynamics. Now, let's suppose we've figured out the Lagrangian for some field theory, how do we get the equations of motion out of it? Another convenience of the Lagrangian formalism is that the story is essentially the same as in the point particle case: we simply need to minimize the action with respect to variations of the field φ(x, t). In what follows we'll work with a scalar field for simplicity, but the generalization to other kinds of fields in straight forward (you just switch the letters!). To carry out the variation, we again consider deforming the field φ(x, t) → φ(x, t) + δφ(x, t). The change in the action δS is then Just as before, we used a linear approximation (203) on all of the arguments of L in the second line (don't forget the index µ is repeated, and thus summed over!). We can integrate the second term by parts, and provided δφ → 0 at the boundary (usually taken to be spatial and temporal infinity) the surface term vanishes so and thus the variation in the action is Requiring this to vanish for all δφ, we arrive at the four dimensional generalization of the Euler-Lagrange equations, As our first example, we'll consider a real scalar field φ(x, t), by which we mean φ(x, t) assigns a real number to each point in spacetime, and that real number is a scalar in the technical sense. That is, it is a Lorentz scalar. The Lagrangian we will consider is which is sometimes called the "Klein Gordon Lagrangian," for reasons we are about to see. To find the equations of motion we need to calculate the derivatives with respect to φ and ∂ µ φ. The first is trivial, The second can be computed by writing ∂ µ φ∂ µ φ = g αβ ∂ α φ ∂ β φ and noting that derivatives in different directions are independent variables, which we can formalize by writing The derivative of the Lagrangian with respect to ∂ µ φ is then where to get from the third to the fourth equality we noted that α and β were both dummy indices and thus we could combine the two identical terms. In practice, ∂ µ φ ∂ µ φ ∼ (∂ µ φ) 2 , so the derivative with respect to ∂ µ φ should be something like 2 ∂ µ φ, which happens to be correct.
Putting these into the Euler-Lagrange equations (219), we find the equations of motion which is the Klein-Gordon equation! So, we have learned that the Klein-Gordon is the classical equation of motion for a real scalar field. We play a similar game with the Dirac equation, starting with the Lagrangian where ψ andψ are to be treated as two independent fields. Since ∂L/∂(∂ µψ ) = 0, the equation of motion forψ is simply the Dirac equation, We can also find the equations of motion for ψ, which we also met in our discussion of the Dirac equation. Although these equations look like single-particle quantum mechanics, we must emphasize that they really are classical equations! To consider something less "quantum-looking," lets consider good old-fashioned electromagnetism, the most famous classical field theory, from the perspective of the Maxwell Lagrangian where the field strength tensor F µν is defined in terms of the potentials A µ = (Φ, −A), as and the second term in the Lagrangian couples the electromagnetic field to a current source, J µ = (ρ, J). The fundamental field here is A µ , and we will get an Euler-Lagrange equation for each component, Obviously, ∂L/∂A ν = −J ν , but the variation with respect to ∂ µ A ν requires a little care. We'll again use the fact that the components and different derivatives of A µ are all independent variables. Here it goes, To get to the last line we used the fact that the field strength is antisymmetric, i.e. F νµ = −F µν . So, Putting this into (230), we get the equations of motion It turns out, this is nothing more than Maxwell's equations, albeit in an elegant notation. First of all, notice that ν is a free index, so the above is actually four equations. Let's first consider the ν = 0 equation, which reads which is Gauss's law. For your convenience, recall the components of the field strength are Next, let's consider the ν = 1 equation: which is the x component of Ampere's Law. As you're invited to confirm, the ν = 2 and ν = 3 equations give us the y and z components of Ampere's law, respectively. You may ask where the other two Maxwell equations went, and the answer is that they've been hiding in front of us all along. The equations ∇ · B = 0 and ∇ × E + ∂ t B = 0 are implicit in the definitions of the electric and magnetic fields in terms of the potentials, so these equations are actually already embedded in the structure of F µν . 16 As a final example before moving onto the big bad world of QFT, we'll consider the unimaginatively named "φ 4 theory," defined by the Lagrangian where φ is once again a real scalar field and m and λ are constants. Although this may look like a minor variation of the (free) Klein-Gordon Lagrangian we previously discussed, the eponymous φ 4 is actually something we haven't met yet; it is an interaction term. The equations of motion are easy to find, since we're already experts on the free Klein-Gordon theory, The first two terms represent the free propagation of a bosonic field, while the last nonlinear term is interpreted as an interaction between the particles. Any theory lacking such interaction terms is said to be free. Because the term in the Lagrangian is φ 4 , this interaction term encodes φφ → φφ scattering, as shown in the figure. We'll learn more about how this works when we discuss Feynman diagrams later on in this section.

Quantizing Canonically: Scalar Fields
Having acquainted ourselves with classical field theory, its time to jump into the deep end and quantize it. We'll follow the normal procedure from particle quantum mechanics: identify the canonical momentum p i = ∂L/∂q i and impose the canonical commutation relations [q i , p j ] = iδ ij . The only difference in field theory is that our set of degrees of freedom is continuous rather than discrete. Unsurprisingly, the canonical momentum in a field theory is itself a field, defined analogously to the point particle case as Note that the fields are time dependent. That means when we promote them to operators in the quantum theory you should think of them like operators in the Heisenberg picture of normal quantum mechanics.

Heisenberg and Schrödinger Pictures of QM
Recall that in quantum mechanics, the things we actually measure are quantum averages, like ψ|O|ψ , where |ψ is a state and O is an operator. Ostensibly, such an average could change in time, and in light of the philosophy that we should only take seriously observable quantities, it shouldn't matter how we decide to implement this time dependence. There are two main "pictures" of how to deal with time evolution. The first is the familiar Schrödinger picture where operators are time-independent and the states evolve via There is also the perhaps unfamiliar Heisenberg representation, where we take the opposite approach: states are time-independent and operators carry the timedependence, evolving as This picture is typically more natural in QFT, given that we are primarily concerned with objects such as creation and annihilation operators and field operators (to be introduced below) rather than wave functions. We should also note that there is a third common picture, called the interaction representation, where both states and operators time-evolve according to different parts of the Hamiltonian. This is the picture one typically uses to derive the Feynman rules (within the canonical formalism), but we will not need to use it explicitly in these notes.
In point particle quantum mechanics, if we impose canonical commutation relations at t = 0 its easy to see from the above that they will still hold at a later time t, Notice that both operators are evaluated at the same time. Because of this, the above are called equal time commutation relations. To quantize our bosonic field theory, we can just replace the discrete operators in the above equation with their continuous field theory counterparts, Again note that both operators are evaluated at the same time. We've also upgraded our delta from Kronecker to Dirac in light of the continuous nature of our fields. Now, to actually compute the energies and momenta of physical states we need to construct the Hamiltonian (density), and in doing so choose a reference frame (and lose covariance). The Hamiltonian (density) is still defined as the Legendre transform of the Lagrangian (density), For example, let's construct the Hamiltonian for our free Klein-Gordon field, where The canonical momentum is π = ∂L ∂φ =φ, from which it follows that the Hamiltonian is Now, the next thing we might want to do is look at what kinds of physical states we can have. Let's first go back to the classical Klein-Gordon equation, which we saw has plane wave solutions, We can think of these solutions as the normal modes, which we can superimpose to write a general solution. This means we can decompose the field φ(x, t) as where classically, the a(k, t) and a † (k) are the amplitudes to be in a given mode. These amplitudes are time dependent, but we're usually going to suppress the explicit time dependence when writing them in order to keep the notation compact. The factor of ω(k) −1/2 is just a conventional normalization to clean up later results.

Lorentz Invariant Integral Measures
To motivate the funny factor of ω k in the denominator of our expressions above, let's ask what a Lorentz-invariant integral measure should look like. It should be no surprise that d 4 k is manifestly Lorentz invariant, but integrating over all possible frequencies and momenta isn't usually what we want. We want to only integrate over states with positive energy (k 0 > 0) and which satisfy the dispersion relation k 0 = k 2 + m 2 ≡ ω k . Note that in this expression k 0 is an integration variable, while ω k is a fixed function of k. This means the Lorentz invariant measure we really want where k 2 − m 2 = 0 is the Lorentz-invariant "on-mass-shell" condition. We can use one of the standard delta function identities to write the above as The theta function kills the second term, so performing the k 0 integral using the delta function, this is simply This is a Lorentz-invariant integral measure. We then choose the square-root in the denominator for our field operators simply as a matter of convention, as it leads to simpler expressions down the road.
We've shown the canonical momentum is justφ, and if we take the time derivative of (250) we'll find Now, let's promote a(k) and a † (k) to operators. If we insist that we can work through the algebra and find that this implies which is precisely the canonical commutation relation we wanted! Both the names and commutation relations (255) of the a and a † operators remind us of the raising and lowering operators from the normal quantum harmonic oscillator. In fact, they actually are the raising and lowering operators for a harmonic oscillator, except now we have an infinite number of harmonic oscillators: one for each mode, indexed by the wave-vector k. However, they now come with a new interpretation, and a fancier name. We'll now call a † and a the creation and annihilation operators for the field, which are interpreted as creating and annihilating a particle with momentum k. To further illustrate the similarity with the harmonic oscillator, we can put (250) and (254) into the Hamiltonian (247), and find In fact, there's an easier way to see this. Instead of considering a system of infinite volume (which we've been doing implicitly), let's consider putting the system in a box of size L with periodic boundary conditions, just like we did previously when discussing the Fermi gas. We'll start with a finite size box, and then take the L → ∞ limit at the end. The allowed momenta are quantized as for n x , n y , n z ∈ Z. The integral over momenta then becomes a sum over the allowed modes, and we get one set of creation and annihilation operators per mode, each of which obeys the commutation relations a nx,ny,nz , a † n x ,n y ,n z = δ nx,n x δ ny,n y δ nz,n z , which are exactly those of a harmonic oscillator. The Hamiltonian (257) To avoid losing our eyesight (and sanity) from all of the subscripts on subscripts, we can rewrite this in terms of the allowed momenta k, as long as we don't forget k now takes discrete values, We can then use the commutator (261) to rewrite the second term as Putting this back into the Hamiltonian, we arrive at a familiar result, which is nothing more than the Hamiltonian for a bunch of harmonic oscillators! Just like in the normal harmonic oscillator, we can define the number operator, which now has the interpretation of counting the number of particles in a given mode k.

The Hamiltonian is then
This has a nice interpretation. The energy of one particle with momentum k is ω(k), and n k counts how many particles we have with momentum k. So, the Hamiltonian is just the sum of the single particle energies ω(k) for each particle in the system. This agrees with the free theory representing non-interacting particles, since there are no contributions to the energy from the particles talking to one another. However, notice that when we have no particles in the system, i.e. when n k = 0 for all k, there is still a contribution to the energy; this is just the energy of zero-point motion familiar from the quantum mechanical harmonic ocillator. However, in the present context there are an infinite number of harmonic oscillators, since there is one for each mode. Thus the zero-point motion contribution to the energy is which is not only non-zero, but is in fact divergent! However, just as before we can ignore this by chanting "only energy differences matter in physics" until the bad thoughts go away. Before going back to the infinite volume case with a continuum of modes, let's note that we can write down a set of basis states as |n k 1 , n k 2 , . . . n kα , . . . , where n km is the number of particles in the α th mode. This is called the occupation number representation. The energy difference between a state with the mode k α occupied and the vacuum (no modes occupied) is Usually, we'll just drop the zero-point energy as a matter of convention. This is equivalent to setting the E = 0 bar (which are free to place wherever we like) to the zero-point energy. Now, returning to the infinite volume case with a continuum of modes, we can build states in a similar way. First of all, we define the vacuum as the state with no particles. Formally, we can define this just like we define the ground state of the harmonic oscillator, i.e. the vacuum is the state for which a(k)|vac = 0, for all k.
To create a particle with momentum-space wavefunction φ a (k) or φ b (k) we can act on the vacuum with the field operators,φ Let's interpret what this means. The momentum-space wave function φ a (k) tells us "how much" of the state is in a given momentum, that is, its amplitude; a † (k) creates an excitation in that momentum mode. So, the net effect is that the operator creates a particle which is "smoothly distributed" in momentum space by the momentum space wavefunction φ a (k). That is, it gives us a wave-packet. Then, to get the quantum mechanical states, we just act with the field operators on the vacuum, We can also use a field operator to add a particle to a state which already has a particle in it, i.e. acting withφ a on |φ b adds a particle with momentum-space wavefunction φ a to a state that already has another particle with momentum-space wavefunction φ b . This is equivalent to acting on the vacuum with both field operators, Now, the key to this game is the commutation relations. We know [a † (k), a † (k )] = 0, i.e. the creation operators always commute with one another. Looking back at (272), this means that the field operators also commute with one another, Coming back to our two-particle state, this means that we can switch the order of the field operators, so So, we have shown that which means that if we switch the particles, the state is the same. So, we have shown that the particles described by this scalar field theory are bosons!! Given this tremendous success, it is now time to turn to fermions.

Quantizing Canonically: Fermion Fields
Luckily, the story for fermions is very similar to the one we've just been through, with one major difference: we'll replace all the commutators with anti-commutators. Just as the commutator is defined as we can also define the anti-commutator, Before getting too far into the details, let's see the effect of this new structure. Let us write the fermion field operators asψ which via the definition (279) can also be writtenψ bψa = −ψ aψb . We'll define the states obtained by acting with the field operators on the vacuum as |ψ a =ψ a |vac , We can consider a two particle state, or, acting with the operators in the opposite order and using the anti-commutator (281), we have the state |ψ b ψ a =ψ bψa |vac = −ψ aψb |vac = −|ψ a ψ b .
That is, which is to say that switching the particles changes the state by a minus sign, or that the state is antisymmetric with respect to interchange. Regardless how we phrase it, this unambiguously tells us our particles are fermions! Specifically, if we takeψ a =ψ b , we have |ψ a ψ a = −|ψ a ψ a , which is only satisfied if |ψ a ψ a = 0, which means two fermions cannot occupy the same state. This is the Pauli Principle. Now, onto the details: we'll start from the Dirac Lagrangian, which describes a free fermion field, Since we've already been through this process, we'll move a little more quickly this time.
The momentum conjugate to ψ is The Hamiltonian is where we've defined which you will often see used in the literature. The important point is that any ψ that satisfies the Dirac equation also satisfies the Klein-Gordon equation, as we saw in section 7. This lets us decompose the field ψ(x, t) into normal modes just as we did before, with a few added complications. We'll first have to introduce the basis spinors, u(p, s) and v(p, s), which are solutions to the Dirac equation with positive and negative energies, when combined with a plane wave, so u(p, s)e −ipµx µ , and v(p, s)e ipµx µ are solutions to the Dirac equation, with positive and negative energy respectively. The index s accounts for the spin of the particle, which can be up or down. A general spinor field may then be decomposed as where b(p, s) and d † (p, s) are currently the amplitudes to occupy a given mode, and will shortly be promoted to operators when we quantize the theory. The dagger on d is just a convention we've introduced to make later results work out simply, as is the overall normalization. It's straightforward to find the canonical momentum, Now, to quantize the theory we promote b and d to operators, and impose the anti -commutation relations, You're invited to confirm that if these hold, then the canonical anti-commutation relation is satisfied. The Hamiltonian can then be written To gain some intuition, we'll once again put our system in a finite box so the momentum is quantized into discrete modes, and the Hamiltonian becomes where we've written the momenta as subscripts to remind us they now take discrete values. For a moment, let's drop the spin label and consider a single mode of wave-vector k. If we let |0 be the state with no particles, we can see that b † k |0 = |1 creates a particle in the mode, b k |1 = |0 annihilates a particle in the mode.
To see that the operators represent fermions, we'll use the anti-commutator With this in mind, let's try to add a particle into a mode that's already occupied, So, we can't put two particles into the same mode, which tells us the particles we are dealing with are fermions. We can also use these relations to show that just as in the bosonic case, the operator n k = b † k b k counts the number of particles in a mode. However, n k now only has eigenvalues 0 and 1: Now, coming back to our mode decomposition (292), we treated the positive and negative energy states differently. That is, the positive energy states were written with an annihilation operator b(p, s), whereas the negative energy states were written with a creation operator, d † (p, s). There isn't any deep reason for this, its simply a matter of convention: the antcommutation relations don't care what we call our operators. However, writing it this way makes things clearer, since having quantized the theory we will now interpret d † as the creation operator for an anti-particle.
Recalling our discussion of the Dirac sea, we expect all of the negative energy states to be occupied in the vacuum. In the occupation number representation, we can write a general state as | . . . , n −α , . . . n −2 , n −1 negative energy states , n 1 , n 2 , . . . n β , . . .
which means that b k |vac = 0, for all modes. We can explicitly see the existence of the Fermi sea by using the anticommutator {d p,s , d p,s } = 1 (the discrete version of (296)) in the Hamiltonian (299), When we have no particles in the system, i.e. the vacuum, the first terms vanish and the energy is given by the last term, which we have identified as the negative energy due to the filled Dirac sea. Again, this is an infinite quantity, but since it is constant we can appropriately calibrate our energy scales such that we can drop this term. In the event we do have particles in the system, the b † p,s b p,s term counts the number of particles in each mode, and the d † p,s d p,s term counts the number of anti-particles in each mode, both of which are multiplied by the single particle energy. This reflects the non-interacting nature of our system of fermions.

Adding Interactions: Perturbation Theory and Feynman Diagrams
So far, we've seen how to solve non-interacting ("free") theories of bosons and fermions, but we haven't said anything about interactions. Unfortunately, once we turn on interactions the theory usually can't be solved. However, all hope is not lost because there is often a reliable way to approximate the answer.
In fact, regardless of whether or not we have a good approximation scheme at our disposal, the structure of the results for physical processes (scattering, decays, etc.) is well understood. In general, we have something like physical result = (kinematic factor) · |M| 2 , where the "kinematic factor" is a frame-dependent factor that is usually obtained from Fermi's Golden rule, and is sometimes called the "phase space factor." We also have the all important M, which is a Lorentz-invariant matrix element between the ingoing and outgoing states. For example, let's suppose we're interested in a decay from particle 1 with (four-) momentum p 1 into particles 1 , 2 , . . . N , each with (four-) momentum p 1 , p 2 , . . . p N . The decay rate for this process for a given set of momenta is where E n = p 2 n + m 2 n . We can then find the decay rate by integrating over all of the outgoing momenta, (308) So, assuming that we can look up the appropriate formula for the kinematic factor, the real question becomes how to calculate the matrix element M, and this is where the approximations come in. One of the most widely used approximation schemes is that of perturbation theory, which we can use if the interaction term is small. For example, in electrodynamics most interactions are proportional to e 2 , where e 2 /4π = α ≈ 1/137 is a small number, making perturbation theory valid in many cases.
Let's first briefly remember some basic facts about perturbation theory in normal quantum mechanics. The words "perturbation theory" probably make you think of something like this, where just to clarify notation, {|n } are energy eigenstates of the non-interacting Hamiltonian, H 0 |n = E n |n , H I is the interaction Hamiltonian, and the full Hamiltonian is H = H 0 + H I . This term has a nice interpretation. The squared matrix element upstairs can be written ψ|H I |n n|H I |ψ , which can be thought of as a transition from the original state |ψ into some other state |n , and then back into |ψ , all mediated by the interaction H I . The system is said to be in a virtual state when it is in the intermediate state |n . Note that the energy of the virtual state is different from that of |ψ , so the likelihood of the transition is suppressed by the energy difference, which the virtual state must "borrow" from the vacuum, and then return when it transitions back into |ψ . However, the state can't borrow momentum from the vacuum, so (presuming the theory preserves momentum and that momentum is a good quantum number for the states) the momentum of the initial and virtual states must be the same. This means that when we sum over the possible intermediate states, we should also integrate over momentum. One can use this sort of calculational scheme (sometimes called "time-ordered" perturbation theory) in field theory, but it turns out to be rather tedious and painful.
A better approach is to note that we are interested in relativistic theories, so a manifestly covariant formalism is ideal. Such a formalism was developed by Feynman, Schwinger, and Tomonaga, and was worth a Nobel prize for them. The idea is to work in 4-momentum space, and integrate over both momentum and energy. We then require that interactions conserve both energy and momentum, i.e. that the virtual states have the same energy and momentum as the initial state. However, this requires that the virtual states don't always satisfy the usual relativistic dispersion, E 2 = p 2 + m 2 , or p µ p µ = m 2 . Because states which satisfy p µ p µ = m 2 trace out a sphere in 4-momentum space, states for which this does not hold are said to be "off the mass shell," or just "off shell." Doing a calculation in covariant perturbation theory essentially amounts to doing one big Taylor expansion in powers of various small quantities. To organize the terms in this expansion, Feynman realized that the same cast of characters appears in every term, and usually in particular combinations. He then assigned a symbol to each of the usual suspects, which when put together allow us to represent each term in the expansion as a diagram. To work out the details of how and why this works takes a bit of effort, so instead of getting into the nitty-gritty details, we'll just state how Feynman diagrams work.
Roughly speaking, there are two main ingredients in a perturbation theory expansion: propagators and interactions. A propagator is exactly what it sounds like, it represents the free (non-interacting) propagation of a particle. Mathematically, it corresponds to the Green's function of whatever operator appears in the non-interacting part of the Lagrangian. For example, the non-interacting part of the Lagrangian for a scalar field is ∂ µ φ∂ µ φ − m 2 = −φ(∂ µ ∂ µ +m 2 )φ, where in the second equality we integrated parts. 17 Then, the propagator is the inverse of the operator ∂ µ ∂ µ +m 2 , which our mathematically inclined friends call a Green function. 18 Although one can develop perturbation theory in real space, it is far simpler to work in momentum space, so the propagators we'll talk about are actually the inverses of the free part of the Lagrangian in momentum space. For a real scalar field, which represents a spin zero boson, we represent the propagator by a dashed line, The value of the propagator isn't too hard to calculate, if you use the fact that in momentum space ∂ µ → ip µ . In the denominator, p 2 without the vector boldface is shorthand for p µ p µ and ε is an infinitesimal quantity which among other things encodes causality.
Since the free part of the Lagrangian is different for different particles, the propagators will also be different. As a second example, the propagator for a spin 1/2 fermion is represented by a solid line and has the value (remember that / p = γ µ p µ ) Next, we'll introduce interactions, which are determined by the rest of the Lagrangian (the non-non-interacting part). In general, an interaction is a vertex in the diagram, where multiple lines (propagators) meet. How many propagators, and for which particles, are allowed in the interaction is determined by the form of the interaction term. This is most easily illustrated by example: suppose we have the interaction 19 Since we have one scalar field and two fermion fields (one barred, and one not), the interaction is between one scalar and two fermion fields. Diagrammatically, we draw a vertex where two fermion propagators (one incoming, one outgoing) and one scalar propagator meet, and assign value g to the vertex, i.e ∼ g.
some theories where topology plays a central role these boundary terms can matter. Such topological effects cannot be captured by Feynam digrams 18 The name comes from the mathematician who invented these functions: there is no deep physical or mathematical significance to the color green. 19 This is usually called a Yukawa interaction.
As a second example, consider the φ 4 interaction we've already met, This describes an interaction between four scalar fields, and is represented by the diagram ∼ λ.
To calculate a matrix element in perturbation theory, we first decide which parameters we want to expand in, and then to what order we wish to calculate. 20 For a process that begins with a set A of particles and ends with a set B of particles, we then draw all distinct diagrams that have the set A of initial particles, and the set B of final particles, as allowed by the interactions in the theory and up to the desired order. At each interaction, we impose momentum and energy conservation. Any undetermined propagators on the interior of the diagram (i.e. in virtual states) are to be integrated over in 4-momentum space. In contrast, the propagators corresponding to the initial and final states ("the external legs") are not counted. The matrix element is then the sum of all such diagrams. This is all made much clearer by a few examples. Let's take the φ 4 theory, and compute the scattering amplitude for the process φφ → φφ, where the ingoing particles have fourmomenta p 1 and p 2 , and the outgoing particles have momenta p 1 and p 2 . Since these particles are on shell, we must have p 2 1 = p 2 2 = (p 1 ) 2 = (p 2 ) 2 = m 2 . Let's assume λ is a small parameter such that perturbation theory is valid. Then, to first order in λ, there is only one diagram we can draw, Since there are no internal lines and the external legs do not contribute, the matrix element is just the factor of λ we get from the interaction. Thus, to first order, M is momentum independent, and |M| 2 = λ 2 .
20 Its worth noting that there exist techniques perform infinite-order expansions by resumming classes of diagrams. This may sound fancy, but it basically comes down to the geometric series you learned about in calculus class. These infinite-order resummations are essential for studying many-body problems, which arise in condensed matter and nuclear physics.
To second order in λ, we have two diagrams The initial and final momenta are on shell, i.e. p 2 1 = p 2 2 = (p 1 ) 2 = (p 2 ) 2 = m 2 , but the momentum k inside the diagram is unconstrained, and thus is integrated over. Now, at second order the matrix element is momentum dependent, as we would expect. Finally, note that even at second order in λ the theory includes scattering processes that do not conserve particle number, such as the φφ → φφφφ process given by the diagram This may be surprising, especially considering the classical Lagrangian only included a twoto-two interaction. In principle, these Feynman rules are sufficient to calculate the matrix element for any process within perturbation theory. However, it turns out there is a big problem!

R is for Renormalization
Let's now consider the "two point" function, which includes perturbative corrections to the mass term. To zeroth order in λ, it's just the bare m 2 from the Lagrangian. To first order, we have the diagram This diagram acts just like a mass term. Note that since the momentum k inside the loop is unconstrained we must integrate it over all possible values. Let's try computing this integral. We can write it as It's now straightforward to do the k 0 integral. Since the integral goes to zero like 1/k 0 as k 0 → ∞, we can replace our integral over the real line with the contour integral through the lower half complex plane shown in the figure. The iε factor moves the poles off the contour, and we can easily compute the integral using the Residue theorem, Now we just have to do the integral over the spatial components, Obviously, infinities are not good! Essentially, they arise due to fluctuations of the large momentum (or equivalently, short wavelength) modes, and are called UV divergences. It turns out the problem is not the formalism of covariant perturbation theory, but rather these infinities are a sickness of the particular quantum field theory we're studying. But does this mean we should throw it away? Luckily, not always. We can salvage our theories using the process of renormalization, in which we note that the values of the parameters we put into the Lagrangian may not be the same as their physical values. For example, the mass in the Lagrangian may not actually be the physical mass of the particle that we measure in the lab. In fact, they can be changed by any (even infinite 21 ) amount in each order of perturbation theory due to virtual processes. For example, if we call the mass in the Lagrangian m 0 , the physical mass can be written The second term is from the divergent diagram we just computed, and the last term is a counter-term, which we can choose to cancel the loop diagram when the system is on shell. However, this doesn't mean we've completely thrown away the impact of the loop diagram: the cancellation only occurs when k 2 = m 2 , so it can still have an effect when the system is off-shell. To first order in λ the propagator gets corrected to where the self energy Σ(k 2 ) vanishes when k 2 = m 2 , such that it only contributes when the particle is off-shell.
However, this procedure will not necessarily fix every theory. The theory is renormalizable only if all of its infinities can be absorbed into a finite number of parameters that are already in the theory (or should have been). On the other hand, if we need an infinite number of different parameters (and hence an infinite number of terms needed to be renormalized) the theory is non-renormalizable, since it is not predictive.
So, we would now very much like to know whether a given theory is renormalizable. A litmus test does exist, but the proof that it works is fairly subtle, so we'll just state the result here, but before doing so we need to develop some new vocabulary. Let's recall the fundamental quantity in our quantum field theory is the action, The action has dimensions (the fancy word for units) of energy × time, which are the dimensions of . But, in our choice of units = 1, so the action is dimensionless. One of the nice features of natural units is that the dimension of any quantity can be written as mass to some power. So, instead of having to think about messy things like units, we can characterize the dimension of any quantity by a single number: its "mass dimension," i.e. we say a quantity has dimension d if its units are ( x] = −4. This means that for the above expression to make sense, the Lagrangian must have dimension [L] = 4. The Lagrangian is a sum of terms, and we can break each term into two ingredients. First of all, we have an operator which is some combination of fields and their derivatives. For example, things like φ 2 , ∂ µ φ∂ µ φ, φψψ, and φ 4 (∂ µ φ) 8 are all operators. The second set of ingredients is the coupling constants, which are numerical coefficients that multiply the operators. For example, the term m 2 φ 2 is comprised of the operator φ 2 and coupling constant m 2 . Or, for λφ 4 , the operator is φ 4 and the coupling constant is λ. https://www.overleaf.com/project/5ed1395d04b8450001bb3400 We can now state the criteria which helps determine whether a theory is renormalizable, which turns out to be very simple: a theory is (perturbatively) renormalizable if the dimension of all operators in the Lagrangian is less than the dimension of spacetime. 22 Note that since every term in the Lagrangian must have dimension four (in our world, where the dimension of space-time is four) this condition is equivalent to every coupling constant having a positive (or zero) dimension.
To assess the dimension of a given term, we first need to determine the dimension of the fields. For a free boson the kinetic term is We know that [∂ µ ] = 1 and that every term in the Lagrangian must have [L] = 4, thus [φ] = (4 − 2)/2 = 1. For a fermion, the kinetic term is from which we see that [ψ] = (4 − 1)/2 = 3/2. With this information, we may now find the dimension of any term in the Lagrangian of a theory for boson and fermion fields. It's then easy to see that mass terms such as m 2 φ 2 or mψψ are renormalizable, since the operators have dimension 2 and 3 respectively, or equivalently the coupling constant for both terms has positive dimension. Next, let's consider the φ 4 interaction, We see the operator φ 4 has dimension 4, and the coupling constant is dimensionless (dimension zero), and thus the term is renormalizable. On the other hand, if we had a four fermion interaction such as L ∼ Gψψψψ, the dimension of the operator is 6 and the dimension of the coupling constant is −2, so we see this interaction is not renormalizable. This result is important: Fermi's theory of weak interactions is of the four-fermion type. Since it has a neutron decaying into a proton, an electron and an anti-neutrino, it contains a term proportional to ψ P ψ N N ψ e ψ ν . This implies that the Fermi theory is not renormalizable and hence is not a fundamental theory of nature.
As our last two examples we can consider the Yukawa interaction which has an operator of dimension 4 and dimensionless coupling constant, and is therefore renormalizable, while a term such as has an operator with dimension 5 and coupling constant of dimension −1, and is not renormalizable.
Renormalizability is a crucial property of any fundamental theory, and in fact was a guiding principle in constructing the standard model. However, even a non-renormalizable theory can be useful: these are "effective field theories" which are valid at low momenta and low order in perturbation theory. For example, we've seen a four fermion interaction is nonrenormalizable, and that the Yukawa coupling is. The diagrams for these processes are However, a theory with a Yukawa coupling has an effective four-fermion interaction mediated by the boson, where m φ is the mass of the boson and q is the momentum transfer, i.e. p 1 = p 1 + q and p 2 = p 2 − q. If we are only interested in physics at low momenta compared to the boson mass such that q 2 m 2 φ , then to very good approximation this looks like a four-fermion interaction, with the effective coupling constant given by which is the low momentum limit of the initial interaction. One can only see the difference between the two interactions if the system is probed at momentum scales comparable to m 2 φ , so it serves as a viable effective model at low energies. Effective field theories such as this one have been an invaluable tool in understanding low energy physics, and are useful not only in particle physics; they play a major role in condensed matter as well.
In the context of particle physics this explains why Fermi's theory of weak-interactions can capture much of what is happening even though it is not renormalizable and not a fundamental theory of nature. In fact something very much like the process above happens in the standard model: the weak decay is mediated by the virtual exchange of a heavy boson: the W. There are number of other refinements as well, among them is the fact the fundammental process happens at the level of quarks not nucleons

Symmetries in Nuclear and Particle Physics
The importance of symmetry in physics is hard to overstate. In this section, we will try to show why. First of all, it is worth clarifying what exactly a symmetry is. Simply put, a symmetry is an operation which leaves a system invariant. For example, we can rotate a square by 90 • and it will look the same. Since only rotations by a multiple of 90 • will leave the square invariant, the symmetry is said to be discrete. On the other hand, we can rotate a circle by any amount and it will always look the same, and its symmetry is said to be continuous. These are both examples of symmetries of objects in real space, but much of our later discussion will consist of more abstract symmetries that operate on "internal" spaces.
We'll start by considering continuous symmetries in classical point-particle physics before applying them to field theory. We'll then introduce the notion of internal symmetries, and finally discuss two discrete symmetries central to particle physics: parity and time reversal. Symmetry will remain a central theme of the following sections as well. In section 11 we'll discuss the idea of a gauge symmetry, in section 12 the consequence of "breaking" a symmetry, and finally in section 13 we will introduce the Higgs mechanism, which is inseparable from the physics of gauge symmetry.

Continuous Spacetime Symmetries
We'll start by considering some of the most intuitive symmetries: those that act on the spacetime in which we live. Specifically, we'll consider continuous symmetries in the familiar context of classical particle mechanics. As you may be aware, continuous symmetries are associated with conservation laws.
For example, time translation symmetry is associated with energy conservation. To see this, consider a Lagrangian of generalized coordinates q and their velocitiesq, that may be explicitly time dependent, L(q,q, t). From this we may construct the Hamiltonian, If we differentiate the Hamiltonian with respect to time, we have The second and last terms cancel, and when the equations of motion are satisfied, the third term can be written −ṗ iqi , which cancels the first term, leaving us with If the Lagrangian is not explicitly time dependent then ∂L/∂t = 0, and the system is said to be time translation invariant (since the Lagrangian does not change from one moment of time to the next), and we have dH which is to say that the energy of the system is conserved. Thus, time translation invariance leads to energy conservation. We can also see that spatial translation invariance leads to conservation of momentum by considering a system of N particles, each with coordinates r 1 , r 2 , . . . r N , with the Lagrangian L(r 1 , r 2 , . . . r N ,ṙ 1 ,ṙ 2 , . . .ṙ N ). Translational invariance means that we can shift the coordinate of every amount by any constant vector c, and the Lagrangian will be the same, i.e.
Physically, this means that nothing changes if we move the whole system by the same amount, which means the potential energy is only a function of the relative displacements of the particles, not their absolute position. This implies that where the index i labels the vector components, and the index a tells us which particle we are talking about. When the equations of motion are satisfied, we can write (343) as d dt where in the second equality we identified the total momentum in the i direction, and in the last equality used the definition of the dot product. Since this must be true for any constant vector c, we must have dp tot dt = 0, which is to say that the total momentum is conserved. One can go through a similar process to show that rotational symmetry implies the conservation of angular momentum. All of these spacetime symmetries and their associated conservation laws carry over to classical field theory, as well as to point-particle quantum mechanics and QFT.

Internal Symmetries and Noether's Theorem
Next, we will discuss the slightly more abstract internal symmetries a theory may possess. Roughly speaking, these are symmetries of the fields themselves, with no reference to the underlying spacetime. As we will now show, continuous internal symmetries also correspond to conservation laws. The relationship between continuous symmetries (spacetime or internal) is formalized by Noether's theorem, which states that for every continuous symmetry of a classical field theory there is an associated conserved current. Recall that a current J µ is conserved if ∂ µ J µ = 0, which is the relativistic notation for the more familiar continuity equationρ + ∇ · J = 0. Noether's theorem turns out to be a powerful tool, and is arguably one of the most important results in modern physics, so we will take the time to derive it, at least for the case of internal symmetries. For the sake of generality, suppose we have a set of N independent fields φ 1 , φ 2 , . . . φ N , governed by the Lagrangian Further, let us suppose that the Lagrangian is invariant under some infinitesimal transformation of the fields, where ε is an infinitesimal (constant) parameter. By summing over all the fields φ m with weights c nm this transformation may mix up our definition of which field is which, i.e. it may take φ 1 → φ 1 + 7 φ 2 − 42 φ 9 . Alternatively, if c nm ∝ δ nm it could simply change each field individually, i.e. φ n → φ n + 32 e 3πi/2 φ n . In any case, the change in the field under this transformation is clearly If the Lagrangian is invariant under this transformation, then the change in the Lagrangian as a result of the transformation is zero. If we call the original Lagrangian L 0 and the transformed Lagrangian L T this means that δL ≡ L T − L 0 = 0. Since the transformation is infinitesimal, we can use the same tricks we used in deriving the Euler-Lagrange equations in the previous section. That is, we can write the change in the Lagrangian as By linearity, δ(∂ µ φ n ) = ∂ µ (δφ n ), and when the equations of motion are satisfied, we may rewrite the first term using so the change in the Lagrangian becomes However, as you can check using the product rule, this can also be written as a total derivative, We can now insert our expression for the change in the fields (349), so we have If the transformation is a symmetry then δL = 0 and we have which is a conservation law! If we define the current to be then (355) says this current is conserved, ∂ µ J µ = 0. This is Noether's theorem. Notice that not only does Noether's theorem say that a conserved current exists, but it also tells us precisely what the conserved current is! You may be skeptical about this proof since we only considered an infinitesimal transformation. In fact, this is where the continuous nature of the symmetry comes in. The wonderful thing about a continuous transformation is that you can compose a finite transformation by performing an infinitesimal transformation many times. For example, to rotate something by a finite angle θ we can rotate it by an infinitesimal angle ε many times in succession. Thus, if the theory is invariant under an infinitesimal transformation, it is also invariant under any finite transformation obtained by composing many infinitesimal ones. To see how this works in practice, let's go through a few examples. First, let's consider a complex scalar φ obeying the Klein-Gordon Lagrangian, It is invariant under a simultaneous change of the phase of both fields, This kind of phase transformation is called a U (1) transformation. This is because mathematically, symmetries are described by groups, and the group of 1 x 1 unitary matrices is called U (1). But, a 1×1 unitary matrix is simply a complex number of unit modulus, i.e. a phase factor of the form e iθ . So, we have found a symmetry of the theory. To find its associated conserved current, we first write the infinitesimal form of this transformation. If α is small, then to first order e −iα/2 ≈ 1 − iα/2 + . . . and the infinitesimal transformation is from which we can identify From (353) we know the conserved current is Note that we treat φ and φ as independent fields. This is simply because a complex field has two degrees of freedom, and we may parameterize them however we so choose. We can do this by taking the real and imaginary parts as our independent fields, but we can just as well use φ and phi , which is more convenient in this context. The necessary derivatives are Putting (362) and (360) into (361) we find the conserved current to be which you may remember is the same conserved current we discussed in section 7, although now we have derived it from symmetry principles. Next, consider the Dirac Lagrangian, It also is invariant under a U (1) transformation, Just as before, the infinitesimal version of this transformation is We need the derivatives Putting these together, the conserved current is which is the same current we encountered in our previous discussion of the Dirac equation.
Moving on to some more complicated theories, you can repeat the calculation and convince yourself that in the φ 4 theory, we will have the same conserved current as in the free Klein-Gordon theory, due to the U (1) symmetry (360). We can also have theories with more than one conserved current. For example, consider a theory including a fermion field ψ, a complex scalar field σ, and a real scalar field φ with the Lagrangian Both the fermion and complex scalar fields are invariant under separate U (1) transformations, Nothing changes the preceding arguments, and we will find two independent conserved currents, for which ∂ µ J µ ψ = ∂ µ J µ σ = 0. This is called a U (1) × U (1) symmetry, since we essentially have two copies of U (1). It turns out that there are more interesting and complex continuous symmetries, but we will hold off discussing them until the next section. For now we will turn to discrete symmetries.

Discrete Symmetries: Parity
Among the most important discrete symmetries are parity, time reversal, and charge conjugation. Although we will not discuss the last of these in these notes, there is an important theorem, the CPT theorem, which states that any sensible quantum field theory must be invariant under simultaneous parity, time reversal, and charge conjugation transformations.
Unlike continuous symmetries, there are no conserved currents associated with discrete symmetries. However, there are still conservation laws. That is, if the interactions in a theory preserve parity (as do the electromagnetic and strong interactions) then the initial and final states of the interaction must have the same parity. Now, to discuss parity specifically we will begin by defining it. A parity transformation is a spatial inversion, which transforms You can convince yourself that both the Klein-Gordon and Dirac Lagrangians are invariant under this transformation. Further, if the Lagrangian is invariant under parity, the Hamiltonian will be invariant as well. Formally, the statement that the Hamiltonian is invariant under parity is that [H, P] = 0, where P is the parity operator. Then, we know from linear algebra that if two operators commute they can be simultaneously diagonalized, and hence share the same eigenvalues. Note that if we invert space with a parity transformation and then invert again, we end up where we started, i.e. acting with parity twice gives the identity operator, P 2 = 1.
Now, suppose we have a state |ψ which is an eigenvalue of the parity operator, Acting with P again we have PP which means that λ 2 = 1, so P has eigenvalues λ = ±1. A parity eigenstate with eigenvalue +1 is said to be even under parity, while a state with a −1 eigenvalue is said to be odd. We know that the Hamiltonian generates the state's time evolution, and thus if [H, P] = 0 (i.e. the dynamics preserve parity) a parity eigenstate will always time evolve into another parity eigenstate. Perhaps surprisingly, a particle, even if it is just sitting at rest, may have an intrinsic parity. It may seem strange that a particle can be odd under parity, but just think back to the hydrogen atom. You may remember that the parity of a hydrogen atom in a state with orbital angular momentum is Now, suppose we have a hydrogen atom in a p-wave 23 state (i.e. = 1), and further that the spin of the electron and proton add to 1. If the spin and orbital angular momenta add to a singlet configuration (where the total angular momentum J = 0), we have a particle with no net angular momentum, but the parity is still P = −1. If we now look at the hydrogen atom from far away (or after a few glasses of your alcoholic beverage of choice) and forget that it is a composite system made of smaller individual particles, it looks like you have one big particle with an intrinsic negative parity (since it appears stationary). As we will now discuss in some detail, parity conservation imposes strong constraints on allowed processes in theories with parity-conserving interactions. Two notable interactions which preserve parity are the electromagnetic and strong forces. In what follows, we'll consider several examples of how parity restricts the set of allowed decays mediated by the strong interaction. To help us, we've included a table of some particles and their properties.

Particle
Spin Parity Type Nucleon Spin Parity Type proton (p) 1/2 + baryon neutron (n) 1/2 + baryon ∆ (∆ ++ , ∆ + , ∆ 0 , ∆ − ) 3/2 + baryon N(1440) (+,0) 1/2 + baryon N(1520) (+,0) 3/2 − baryon N(1535) (+,0) 1/2 − baryon The superscript on a particle indicates its electric charge, and the distinction between baryons and mesons is based off of their quark content (a baryon is made of three quarks and a meson is made of a quark and an anti-quark). The important point for now is that a meson is a boson while a baryon is a fermion. Now, let's consider some possible decay processes: ρ → π + π 0 : We have three conserved quantities to worry about: charge, angular momentum, and parity. The first is easy: the initial state has Q = 1, and the final state has Q = 1, so charge is conserved. For angular momentum, the initial state has j = 1 from the spin of the ρ + . Since the pions are spinless, in order to get j = 1 in the final state, the two must be in a p-wave orbital state with = 1 and hence j = 1. Finally, the initial state has parity −1. The final state has several contributions to its parity. Since the pions are intrinsically odd under parity, we get one factor of −1 for each. But, they are also in a p-wave state with parity (−1) = −1, and the net parity is given by the product: Thus, the final state is odd under parity, and consequently parity is conserved. Altogether, this means that such a decay is allowed, provided it is not forbidden by some other consideration. Also note that the decay is only possible if the resultant pions are in a p-wave orbital state.
a − 1 → π − π 0 : We can immediately see charge is conserved by looking at the superscripts.
The initial state has j = 1, so just as in the previous examples, the pions must be in a p-wave orbital with = 1 for angular momentum to be conserved. The initial state has parity +1, but the parity of the final state is (−1) 2 from the intrinsic parity of the two pions, times the −1 parity of the p-wave orbital, giving a net parity of −1. The parity of the initial and final states are not the same, and thus this interaction is forbidden by parity conservation! It cannot happen! σ → π + π − : Once again, charge is manifestly conserved. The initial state is spinless so j = 0, which requires the pions in the final state be in a s-wave orbital such that j = 0 there as well. The parity of the initial state is +1, and the parity of the final state is (−1) 2 from the intrinsic parity of the pions times (−1) 0 = 1 from the s-wave orbital. Thus, the final state has parity +1, and parity is conserved. So, the decay is allowed.
∆ ++ → p π + : The initial and final states both have Q = 2, so electric charge is conserved. The initial state has j = 3/2, so for angular momentum to be conserved we need j = 3/2 in the final state as well. Since the pion is spinless and the proton has spin 1/2, there are two ways we can get a total j = 3/2: we can either have the proton and pion in a p-wave or d-wave orbital. 24 Finally, let's turn to parity. The parity of the initial state is +1, and if the final state is in a p-wave configuration we'll have a final parity of +1 from the intrinsic parity of the proton, times −1 from the intrinsic parity of the pion, times (−1) 1 = −1 from the orbital angular momentum, for a net parity of +1. Thus, for a p-wave final state, parity is conserved and the decay is allowed.
On the other hand, if the final state is d-wave the final parity is +1 from the proton times −1 from the pion times (−1) 2 = 1 from the orbital angular momentum, for an end result of −1 parity. So, if the final state is d-wave parity is not conserved and the interaction is forbidden. In conclusion, this decay is only possible for a p-wave configuration.

Discrete Symmetries: Time Reversal
The other important discrete symmetry that we will discuss in these notes is time reversal. Just as parity flips the coordinates of space, time reversal flips the direction of motion. It is the transformation T : If we want to consider the behavior of more general operators under time reversal, things get somewhat complicated. This is because time reversal is unlike most of the quantum mechanical operators you've met so far in that it is anti-unitary. Things are further complicated by the fact that there are multiple different things that people call time reversal operators, and different conventions for everything as well. So, we'll eschew all this complicated business by just focusing on a simple, but very physically relevant, case. Namely, let's suppose we have a state for which the angular momentum is a good quantum number. In addition to the total angular momentum j we also usually work with its projection onto the z-axis, which we call m. From the rules above, time reversal maps T : m → −m.
Once again, if we have a time reversal invariant Lagrangian, the Hamiltonian for the system will also respect time reversal. If we consider an energy eigenstate |ψ; j, m where ψ(x) is the position space wave function, it should then transform under time reversal as where we've allowed for an extra phase factor, which will depend on the conventions one uses. Just like parity, time reversal invariance constrains the allowed processes within a theory. Here, we will focus on the electric dipole moment of particles, and will show that any particle which is an energy eigenstate of a time reversal invariant Hamiltonian must have an electric dipole moment of zero.
First off, recall that the electric dipole moment d is something like a charge density integrated against a displacement vector, or, for a discrete bunch of charges, Obviously, the charge shouldn't transform under time reversal, and by our definition (380) the displacement does not transform either. So, under time reversal d → d. Let's now consider the expectation value of the electric dipole moment in an energy eigenstate, ψ; j, m|d|ψ; j, m .
There is a powerful result in quantum mechanics called the Wigner-Eckert theorem, which tells us that the expectation value of the electric dipole moment must be proportional to the expectation value of the angular momentum, i.e.
ψ; j, m|d|ψ; j, m = h ψ; j, m|J|ψ; j, m , where h is an angular momentum independent constant of proportionality. A good heuristic justification for this is to ask what else could it be? The expectation value of d is a vector, and the only vector that describes the state is J, so we must have d ∝ J , simply because there is no other vector operator in the game. Taking the z component of (383) we have ψ; j, m|d z |ψ; j, m = h ψ; j, m|J z |ψ; j, m = hm, where in the last equality we used J z |ψ; j, m = m|ψ; j, m . Now, let's ask what happens under time reversal. The left-hand side should be invariant by our previous considerations of the dipole operator, and by (381) the right-hand side transforms as h ψ; j, m|J z |ψ; j, m → h ψ; j, −m|e −iα J z e iα |ψ; j, −m = −hm .
Equating (384) and (385), which should hold for a time reversal invariant system, we have hm = −hm which implies that h = 0, and thus d = 0, which is to say that the electric dipole matrix element vanishes for any energy eigenstate of a time reversal invariant theory.

Time Reversal Violation in the Standard Model
We know that there is violation of time reversal symmetry (TRS) in the standard model, and we also know that there must be more of it than is presently understood. For example, without extra TRS breaking cosmology can't explain why there is more matter than anti-matter in the universe. One likely way to detect TRS breaking is to search for small electric dipole moments of neutral particles (why neutral particles? simply because it makes the experiments easier: to measure an electric dipole moment one puts the particle in an electric field and sees how the resonance properties are affected; this is hard to do with charged particles which tend to accelerate in electric fields!), since such a moment violates time reversal due to our discussion above.

Isospin
Now, let's turn back to continuous symmetries. Before our discussion of discrete symmetries, we mentioned that there are more complicated continuous symmetries than just U (1), or multiple copies of it. These are called non-Abelian symmetries for reasons that will be explained in the next section. For now, we'll discuss the simplest non-Abelian symmetry group, SU (2). This is the same group associated with the spin of a spin 1/2 particle, so much of the structure will be familiar but the physical context will be different.
The particular physical symmetry we will discuss is isotopic spin, or isospin, which is an approximate symmetry of the strong interaction. Roughly speaking, we can think of an exact symmetry as implying that some parameter in the theory is zero (for example, the divergence of a conserved current), but there are many interesting cases where the parameter is not zero, but is still very small compared to other scales in the theory. If this is the case, we say we have an approximate symmetry. Operationally, one typically first treats the symmetry as exact, and then adds in the symmetry breaking perturbatively afterward.
The original idea of isospin is due to Heisenberg, who noted that the masses of the proton and neutron are nearly equal; m p ≈ 938.3 MeV and m n ≈ 939.5 MeV. The binding energy of 3 He (two protons and one neutron) is 7.72 MeV, which is very close to that of 3 H (one proton and two neutrons) which is 8.48 MeV. In fact, this is generally true for any small nucleus, where the Coulomb force is unimportant, as we discussed in section 2. Recall that the proton-neutron asymmetry term goes like ∼ (Z − (A − Z)) 2 for a nucleus with Z protons and A − Z neutrons. The point is that switching the protons and neutrons doesn't change the binding energy very much.
Altogether, this could lead one to think that protons and neutrons are kind of the same thing. Specifically, we can think of them as two different states of a single underlying entity, just like up and down spin electrons are two different states of the same fundamental particle. So, roughly speaking we can think of a proton being the "isospin up" state and the neutron as the "isospin down" state. We can combine them into a two-component nucleon, which is identical to how we usually describe the spin state of a spin 1/2 particle, The considerations above suggest that physics should be approximately symmetric under switching protons and neutrons. The idea of isospin is to further suppose that the strong interaction is invariant under rotating the proton and neutron into one another. By this we mean that acting on the nucleon vector with a 2 × 2 special (det U = 1) unitary (U † U = 1) matrix is a symmetry of the theory. In other words, we can transform and the physics is the same. Essentially, this operation mixes up our definition of what is a proton versus what is a neutron. Mathematically, it is completely analogous to rotating the quantization axis of a spin system. In both cases, the symmetry is a manifestation of the arbitrary convention we pick to distinguish the two different states. As a matter of terminology, the set of 2 × 2 special unitary matrices is called SU (2). We can take the analogy with spin further, since the math is identical. Just like we say a spin 1/2 particle transforms in the s = 1/2 or doublet representation, we can also classify nuclear states into isospin multiplets. 25 Since the nucleons transform just like a spin, it is clearly an isospin doublet with I = 1/2, where I is the analogue of the spin magnitude s that tells us which representation of the group the particles transform under.
Other hadrons also form isospin multiplets. For example, the pions have nearly identical masses m π 0 ≈ 139 MeV, m π ± ≈ 135 MeV, and form a triplet with I = 1. The rho mesons with m ρ ± ≈ m ρ 0 ≈ 775 MeV also form a triplet. The ω with m ω ≈ 738 MeV has no partners and is a singlet with I = 0, while the ∆'s have masses m ∆ ++ ≈ m ∆ ± ≈ m ∆ 0 ≈ 1232 MeV and form a quartet 26 with I = 3/2.
In addition to categorizing the representations under which particles transform, we can borrow another quantum number from the study of spin: the z-component of the spin projection. In the context of isospin, we call this I 3 , or "the third component of isospin." The only reason we call it I 3 instead of I z is because its conventional (and we're fancy). Since the nucleons form a doublet, with the proton playing the role of the spin up state and the neutron playing that of the spin down state, it is evident that the proton has I 3 = +1/2 and the neutron has I 3 = −1/2.
There is an interesting formula that relates the electric charge and I 3 which applies for nuclei and non-strange hadrons 27 , where B is the baryon number. For example, we know protons and neutrons are baryons (they're each made from three quarks), so B = 1 for both. Then, for the proton this says 25 A multiplet is just a different word for the representation. For example, something that transforms in the s = 1/2 representation of SU (2) is said to be a doublet, something that transforms in the s = 1 representation is a triplet, and so on. Something that transforms in the trivial s = 0 representation is said to be a singlet. 26 By which we mean the four-dimensional representation 27 However, there is a straightforward generalization that does apply to strange particles that Q = 1/2 + 1/2 = 1, and for the neutron Q = 1/2 − 1/2 = 0, which we know to be true. We can also use this formula in reverse to easily deduce the third component of the isospin for the pions, which are mesons so B = 0. For the π + , Q = 1, so by the above 1 = 0 + I 3 , and thus the π + has I 3 = 1. By the same process we can see that the π 0 has I 3 = 0 and the π − has I 3 = −1. This relationship was known before the discovery of quarks, and even helped motivate the quark picture. Speaking of quarks, the fundamental symmetry underlying isospin is the approximate symmetry of rotating up and down quarks (whose masses are both very small) into one another with an SU (2) transformation, Since any baryon is made of three quarks, a single quark has baryon number B = 1/3. It is also known that the up quark has electric charge Q = 2/3 and the down quark has electric charge Q = −1/3. These facts, combined with defining the up quark to have I 3 = +1/2 and the down quark to have I 3 = −1/2 imply the quarks satisfy (389). Further, since B and I 3 are simple additive quantities 28 this ensures that any particle made out of up and down quarks will also satisfy (389). So, we now understand (roughly) where this equation comes from.
Returning to the mathematical structure of isospin, we have seen that the proton and neutron can be combined into an isospinor N which transforms under the two-dimensional representation of SU (2). As we know from the physics of spin, the important matrices 29 when dealing with SU (2) are the Pauli matrices, which when used in the context of isospin are conventionally denoted as τ i rather than σ i , A generic isospin rotation parameterized by the "angles" θ 1 , θ 2 , θ 3 is given by This is analogous to the fact that the spin angular momentum operator is the generator of rotations in quantum mechanics. We will also see how this fact follows directly from group 28 That is to say that if we have two particles with B = 1 and Q = 2/3, the total baryon number of the combined state is B = 2 and the total charge is Q = 4/3, and so on. 29 This is made more precise in the next section when we learn the Pauli matrices are the generators of SU (2) in the fundamental representation. theory in the next section. Just like when using the Pauli matrices to describe real spins, its useful to combine the Pauli matrices into a vector, However, this is not a vector in real space, but a vector in isospin space, and as such it is called an isovector. We can also combine the pion fields into an isovector by defining the Hermitian fields π 1 , π 2 , π 3 which are related to the physical π + , π − , π 0 by These fields then transform as an isovector, Just like the dot product of spatial vectors is rotationally invariant, the dot product of isovectors is isospin rotation-invariant, such as π · π or π · τ . Given the isospinor N for the nucleons and isovector π for the pions, we can write down an effective 30 Lagrangian that describes the long range interaction of nucleons and pions, encoded in the interaction term Despite its deceptive simplicity, this term contains a great deal of information. Let's start by just checking that it possesses all the symmetries that a sensible theory of nucleons should have: Lorentz invariance, isospin invariance, and parity conservation. It's easy to see the Lagrangian is Lorentz invariant since all the µ's are contracted. The isospin invariance is also manifest since we've already discussed how τ · π is an isoscalar. 31 The fact it is invariant under a parity transformation takes a little more work to see. Let's start by considering just the µ = 0 term, The factor ∂ 0 π has odd parity since ∂ 0 is even and the pion is intrinsically odd. The nucleons all have even parity, as does γ 0 . On the other hand, γ 5 is a pseudoscalar and thus has negative 30 Recall from the previous section that an effective theory is not renormalizable, but is useful for describing low energy physics 31 By which we means it does not transform under isospin rotations parity. Together, the negative parity of the pion and γ 5 matrix cancel to give us an overall even parity, as a sensible Lagrangian should have. Next, we can consider the µ = 0 terms, where i = 1, 2, 3 runs over the spatial components. Now, ∂ i has negative parity so ∂ i π has a net positive parity since the pion is also odd. γ 5 still has negative parity, but the spatial components of γ µ are a vector which means that they transform P : γ i → −γ i , and thus have negative parity. So, once again the net parity of these terms is positive, and all is well. Now, let's take a few moments to unpack this Lagrangian. First of all, let's evaluate the matrix in isospin space, We can now use (395) to write this in terms of the π + , π − , π 0 : The effective Lagrangian can then be written Carrying out the matrix multiplication we get four terms, L int = ig π pγ µ γ 5 (∂ µ π 0 )p + √ 2pγ µ γ 5 (∂ µ π + )n + √ 2nγ µ γ 5 (∂ µ π − )p −nγ µ γ 5 (∂ µ π 0 )n . (402) Recall that the proton and neutron are massive fermions, so p and n are each four-component Dirac spinors, and the gamma matrices act on these spinor components. If we strip away all of the details, (which are needed to ensure all the required symmetries are satisfied) these four terms correspond to four processes, L int ∼pπ 0 p +pπ + n +nπ − p +nπ 0 n.
Given these interactions, the Lagrangian encodes the longest range scattering channels between nucleons and pions. You can see this diagrammatically: its a nice exercise to draw all of the tree level diagrams and show that they correspond to the allowed pion-nucleon scattering processes.
In particular, if we take the non-relativistic limit of the tree diagram for a nucleonnucleon interaction to be the long-distance potential experienced by the nucleons, we can Fourier transform it back to real space to arrive at the famous one-pion-exchange potential (OPEP) between nucleons 1 and 2, Although we won't go into any depth about this result, let's take note of a few things.
We've assumed all of the pions have the same mass m π and the protons and neutrons have the same mass m n The contribution we've labelled S 12 is a J = 2 tensor force, which mixes partial waves (this makes sense, since for example the deuteron is a mixture of s and d waves) Overall, this potential has the standard Yukawa form due to the overall e −mr /r factor The last delta-function term is a short-ranged interaction, which is generally not reliable Most importantly, this works experimentally!!

Implications of Isospin
Isospin invariance can be used to further our understanding of physical processes. For example, if we consider the scattering of hadrons or nuclei off of one another, we can decompose the scattering amplitude into separate isospin channels which do not mix with one another, with the relative strength of scattering in that channel given by the Clebsch-Gordan coefficients of the initial and final states. Let's take π + n scattering as an example: we have two possible processes, π + +n → π + +n or π + +n → π 0 +p. The final state in the first process has I = 3/2, I 3 = 1/2 while the second has I = 1/2, I 3 = 1/2, corresponding to two isospin channels. In the initial state the pion has I = 1, I 3 = 1 and the neutron has I = 1/2, I 3 = −1/2. Using the notation |I π , I n ; I π 3 , I n 3 for the combined state, we can decompose the state into its two isospin channels, So, even though there are ten pion-nucleon scattering channels, there are only two independent amplitudes. This means that if we measure two processes at various energies and angles we can determine the scattering amplitude in the I = 3/2 and I = 1/2 channels, from which we can predict the scattering rates for all pion-nucleon processes. As a simple case, suppose we measure the differential scattering cross section for π + p scattering, which has I 3 = 3/2 and thus can only occur in the I = 3/2 channel. The same is true for π − n process which also is entirely within the I = 3/2 channel. Since both processes have the same scattering channel, we can conclude We can apply a similar treatment to the decay of hadrons. It turns out that the decay rate of a hadron is independent of its isospin projection (I 3 ), as a consequence of isospin invariance. However, the branching ratios (that is, the fraction of hadrons which decay via a given process) are fixed by the Clebsch-Gordan coefficients between the initial and final states. That is, if we have two processes where hadron a decays into hadrons b and c or b where h a has isospin I a and I a 3 , and similarly for b, c, b and c , the relative probability for each process is where p(a → bc) is the probability that h a decays into h b and h c . Let's work out an example to make this concrete. We could ask what the relative probabilities are for the two possible decays ∆ + → p π + and ∆ + → nπ + . The ∆ + has I = 3/2 and I 3 = 1/2, and we have worked out the isospins of the other particles in previous sections. Putting all of these pieces into the above formula, we find 32 from which we conclude that the ∆ + decays into a p π 0 2/3 of the time, and into a n π + 1/3 of the time.

Gauge Theories and the Standard Model
The standard model of particle physics is comprised of a few basic ingredients. As for particles, we have quarks (which are the constituents of hadrons) and leptons, which include electrons, muons, taus, and neutrinos. These particles all interact with one another via gauge interactions, the simplest of which (and the most familiar) is the electromagnetic interaction. In this section we will explain the basic idea behind gauge theories and what they have to do with particle physics. We'll start by gaining a new perspective on our old friend E&M by considering it as the simplest example of a gauge theory.

Electromagnetic Gauge Invariance
When we first came across E&M as freshmen, things were typically discussed in terms of the electric and magnetic fields, E and B. Later on, after we have grown up as physicists, we learn that life is made a good deal easier if we swap E and B for the scalar and vector potentials Φ and A via the definitions If we add relativity into the mix, we can combine the scalar and vector potentials into the four-potential (henceforth simply called the potential), A µ = (Φ, −A). However, its important to note that while specifying A µ uniquely determines E and B, the converse is not true: for any configuration of E and B there are an infinite number of potentials which describe it. In fact, suppose we have some set of potentials Φ and A, which via (409) represent a configuration of E and B. Then, for any differentiable function of spacetime, Λ(x, t), we may perform the gauge transformation under which the electric and magnetic fields are invariant, The point is that given a set of potentials Φ and A, we can alter them by any scalar field without changing the electric and magnetic fields. Things get dicey when we remember that its E and B that are the physical fields we measure in the lab, so this gauge redundancy leads us to interpret the potentials as just a cute trick that lets us compute more efficiently.
In fact, since any choice of gauge must give us the same E and B fields, we're free to use this freedom to pick the gauge condition that makes our life easiest, and can rest assured we'll always get the right answer. Another advantage of using the potentials is that it allows us to write down the Lagrangian or Hamiltonian for a charged particle interacting with the electromagnetic field, As shown in the box in section 7.5, the Euler-Lagrange equation of motion we get from this Lagrangian is simply the Lorentz force law. Performing a Legendre transformation, the corresponding Hamiltonian is It turns out that there is no way to write such a Lagrangian or Hamiltonian using E and B, so if we want to use these formalisms we have to use potentials. However, recall that quantum mechanics is based off Lagrangian and Hamiltonians! This means that quantum theories must invariably be formulated using Φ and A. For example, using the Hamiltonian above, the Schrödinger equation for a charged particle in an electromagnetic field is The cost of framing things in terms of potentials is the need to keep track of the extra unphysical information contained in them. Specifically, anything we calculate should be the same regardless of whether we use one potential or a gauge-transformed version of it. This constraint is called gauge invariance, or (somewhat misleadingly) gauge symmetry, and reflects the redundancy of our description of the E and B fields in terms of the potentials. If an observable quantity were not gauge invariant, i.e. depended on our arbitrary choice of Λ, things would be very bad: nothing would stop you and I from choosing two different Λ's and getting different answers! Thus, for any consistent theory including the electromagnetic field, all physical observables must be gauge invariant.
In quantum mechanics, there is another kind of ambiguity. That is, if observable quantities generally go like ψ ψ, the absolute phase of the wavefunction isn't physical. Remarkably, it turns out that this is intimately connected to the ambiguity of the potentials. Consider rotating the phase of the wavefunction as Note that this is not the same kind of phase rotation we've considered before. Previously we've rotated the phase of a field or wavefunction by a constant amount α, which is called a global U (1) transformation. Here, we have a space-time dependent field Λ in the exponential, so we are rotating the phase of the wavefunction by a different amount at each point in spacetime. This is called a local U (1) transformation because we are locally twisting the phase by a different amount at each space-time point. Since Λ is dependent on x and t, the derivatives of ψ pick up an extra term relative to the derivatives of ψ. That is, Keeping this in the back of our minds, let's go back to the Schrödinger equation (414) and consider how it changes when we gauge transform the potentials. We then have which does not appear to be gauge invariant. To help guide your eye, I've written the extra terms we picked up from the gauge transformation of the potentials in red. Now, the magic happens: let's simultaneously perform a local U (1) transformation on the wave function, taking ψ → e iqΛ ψ. Keeping in mind the extra terms picked up from the derivatives, the gauge transformed Schrödinger equation (417) becomes where we've written the terms generated by the phase rotation in blue. It's now easy to see that the red and blue terms all cancel one another out, and that after the simultaneous transformation of the potentials and the wavefunction, we get back the Schrödinger equation (414) that we started with. This means that in the quantum theory, a gauge transformation is the transformation under which physics is invariant. Put another way, this means that if we have a solution to the Schrödinger equation ψ, we can locally twist its phase to ψ , which is guaranteed to be another solution to the Schrödinger equation with a gauge-transformed version of the potentials.

Quantum Electrodynamics
So far, everything we've done has been non-relativistic quantum mechanics. Luckily, things carry over directly to relativistic field theory. Recall that we package the potentials together into a four-vector, A µ = (Φ, −A), and the E and B fields are the components of the field strength tensor F µν = ∂ µ A ν −∂ ν A µ . In this language, a gauge transformation (410) is written as Under the transformation, it's not hard to see that the field strength is invariant, as it should be since it is a physical observable, Also recall that the Lagrangian for the electromagnetic field is and the classical equations of motion ∂ µ F µν = J ν give us the Maxwell equations, as we showed earlier. We can couple the electromagnetic field to matter via the A µ J µ term: we just need to find the right form for the current. Let's consider a fermion field ψ, the free Lagrangian for which is If the electromagnetic field is in the game, we want things to be gauge invariant. It turns out that under a gauge transformation, the Dirac field should transform just like the nonrelativistic wavefunction, i.e. our theory should be invariant under the transformation ψ → ψ = e iqΛ ψ, where q is the charge of the fermion. As you can check, the extra terms generated from differentiating the gauge transformed Dirac field give us problems, Note that if ψ → e iqΛ ψ, thenψ → e −iqΛψ . Perhaps this isn't too surprising, since whenever we deal with electromagnetic fields we always need to shift the canonical momentum. In field theory language, there's a simple way to implement this: we just replace every derivative with the gauge covariant derivative, So, replacing every ∂ we see with a D, the Dirac Lagrangian becomes Unpacking this, we have Now, performing a gauge transformation (424), the transformation of / A → / A − / ∂Λ gives us an extra term (in red), so this version of the Dirac Lagrangian is now gauge invariant. Thus, the Lagrangian for a fermion with charge q interacting with the electromagnetic field is simply L Dirac + L Maxwell , or This is the Lagrangian for Quantum Electrodynamics (QED), which is one of the most accurate theories of physics to date, and a part of the standard model. Now, remember that at the outset of this section we noted that the coupling between matter and the EM field is given by the term which is gauge invariant. Reading off the term linear in A µ from our Lagrangian, we identify the current J µ to be and treat the charge q as a coupling constant. However, we could conceivably have other kinds of currents in more complicated theories. In any case, it turns out that gauge invariance requires that any such current is conserved. To see this, note that under a gauge transformation the interaction term transforms as At first sight, this non-gauge invariance may be troubling. But, remember that the action S int = d 4 x L int is what really matters, and it changes by Integrating the last term by parts, we have 33 33 As usual, the surface term vanishes. This is because the fields must vanish at spatial infinity for any physically realizable configuration. Otherwise, they would have an infinite energy. Alternatively, we can always pick a gauge function Λ which vanishes at infinity.
For this to be gauge invariant, we must have S int = S int , which requires that the second term above must vanish for any arbitrary function Λ. This is true only if ∂ µ J µ = 0, i.e. the current is conserved. Next, let's turn our entire discussion thus far on its head and reconsider gauge invariance from a new perspective. Instead of thinking of gauge invariance is a peculiar property of the theory, let's instead consider gauge invariance as the basis of our theory. Suppose we start with the Dirac Lagrangian and require that it is invariant under local U (1) transformations. For this to be possible, we need to replace the partial derivative with a covariant derivative to cancel unwanted terms. But, to do so requires introducing a gauge field A µ whose transformation cancels the transformation of the Dirac field. So, imposing local U (1) invariance implies the existence of the electromagnetic field.
If the only place that A µ appears in our theory is inside the covariant derivative, it is essentially just a background field that doesn't have any dynamics of its own. If we want the gauge field to be dynamical, we need to add a kinetic term for it, and such a term must be gauge invariant. For the theory to also be renormalizable, there is a unique kinetic term that we can write down: the Maxwell term, L Maxwell = − 1 4 F µν F µν . The factor of −1/4 is just a convention, but the contraction F µν F µν is the only renormalizable and gauge invariant term that exists. Putting this together with the gauge invariant version of the Dirac Lagrangian (with ∂ → D), we have which is nothing other than the QED Lagrangian! In summary, imposing local U (1) invariance on a Dirac field automatically gives us QED. This is remarkable! Requiring a single local symmetry is sufficient to give us the correct form for one of humanity's most successful theories, which in the classical limit recovers all of classical electrodynamics. 34 In fact, the core of the standard model was derived by generalizations of this line of thinking to more complicated local symmetries. The first generalization was made by Yang and Mills. Like much of modern physics the initial motivation was misguided, but the underlying idea and mathematics was correct. Then, in 1968 Weinberg and Salam (informed by ideas from Glashow) used this idea to unify electromagnetism with the weak interaction. It was further generalized by Gell-Mann, Leutwyler, Fritzsch, and others in 1973 to describe the strong interactions using the theory now known as quantum chromodynamics (QCD). The history of the standard model is an interesting subject in itself: consider reading this essay recently written by Steve Weinberg.
To understand these more complicated gauge theories requires a respectable knowledge of group theory. As such, We've included a very cursory introduction to the subject in the next section, after which we will dive into QCD.

A Quick and Dirty Group Theory Primer
Strictly speaking, a group G is simply a set endowed with a multiplication operation, ·, which obeys a few properties: The group is closed under multiplication, so g 1 · g 2 ∈ G for all g 1 , g 2 ∈ G Group multiplication is associative, so g 1 · (g 2 · g 3 ) = (g 1 · g 2 ) · g 3 for all g 1 , g 2 , g 3 ∈ G.
There exists an identity element e such that e · g = g · e = g for all g ∈ G.
For each group element g there exists an inverse element g −1 such that g −1 ·g = g·g −1 = e.
And that's it! Notice that we did not require that multiplication be commutative, so we need not have g 1 · g 2 = g 2 · g 1 . However, for some groups the multiplication is commutative, in which case we say the group is Abelian. If the group multiplication is not commutative, the group is said to be non-Abelian.
The conditions above aren't too restrictive, so there are lots of groups that we can dream up that come in all different shapes and sizes. Let's consider a few examples of groups to get the basic idea.
Z 2 : The group is comprised of two elements, Z 2 = {1, −1} with the group multiplication simply being normal scalar multiplication. This automatically tells us that the group multiplication is associative and commutative, and thus Z 2 is an Abelian group. It's also easy to check the other properties hold: the group is closed under multiplication, the identity element is 1, and the inverse elements (1) −1 = 1 and (−1) −1 = −1 exist. Since it has a finite number of elements, it is said to be a finite or discrete group. At this point, you can forget that discrete groups exist, because we won't talk about them again for the rest of the course. 35 U (1): Our old friend U (1) is defined as the set of complex numbers with modulus one, which we can write as the set {e iθ |θ ∈ R}. Again, the group multiplication is just normal scalar multiplication, so the group is Abelian and the associativity axiom is satisfied. The identity element is e i0 = 1, and the inverse of an element e iθ is e −iθ . In contrast to our previous example, there is a continuously infinite number of elements 35 However, they are important if you want to talk about crystals! in this group. A continuous group is called a Lie group (pronounced like "Lee"), after the mathematician Sophus Lie, and it is these groups that will be our primary focus. O(N ): This is the group of N × N orthogonal matrices, 36 that is, the set of all matrices O such that O T O = 1. Since these are matrices, the group multiplication is now matrix multiplication, which means it is associative but not commutative. So, in general matrix groups are non-Abelian! It's also quick to check that the identity element is just the unit matrix 1 ∈ G and for any element O ∈ G its transpose O T is also in G, but by definition O T = O −1 so every element has an inverse in G. SU (N ): This is the guy we really care about. SU (N ) is the the group of N × N special unitary matrices. Special means that for all U ∈ G, det U = 1 and unitary means that U † U = 1. Group multiplication is again just matrix multiplication, so this is a Non-Abelian group. The identity element is the unit matrix and for every U ∈ G its Hermitian conjugate U † is also in G, and since U † = U −1 every element has an inverse.
Having learned about a few different groups, from now on we're just going to talk about SU (N ). To start, we'd like to understand infinitesimal transformations. As we'll see later, it turns out that the magic of Lie groups is that understanding these transformations is enough to understand the whole group. An infinitesimal transformation is a transformation that is infinitesimally close to the identity operator, which means that we can expand it as Here ε is an infinitesimally small (real) parameter, which you can think of as small rotation angle in some higher dimensional space. The factor of i is just a convention, and the M is some N × N matrix. There could be other terms in this expansion, but they are all of order ε 2 , which is taken to be doubly small and negligible compared to the linear term. For U to be an SU (N ) matrix, we must have U † U = 1. Putting (437) into this formula and keeping only first order terms, we have For this to hold to linear order, the second term in the last line must vanish for any ε, which implies That is, M must be Hermitian. We also know that for U to be special, we need det U = 1. Since ε is very small, you can play with the formula for the determinant to show that For det U = 1 to hold, the second term must vanish, which means M is traceless, We've now learned that an infinitesimal SU (N ) transformation can generically be written as the identity plus a traceless Hermitian matrix. The set of all such matrices is called the Lie Algebra of the group, and is usually written in gothic font, like su(N ). We can write down a basis for the Lie Algebra, called the generators of the group. We denote them T a where a is an index which tells us which generator we're talking about. It turns out that it takes N 2 − 1 many matrices to form a basis for the space of N × N traceless Hermitian matrices, so the index a runs from 1 to N 2 − 1.
To avoid confusion, I'll emphasize that a is not the matrix index which tells us which row and column we're talking about. If we wanted to include these indices we'd write the i th row and j th column of the generator T a as (T a ) ij , but because things get messy we'll usually leave these indices implicit. For reasons that we'll see in the next section, we'll call the a indices (which label the generators) the color indices.
If the generators {T a } form a basis for the Lie Algebra, then we can write any M ∈ su(N ) as a linear combination of the generators, where the parameters ε a tell us how much of M is in the direction of the a th generator. We'll also use the Einstein summation convention with the color indices, but we'll always write color indices upstairs. 37 The reason the Lie algebra is called an algebra is...because it's an algebra! Technically speaking, an algebra is a vector space with some kind of multiplication defined. Formally, in the theory of Lie groups this multiplication isn't matrix multiplication (because the product of two generators may not be an element of the Lie algebra) but rather the commutator, [T a , T b ]. If you walk over to the math department, they'll usually call this a Lie bracket instead. The reason the commutator is so important is that the algebra is closed with respect to it. That is, the commutator of two elements of the Lie algebra is guaranteed to be another element of the Lie algebra. This means that for any two elements M, M ∈ su(N ), we know that [M, M ] is also in su(N ) and thus can be written as a linear combination of the generators (which are a basis), But we can also write M and M in terms of the generators and some expansion coefficients.
If we strip away all the unimportant details, what really matters is the commutator of the generators themselves, The coefficients f abc are called the structure constants of the group, and provide a convenient way to characterize the entire Lie Algebra.
In terms of the generators, a general infinitesimal SU (N ) transformation (437) is written Now, suppose we have two different infinitesimal transformations, and would like to act with one after the other (i.e. rotate by ε 1 and then ε 2 ). Because this is a group, the combined transformation is simply the product of the individual transformations, so U (ε 1 + ε 2 ) = U (ε 1 )U (ε 2 ). In terms of (446), to first order in ε we have So it looks like multiplication in the group is addition in the algebra (what does this remind you of?). Inspired by this observation, let's now get to the big idea of Lie groups. Suppose we want to make a finite transformation U (θ), characterized by some finite parameters θ. Instead of rotating by θ all at once, I can first rotate by θ/2, and then stop and rotate by θ/2 a second time. Or, I could rotate by θ/5 five times, or θ/42 42 times, etc. In general, I can chop the angle up into N different pieces, and then act N times with the smaller transformation. 38 In equations, And there's nothing stopping us from making N really really big, so we can take 38 Please don't confuse the N here with the N in SU (N ). Unfortunately there is a finite number of letters! In this limit, U (θ/N ) is now an infinitesimal transformation, and we can write it as Putting (450) into (449) we have Perhaps you remember the identity Even though we're working with matrices rather than numbers, this identity still applies, and we find that we can write any finite transformation as the exponentiation of an element of the Lie Algebra, This is an extremely useful fact! Recall that the exponential of a matrix is defined by its Taylor Series, Before moving on, it's nice to notice that this is self-consistent with our previous claims derived from infinitesimal transformations. First, if we have an infinitesimally small transformation, we can just keep the leading order term in the above Taylor series, It's also clear that for U to be unitary, the generators must be Hermitian 39 39 We're sweeping a lot under the rug here. The product of the exponentials of two matrices is actually given by the Baker-Campbell-Hausdorff formula, where the dots are a bunch of terms which depend on the commutator [A, B]. In this case, the commutator , and the contraction of the symmetric tensor θ a θ b with the antisymmetric tensor [T a , T b ] vanishes. Only in this particular case is e A e B = e A+B .
We can also show that det U = 1 implies tr T a = 0 by using the identity 40 tr log A = log det A. Applying this to U = exp(iθ a T a ), we have log det e iθ a T a = tr log e iθ a T a = tr(iθ a T a ) = iθ a tr T a .
Exponentiating both sides, we have e log det e iθ a T a = e iθ a tr T a det e iθ a T a = e iθ a tr T a .
The left hand side is det U , and setting this equal to one requires that e iθ a tr T a = 1 =⇒ iθ a tr T a = 0 =⇒ tr T a = 0.
To make all of these rather abstract ideas clear, let's consider what is by now a very familiar example: the spin rotation group, SU (2). Since SU (2) is SU (N ) with N = 2, all of our previous discussion holds. The important task is to identify the generators of the group. We need 2 2 − 1 = 3 traceless Hermitian 2 × 2 matrices which form a basis for the Lie Algebra. Of course, we already know what these matrices are: the Pauli matrices!
Traditionally, we normalize the generators to instead be which gives us the canonical normalization Comparing the familiar angular momentum algebra to (444), we see that the structure constants of SU (2) are simply f abc = ε abc . Reading off (453), we see a rotation of the spin quantization axis is generated by the spin angular momentum operators, Since this is a rotation in three-dimensional space, it is helpful to write the parameters θ a as θ a = θn a , wheren a are the components of a unit vector. We can then interpret the transformation U (θ) = e iθn a S a = e iθn·S , as a rotation about then axis by the angle θ. Perhaps this is something you've already seen in your quantum mechanics class. We can write this in a simpler form by using some properties of the Pauli matrices. The two key properties are the commutator and anti-commutator, We can then write the product of two Pauli matrices as Let's use this to evaluate (n · σ) 2 , To get from the third to the fourth line, we noticed that the contraction of the symmetriĉ n anb with the antisymmetric ε abc vanishes, and to get to the last line we used the fact that n is a unit vector so |n| = 1. With this in the back of our mind, let's turn to the rotation (466), We can divide the infinite sum into two pieces: the terms with n even, and the terms with n odd, This is a useful representation which will be helpful on one of the homework problems.
Having acquainted ourselves with the basics of Lie groups, let's get back to the physics!

Quantum Chromodynamics
We'll now consider the strong interactions through the lens of Quantum Chromodynamics (QCD). The fundamental particles in this theory are the quarks, which were introduced in the early 1960's to explain observed patterns of strange and non-strange hadrons. At the time, these quarks had three known flavors: up, down, and strange. 41 They are all spin 1/2 fermions, and the up quark has charge +2/3 while the down and strange quarks have charge −1/3. One of the key physical ideas of the time was SU (3) flavor symmetry-a generalization of isospin, which made qualitative and approximate quantitative predictions about the hadrons. The strange quark is also noticeably heavier than the up and down quarks, which means the SU (3) flavor symmetry of rotating the quarks into each other has significantly more explicit breaking then isospin. 42 However, aside from this mass difference the quarks interact with one another in the same way. Other particles which interact via the strong interaction are made out of quarks, and come in two varieties. Baryons are made out of three quarks (and thus are fermions), and mesons are made out of a quark-antiquark pair (and are bosons). Given these new particles, one generalizes the SU (2) isospin symmetry to an SU (3) symmetry, with the extra degree of freedom being the strangeness of the particle. One can then use the properties of the SU (3) group (along with perturbation theory) to make predictions for the masses of other particles. At the time the quark model was developed, group theory was not yet a standard tool of particle physics, and not everyone was wellversed in its use. A major advantage of the quark model was that it effectively did the group theory for you, provided the quarks had the properties outlined above. In this capacity, the quark model was successful in accounting for many of the observations of the day.
However, at the time of the model's invention, it wasn't actually clear whether quarks were real particles, or just a useful mathematical device for working out the details of SU (3) flavor. After all, no one had ever been able to observe an isolated quark! Today quarks are considered real particles, but they are confined : they exist as individual entities inside of hadrons, but cannot be pulled apart. One simple intuitive way to understand how this can come about is for the theory to have the property that before you have enough energy to the quarks apart, you would have enough energy to pull a whole new hadron out of the vacuum.
It was also noticed that the quark model had a seemingly fatal flaw: it appeared to violate the spin-statistics theorem. 43 We can consider, for example, the ∆ baryon. It is a low-energy excitation (in a simple model, it can be thought as a simultaneous flip of the spin and isospin of a nucleon) so we expect it to be in a spatial s-wave state (such that the energy from orbital angular momentum is minimized). At this point, recall that an s-wave configuration is spatially symmetric under particle interchange.
As we may have mentioned in the previous section, the ∆ has isospin I = 3/2. Since each quark has isospin 1/2, this means that the ∆'s three constituent quarks are in an isospin symmetric combination. 44 The ∆ also has spin s = 3/2, and since the quarks are all spin 1/2, they are in a spin-symmetric configuration as well. Having exhausted all of the ∆'s quantum numbers, it seems to be a fully symmetric state of three quarks. However, the quarks are fermions, and basic quantum mechanics tells us that they have to exist in an anti-symmetric state. This is obviously a major problem! To fix this, we can postulate that the quark has some other property, let's call it color, and the ∆ is antisymmetric with respect to it. In fact, Maryland's Wally Greenberg was the first to introduce something equivalent to color, but it didn't catch on with community in any significant way until Nambu re-formulated it in a more transparent fashion. The idea is that a quark (in addition to its flavor) comes in three colors: red, blue, and green. The strong interaction is taken to be completely symmetric with respect to color, i.e. if we 43 This is the statement that integer spin particles are bosons and half-integer spin particles are fermions.
Basically, it comes from the fact that if you try to quantize a spinor (half-integer spin) field with (bosonic) commutation relations, or a scalar/vector (integer spin) field with (fermionic) anti-commutation relations, then very bad things happen. 44 1/2 + 1/2 + 1/2 = 3/2 arrange to colors as a column, QCD is invariant under transformations where U is an SU (3) matrix. We can understand confinement as the statement that all physical states are color neutral, or white. For example, baryons are configurations of three quarks that are antisymmetric with respect to color, Similarly, a meson is the color-neutral quark-antiquark combination To translate these ideas into mathematics, we start by introducing the quark field, q, which has three different sets of indices. First of all, the quark is a fermion, so we represent it as a Dirac spinor, and thus it has an index α = 1, 2, 3, 4 which specifies the four spinor components. The quark comes in one of three flavors, so it has a flavor index f = 1, 2, 3 which tells us whether it is up, down, or strange. 45 Finally, it has a color index a = 1, 2, 3 which tells us what color the quark is. So, if we wanted we could write the quark field as q α,f,a . As we're about to see, it is the color index which is central to QCD, so we'll typically suppress the flavor and spinor indices and just write q a . Color indices will always be taken from the beginning of the roman alphabet a, b, c, . . . and written downstairs.
Having established notation, we can now state the core ideas of QCD. We mentioned that the strong interaction is invariant under a rotation of the color basis as in (474). The idea of a non-Abelian gauge theory such as QCD is to promote this global (U is independent of spacetime) symmetry to a much stronger local symmetry, where U (x, t) is a spacetimedependent SU (3) matrix acting on the color indices. That is, we require that is a symmetry of the theory. This is completely analogous to requiring local phase invariance in E&M, just now we are dealing with the non-Abelian group SU (3) rather than the Abelian group U (1). Since the quarks are fermions they should be described by the Dirac Lagrangian, 45 Again, we are neglecting the charm, top, and bottom quarks.
Let's see how this behaves under the transformation (477), where the extra term generated by the derivative acting on the color matrix is written in blue. Just as in the Abelian case, this blue term means that the Dirac Lagrangian is not gauge-invariant. To get rid of this we can use the same trick we used in QED, which was to introduce a covariant derivative D µ that cancels the extra term. To do this, we need to introduce a gauge field A µ , which is analogous to the electromagnetic potential. The gauge covariant derivative is then where g is a coupling constant. Replacing ∂ → D in the Dirac Lagrangian, we have By requiring this Lagrangian be gauge invariant, we can deduce the transformation of A µ under a color rotation. It's not hard to show the correct transformation is We can check this is correct by considering the transformation of (481), So, everything does indeed work out and this version of the Dirac Lagrangian is gauge invariant. Let's now take a step back and pay a little more attention to this object A µ . First of all, notice that it is a 3 × 3 matrix in color space! The simplest way to see this from the second term in the transformation (482), which is a product of U 's and is thus clearly a matrix.
The gauge field must also be Hermitian, since a term likeq(g / A)q appears in the Lagrangian, and hence will also appear in the Hamiltonian, which is an observable quantity that must be represented by a Hermitian operator. It's easy enough to require A µ be Hermitian, but it must stay Hermitian under gauge transformations, and the i(∂ µ U )U † term in (482) looks like it could be a problem, in that it is not manifestly Hermitian. The key point is that U is unitary, so U U † = 1. If we differentiate this, we obviously have ∂ µ 1 = 0, so ∂ µ (U U † ) = 0 as well. Using the product rule, this means Multiplying by i, we have The left-hand side is the term in the transformation we're worried about. Note that it's Hermitian conjugate is which is precisely the right-hand side of (485). So, i(∂ µ U )U † = [i(∂ µ U )U † ] † , which is to say the term is Hermitian, and we have no problems. Further, A µ must be traceless. The reason is that we are concerned with rotations in color space, and the trace just gives us an overall phase rotation that doesn't mix up the colors. So, in all we find that the gauge field A µ is a 3 × 3 traceless Hermitian matrix.
But, recall from our group theory review that the set of all 3 × 3 traceless Hermitian matrices comprise the Lie algebra of the group SU (3)! This means that A µ lives in the Lie Algebra, and thus can be written as a linear combination of the generators of SU (3), which are called the Gell-Mann matrices, and are denoted by λ a with a = 1, . . . 8. 46 You can think of these as the SU (3) version of the Pauli matrices. So, we can write In case you're curious, the Gell-Mann matrices are These are chosen such that they have the standard normalization tr λ a λ a = 2, just like the Pauli matrices. Now, the A µ field is a matrix in color space, while the A a µ are just scalars in color space, and are thus easier to work with. The spacetime index µ just tells us that all of these objects transform like vectors under Lorentz transformations, which is not our prime concern right now.
Given the generators, we also know that we can write any SU (3) color rotation as and that they close under commutation, [λ a , λ b ] = if abc λ c . The structure constants for SU (3) are much more complicated than those of SU (2). We won't write them all down, but for example [λ 1 , λ 2 ] = iλ 3 , [λ 1 , λ 4 ] = iλ 7 , and you can look up all of the rest. So far, we've discussed just a single quark, but it's trivial to generalize our theory to all of the flavors. We simply reinstate the flavor index and sum over it, where we've allowed the flavors to have different masses m f . There's one final ingredient missing: the A a µ fields are not dynamical. That is, if we find the equations of motion for A a µ , we just get fq f γ µ λ a q f = 0. ( This doesn't tell us anything about A a µ , which is to say that it has no role in the dynamics of the theory. To give the gauge field dynamics we need to add a gauge invariant, renormalizable kinetic term (that is, it must include derivatives of A a µ ) to the Lagrangian. If we also want to maintain discrete symmetries, particularly time reversal, there is only one such term that exists: where the trace is over the color indices. We can simplify this by expanding In the last line, we've defined the components of the non-Abelian field strength, In terms of which we can rewrite the gauge field Lagrangian as simply which looks just like the Maxwell Lagrangian! Just like the electromagnetic potential field A µ represents the photon in QED, the eight fields A a µ represent gluons in QCD, which are the massless gauge bosons that mediate the strong force, just as photons mediate the electromagnetic force. However, the extra terms in the gluon Lagrangian due to the non-Abelian nature of the SU (3) gauge symmetry give the gluons a much richer set of interactions. First off, note that since the gluon fields A a µ carry a color index, they possess color charge, in contrast to the photon in QED which does not carry electric charge. As a result, the gluons can have self-interactions.
To see this more concretely, let's consider some of the terms in the gluon Lagrangian and their corresponding diagrams. Denoting the gluon propagator as a curly line, we have a three gluon interaction, as well as a four gluon interaction, This means that the gluons are self-interacting in a non-Abelian gauge theory, owing to the fact that they carry color charge and can therefore interact via gluon exchange. Recall that this is not the case in QED: the photon does not carry charge, and thus there are no direct photon-photon interactions. The existence of the gluon-gluon interactions are responsible for many of the rich features of QCD, and non-Abelian gauge theories in general.

Broken Symmetries
So far, these notes have mainly focused on the strong interaction, but the weak interaction is also of great interest if we wish to get a complete picture of fundamental physics. We arrived at QCD by promoting the global SU (3) color symmetry to a local gauge symmetry, and one could reasonably hope that a similar procedure could apply to the weak interactions. After all, one of the failures of the Fermi theory (discussed briefly in section 9.6) was that it was non-renormalizable, and introducing gauge fields could rectify this. The second shortcoming of the Fermi theory was that it conserved parity, whereas since the 1950s the weak interaction had been known to violate parity: nature is left-handed. This means that any gauge field we introduce should couple only to left-handed currents, which for quarks would be where the 1 − γ 5 projects out only the left-handed components, and F is some flavor dependent matrix. However, in a gauge theory this seemingly harmless requirement raises major issues: recall from section 11 that for the Lagrangian (including the source term A µ J µ ) to be gauge invariant, the current must be conserved, ∂ µ J µ = 0. So, a gauge theory of the weak interactions must have ∂ µ J µ L = 0. We can write the left-handed current (497) as the difference between a vector and axial current, where We now face two issues: first, a vector current is only conserved if the masses of all of the particles involved are the same. As we've discussed, the masses of the up and down quark are not quite the same, which is problematic if we want quarks to participate in weak interactions (we can't play the usual game of saying the symmetry is only approximate, as a gauge symmetry is a redundancy, not a physical symmetry, and thus must be exact in a consistent theory). Secondly, if we want the axial current to be conserved the particles must be massless (we saw this in section 7). Neglecting any complications due to multiple flavors, a fermion field transforms under an axial rotation as which, as is shown in one of the homework problems, implies that a mass term is not invariant under this transformation,ψ ψ → cos(2θ)ψψ + i sin(2θ)ψγ 5 ψ =ψψ. Despite these apparent obstacles, it turns out that by introducing some new ideas we can still successfully formulate the weak interactions as a gauge theory. Namely, we can imagine that there is a symmetry of the fundamental theory which ensures that particles are massless, but which is spontaneously broken in the universe (and energy scales) that we observe. Understanding this notion of spontaneous symmetry breaking in field theories will occupy us for the rest of the section. We will begin by briefly considering spontaneous symmetry breaking in the more intuitive contexts of single particle physics and then condensed matter physics (where the ideas were initially developed, and remain a cornerstone of the field) before moving on to consider its role in fundamental physics.

Symmetry Breaking in Single Particle Physics
Let's start by considering a simple classical mechanics problem: a particle confined to the xy plane and subject to the potential where R is a parameter with dimensions of length, and V 0 has dimensions of energy. This is called the Mexican hat or wine bottle potential (for those unused to wine, it turns out that red wine typically comes in bottles whose bottoms are raised in the center so that sediment can settle in the rim), and will appear in several different contexts within this section. In this case, it is simply the potential experienced by a single particle in real space. It is easiest to work in polar coordinates, where x = r cos θ, y = r sin θ, and the Lagrangian can be written Notice that the system is rotationally invariant (i.e. ∂L ∂θ = 0) and in accordance with our discussion of Noether's theorem, the angular momentum J = mr 2θ is conserved (J = 0). Now, let us consider the ground state of the classical theory. We don't need to perform any calculations to realize that the system's energy will be minimized if the particle sits at rest at some point along the rim (i.e. the minimum) of the potential at r = R. However, notice that this state is not rotationally invariant: the particle must "pick" one point along the rim, and that point will move under rotations. This is in fact the definition of spontaneous symmetry breaking: the ground state of the system does not respect one of the symmetries of the Lagrangian. Note that this phenomenon is not generic to any conceivable system: if we instead had the potential V (r, θ) = r 2 the particle would reside at r = 0 in the ground state, maintaining the rotational variance of the Lagrangian.

Vibrational Modes
There is an interesting general result that holds as a consequence of symmetry breaking in classical single particle physics. Namely, symmetry breaking is associated with a zero-frequency vibrational mode (this is the baby version of Goldstone's theorem which we discuss below). In classical mechanics, we can typically consider small deviations from the energy-minimizing configuration, and usually end up with oscillatory modes with a frequency where r 0 is the position of the minimum (this is just a linearized restoring force). In the presence of a symmetry, these modes become degenerate: for example the rotational symmetry of a bowl-shaped potential means that climbing up the wall in the x or y (or any linear combination) direction both meet the same restoring force, and hence the oscillations have the same frequency. On the other hand, if the particle is in a Mexican hat potential, it can roll around the rim of the hat with no restoring force, corresponding to a zero-frequency mode. This is a useful picture to have in one's head when dealing with the more abstract realizations of symmetry breaking discussed in the remainder of this section.
If we consider this problem quantum mechanically, it turns out that the ground state has zero angular momentum: the wave function spreads itself evenly about the rim of the potential in a rotationally invariant manner, and thus there is no spontaneous symmetry breaking. However, if we consider a system with an infinite number of degrees of freedom (i.e. a field theory), the story can become more interesting, owing to the fact that the ground state can become infinitely degenerate.

Symmetry Breaking and Phase Transitions
At some point in your statistical mechanics class, you've probably met the Ising model, which describes a bunch of "spins" living on a lattice (let's say in two dimensions) governed by the Hamiltonian This model represents a bunch of spins σ sitting on a lattice (let's say in three dimensions) whose sites are labeled by i. Each spin can take only take one of two values: σ z i = ±1, i.e. it can point up or point down. To make things interesting, the spins can talk to their nearest neighbors through the interaction in the Hamiltonian, where the symbol i, j means that we should sum over all pairs of nearest neighbors (indicated as green links in the figure). If J > 0 the energy of a pair will be minimized if σ i σ j = 1, i.e. neighboring spins want to line up and point in the same direction, while if J < 0 the spins will want to anti-align. Notice that this Hamiltonian has a Z 2 symmetry, under which we can flip every spin on the lattice, Now, let's consider the ground state of the system, let's say for J > 0. The energy of the system will be minimized if all of the spins point in the same direction, either all up or all down. That is, we have two degenerate ground states. However, if all of the spins point in the same direction, the ground state is no longer invariant under the Z 2 transformation: flipping all of the spins turns one ground state into the other! So, the symmetry of the Hamiltonian is not a symmetry of the ground state, and thus we say the Z 2 symmetry has been spontaneously broken. If we consider this system at high temperatures, it is unlikely that we will find it in its ground state. This is simply because entropy now comes into play and there are many more configurations with random spin distributions than there are configurations with all of the spins aligned. Such "random" states will respect the Z 2 symmetry of the Hamiltonian, since on average there will be equal numbers of up and down spins or, put another way, the average spin of the system, or magnetization, σ i ≡ m will be zero. This disordered high temperature state is qualitatively very different from the ordered ground state. In fact, they represent two distinct phases of matter which we can distinguish by their symmetry properties. At high temperatures the system respects the Z 2 symmetry and the magnetization vanishes, so we call this the paramagnetic state, while at low temperatures the spins line up, breaking the Z 2 symmetry and giving rise to a nonzero magnetization, in what is called the ferromagnetic state. This structure is generic, and is the foundation for the modern theory of phase transitions, as initially developed by Lev Landau in the 1950's.
In fact, its very easy to extend this to more sophisticated model. If we allow the spins to now point in any direction in three-dimensional space, which amounts replacing the binary variables σ z i on each site with unit vectors S i , we have the Heisenberg model, This model has a continuous O(3) rotational symmetry (under which S i → R S i with R an orthogonal matrix), reflecting the spatial isotropy of the system. However, in the ground state all of the spins align, "choosing" a particular axis and breaking the rotational symmetry of the Hamiltonian. Now, there are an infinite number of different ground states, each corresponding to a different magnetization axis. This can be visualized using the Mexican hat potential, where we now interpret the radial direction as the magnitude of the magnetization (which is fixed in the ground state and determined by the parameters of the system) and the angular direction as the possible spatial orientations of the magnetization vector. The rim of the hat corresponds to the degenerate ground state manifold, each point along it representing a different possible ground state configuration. Note that now the Mexican hat lives in the space of possible magnetization vectors, not real space. The Mexican hat potential will inhabit an analogous space of possible field configurations in particle physics models, to which we will now turn.

Symmetry Breaking in Field Theories
In the previous section, we saw that simple models of magnets can have multiple degenerate ground states, and spontaneously break the symmetries of their Hamiltonian. The same can occur in models of fundamental physics, in which case the different ground states correspond to different vacuua: all of which are equally valid but quantum mechanically disconnected vacuum states of our universe. The notion of spontaneous symmetry breaking was first introduced to field theory in an effort to explain why the pion was anomalously light (the pion has a mass of ∼ 135 MeV, which is five times lighter than other non-strange mesons). To explain this, it was conjectured that the strong interactions enjoyed a second approximate symmetry (in addition to isospin) that was then spontaneously broken by our vacuum. To illustrate this idea, we will consider a toy model initially introduced by Gell-Mann and Levy, defined by the Lagrangian where σ is a (Lorentz) scalar field which also transforms as a scalar under isospin rotations and π = (π x , π y , π z ) are (Lorentz) pseudo-scalar fields which transform as a vector under isospin rotations. We also have the parameters λ and f and, to be explicit, This Lagrangian is invariant under O(4) rotations of the σ and π fields into one another. That is, we can mix up the definitions of the fields using orthogonal matrices, like In the vacuum state, the energy of the system will be minimized, including the potential term This is of course minimized when σ 2 +π·π = f 2 . This condition specifies a three-dimensional surface in the four-dimensional space of possible field configurations, and can be thought of as the higher-dimensional analogue of the rim of the Mexican hat shown above. Any point along the surface represents a different vacuum state, and in "choosing" one to occupy, the O(4) symmetry of the Lagrangian is spontaneously broken. Although any such point is equally valid, for the sake of concreteness it is useful to pick a particular vacuum, let's say along the σ direction. That is, we can write where f is the vacuum value of the field, and the fluctuations around it, δσ, are what we see as the physical field. Substituting this decomposition back into the Lagrangian (508), If we expand the potential term, we can see the full set of interactions contained in the theory, V = λ f 2 δσ 2 + f δσ 3 + 1 4 δσ 4 + 1 4 π · π 2 + f δσπ · π + 1 2 δσ 2 π · π .
The first term tells us that the δσ field has a mass of m 2 σ = 2λf 2 , and the remaining terms encode the allowed processes, Notably, there is no π · π term which would indicate the pion is massive. That is, the spontaneous symmetry breaking has rendered the pion massless! This is actually a general result, known as Goldstone's theorem: whenever a continuous symmetry is spontaneously broken, there are massless particles in the spectrum (in the absence of long-ranged forces). The massless particles, in this case the pion, are called Goldstone bosons.
You may have noticed that we left out the four-pion interaction in (515). This is because there is no pion-pion scattering at zero momentum transfer in the theory with spontaneously broken symmetry. Recalling that the momentum-space propagator for a scalar field is 1/(q 2 − m 2 ) and that the mass of the σ is m 2 σ 2λf 2 , we can evaluate the tree level diagrams, Summing these contributions, we see the matrix element for this process is zero, This is also a generic result: Goldstone bosons do not couple to themselves or other particles at zero momentum transfer. Of course, in reality things are not so simple: the pion isn't massless! To reflect this, we should suppose that there is a small explicit breaking of the O(4) symmetry: that is, that the symmetry is only approximate. Then, the approximate symmetry is spontaneously broken, as discussed above. This can be visualized as tipping the Mexican hat to favor a particular point along the rim. This can be implemented by adding a small symmetry-breaking term to the Lagrangian, where Λ is a (small) energy scale. In the presence of this term, the O(4) symmetry is only approximate, and the pions are light, but not massless, and called pseudo-Goldstone bosons. This also means that pion-pion scattering is not exactly zero at zero momentum, but is also very small. Although this is only a toy model, not an accurate depiction of reality, it does possess some of the gross features of QCD, in particular its pattern of symmetry breaking. Without the small explicit symmetry breaking, the theory has an O(4) symmetry under rotations of σ and π. The group O(4) has six generators (recall our brief discussion of group theory in the previous section), which correspond to six conserved currents. In this particular model, the currents are J a 5,µ = σ∂ µ π a − (∂ µ σ)π a .
The first three are the standard conserved currents associated with isospin invariance (the indices a, b, c = 1, 2, 3 run over the components of the pion in isospin-space), while the second three are axial currents which arise from our ability to mix the σ with the three pions. It is important to note that the axial currents have positive parity-which is the opposite of negative parity which one usually expects in a vector; recall that under parity x → −x. These are often denoted as "axial vectors" or "pseudovectors". The magnetic field is an example of an axial vector. Returning to our model, can write down the conserved charges associated with these currents, which form an algebra in that they close under commutation, When the symmetry is spontaneously broken, the vacuum expectation value of σ is nonzero, σ = f , while π = 0. Now, the vacuum is no longer invariant under an axial rotation which mixes σ into π, since it will change the observables σ and π . That is, in the symmetry-broken vacuum the axial rotatation symmetry is broken and the axial currents are not conserved. This corresponds to breaking three of the generators of the O(4) group, which by Goldstone's theorem implies that we will have three Goldstone bosons: the pions. Adding back the small explicit breaking will give them a small mass, as explained earlier.
As any group theorist can tell you, O(4) is isomorphic (that is, structurally identical) to SU (2) × SU (2). In fact, one can construct linear combinations of the conserved charges (521) such that they break into two groups which close among themselves. Defining the leftand right-handed (or chiral ) charges to be one can show that their commutation relations are simply The case is extremely similar in the actual theory of the strong interactions, QCD.In fact, this model has the same approximate symmetry as QCD in the limit that the up and down quarks are massless. Again, the up and down quarks are not actually massless, but they are very light-light enough that this is a very useful idealization. To be concrete, let's consider QCD with just the up and down quarks (the others are irrelevant for this line of argument), with the Lagrangian where q = (u, d), so the second two terms are just a more symmetric way of writing − 1 2 m uū u− 1 2 m dd d. Under a (vector) isospin rotation through a small angle θ 1, the quark fields transform as q → e iθ·τ q ≈ q + iθ · τ q q →q e −iθ·τ ≈q − iq θ · τ .
We dropped terms of order θ 2 in the second line, and used some SU (2) identities to get to the third. If m u ≈ m d this extra term is small, and the vector rotation is an approximate symmetry of the theory. Of course, this is nothing but isospin invariance! We can now consider axial rotations, under which the quarks transform (again for small θ), q → e iθ·τ γ 5 q ≈ q + i θ · τ γ 5 q q →q e −iθ·τ ≈q − iq θ · τ γ 5 .
In this case, neither of the last two terms of (525) are invariant, and the axial rotation generates the extra terms (to first order in θ) δL = −i(m u + m d )qθ · τ γ 5 q − i(m u − m d )qθ 3 γ 5 q .
Given the first term, the axial rotation is only a symmetry if both the up and down quarks are massless. Invariance under axial rotations is known as chiral symmetry. In reality, the quarks are light compared to other hadronic scales, so chiral symmetry can be thought of as an approximate symmetry of QCD. This approximate symmetry is also spontaneously broken in our vacuum. The three broken generators correspond to three pseudo-Goldstone boson-the technical name for particles that in the absence of a small explicit symmetry breaking would be massless; these are, in fact, the pions seen in nature.

The Higgs Mechanism
We started the last section discussing the challenges of framing the weak interactions as a gauge theory. In particular, such a formulation would require the conservation of axial currents, which in turn requires that the particles involved are massless (as we just saw). Of course, this is a problem since not all particles which participate in weak interactions are massless. It is sensible to try to use the mechanism of spontaneous symmetry breaking to rectify this, however the situation is slightly more subtle in the case of gauge symmetries. For one, the symmetry must be exact (since gauge symmetries always are), and we run into the slightly confusing issue of what it actually means to "break" a gauge symmetry: after all, we introduced the gauge symmetry as a redundancy in our description, not a physical symmetry. It turns out that the math behind spontaneous gauge symmetry breaking (otherwise known as the Higgs mechanism) in the electroweak sector of the standard model gets rather messy. To avoid complications, we will consider here a simpler model called the Abelian Higgs model which demonstrates many of the important concepts without excessive amounts of algebra. In particular, it shows how a gauge theory without massive fermions at the level of the underlying Lagrangian can have massive fermions in the physical spectrum. The model has a scalar field coupled to U (1) gauge field, as opposed to the actual theory of the electroweak interactions which has the gauge group SU (2) × U (1). We'll start by first considering breaking a global U (1) symmetry before moving onto the Higgs mechanism in the gauged theory. Afterwards, we'll briefly note what is different in the more complicated case of electroweak symmetry breaking.

Global U (1) Symmetry Breaking
Consider a complex scalar field, H, with a Mexican hat potential, Instead of using the fields H and H , it will be convenient to parameterize the two degrees of freedom of the complex scalar field by two real fields h and θ, corresponding to the amplitude and phase: The field h represents fluctuations in the radial direction of the Mexican hat, while θ specifies the angular position along the rim (see figure). In terms of these fields, the derivatives of H and H are and thus the kinetic term becomes and the Lagrangian is In the vacuum state, the energy of the system will be minimized. To minimize the potential energy (the last term above), we must have h = v. That is, the field h acquires a nonzero vacuum expectation value, breaking the global U (1) symmetry since rotating h → e iα h is not a symmetry of the vacuum. We can then write the h field as its vacuum value plus fluctuations around it, which become the physical field, Plugging this expansion into the Lagrangian (536), we have L = ∂ µ (δh)∂ µ (δh) + v 2 ∂ µ θ∂ µ θ + 2vδh ∂ µ θ∂ µ θ + δh 2 ∂ µ θ∂ µ θ − λ 2 2 4v 2 δh 2 + 4vδh 3 + δh 4 .
This is fairly complicated, but if we want to consider the behavior of the theory near the vacuum, we can assume δh is small and only keep terms to second order in the fluctuations, leaving us with L = ∂ µ (δh)∂ µ (δh) + v 2 ∂ µ θ∂ µ θ − 2λ 2 v 2 δh 2 + . . .
We see that there is no term quadratic in θ, implying that the field is massless. That is, θ is the Goldstone boson associated with the broken U (1) symmetry. On the other hand, there is a term quadratic in δh, indicating that it is massive. Recalling that the mass term for a real scalar field is − 1 2 m 2 h 2 , we can read off the mass of δh to be m = √ 2λv.
So, we see δh is not a Goldstone boson, but rather a massive scalar field (sometimes called the amplitude Higgs mode). This is to be expected, since the broken U (1) symmetry has only one generator and thus Goldstone's theorem only gives us only one massless particle. With this structure in mind, let us now consider the gauged version of the theory.

The Abelian Higgs Model
Let's now suppose the field H is charged. We should then replace the derivatives with covariant derivatives, (D µ ≡ ∂ µ + ieA µ ) and add a dynamical term for the gauge field. With no symmetry breaking this model is usually called scalar QED, and has the Lagrangian We can read off the interactions in this theory to be ie∂ µ H A µ H = (542) where the dashed line is the H propagator and the wavy line is the gauge field propagator. We can also see that it has a conserved Noether current We can now ask what happens when we replace the simple mass term for the H field with a Mexican hat potential, changing the Lagrangian to Let's notice two things immediately. First of all, the Mexican hat potential is (perturbatively) renormalizable, so adding it is a sensible thing to do. Also, you can check that the Noether current (544) is still conserved, which might not be what one would suspect for a theory with a spontaneously broken symmetry. As in our pevious analysis of a theory with a broken global symmetry, let's write the scalar field as an amplitude and phase, H = h e iθ . Using the derivatives of H calculated above, we can rewrite the Lagrangian as Following the same argument as before, the vacuum wants to minimize its energy, and to satisfy the potential the amplitude field h will be pinned to v, prompting us to expand h = v + δh. Plugging this expansion into the Lagrangian will give us something very messy and complicated, so we'll only keep terms to second order in δh and ∂ µ θ, both of which should be small for states in proximity to the vacuum. We then have where the three dots indicate terms beyond quadratic order in δh, A µ or θ, Notice that θ only appears in combination with A µ , leading us to change variables and define a new vector field B µ = A µ + 1 e ∂ µ θ .
Notice that I called this a vector field, not a gauge field. This is because under a gauge transformation, A µ → A µ − ∂ µ Λ and H → e ieΛ H, which implies that the phase θ transforms as θ → θ + eΛ. Together, this implies the transformation property of the new field B µ is That is, it is a gauge-invariant vector field! Notice also that F µν is invariant under this field redefinition, since In terms of this new vector field, the Lagrangian is We can now stop and take stock of the field content of our theory. We have a massive Higgs field δh as in the previous example, but we also notice that two remarkable things have happened: first of all, the Goldstone mode θ has completely disappeared! Secondly, the "photon" field A µ , now called B µ has acquired a mass, given the presence of the third term which is quadratic in B µ . Given that our previous discussions of gauge theories led us to conclude that a photon mass is prohibited by gauge invariance, this is quite a surprise. It is also important to note that the Lagrangian is still gauge invariant, and has been throughout our discussion. These two results (the vanishing of the Goldstone mode and the generation of mass for the gauge field) are intimately related, and collectively referred to as the Higgs mechanism 47 To see their relation, let's count degrees of freedom. A propagating massless photon has two polarizations (the two directions perpendicular to the direction of propagation) which correspond to two degrees of freedom, and the Goldstone mode θ is a real scalar field, which has one degree of freedom. Meanwhile, the massive vector field that we end up with has two transverse polarizations as well as a longitudinal polarization, corresponding to three degrees of freedom. The interpretation is now clear: the would-be Goldstone boson is absorbed into the photon field, giving it an extra polarization and a mass. Or, as Sidney Coleman famously said, the gauge field eats the Goldstone boson and becomes fat.
One implication of the gauge field acquiring a mass is that the interaction it mediates becomes finite-ranged. We can see this intuitively by appealing to our knowledge of the Yukawa force: the range of an interaction mediated by a particle of mass m decays like e −mr .

Aside: The Meissner Effect
One might ask if the Abelian Higgs model actually describes anything in the real world. As far as particle physics is concerned, the answer is currently no, but it turns out that the non-relativistic limit of the Abelian Higgs model is a realistic model of a superconductor called the Ginzburg-Landau model (which predates the Higgs mechanism by over ten years). More broadly, superconductivity is really best understood as the Higgs phase of electromagnetism (that is, we interpret the gauge field as the actual photon). One of the most striking properties of superconductors is the expulsion of magnetic flux, or Meissner effect. Essentially, this is the statement that magnetic fields cannot exist inside a superconductor (and the reason those superconducting levitation demonstrations work). This can easily be understood in terms of the Higgs mechanism: if the photon has a mass and decays like e −mr in the superconductor, magnetic fields 47 This idea is not just due to Higgs. In the context of particle physics it was independently developed by a number of people including Higgs, Kibble, Englert, and Polyakov. However, the idea was actually first developed by Phillip Anderson several years earlier in the study of superconductivity. In light of this, the condensed matter community (and the historically accurate) tend to refer to this as the Anderson-Higgs mechanism.
will be exponentially suppressed in the bulk of the material (Why not the electric field too? Because superconductors are non-relativistic so the two are not treated equally, and I am intentionally sweeping a lot of complications under the rug for the sake of simplicity). In the field of superconductivity, the length scale over which the magnetic field can penetrate is called the London penetration depth, λ = 1/m.

Electroweak Symmetry Breaking
The Abelian Higgs model we've discussed so far is really just a toy model, and doesn't play any role in the standard model or particle physics in general. However, the Higgs mechanism is central to the electroweak sector of the standard model, as originally formulated by Weinberg and Salam in the late 1960s. In this model, the gauge symmetry acts only on the left-handed quarks, electrons, and neutrinos, but not on the components. As we've seen, this chiral decomposition implies that the quarks are massless. To implement the SU (2) × U (1) gauge symmetry, we introduce four gauge fields (three for the SU (2) and one for the U (1)) which are massless as required by gauge invariance. We then add a scalar Higgs field which couples to the gauge fields as well as to the fermions through a Yukawa-like interaction. The Higgs field lives in a Mexican hat potential, spontaneously "breaking" the symmetry and causing it to acquire a non-zero vacuum expectation value. Schematically, this gives us terms like h ψ ψ for the fermions, which for h = 0 corresponds to a mass term. In this way, the quarks and other particles acquire a mass via the Higgs mechanism.
The story with the gauge fields is a little more complicated, because the SU (2) × U (1) symmetry is not completely broken: there is a U (1) symmetry that survives. Perhaps confusingly, this is not the same U (1) symmetry that appears in the direct product of the original symmetry group (by which we mean that the local symmetry is mediated by a different gauge field). This residual symmetry is usually called U (1) EM because it is precisely the familiar U (1) symmetry of electromagnetism. In light of this, we end up with one massless U (1) gauge field -the photon -and the other three gauge fields acquire masses via the Higgs mechanism. These are the W ± and Z bosons which mediate the weak interaction.
Aside: What does it mean to "break" a gauge symmetry?
It may be troubling that we are talking about "breaking" a gauge symmetry, considering we introduced gauge symmetries as unphysical redundancies which ease our description of massless vector fields. "Breaking" a gauge symmetry implies that two gauge equivalent states are no longer physically equivalent, i.e. our arbitrary choice of gauge now matters! Surely, this cannot occur in any consistent theory.
The phrase "spontaneously broken gauge symmetry" is actually an abuse of terminology, as the gauge symmetry is not actually broken. However, it is a well-motivated phrase because the situation at hand looks extremely similar to spontaneous symmetry breaking. To see this, let's consider the vacuum state of the Abelian Higgs model. We argued that the amplitude is fixed at v, but the phase is arbitrary, leading us to conclude the vacuum configuration of the Higgs field is for any phase angle θ, At first glance, it looks like we have a continuum of degenerate ground states, each parameterized by the angle θ, and signalling that a symmetry has been spontaneously broken. However, we must recall that the Higgs field is not the only field in the game: we also have the gauge field, the vacuum configuration for which is a pure gauge, A µ = ∂ µ ω for some scalar field ω, such that the energy density F µν F µν = 0. The vacuum state of the system is then really Now we can consider making a gauge transformation, under which the vacuum state of each field transforms H → ve i(θ+eΛ) , A µ → ∂ µ (ω + eΛ) This shows that rotating the phase of the Higgs field (which we thought above signalled the breaking of a symmetry) is in fact nothing more than a gauge transformation between two gauge-equivalent (and thus physically identical) vacuum states. Thus, the symmetry breaking is merely an illusion brought about by the redundancy in our description. a a Although the local symmetry is not broken (since such a thing would not make sense), there is a global subgroup that is broken. But, this is a technical detail of secondary importance to our discussion.
cover only the material for a one semester course, so they're missing coverage of some of the fancier topics we previewed, such as renormalization and non-Abelian gauge theories. Tong also has excellent notes on other topics; in particular his lectures on gauge theory are a superb (albeit somewhat advanced) reference.
Quantum Field Theory by L. Ryder, ISBN-13: 978-0521478144. This book is a little old, but the explanations are simple and clear. It also de-emphasizes perturbative methods, which you may or may not like.
Quantum Field Theory and the Standard Model by M. Schwartz,ISBN-13: 978-1107034730. This is a comprehensive and well-written introduction to QFT. However, it's not as easy reading as the previous books listed.
Condensed Matter Field Theory by A. Altland and B. Simons, ISBN-13: 978-0521769754 . Field theory is an important tool in condensed matter physics as well as particle theory, as was hinted at by our discussion of the Ising and Heisenberg models of symmetry breaking in ferromagnetism. We include this book not just for the sake of variety, but also because it is extremely well-written and from a modern perspective.
(c) Show that r 2 , defined as r 2 ≡ Z −1 d 3 r r 2 ρ(r) is given by The square root of r 2 is called the charge radius.

4.
Suppose that a charge of Ze is uniformly distributed over a sphere of radius R, and the distribution is taken to be static.
(a) Compute the form factor g E (q 2 ) (b) Verify explicitly that g E (0) = Z for this distribution (c) Show from the definition of r 2 that r 2 = 3 5 R 2 (d) Verify explicitly that r 2 = − 6 Z dg E dq 2 q 2 =0 for this distribution (e) Consider nonrelativistic electron scattering off of the static charge distribution given in this problem. Suppose that the initial momentum of the electron is b/R, where b is a constant and we use units where = 1. Find an expression for the ratio of the differential cross section for this process to the differential cross section for electron scattering off of a point charge of magnitude Z as a function of the scattering angle θ. Hint: express the momentum transfer in terms of b,R, and θ.

5.
We briefly discussed the Fermi gas model for nuclear matter.
(a) We showed that the density for nuclear matter in this model is given by ρ N M =  15.2 Problems for Sections 4, 5, and 6 6. The metric tensor, g µν is a four-tensor, which means that if one transforms into another inertial frame (denoted with a prime), g µν = Λ µ α Λ ν β g αβ , where Λ is the matrix for the Lorentz transformation. Using the fact that the product of two Lorentz vectors A µ B µ = g µν A µ B ν is the same in all frames, show that for an arbitrary Lorentz transformation g µν = g µν . That is, show that component-by-component the values of the matrix elements of the metric tensor are the same in both frames.
7. In our discussion of the Klein-Gordon equation we implicitly assumed that we coupled the Klein-Gordon field to a static source localized at the nucleon. In this problem we will consider a more general situation: the field coupled to an arbitrary Lorentz scalar source J s , that may depend on spacetime. The Klein-Gordon equation is then (∂ µ ∂ µ + m 2 )φ(x, t) = J s (x, t).
(a) Suppose that I can find a function of spacetime, G(x, t) that satisfies Show that the the Klein-Gordon equation with a source is automatically solved if where the integrals are taken over all spacetime.
G(x, t) is called the Green's function of the Klein-Gordon equation. It reduces the sourced equation into a simple integral. In the context of quantum field theory it is referred to as the propagator, as it acts to propagate the field from a source at a spacetime point (x , t ) to the point (x, t).
It turns out that due to boundary conditions at infinity (which are ultimately tied to causality) there is an ambiguity in just taking S(q) = (q µ q µ − m 2 ) −1 . Feynman showed that the correct way to do this within the context of field theory is to instead take S(q) = (q µ q µ − m 2 + iε) −1 , where ε is an infinitesimally small positive constant.