Learning crystal field parameters using convolutional neural networks

We present a deep machine learning algorithm to extract crystal field (CF) Stevens parameters from thermodynamic data of rare-earth magnetic materials. The algorithm employs a two-dimensional convolutional neural network (CNN) that is trained on magnetization, magnetic susceptibility and specific heat data that is calculated theoretically within the single-ion approximation and further processed using a standard wavelet transformation. We apply the method to crystal fields of cubic, hexagonal and tetragonal symmetry and for both integer and half-integer total angular momentum values $J$ of the ground state multiplet. We evaluate its performance on both theoretically generated synthetic and previously published experimental data on CeAgSb$_2$, PrAgSb$_2$ and PrMg$_2$Cu$_9$, and find that it can reliably and accurately extract the CF parameters for all site symmetries and values of $J$ considered. This demonstrates that CNNs provide an unbiased approach to extracting CF parameters that avoids tedious multi-parameter fitting procedures.

CFs arise from time-reversal-even interactions between electrons (in f orbitals for rare-earth elements) and charges in their crystalline environment and are conveniently described by an effective electrostatic potential. The form of the CF potential is dictated by the point symmetry at the rare-earth site and contains a variable number of independent parameters [6,9,22]. For example, while the CF potential for f electrons is fully described by only two independent parameters for the cubic point groups G = m3m, 432,43m, there are 26 independent parameters for the lowest symmetry groups 1 and1 [23,24]. These CF parameters are notoriously difficult to determine in first-principle calculations [25], and are therefore best regarded as phenomenological parameters that are found from comparison to experimental results. While most accurate values of CF parameters are obtained from analyzing inelastic neutron scattering results [26,27], much insight can already be gained by much more straightforward measurements of thermodynamic observables such as the (magnetic part of the) * porth@iastate.edu specific heat c M (T ) as a function of temperature T , the magnetic susceptibility χ a (T ) along direction a, and the magnetization µ a (B, T ) in a finite magnetic field B. This approach allows investigating whole series of rare-earth compounds, which often provides a more complete understanding of a material class, as was demonstrated, for example, in Refs. [28][29][30].
Here, we focus on the method of extracting CF parameters from thermodynamic measurements that are performed in a regime above possible Kondo and magnetic ordering temperatures, where the rare-earth ion can be treated within the single-ion approximation [8,9]. We will also assume that the Russell-Saunders approximation is valid and spin-orbit coupling is stronger than CF, Zeeman and magnetic exchange energy scales: E Coulomb E SOC E CF , E Zeeman , E ex . Note that we will further focus on the case where the CF and Zeeman energies are larger than the exchange energy: Here, E Coulomb and E SOC refer to the isotropic Coulomb and spin-orbit interaction between N electrons within the 4f N electronic configuration of a single rare-earth ion, and E Zeeman = −µ B (L + 2S) · B with total orbital and spin angular momentum operators L and S. Under these assumptions, one can restrict the attention to the ground state J multiplet of the 4f N configuration that is derived from the three Hund's rules [10]. Its 2J + 1 sub-levels are only degenerate for spherical symmetry and split in a crystalline environment into a sequence of lower order multiplets. While their multiplicity is fully determined by site symmetry, the energies of the different levels as well as their wave functions depend in general on the values of the CF parameters.
To obtain the CF parameters from measurements of thermodynamic observables, one traditionally proceeds as follows. Starting from an initial guess of the CF parameters, one determines the energy levels and wave functions by diagonalizing the CF Hamiltonian H CF = arXiv:2011.12911v2 [cond-mat.str-el] 18 May 2021 q,k B q k C (k) q (J ). Here, the summation runs over a symmetry-allowed set of quantum numbers k and q with 0 ≤ k ≤ 2 , −k ≤ q ≤ k for a single-ion with orbital quantum number ( = 3 for f -electrons). The coefficients B q k are CF Stevens parameters and the CF operator "equivalents" C (k) q are expressed in terms of angular momentum operators J acting on the ground state J multiplet of the ion [5,8,9,31,32]. Various forms for the operators, which differ in their normalization convention, have been used in the literature and will be discussed below. Once the energies and wave functions are known, it is straightforward to calculate thermodynamic observables such as c M , χ a and µ a from the partition function in finite magnetic field (details are shown below). The theoretical result is then compared to experiment and the complete procedure is iterated with updated CF parameters until sufficient agreement is reached.
While this iterative process is straightforward in principle, it can be tedious and time consuming in practice, in particular for lower than cubic symmetries, where several fit parameters need to be optimized simultaneously. This is complicated by the fact that the impact on the thermodynamic response that is caused by modifying the CF parameters {B q k } is in most cases unknown and not straightforward to derive. This is a typical example of an "inverse problem" [33] that often occurs in science, where one wants to estimate parameters p characterizing the system (here the CF parameters) based on observations O (in our case thermodynamic observables). Given a model P (for us, the crystal-field Hamiltonian), it is straightforward to derive observables O = F P (p), but the inverse mapping p = F −1 P (O) is difficult to perform, in particular when the relation is non-linear as in our case; often, the inverse mapping is ill-conditioned or unstable and, thus, requires regularization.
Motivated by the multitude of recent explorations of machine-learning (ML) techniques in physics [34][35][36], in general, and the success of artificial neural networks and other ML approaches to attack complex inverse problems of physics [37][38][39][40][41], in particular, we here study how ML can be used to extract Stevens CF parameters from thermodynamic measurements. This data-driven approach to inverse problems is based on first computing a large set of training data {(p j , F P (p j ))|j = 1, 2, . . . }, which requires solving the (simple) forward problem for many values of p = p j . With this data set, a non-linear function is trained to reconstruct p j from O j = F P (p j ); the key challenge is to find a model that generalizes well for feasible training data sizes, i.e., that works on physically relevant samples that are not part of the original training set.
More specifically, we here employ a convolutional neural network (CNN) to parametrize the non-linear function performing the inverse operation: it relates thermodynamic observables, O = {c M (T ), χ a (T ), µ a (B, T )}, to a set of CF parameters p = {B q k }. We train the CNN on thermodynamic data for different site symmetries (cubic m3m, hexagonal6m2, tetragonal 4mm) and different val-ues of angular momentum J = 4 and J = 15/2. This corresponds to the rare-earth ions Pr 3+ (J = 4) and Er 3+ (J = 15/2) in different crystalline environments. The training data is obtained within the single-ion approximation, and further processed using a standard wavelet transformation before being fed into the CNN. We test the performance of the CNN on both calculated and previously published experimental data on CeAgSb 2 [28,42], PrAgSb 2 [28] and PrMg 2 Cu 9 [30]. We find that our CNN architecture generalizes well for moderately large training data sets and for all site symmetries and values of J considered. It also provides good estimates of the Stevens parameters from experimental data.
The remainder of the paper is organized as follows. In Sec. II, we review the single-ion approximation, define our notation of the Stevens CF parameters, and explain how the relevant thermodynamic observables are computed. Readers already familiar with this, can proceed directly to Sec. III, where we detail our proposed ML framework to estimate Stevens parameters from thermodynamic quantities. In Sec. IV and Sec. V, we demonstrate and test our ML approach on synthetic and experimental data, respectively, and Sec. VI provides a summary.

II. CRYSTAL FIELD THERMODYNAMICS IN RARE-EARTHS
In this section, we provide the necessary background to perform a quantitative analysis of CF effects on thermodynamic observables in rare-earth materials. We begin by describing the single-ion approximation, which assumes that interactions between different rare-earth ions are negligible. This approximation is often justified by the hierarchy of interactions that exist in rare-earth intermetallics [9]. Focusing on the ground state multiplet of a single-ion with a definite total angular momentum J, we show how to expand the CF Hamiltonian for a given J and point symmetry group G in terms of operator equivalents, as first introduced by Stevens [5].
Straightforward diagonalization of the Hamiltonian matrix together with elementary statistical mechanics calculations, then yield the thermodynamic observables, (i) specific heat c M , (ii) magnetic susceptibilty χ a (along direction a), and (iii) magnetization µ a in finite applied magnetic field B a . This calculation explicitly shows the (forward) mapping from a set of CF parameters to thermodynamic observables. These thermodynamic observables are then fed into the input nodes of a CNN that "learns" the inverse mapping from the observables to the CF parameters as output.

A. Single-ion approximation
In the single-ion approximation one neglects the interaction between different rare-earth ions, which is often justified because the 4f electrons are strongly localized. This leads to a relative weakness of 4f -4f exchange interactions compared to 3d-3d and 3d-4f interactions [9], and an often weak hybridization between the localized 4f electrons and delocalized conduction electrons. The single-ion description breaks down, for example, when Kondo or Rudermann-Kittel-Kasuya-Yosida (RKKY) interactions play an important role in the magnetism of the system. Our analysis in the following is therefore restricted to parameter regimes, where both Kondo and RKKY interactions are weak effects, which is typically the case at not too low temperatures T T K , T RKKY , where T K (T RKKY ) refer to Kondo and RKKY temperatures scales.
In the single-ion approximation, one describes the 4f electronic part of the system by a non-interacting collection of Hamiltonians for single rare-earth ions in a 4f N configuration which each take the form [8,9] Here, H Coulomb and H SOC describe the isotropic Coulomb and spin-orbit interactions among the N 4f electrons, which are the dominant energy scales. They enforce the three Hund's rules in the 4f N configuration of the rareearth ion, S = 1 2 (2 +1−|2 +1−N |), L = S(2 +1−2S), and J = L±S. The resulting ground state is then a 2J +1 degenerate multiplet. Here, = 3 is the orbital angular momentum of a single f electron, S (L) are the total spin (orbital) angular momentum quantum numbers and J is the total angular momentum quantum number. The third Hund's rule enforces J = L + S for more than halffilled 4f shells, N ≥ 2 + 1 [8,10].
The third term in Eq. (2.1) describes the Zeeman coupling to an external magnetic field B, where µ B is the Bohr magneton and L = N i=1 l i and S = N i=1 s i denote total orbital and spin angular momenta of the N electrons in the 4f N configuration. In the following, we will assume that spin-orbit coupling dominates over Zeeman energy and use the Russell-Saunders LS-coupling scheme to express the Zeeman Hamiltonian using the total angular momentum J = L + S as Here, we have introduced the g-factor with angular momentum quantum numbers J, L, S corresponding to the magnitude of the operators J , L, S, respectively. Finally, the last term in Eq. (2.1) denotes the CF potential, which can be expanded in a series of (single-particle) irreducible tensor operators as [8] Here, the functions B q k (r) depend on the radial coordinate only, and C (k) are related to the spherical harmonics. Both sets of operators, B q k (r) and C (k) q (θ, φ), act on the coordinates r i of individual electrons in the f -shell. Note that the summation of k is restricted to k = 2, 4, 6, as we anticipate to evaluate matrix elements of V CF only within a single 4f N configuration. This excludes odd values of k by parity considerations. Higher values of k > 6 are excluded from the triangular condition k ≤ 2 of the Clebsch-Gordon coefficients (or Wigner 3j symbols), which arise when performing an integration over products of three spherical harmonics [43]. Finally, we have also excluded the k = 0 term as it amounts to an unimportant constant energy shift.

B. Operator equivalents in crystal field Hamiltonians
The evaluation of matrix elements of the CF Hamiltonian in the limited subspace of a 4f N electronic configuration of a single rare-earth ion is made easier by the method of operator equivalents introduced by Stevens [5]. First, within a fixed 4f N manifold, the radial operators can be replaced by their expectation values in the 4f states, which defines the (single-particle) Stevens coefficients B q k ≡ B q k (r i ) 4f . Since the precise form of the wavefunction is difficult to determine, a theoretical calculation of the Stevens coefficients from first-principles is notoriously challenging [25]. The B q k are therefore best regarded as phenomenological coefficients that are obtained from a comparison of calculated physical observables to experimental data.
The method of operator equivalents [5,7,[44][45][46] relates matrix elements of (the sum over) irreducible tensor operators within a 4f N configuration to matrix elements of expressions that depend on angular momentum operators l i : Here, a k is an k (and l i ) dependent coefficient. The operator expressions on both sides transform under rotation according to the same irreducible representation of the continuous rotation group. This condition in fact defines the "operator equivalent" of the irreducible tensor operator on the left-hand side. The operator equivalents C (k) q (l i ) can be obtained by converting the functions C (k) q (θ, φ) into Cartesian coordinates, (x, y, z) = (sin θ sin φ, sin θ cos φ, cos θ), symmetrizing monomials (e.g., xy → (xy + yx)/2), and replacing (x i /r i , y i /r i , z i /r i ) → (l x , l y , l z ) i . The proportionality of the matrix elements in Eq. (2.6) relies on the fact that the rotation group is continuous and, loosely speaking, matrix elements for any point on the sphere can thus be obtained from those at a single, fixed point via rotation. The proportionality factors a k essentially account for the difference of the matrix elements at the single reference point. The a k are independent of q due to the Wigner-Eckart theorem [43]. In the literature, the proportionality coefficients are typically denoted as a 2 = α l , a 4 = β l , and a 6 = γ l . For l i = 3 corresponding to 4f rare-earth ions, one finds the values a 2 = −2/45, a 4 = 2/495, and a 6 = −4/3861 [6,9,32].
Here, we restrict our analysis to the (2J + 1)dimensional ground state multiplet of the 4f N configuration that obeys the three Hund's rules. Combining individual angular momenta to the total orbital angular momentum L = N i=1 l i and considering spin-orbit coupling within a fixed LS term, leading to total angular momentum J = L + S, one can derive a similar "operator equivalent" relation as Eq. (2.6) for matrix elements taken within a particular J multiplet Here, the coefficients b k depend on k as well as on the quantum numbers l i , L, S, J. Like the a k , they are independent of m due to the Wigner-Eckart theorem. The values of the b k for the ground state multiplets of the R 3+ rare-earth ions can be found in the literature, where they are commonly denoted as b 2 = α J = θ 2 , b 4 = β J = θ 4 and b 6 = γ J = θ 6 [8,9].
Using the operator equivalence in Eq. (2.7), one can thus express the CF Hamiltonian acting within the (2J + 1)-dimensional ground state multiplet as Here, we have introduced the Stevens coefficients B q k = b k B q k that depend on the radial expectation values through B q k ≡ B q k (r i ) 4f (see Eq. (2.4)). We regard both B q k and B q k as phenomenological parameters that are determined by comparing theoretical calculations of physical observables to experimental results.
In the following, we will use the CF Hamiltonian of the form in Eq. (2.8). We note that in the literature it is common to use the so-called Stevens operator equivalents O q k (J ) [9,23,32,47], which are based on the tesseral harmonics (real and imaginary parts of the spherical harmonics). We denote the corresponding Stevens coefficients multiplying O q k as B q k,Stevens in the following. The Stevens operators employ a different normalization convention than the irreducible tensor operator equivalents C (k) q (J ). This requires using k and q dependent factors K q k relating the C [47]. The factors K q k can be found, for example, in Ref. [47], but can also be easily derived by direct comparison of the operator matrices [31,32]. The main disadvantage of the Stevens operators O q k (J ) is that they do not obey the Wigner-Eckart theorem. Their matrix elements are explicitly tabulated [32,45,48].

C. Stevens crystal field parameters
In this section, we describe the convention of Stevens parameters that we use in the following and their relation to other common definitions. Following Lea, Leask, Wolf [23] and Walter [24], it is convenient to perform a transformation from the Stevens parameters {B q k } in Eq. (2.8) to a set of Stevens coefficients {x 0 , . . . , x NSt−1 }.
Here, x 0 describes the overall energy scale of the CF splitting (note that x 0 can be negative). The dimensionless parameters {x 1 , . . . , x NSt−1 } fulfill |x i | ≤ 1 and describe the relative weight of the different Stevens parameters B q k .

Cubic symmetry
Let us explicitly describe the transformation from {B q k } → {x i } for cubic symmetry. The derivation easily generalizes to arbitrary point groups G. For the cubic point groups G = {m3m, 432,43m}, the CF Hamiltonian contains two independent Stevens parameters, N St = 2, and reads Let us first normalize each operator that multiplies a particular Stevens coefficient

C Stevens crystal field parameters
Normalization of O (4) and O (6) can be achieved by dividing by the sum of squared eigenvalues are the eigenvalues of the operator O (k) . Specifically, we define the scaled operators and express the CF Hamiltonian as (2.13) As anticipated above, the scale parameter x 0 , which can be positive or negative, sets the overall energy scale of the CF splitting. The dimensionless weight parameter x 1 describes the ratio of Stevens coefficients and lies in the interval −1 ≤ x 1 ≤ 1. The ratio B 4 4 /B 4 6 = 0 corresponds to x 1 = 0, whereas B 4 4 /B 4 6 → ±∞ corresponds to x 1 = ±1. The exact relation between the two sets of Stevens parameters {B q k } and {x i } depends on the value of J and can b easily rederived from Eq. (2.13).

Hexagonal symmetry
For hexagonal site symmetry with point symmetry groups G = {6m2, 6/mmm, 6mm, 622}, the CF Hamiltonian contains four independent Stevens parameters, N St = 4, and reads Here, q . Finally, we express the CF Hamiltonian in terms of the normalized operators as As before, x 0 describes the overall energy scale, whereas the weight parameters −1 ≤ x 1 , . . . x 3 ≤ 1 describe the relative weight of the four Stevens parameters B q k . Note that we have split off the sign of x 0 . This turns out to be advantageous in the ML calculation described below as it makes the overall scale prefactor |x 0 | strictly positive. This comes at the cost of introducing an additional parameter, x 4 , defined as sign(x 4 ) := sign(x 0 ). Only the sign of x 4 enters the Hamiltonian.

Tetragonal symmetry
For tetragonal site symmetry with point symmetry groups G = {4mm, 4/mmm}, the CF Hamiltonian contains five independent Stevens parameters, N St = 5, and reads in the second line, which we then normalize as in Eq. (2.16).
Finally, the CF Hamiltonian is expressed in terms of the normalized operators as In addition to the scale parameter x 0 , the Hamiltonian contains four bounded Stevens parameters −1 ≤ x 1 , . . . , x 4 ≤ 1. Like in the hexagonal case, we have split off the sign of x 0 explicitly and introduced an additional parameter, x 5 , as sign(x 5 ) := sign(x 0 ). The Hamiltonian only depends on the sign of x 5 .

D. Thermodynamic observables
In this section, we describe how to obtain the thermodynamic observables of interest: magnetization (per rare-earth ion), µ(T, B), in finite magnetic field, magnetic susceptibility χ a (T ) along direction a, and specific heat c M (T ). We calculate these quantities starting from the Hamiltonian (2.1) of a single rare-earth ion in a magnetic field B and exposed to a CF with point symmetry G. From our discussion above, we know that the Hamiltonian projected onto the ground state multiplet with total angular momentum J reads Here, J = (J x , J y , J z ) denotes the total angular momentum operator, the g-factor g JLS is explicitly given in Eq. (2.3), and the form of the CF Hamiltonian is constrained by the point group G. The method we describe in the following can be used for any point group G, but we will focus on the experimentally common cases of cubic, hexagonal and tetragonal crystal symmetry with point groups G that were discussed in Sec. II C.
In the basis of J z eigenstates, J z |m J = m J |m J , the Hamiltonian H J is a (2J + 1) × (2J + 1) dimensional matrix that can be easily diagonalized, with field-dependent energies E n and eigenstates In Fig. 1(a), we show the resulting energy spectrum for J = 4 and cubic CF (G = m3m) as a function of the dimensionless Stevens parameter x 1 , see Eq. (2.13). The level spectrum is presented both at zero magnetic field and at a finite field of B z = 1 T. In zero field, we observe that for x 0 > 0, the ground state is a singlet (triplet) for . For x 0 < 0, the ground state is a triplet for x 1 < −0.57, a doublet for −0.57 < x 1 < 0.74 and a singlet for x 1 > 0.74. The level degeneracy is split in an applied magnetic field and one observes the emergence of several (avoided) level crossings. A similar behavior is observed for other integer values of the angular momentum quantum number J with lower degeneracies in the case of lower-symmetry CFs. For half-integer J, the levels are at least doubly degenerate in the absence of an external magnetic field due to Kramers theorem.

Magnetization and magnetic susceptibility
The magnetization per single rare-earth ion along direction a is given by with partition function Z = Tre −βH J and magnetic field along direction a. In Fig. 1(b), we show the magnetization µ z as a function of B = (0, 0, B) T for different temperatures T in a cubic CF (G = m3m). The Stevens parameters are chosen to be x 0 = 20 K and x 1 = 0.5 such that the ground state is a triplet with J z = ± 5 2 , 0. The magnetization µ z thus increases linearly at low fields with a slope that increases Curie-like as 1/T . The magnetization saturates at a saturation magnetic field value B sat that increases with temperature. At the lowest temperature, T = 1 K, the saturation occurs at B sat (1 K) 3 T. The saturation value of the magnetization is given by where |0 is the ground state in magnetic field. Here, we have chosen L = 5 and S = 1 such that g JLS = 4/5, which corresponds to the rareearth ion Pr 3+ . Note that µ sat z deviates slightly from the expected value of 5/2 × 4/5 = 2, where 5/2 is the expectation value of J z in the triplet ground state in small fields, due to field induced mixing of higher levels.
The magnetic susceptibility is obtained at small magnetic fields from the slope (2.24) Its behavior at low temperatures is determined by the ground state degeneracy [10]. If the ground state is a singlet, it is of van-Vleck type, Ei−E0 , and becomes temperature independent at temperatures much smaller than the energy gap to the first excited state, , if the ground state degeneracy is larger than one. In Fig. 1(c), we show the susceptibility χ z as a function of temperature for the case of a triplet ground state, where it follows a characteristic Curie-like behavior

Specific heat
The specific heat in zero magnetic field is calculated from where the average is performed with respect to the CF eigenstates e −∆/k B T , at temperatures below the gap to the first excited state, k B T ∆. As shown in Fig. 1(c) for J = 4 and G = m3m, it exhibits a Schottky anomaly peak at higher temperatures, whose position and weight yields direct information about the size of the gap to the excited states and the relative degeneracies of the ground and excited state levels. Note that excited state levels higher than the first often occur nearby in energy and thus contribute to the specific heat as well.

III. CNN APPROACH FOR FINDING CRYSTAL FIELD PARAMETERS
In this section, we describe the method of using a twodimensional CNN to determine the Stevens parameters {x i } for a given angular momentum J and CF symmetry group G from thermodynamic observables. Our goal is The fully saturated moment is reduced from the value in the ground multiplet at x1 = 0.5 is µz/µB = 5/2 due to field-induced mixing into other states. (c) Magnetic susceptibility along z-axis (per rare-earth ion) χz/µB = µz/(µBB) (blue, left y-axis) and specific heat cM /kB (green, right y-axis) as a function of temperature T . The magnetic field is fixed to B = 10 −4 T when computing χz. The Stevens parameters are identical to panel (b). The inset shows the inverse susceptibility χ −1 z = µBB/µz, highlighting the Curie-like behavior that occurs over the full temperature range due to the triplet ground state. The specific heat cM /kB shows a Schottky-like peak at a position proportional to the level splitting between the ground multiplet and the excited states ∆ = 40 K. Note that here the first excited state is a singlet and the contribution of the next higher triplet level (at about 80 K) is significant.
to build a ML model that can be fed with experimental data and accurately predict the underlying Stevens coefficients that characterize the material, thereby circumventing a time-consuming data fitting procedure. One therefore places thermodynamic data on the input nodes of the network and obtains the set of Stevens coefficients as output. We choose the input data of the network to be from observables that are experimentally readily available: magnetization µ a (T, B a ), magnetic susceptibility χ a (T ), and magnetic specific heat c M (T ). To train the network, we require a sufficiently large dataset that we generate by calculating the thermodynamic observables for random choices of Stevens parameters within the single-ion approximation as described in Sec. II. Comparing different network architectures, we found it to be advantageous to perform a wavelet transformation on the data before feeding it into the network. In the following, we describe the details of the training data generation and the network architecture and parameters.
A. Training data generation

Thermodyamic data generation
A training data set contains the following three types of observables, which are calculated for a fixed choice of angular momentum J and point group G, and randomly sampled Stevens coefficients {x i }, using the approach detailed in Sec. II.
(i) Magnetization per single rare-earth ion, µ a , along direction a as a function of external magnetic field B a for fixed temperature T . We choose a magnetic field range between B min = 0 T and B max = 10 T, and three temperatures T j = 1, 17, 300 K to represent the behavior in typical field and temperature ranges which are easily accessible experimentally. We use N B steps = 64 equally spaced magnetic field points Depending on the CF point symmetry group G, we choose different highsymmetry directions-only one high-symmetry direction a = [001] ≡ z (two high-symmetry directions, a = {[100], [001]}) for cubic (tetragonal and hexagonal) symmetry. Combined with the three temperature values T j , this corresponds to three (six) sets of magnetization data: µ a (B a , T j ). Other choices of directions are of course possible, but we wanted to keep the size of the input data set as small as possible to keep the experimental work necessary to obtain it at a minimum.
(ii) Magnetic susceptibility χ a (T ) along direction a as a function of temperature T . We choose a temperature range between T min = 1 K and T max = 300 K using  Fig. 1(b,c) for which J = 4, G = m3m, x0 = 20 K, and x1 = 0.5. The plots show the CWT coefficients, using the real Morlet mother wavelet (3.4), as a function of scale s and physical index b (or t). We choose linearly spaced scales from smin = 1 to smax = 64, and also use 64 equally spaced points along the "physical" dimension, corresponding to the magnetic field, b (see Eq. (3.1)), and the temperature, t see Eq. (3.2)). In addition to these four scaleograms we also provideμz,T =300 K to the CNN, resulting in a total of five scaleograms to be layered into one training sample. For the lower symmetry point groups, we provide magnetization and susceptibility along both [100] and [001] directions, resulting in a total of nine scaleograms in one training sample. N T steps = 64 equally spaced temperature points We use the same directions a for the susceptibility and the magnetization, corresponding to one (or two) sets of susceptibility data.
(iii) Magnetic specific heat c M (T ) as a function of temperature T . We use the same temperature range and step size as for the susceptibility, see Eq. (3.2).
One training sample therefore consists of five (nine) different sets of thermodynamic data. A complete training sample for J = 4, G = m3m and x 0 = 20 K, x 1 = 0.5 is shown in Fig. 1(b, c). To obtain the training data set, we draw the Stevens parameters randomly from a uniform distribution and, for each of these sampled values, compute the aforementioned observables. To generate sufficient training data for the network, the process takes 2-3 hours. The wavelet transform described in the next subsection is included in this time frame. Note that within our convention there exist N St CF parameters for the cubic and N St + 1 CF parameters for the hexagonal and tetragonal cases. While x 0 can take either sign in the cubic case, it is strictly positive for hexagonal and tetragonal systems. In the latter cases, only the sign of the last Stevens parameter sign(x NSt ) enters the Hamiltonian.

Continuous wavelet transform
After comparing different network architectures (see more details below in Sec. III B), we have found it to be advantageous to first perform a continuous wavelet transformation (CWT) of the "raw" thermodynamic data before feeding it into a two-dimensional (2D) CNN. The reason is that CNNs are well suited to model data with an image-like structure like the wavelet scaleograms that are produced by the CWT. Similar to a Fourier transform, a CWT is used to perform a harmonic analysis and decompose a signal into its fundamental frequencies.
The advantage of a CWT is that it produces a sparse representation of the data by providing localization in both frequency and "time" domain, with the main features of the data appearing in only a (small) subset of all CWT coefficients. This property is key for applications in data compression and denoising [49]. We find that it also enables superior performance of a 2D CNN compared to placing a "raw" data vector of linear size 5 × 64 = 320 (or 9 × 64 = 576) on the network input nodes. Here, 5(9) corresponds to the number of thermodynamic observables and 64 to number N B steps = N T steps of values of B a and T , respectively.
A CWT of a discrete and equally spaced 1D data set of size N f , corresponds to performing the following transformatioñ Here, ψ(t) is the so-called mother wavelet function, which is translated by parameter t and scaled by the scale parameter s. The scale s can be regarded as a period or inverse frequency. We choose a mother wavelet function of the real Morlet form [50] ψ(t) = e − t 2 2 cos(5t).
We use a linearly spaced set of scales, 1 ≤ s ≤ 64, which emphasizes the low frequency behavior of the data in comparison to using a geometric spacing. We have explicitly checked for a few cases that the CNN performs equally well if we use geometrically spaced scales. By calculating the convolution of this family of translated and scaled wavelets with our original data, we perform a frequency analysis that provides additional insight into the changes over the "time" domain, which corresponds to temperature or magnetic field in our data. We apply a CWT to each of five (or nine) 1D observable data sets, {µ a (T, b i ), χ a (t i ), c M (t i )}, to produce a total of five (or nine) 2D wavelet scaleograms of size 64 × 64. In Fig. 2, we show four of the five scaleograms corresponding to the "raw" data in Fig. 1(b,c), which depict the associated (real) CWT coefficientsf defined in Eq. (3.3). It clearly distinguishes regions with small and large wavelet coefficients, which is a characteristic of the underlying "raw" data set. The peak position and some broad characteristics of the original data, for example, whether a function approaches zero or a finite value at the boundary (minimal and maximal T and B values) can also be recognized in the scaleograms. Specifically, the peak position corresponds to a region with large wavelet coefficients at small scales s (i.e. large frequencies), because the underlying function varies most rapidly close to the peak (seec M in Fig. 2, for example). Nonzero data values at boundaries (minimal and maximal T or B values) lead to pronounced peaks in the scaleogram at small s. The origin of this phenomenon is that zeros are padded to the dataset at both edges, which results in discontinuities at the boundary that show up as scaleogram peaks at small s. In Fig. 2, we can thus clearly distinguish observables that peak at low temperatures χ z ∝ 1/T (left boundary) from those that peak at the right boundary µ z,T=17 K . All five (nine) scaleograms are stacked into a multi-channel image and then placed on the input nodes of a 2D CNN, whose architecture is described in the next subsection.

B. Convolutional neural network
We employ a CNN architecture that is based off the LeNet-5 architecture [51], which we scale up to be appropriate for the form of our training dataset. Initial experimentation with alternative architectures, such as simple feed-forward networks and 1D CNNs, yielded significantly worse results. To make a fair comparison, we created architectures that had approximately the same number of parameters as the 2D CNN and were trained on the same data. Applying to the cubic case with two Stevens parameters, x 0 and x 1 , we find the 1D CNN a factor of 2 worse for x 0 and a factor of 7 worse for x 1 than the 2D CNN. The feed-forward deep neural network performed a factor of 1.25 worse for x 0 and a factor of 9 worse for x 1 than the 2D CNN. It is expected that the performance difference is enhanced in the lower symmetry cases with more Stevens parameters to predict, which is why we chose to use the 2D CNN. As illustrated in Fig. 3, the input of the network is a five (nine) channel image containing the five (nine) scaleograms created using the CWT. We center the input data by subtracting the mean of each channel so that the distribution of the input "pixel" values has zero mean. We similarly normalize the target data {x i }, as the coefficients x 0 and x i with i = 0 have significant size differences and different dimensions. Typically, |x 0 | ∈ [0.5, 50] (in units of K), while the dimensionless x i ∈ [−1, 1] for i > 0.
The centered input CWT scaleograms are fed into three sets of convolution and max-pooling layers. Each convolution layer has two identical 2D convolution layers with a kernel size of 3 × 3 and a stride of 1 × 1. This increases the number of channels and allows extracting data features. The max-pooling layers have a pool-size and stride of 2 × 2, which essentially corresponds to a down-sampling of the image by a factor of two. Each convolution layer uses the ReLU activation function [52] and has batch normalization.
The final max-pooling layer is flattened and fed into two fully-connected layers, each with the ReLU activation function and 30% dropout. These layers feed into a fully-connected output layer whose width corresponds to the number of independent Stevens parameters N P + 1. Here, N P = N St − 1 (N P = N St ) for the cubic (hexagonal, tetragonal) case due to the procedure of splitting off the sign of x 0 into an additional parameter. The output layer uses a linear activation function. The N P + 1 out-put values are the prediction of the CNN for the Stevens coefficients {x 0 , . . . , x NP }, The total number of trainable parameters in the network is 13, 702, 818 1.3 × 10 7 . This is substantially larger than the number of trainable parameters in the original LeNet-5 architecture [51]. This is due to the high dimensionality of the fully-connected layers in our version of the network. A larger number of parameters means that the network will be able to form more complex relationships between the features and the targets. However, we run the risk of over-parameterizing the network, resulting in a model that overfits-that is, it performs very well on the training data but poorly on unseen data. By applying normalization and dropout throughout the network we mitigate this issue. We build the network using Keras [53] and train it on Nvidia Volta V100S graphic processing unit (GPU). We use the Adam optimizer [54] with the recommended parameters to minimize the mean squared error (MSE) loss function, (3.5) wherex i is the network's prediction for x i . We choose the stochastic gradient descent optimization algorithm Adam to avoid as much as possible the trapping in local minima of the cost function. As shown below, the behavior of the quality of the CNN predictions (as described by MSE) across different input parameters can be largely understood on physical grounds such as arising from energy level crossings, from the smallness of certain CF parameters or from the ratio of the bandwidth to the maximal temperature scale. This indicates that the CNN is not trapped in local minima. In general, the inverse problem that the CNN addresses may be ill-defined and allow for multiple solutions. This issue can be (partially) addressed in practice by providing more data to the CNN such as enlarging the field and temperature range and/or by including magnetization and susceptibility data along different field directions.
Let us finally describe the resource cost of training the network. Using

IV. CNN RESULTS
In this section, we present results and measure the performance of CNNs predicting Stevens coefficients for three different point groups. We choose groups in cubic, hexagonal and tetragonal crystal classes that are of experimental relevance: m3m,6m2 and 4mm. These groups allow for N St = 2, 4, 5 independent Stevens parameters, respectively. The complexity of the task to find Stevens parameters from thermodynamic data increases when lowering the symmetry. We consider both integer and half-integer values of the total angular momentum quantum number J, and find that our method works equally well in both cases. For concreteness, we investigate J = 4 and J = 15/2, which correspond to the ground state values of the rare-earth ions Pr 3+ (J = 4) and Er 3+ (J = 15/2).
For a given point group G and value of J, we train a CNN using the training data described in Sec. III A. The input thus corresponds to five (nine) thermodynamic observables for cubic (hexagonal and tetragonal) point groups, and the output of the network are the N P + 1 Stevens parameters, {x 0 , . . . , x N P }. We use two performance measures: (i) the mean absolute error (MAE) of the network's prediction of the Stevens coefficients Here,x i is the prediction of the network, x i is the true Stevens coefficient that is used to generate the data placed on the input nodes, and N test is the size of the testing dataset. The MAE is related to the loss function (3.5) used to train the network. (ii) The mean squared error (MSE) of two thermodynamic data sets {µ a , χ a , c M } generated byx i and x i , respectively: Here, M = 5 × 64 = 320 (9 × 64 = 576) for cubic (hexagonal, tetragonal) point groups is the length of the thermodynamic dataset andŌ ν runs over the five (nine) experimental observables {µ a,Tα (b i ), χ a (t i ), c M (t i )} as a function of temperature t i and magnetic field b i (see Sec. III A) that are obtained for a given choice of Stevens parameters. To account for the differences in size and units between observables, we first normalize each dataset by their mean and perform Eq. 4.2 on the resulting dimensionless quantities. The MSE measures the performance of the CNN in reproducing the desired (input) thermodynamic data set that was generated using {x i }. We include this metric as the sensitivity of the error in the observables (MSE) with respect to the error in the Stevens parameters (MAE) depends on the values of the {x i }, and the MSE thus contains additional information about the networks performance. Unless otherwise noted, both MAE and MSE are evaluated on a testing data of size N test = 4 × 10 3 that was not shown to the network during training.
In the following, we separately discuss the performance of the CNNs for the cubic, hexagonal and tetragonal point groups.

A. Cubic point group symmetry
We consider the case of cubic point group G = m3m and J = 4, which is applicable to cubic Pr rare-earth compounds. The energy level diagram for this case is shown in Fig. 1(a) and exhibits singlet, doublet and triplet ground states, depending on the sign of x 0 and the value of x 1 . As shown in Fig. 4, the CNN accurately predicts the two independent Stevens coefficients {x 0 , x 1 } with error values of MAE(0) = 0.321 K and MAE(1) = 0.012. Note that we choose the energy range of 0.5 K ≤ |x 0 | ≤ 50 K. The color code and the inset show the MSE, which lies at MSE = 0.053 on average. The results in Fig. 4 show the predictions of two networks: one is trained with strictly positive x 0 ∈ [0. 5,50], and a second one is trained with strictly negative x 0 ∈ [−50, −0.5]. When applied to a given testing example, which has a definite but unknown sign of x 0 , the performance of the network that was trained on data with the same sign of x 0 as the testing example is typically much better and can be easily identified. Here, we show results for testing examples that have a known sign for simplicity, i.e., the positive (negative) network is tested on samples with positive (negative) x 0 .
In Fig. 5, we visualize the distribution of MSE as a function of the two Stevens parameters x 0 and x 1 . We clearly observe that the MSE is larger in regions where x 0 is small. This occurs as the energy level spectrum collapses in this limit, with all levels being smaller than (or comparable to) the minimal thermal energy k B T min at T min = 1 K. In this regime, thermodynamic data cannot resolve the order of the levels. We also find an increased MSE along the lines x 1 0.35 for positive x 0 > 0 and x 1 = −0.6 and x 1 = 0.75 for negative x 0 < 0. This follows from the fact that the ground state energy exhibits level crossings in these parameter regions, as shown in Fig. 1(a). This makes the thermodynamic observables quite sensitive to small errors in the Stevens parameters as the nature of the ground state changes between singlet, doublet and triplet states. As a result, the MSE is enhanced even though the MAE is still small and the Stevens coefficients are predicted with high accuracy.

B. Hexagonal point group symmetry
We also consider the case of hexagonal point group G =6m2 and J = 15/2, which is applicable to hexagonal Er rare-earth compounds. Being a half-integer value of J, the energy level exhibits Kramers degeneracy in the absence of a magnetic field. The number of independent Stevens coefficients for this point group is N St = 4 (see Eq. (2.15)). We split off the sign of x 0 into a separate parameter x 4 such that sign(x 4 ) = sign(x 0 ). This allows us to consider the parameter x 0 ≡ |x 0 | to be strictly positive. The training data sets thus contains the five coefficients, {x 0 , . . . , x 4 }, with strictly positive x 0 > 0 and only the sign of x 4 entering the Hamiltonian. We The heat map shows a total number of Ntesting = 4 × 10 3 data points. Regions with larger MSE occur when x0 becomes comparable to the temperature probed (x0 Tmin = 1 K) and when there is an energy level crossing involving the ground state [see Fig. 1(a)]. While it becomes impossible to predict the coefficients if x0 Tmin as the spectrum collapses, the increased MSE at the position of level crossings rather indicate an enhanced sensitivity of the observables with respect to small errors in the {xi}, which are still accurately predicted by the network (see Fig. 4).
consider the scale parameter to be in the region 0.5 K ≤ x 0 ≤ 50 K and −1 ≤ x i ≤ 1 for i ≥ 1.
As described in Sec. III A, the training data contains in addition to the specific heat c M , the magnetization µ a and susceptibility χ a along both a = [100] and a = [001] directions. This provides information about the anisotropy between the ab plane and the c axis in the system, and is necessary for the CNN to be able to predict the parameter x 3 . A training data set thus con- Here, we set T α = 1, 17, 300 K and consider the magnetic field range 0 ≤ B a ≤ 10 T and temperature range 1 K ≤ T ≤ 300 K as described in Sec. III A.
As shown in Fig. 6, the CNN can accurately predict the Stevens parameters with {MAE(0), . . . , MAE(3)} = {1.325 K, 0.024, 0.031, 0.073}. The sign of the fifth parameter x 4 was correctly found in 96% of the cases (see inset in the left panel in Fig. 6). The color corresponds to the MSE of each testing data set. A histogram of the MSE values is included in the right most panel. The average MSE over all testing data sets is MSE = 0.280. In general, the MAE increases slightly with larger values of x 0 , which can be understood from the fact that the thermal energy is not sufficient to probe higher lying levels. This could likely be improved by enlarging the temperature range by increasing T max . The CNN performs worst for the x 3 coefficient, in particular when this parameter is small. This parameter contains information about the anisotropy between ab and c axis directions as well as between directions within the ab-plane. The CNN predictions of x 3 could thus likely be improved by providing additional magnetization data along a second, inequivalent direction in the ab plane. Finally, we note that quantitatively similar results were obtained for the integer case of J = 4, showing that the method works equally well for integer and half-integer values of J.

C. Tetragonal point group symmetry
We study the performance of the CNN for the tetragonal point group G = 4mm and half-integer J = 15/2, which corresponds to tetragonal Er rare-earth systems. In this case, the CF allows for N St = 5 independent Stevens parameters. Since we split off the sign of x 0 , the CNN actually predicts six parameters {x 0 , . . . , x 5 }, where 0.5 ≤ x 0 ≤ 50 (in units of K), −1 ≤ x i ≤ 1 for i ≥ 1 and the training data depend only on the sign of x 5 . A training data set contains nine scale-ograms obtained from specific heat c M , magnetization µ a and susceptibility χ a along a = [100], [001] directions: {µ a,Tα (B a ), χ a (T ), c M (T )}. The temperature and field ranges are 1 K ≤ T ≤ 300 K and 0 ≤ B a ≤ 10 T. Providing information about the anisotropy between the ab plane and the c axis is necessary for the CNN to be able to learn the dependence on the parameters x 2 and x 4 .
As shown in Fig. 7, the CNN can accurately predict the Stevens parameters for the majority of the data points that it was tested on. The MAEs of the Stevens coefficients read {MAE(0), . . . , MAE(4)} = {1.380 K, 0.022, 0.038, 0.031, 0.059}. The sign of x 5 was correctly predicted by the network in 93% of the cases. The average MSE is given by MSE = 0.248. The overall performance is comparable to the hexagonal case of 6m2, even though the tetragonal case exhibits one more Stevens parameter. Similar to the hexagonal case, the error is larger for larger values of x 0 , which likely stems from the fact that the bandwidth of the spectrum becomes larger than the maximal thermal energy k B T max . This suggests increasing the temperature range in the training data. The MAE of different coefficients is comparable. The largest MAE occurs for the parameter x 4 , which measures the anisotropy of the system, both between ab and c directions as well as within the ab plane. It could likely be better predicted by adding magnetization data along another inequivalent direction in the ab plane to the training sets. Finally, we note that we have applied the algorithm to the case of J = 4 and obtained quantitatively similar results.

V. APPLICATION TO EXPERIMENTAL DATA
The ultimate application of the presented CNN algorithm is to extract CF parameters from real experimental data. We provide all required programs as opensource software [55]. In this section, we demonstrate this by applying the algorithm to three published experimental data sets: one Cerium and two Praseodymium- based rare-earth intermetallics: (i) CeAgSb 2 [28,42] and (ii) PrAgSb 2 [28], where the rare-earth ions Ce 3+ (J = 5/2) and Pr 3+ (J = 4) experience 4mm site symmetry, and (iii) PrMg 2 Cu 9 [30], where Pr 3+ exhibits6m2 site symmetry. Importantly, published data of magnetization, magnetic susceptibility and magnetic specific heat on the same single crystal are available for these systems [28,30,42]. Since experimental circumstances and parameters are different for each material, it is required to train a custom CNN for each case.
When applying the CNN algorithm to experimental data, one must first select the set of thermodynamic observables that are given to the network. In general, it is best to include as much data as possible, for example, magnetization along different directions, which informs the network about the anisotropy in the material. In addition, one must choose the temperature and magnetic field ranges. These will in general be different for each observable to ensure that the assumptions of the modeling, in particular the single-ion approximation, are valid. Once the set of experimental input data is determined (together with J and G), one generates a customized training data set using the same observables and parameter ranges. Finally, to use the experimental data as input data for the CNN, one first needs to transform from the experimental units to the units used in the training data. This is described in detail in Appendix A.
When selecting a suitable temperature window, one must ensure to avoid the occurrence of many-body phenomena such as the development of Kondo screening (by choosing T > T K ), magnetic order (T > T M ), or significant magnetic exchange interaction effects (T > T RKKY ), which are currently neglected in the modeling that generates the training data. We thus choose to apply our algorithm to the Praseodymium members of the RAgSb 2 and RMg 2 Cu 9 series, because they do not exhibit magnetic order down to 2 K (even though magnetic exchange effects may become noticeable at T 5 − 10 K already). On the other hand, they may exhibit some degree of J mixing [9,57], which is currently neglected in the training data generation. As this is avoided in the Ce member of the series, because Ce 3+ only contains a single f electron, we also investigate CeAgSb 2 within our approach. When including specific heat data, it is also important to realize that the magnetic part of c M is typically experimentally approximated by subtracting off the specific heat of a corresponding nonmagnetic compound. A nonmagnetic analogue material can often be obtained by replacing the magnetic rare-earth ion by a nonmagnetic one such as La, Y or Lu. The subtraction procedure is only valid when all other (i.e., the phonon and electronic) contributions to the specific heat in the two materials are identical. In practice, this restricts the temperature regime that can be used for (magnetic) c M in the algorithm. Similarly, one may need to subtract an enhanced Pauli susceptibility contribution, which arise from conduction electrons, from the magnetic susceptibility data.
We emphasize that these caveats are related to the physical modeling of the forward problem of computing (or experimentally extracting) observables. Our CNNbased approach of solving the inverse problem, however, is more generally valid and can, in principle, also be used in conjunction with more advanced and realistic physical models (that might, e.g., be able to capture magnetism or phonons). Magnetic exchange interactions could be Stevens parameters for PrAgSb2 obtained from CNN using {µz,T =2 K, χz(T ), cM (T )} from Ref. [28] as input data. The coefficients xi, B q k , and B q k,Stevens are defined in Sec. II. Note that the CNN is unable to learn the coefficients {x2, x4} since the experimental data set (and thus also the training data sets) only include susceptibility and magnetization along the z direction and thus does not include sufficient information about the magnetic anisotropy. This is a limitation of the experimental data set, and not of our deep learning approach, as shown in Sec. IV, where data along a second axis is included and {x2, x4} are predicted well. Finally, we note that Myers et al. report B 0 2,Stevens = 1.8 ± 0.3 K [28], close to what we find.
Based on these general considerations, we select Ce and Pr members of the RAgSb 2 series and PrMg 2 Cu 9 as suitable experimental systems to apply and test our deep learning algorithm in practical situations. The material CeAgSb 2 develops magnetic order at T = 9.7 K [42] and we thus restrict the temperature regime for which we generate training data to be between 10 K ≤ T ≤ 300 K. In contrast, both Pr compounds that we investigate remain paramagnetic down to T = 2 K and can be well described within the single-ion approximation over the complete temperature range from T min = 2 K to T max = 300 K.
For all three compounds, published data exists for magnetic specific heat c M as well as magnetization and susceptibility along the [001] axis (and also along the [100] axis in case of CeAgSb 2 ). We note that while there exists data for the Pr compounds in Refs. [28,30] with magnetic fields applied in the ab plane, these will not be included, because the exact in-plane direction was not ex-perimentally determined. This implies that the Stevens coefficients that describe the anisotropy in the ab plane cannot be determined for the Pr compounds.

A. CeAgSb2
The thermodynamic properties of tetragonal material CeAgSb 2 were studied in detail in Ref. [28,42,56]. This Ce-based Kondo lattice system orders ferromagnetically below T C = 9.6 K with moments aligned parallel to the c axis. Crystal fields were previously shown to play an important role in the material, leading to a peculiar magnetization behavior with moments ordering along the magnetically hard (c) axis, and saturation moments that are larger for fields lying in the easy (ab) plane (versus the c axis) [28,42,59,60].
We use experimental results reported in Ref. [42] as input data for the CNN. In Fig. 8, we show the complete experimental data set provided to the CNN (after unit conversion and wavelet transformation as described in Secs. A and III). It contains the magnetic specific heat c M between 13 K and 80 K, and magnetic susceptibility (versus T ) and magnetization (versus B at T = 20 K) with fields applied along the x ≡ [100] and z ≡ [001] axes. The figure compares the experimental data to the theoretical results obtained within the single-ion approximation that uses the values of Stevens coefficients {x i } predicted by the CNN (red) and reported by Takeuchi et al. in Ref. [42] (black dashed).
Overall, we find that the CNN predictions match the experimental data very well with a MSE = 0.17. The results obtained with values from Ref. [42] are characterized by the same MSE. While the values from Ref. [42] provide a slightly better fit to (the noisy data of) c M , the CNN predictions lead to a slightly improved fit of µ x (B) (see Fig. 8(d)). Ultimately, a better fit may require including effects of spin exchange, as done in Ref. [56] via a molecular field approach. Such an approach could be straightforwardly included in our algorithm, which is left for future work. Table I contains the values of the Stevens coefficients from the CNN, and compares them to two sets reported in the literature [42,56]. The val- FIG. 9. CNN application to PrAgSb2 (a-c) and PrMg2Cu9 (d-f). Panels show experimental data (blue) and CNN predictions (red) for input data {µz,T=2 K, χz(T ), cM (T )}. Experimental data is from Refs. [28,30]. Theory results are calculations (within the single-ion approximation ) using the CF parameter values {xi} predicted by the CNN (see Tab. II  and III). We find good agreement between experimental data and CNN predictions with a MSE = 0.074 (PrAgSb2) and MSE = 0.085 (PrMg2Cu9). Note that due to the lack of experimental data containing information about the magnetic anisotropy, the parameters x2, x4 (for PrAgSb2) and x3 (for PrMg2Cu9) cannot be properly learned by the CNN. Also note the small y axis scale in panel (d) due to a strong ab easy-plane anisotropy.
ues obtained from our CNN approach are closer to those reported in Ref. [56]. In all three sets, the coefficient . We note that B 0 2,Stevens is proportional to the difference in Curie-Weiss temperatures along different axes [61], which provides another useful validation check of the CNN results.

B. PrAgSb2 and PrMg2Cu9
We apply our CNN algorithm to two Pr based materials: tetragonal PrAgSb 2 and hexagonal PrMg 2 Cu 9 . Both materials remain paramagnetic down to T = 2 K, and are modelled within the single-ion approximation over the full temperature range from T = 2 to T = 300 K.
In Fig. 9, we show results of our CNN algorithm together with the experimental data for PrAgSb 2 (a-c) and PrMg 2 Cu 9 (d-f). The experimental data is taken from Refs. [28] (PrAgSb 2 ) and [30] (PrMg 2 Cu 9 ), respectively. The figures contain the complete experimental data that is provided as input to the CNNs (after unit conversion and wavelet transformation). The input thus consists of a three-channel scaleogram image obtained from {µ z,T=2 K , χ z (T ), c M (T )} in the magnetic field and temperature ranges shown in the figure, where z ≡ [001]) direction. The experimental data is compared to theoretical results within the single-ion approximation using the values {x i } predicted by the CNN. The numerical values of the Stevens coefficients obtained from the CNN are given in Table II and III. Ref. [28] reports a value of B 0 2,St. = 1.8 ± 0.3 K, which is close to the value 1.3 K predicted by the CNN. Values for the other coefficients were not reported in Refs. [28,30].
To validate the CNN predictions, we use the coefficients {x i } to calculate the thermodynamic observables (within the single-ion approximation). As shown in Fig. 9, we observe an overall very good agreement between the experimental data and the theoretical results for both compounds. The MSEs are 0.074 for PrAgSb 2 and 0.085 for PrMg 2 Cu 9 , respectively. We note that due to the lack of data containing information about the magnetic anisotropy, the parameters x 2 , x 4 (in tetragonal case) and x 3 (in hexagonal case) cannot be properly learned by the CNN. Its predictions are thus close to zero. As shown in Secs. IV and V A this is not a limitation of our CNN approach and could be remedied by providing experimental magnetization and/or susceptibility data for a second direction, e.g., in the ab plane. We note that Refs. [28,30] contain results for in-plane directions, but do not determine the precise in-plane direction.
This justifies our approach to employ the single-ion approximation to describe the material properties, and demonstrates that the CNN has converged to a physically viable solution. In both cases, the network captures the initial slope of the magnetization [note that small y axis scale in panel (d)]. It also correctly predicts both the Curie-Weiss slope of the inverse susceptibility at higher temperatures and the location of the Schottky peak in the specific heat. The latter is a direct indication that the energy level splitting between the ground and excited state has been successfully extracted from the data.

VI. SUMMARY AND OUTLOOK
To summarize, we present a deep ML algorithm for extracting CF Stevens parameters from thermodynamic observables of local-moment materials that can be treated within the single-ion approximation. We focus on rareearth intermetallics and train a CNN on input data of magnetization, susceptibility, and specific heat. The training data is obtained from straightforward statistical mechanics calculations for different, randomly sampled, values of the Stevens parameters. To exploit the ability of CNNs in image recognition, we process the raw thermodynamic data using a wavelet transform and feed the resulting multi-channel scaleogram image to the network.
The presented algorithm provides a convenient and powerful tool for extracting CF parameters from experimental data that avoids a tedious multiple parameter fitting procedure. We provide all programs necessary to run the algorithm and apply it to experimental situations as open source software [55].
The CNN provides an unbiased solution to the inverse problem of finding the Stevens parameters for a given set of thermodynamic observables. Depending on the type and amount of input data, this inverse problem can be ill-defined and allow for multiple solutions. Our study is an explicit test on the performance of CNNs on this well-known inverse physics problem of wide interest. We systematically investigate the performance of the algorithm for different site symmetries in the cubic, hexagonal and tetragonal crystal classes. The point groups we consider are experimentally relevant and allow for 2, 4 and 5 independent Stevens parameters, thus testing the CNN in cases of increasing complexity. We find that the CNN can accurately predict all Stevens coefficients if one provides magnetization data both along the easy-axis as well as within the easy-plane. The network performs equally well for integer and half-integer values of the total angular momentum J. Finally, we demonstrate that the algorithm also works well when applied to real experimental data of one Cerium system, CeAgSb 2 [42], and two Praseodymium compounds, PrAgSb 2 [28] and PrMg 2 Cu 9 [30]., which we obtain from the literature.
One promising future direction is to include correlation effects such as magnetic exchange interactions between different local moments within the modeling approach used to generate the training data. Magnetic exchange could, for example, be rather straightforwardly included via a molecular mean-field approach [56,58,62] (at the small cost of introducing an additional fit parameter describing the molecular field). Other interesting future directions are to apply more advanced ML techniques (generative adversarial networks [63], autoencoders) to lower symmetry CFs, to systematically investigate the stabil-ity of the algorithm with respect to input data noise, and to explore other choices of input observables, including low-symmetry magnetic field directions and direction averaged quantities. The latter would extend the applicability of the algorithm to polycrystalline materials.

Susceptibility
The magnetic susceptibility (or volume susceptibility) is experimentally obtained as χ = M/H, where M is the magnetization and H is the auxiliary field. Its SI and cgs units are [χ] = 4π(1) = 1emu/cm 3 . Note that it is dimensionless in SI units. It is distinguished from the molar susceptibility χ mol with units [χ mol ] = 4π10 −6 m 3 /mol = 1emu/mol. The two are related by χ = 4π ×10 −6 ρ M χ mol , where ρ is the mass density, M is the molar mass.
The training data reads χ train = µ α /(µ B B), which is the ratio of the induced magnetic moment per rare-earth ion µ a divided by the Bohr magneton µ B and a small magnetic field B = 10 −4 T. Note that the result is in-dependent of the value of B as we ensure that we are in the linear regime of µ a (B). This is related to the experimentally measured (volume) susceptibility via χ train = Vuc N R µ B µ0 χ, where χ is the magnetic susceptibility in SI units and V uc /N R is an effective volume per rare-earth ion such that the magnetization M a = µ a /(V uc /N R ). Here, V uc is the unit cell volume and N R is the number of rareearth ions per unit cell.
Combining the two transformations discussed above leads to where N f.u. is the number of formula units per unit cell.
One thus needs to multiply the experimental data for M/H (in emu/mol) by a factor of 1.79053 N f.u N R , before feeding it into the CNN.