# Learning crystal field parameters using convolutional neural networks

### Submission summary

 As Contributors: Peter P. Orth · Mathias Scheurer Arxiv Link: https://arxiv.org/abs/2011.12911v2 (pdf) Date accepted: 2021-06-25 Date submitted: 2021-05-19 03:53 Submitted by: Orth, Peter P. Submitted to: SciPost Physics Academic field: Physics Specialties: Condensed Matter Physics - Theory Condensed Matter Physics - Computational Approach: Computational

### Abstract

We present a deep machine learning algorithm to extract crystal field (CF) Stevens parameters from thermodynamic data of rare-earth magnetic materials. The algorithm employs a two-dimensional convolutional neural network (CNN) that is trained on magnetization, magnetic susceptibility and specific heat data that is calculated theoretically within the single-ion approximation and further processed using a standard wavelet transformation. We apply the method to crystal fields of cubic, hexagonal and tetragonal symmetry and for both integer and half-integer total angular momentum values $J$ of the ground state multiplet. We evaluate its performance on both theoretically generated synthetic and previously published experimental data on CeAgSb$_2$, PrAgSb$_2$ and PrMg$_2$Cu$_9$, and find that it can reliably and accurately extract the CF parameters for all site symmetries and values of $J$ considered. This demonstrates that CNNs provide an unbiased approach to extracting CF parameters that avoids tedious multi-parameter fitting procedures.

Published as SciPost Phys. 11, 011 (2021)

# Reply to SciPost referee reports for "Learning crystal field ..."

While not revolutionary, this is a simple yet useful example of how the machine learning approach can reduce a somewhat tedious task often performed manually in experimental labs. The paper is very clearly written, and is accompanied by an open source repository where the corresponding code can be directly used to compute Steven parameters from thermodynamic measurements. This clearly will be useful for future experiments.

This paper can be accepted as it is, and I have only basic questions / suggestions.

We thank the referee for recommending our paper for publication. We address the referee's suggestions below and in the modified manuscript.

To improve quality of the results for the hexagonal and tetragonal point group, it is suggested to add other input data such as a second magnetisation curve (obtained from a field aligned in a different direction). I guess this is quite easy to add in the training data and approach, and I would suggest to try it to see if it indeed improves the results. Same question for a larger temperature range.

We thank the referee for pointing this out. We have actually included magnetization and susceptibility data along two inequivalent crystalline directions ($[001]$ and $[100]$) in the training data sets for tetragonal and hexagonal systems (see Secs.4B and 4C), and in the newly added experimental example, CeAgSb$_2$ (see Sec.5A). Providing magnetization along different directions indeed results in more accurate CNN predictions of the CF parameters. In fact, to successfully predict the CF coefficients that describe the asymmetry between different crystal directions (e.g. $ab$ versus $c$ directions in a tetragonal system) requires such information. We have emphasized that point in the modified manuscript. Also, we chose to remove the entries in Table 2 and 3 that are not properly learned by the CNN (these entries were previously close to zero, which is the network's best prediction of a parameter that cannot be properly learned).

We have not systematically tested incorporating larger temperature and magnetic field ranges, mainly because we wanted to keep it the study as experimentally realistic as possible (i.e. we wanted to use data that can be easily acquired in the lab without going to special facilities, such as high-field laboratories). This can certainly be explored in future work.

We note that the magnetic field range for the newly added example CeAsSb$_2$ is going up to $50$ T, and we find good agreement of the CNN prediction with the experimentally observed moment saturation values. Especially going to larger magnetic fields, where the higher energy levels mix into the B=0 ground state manifold provides useful information about the CF parameters.

When comparing to experimental data, it would be interesting to see how good/bad is this approach when one of the experimental input is missing (as could happen in a lab, if e.g. specific heat measurements are not available).

It is generally true that more data is always better for the performance of the machine learning algorithm. Specific heat data is useful to constrain the splitting between the lowest two crystal field states. We have tested (for the data set for CeAgSb$2$ explicitly) that the CNN can also converge to solutions with small MSE if we do not provide $c_M$ data. However, generally constraining the results by more data is better as it reduces the number of solutions to the inverse problem. Most importantly, the data needs to contain sufficient information (e.g. about the anisotropy of the system or contributions from higher excited levels) to discriminate different CF parameters. Most notably, the CNN is unable to learn particular CF parameters ($x_3$ for hexagonal and ${x_2,x_4}$ for tetragonal) if the data set does not include information about magnetic anisotropy between ab and c directions (by providing both $\chi_c, M_c$ and $\chi, M_a$). We emphasize that this is not a shortcoming of the machine-learning algorithm but just a property of the inverse problem. In the updated manuscript, we have emphasized this point in the captions of Tables 2 and 3.

Using convolutional neural networks to solve this important inverse problem in condensed-matter physics is a worthwhile idea, which is presented in a very pedagogical way that will benefit early-career researchers in the field. Extracting crystal field parameters is known to be time consuming, and the fact that the algorithm is provided as an open source will be useful to the community. Overall, this manuscript meets SciPost’s criteria for publication.

We thank the referee for the positive evaluation of our work. We also greatly appreciate the referee's detailed and useful comments and suggestions for improvement of our work. We address these below and in the updated manuscript.

A few issues need to be addressed, however, before the manuscript is published.

1. One general limitation of performing fits of thermodynamic data to a crystal field Hamiltonian is the possibility of reaching a local minimum that reproduces the data reasonably well but that does not necessarily contain the correct Stevens parameters. In particular, this is more likely to happen when a large number of parameters is taken into account. The authors address the issue of over-parametrization, but could the authors comment on whether their algorithm has encountered local minima (of e.g. MSE) during the development of this work? In particular, one would naively think that local minima may be why “the CNN can accurately predict the Stevens parameters for the majority of the data points that it was tested on”, but not all data points.

The problem of becoming trapped in local minima of the cost function is generally present in any machine learning approach with a complex cost function landscape such as ours. We have employed the Adam optimization algorithm, which is a stochastic gradient descent method and has thus built in randomization to avoid such local trapping as much as possible. Reaching the global minimum, however, is not guaranteed for gradient descent optimizers. In addition, there may be multiple CF solutions with similar MSEs (since the inverse problem may be ill-defined). Both cases can be addressed in practice by validating the CNN predictions via a direct comparison of the corresponding theory predictions (i.e. the calculated observables for the predicted CF parameters) with the experimental data. This allows to explicitly check whether they agree sufficiently well (this is what the MSE measures). If not, we can choose to retrain the CNN using a different training data set by changing temperature and/or field range or including additional observables (if experimentally available) to constrain the problem more.

We emphasize that the features in the MSE heat map in Fig.5 are physically understandable and arise from an increased sensitivity of observables to small uncertainties in CF parameters due to level crossings (see heat map in Fig.5), smallness of parameters (e.g. $x_3$ in Fig.6 and x4 in Fig.7) or the collapse of spectrum (small $x_0$ in Fig.5). It is also expected that prediction of $x_0$ becomes more difficult as the bandwidth becomes much larger than the maximal temperature one considers (here, $T=300$ K), as the Boltzmann weight of the higher energy levels then becomes very small. The fact that the behavior of MSE and MAE can be understood physically is an indication that the algorithm is not trapped in local minima.

We have added the following paragraph at the end of Sec.III: "We choose the stochastic gradient descent Adam optimization algorithm to avoid as much as possible the trapping in local minima of the cost function. As shown below, the behavior of the quality of the CNN predictions (as described by MSE) across different input parameters can be largely understood by physical means such as arising from energy level crossings, the smallness of certain CF parameters or the ratio of the bandwidth to the maximal temperature scale. This indicates that the CNN is not trapped in local minima. In general, the inverse problem that the CNN addresses may be ill-defined and allow for multiple solutions. This issue can be (partially) addressed in practice by providing more data to the CNN such as enlarging the field and temperature range and/or by including magnetization and susceptibility data along different field directions."

1. To address the issue above, it might be very enlightening to apply this algorithm to data sets of materials in which neutron scattering measurements have been performed. Two examples that come to mind are CeAgSb2 (from the same family of PrAgSb2) and CeRhIn5 (at high temperatures T>T_K, the Kondo temperature). One interesting aspect here is that CEF fits and neutron data agree well for CeAgSb2, whereas the agreement is worse for CeRhIn5. This could obviously be due to the Kondo effect, but nonetheless it would be a very informative comparison.

We thank the referee for these suggestions. We include a detailed analysis of CeAgSb$_2$ in the revised manuscript using data published by Takeuchi et al. in Ref.[35]. The results are described in Sec.5A (see Fig.8). In summary, the CNN predicts CF coefficients that are similar to results reported in the literature [35, 48]. The resulting MSE between the predicted and measured thermodynamic observables is the same for the CNN predicted values and the ones from Ref.[35], and given by MSE = 0.17.

1. I further encourage the authors to consider Cerium instead of Praseodymium because the former has only one f-electron, whereas the latter has two, which could give rise to J mixing. This has been shown, for instance, in the case of Pr2Sn2O7 via neutron spectroscopy [PRB 88, 104421 (2013)]. Even if the authors are absolutely convinced there is no J mixing in the Pr compounds investigated in their work, it is worth mentioning this general possibility in the manuscript to inform the reader of this potential issue.

We thank the referee for pointing this out. We agree with the referee that Ce members are in principle preferred as they only contain a single f-electron. In the updated manuscript we include an analysis of CeAgSb$_2$ in Sec.5A (see Fig.8).

We also added a statement about the possibility of J mixing in the experimental section on p.13: "In fact, we chose to apply our algorithm to the Praseodymium members of the RAgSb$_2$ and RMg$_2$Cu$_9$ series, because they do not exhibit magnetic order down to $2$~K. On the other hand, they may exhibit some degree of $J$ mixing, which is neglected in the training data generation. This could be avoided by investigating the Ce member of the series, since Ce$^{3+}$ only contains a single $f$ electron. "

1. On the Kondo effect, the authors correctly point out in the Introduction that the crystal field scheme can also have important ramifications for the nature of the Kondo effect in the system. In a simple local picture, this can be understood by considering that the shape/anisotropy of the ground state wavefunction is related to the overlap between f electrons and conduction electrons, which in turn determines the Kondo hybridization. However, the citations provided [11-14] are not particularly general, and they mostly focus on U-based materials, for which the LS coupling may not be valid as mentioned in point #3. To address SciPost’s criterion to “Provide citations to relevant literature in a way that is as representative and complete as possible” – and the fact that the community is considering crystal field effects more strongly in the recent past – the authors are encouraged to provide more citations to the relevant literature.

We have added the following references about interplay of CF and Kondo effects to the manuscript

• F. B. Anders and T. Pruschke, Can competition between the crystal field and the Kondo effect cause non-Fermi liquid-like behavior?, Phys. Rev. Lett. 96, 086404 (2006).
• P. M. Levy and S. Zhang, Crystal-field splitting in Kondo systems, Phys. Rev. Lett.62, 78 (1989).
• L. Peyker, C. Gold, E.-W. Scheidt, W. Scherer, J. G. Donath, P. Gegenwart, F. Mayr, T. Unruh, V. Eyert, E. Bauer, and H. Michor, Evolution of quantum criticality in CeNi9-xCuxGe4, J. Phys.: Condens. Matter 21, 235604 (2009).
• M. Dzero, K. Sun, V. Galitski, and P. Coleman, Topological Kondo insulators, Phys. Rev. Lett. 104, 106408 (2010).
• M. A. Romero, A. A. Aligia, J. G. Sereni, and G. Nieva, Interpretation of experimental results on Kondo systems with crystal field, J. Phys.: Condens. Matter 26, 025602 (2013).
• H.-U. Desgranges, Crystal fields and kondo effect: Specific heat for cerium compounds, Physica B: Condensed Matter 454, 135 (2014).

We are happy to include further suggestions of the referee if they want.

1. It could be useful to mention that the leading term, B20, is proportional to the difference in Weiss temperatures along different axes. Even though the method introduced by the authors is supposed to be unbiased, there are tricks that could be useful for training data.

We added such a statement in Sec.5A: "We note that $B^0_{2,\text{Stevens}}$ is proportional to the difference in Curie-Weiss temperatures along different axes~\cite{Wang-Phys_Lett_A-1971}, which provides another useful validation check of the CNN results."

1. Regarding the outlook of this work, including magnetic interactions will be very valuable. The easiest way of doing this is by adding a molecular field term as employed in JPSJ 70, 877 (2001), for example. The next step would be to include a mean-field Hamiltonian with one exchange constant and so on.

We thank the referee for this suggestion and the useful reference. We have added a couple of sentences in the outlook (and on p.13 (right column)) when we discuss the current limitations of our modeling) and included the reference there. The added statement in the outlook reads:

"One promising future direction is to include correlation effects such as magnetic exchange interactions between different local moments within the modeling approach used to generate the training data. Magnetic exchange could, for example, be rather straightforwardly included via a molecular mean-field approach~\cite{Takeuchi-CeRhIn5-2001, Jobiliong-PRB-2005, Johnston-Unified_molecular_field-PRB-2015} (at the small cost of introducing an additional fit parameter describing the molecular field)."

1. This is a minor point, but it could stimulate readers to actually use the open software in Ref. [47]: one of the main motivations of this work is to circumvent a time-consuming fitting procedure, but if one is using a conventional minimization method, a significant portion of the overall time is also spent i) preparing/converting the data and ii) waiting for iterations. Could the authors comment on the overall time spent during their approach (machine time and researcher time)?

We have added more details about the duration of the different steps of our algorithm in practice. These have been added in Sec.III at the end of the respective subsections. To summarize:

1. Our algorithm assumes an experimental data set of thermodynamic observables has been obtained in usual experimental units. The conversion to the units we use in the training data is described in detail in the appendix and takes a negligible amount of time.
2. Then, the training data set needs to be theoretically calculated (including the continuous wavelet transformation). This takes about 3 hours of physical time on a regular multi-core CPU.
3. Then, the CNN needs to be trained. We perform this step on a standard Nvidia Volta V100S graphic processing unit (GPU), where it takes about 1-2 hours.
4. Finally, the experimental data set is placed on the input nodes of the network and CNN yields the CF parameter predictions on the output node, which takes a negligible amount of time.
1. A couple of final points regarding input data: i) similar to the subtraction of the phonon contribution to obtain the magnetic part of the specific heat, one also has to subtract an enhanced Pauli susceptibility from magnetic susceptibility data (if applicable); ii) trying to fit experimental data at low temperatures (eg M(H) at 1.7K) may be very hard, even if there is no magnetic order because of the effects of magnetic exchange. I wonder whether this is causing the disagreement in Fig 8d. This could also be an indication that, although the CF splitting is predicted correctly, the ground state wavefunctions are not. Therefore, it might be more useful to only fit magnetic susceptibility and magnetization data at higher temperatures, e.g., T>25K or so.

(i) thanks, a statement about subtracting Pauli $\chi$ has been addded on p.13: "Similarly, one may need to subtract an enhanced Pauli susceptibility contribution, which arise from conduction electrons, from the magnetic susceptibility data. " (ii) we have added a statement about complications from different physical effects such as magnetic exchange that can arise at low T on p. 13: "When selecting a suitable temperature window, one must ensure to avoid the occurrence of many-body phenomena such as the development of Kondo screening ($T > T_K$), magnetic order ($T > T_M$), or significant magnetic exchange interaction effects ($T > T_{\text{RKKY}}$), which are currently neglected in the modeling that generates the training data. We thus choose to apply our algorithm to the Praseodymium members of the RAgSb$_2$ and RMg$_2$Cu$_9$ series, because they do not exhibit magnetic order down to $2$ K (even though magnetic exchange effects may become noticeable at $T \lesssim 5-10$ K already)."

### Weaknesses

1. rather simple use of ML methods as in older image recognition methods, no ResNet, no transfer learning, etc.

We thank the referee for these suggestions. We included the possibilities of using other, more advanced deep learning models in the future in the outlook of the paper.

1. language used to quantify ML recognition accuracy uses standard ML terms (MAE, MSE), but not physically useful parameters (% of deviations, errors, etc)

We modified our definition of MSE to give it a meaningful physical interpretation. More details are given our reply below and in the updated manuscript around Eq.(4.2).

1. usefulness of method in "real" situations is left somewhat unclear as ML aspect of data creation is not detailed

We expanded and clarified the description of training data generation and explicitly included the time is needed in practice to train a CNN for a given set of thermodynamic data sets available to the experimentalist. Additional sentences were added to Secs.3A and B.

"One training sample therefore consists of five (nine) different sets of thermodynamic data. A complete training sample for $J=4$, $\mathcal{G} = \text{m}\bar{3}\text{m}$ and $x_0 = 20$~K, $x_1 = 0.5$ is shown in Fig.1(b, c). To obtain the training data set, we draw the Stevens parameters randomly from a uniform distribution and, for each of these sampled values, compute the aforementioned observables. To generate sufficient training data for the network, the process takes 2-3 hours."

"Let us finally describe the resource cost of training the network. Using $10^5$ training examples and $1.5 \times 10^4$ validation and $1.5 \times 10^4$ testing examples with a batch size of $N_{\text{batch}} = 64$, the network converged after around $100$ epochs. With the available GPU (Nvidia Volta V100S, 32 GB), training the network took around 70 seconds per epoch. Fully training the network thus takes around 1-2 hours."

Once the CNN model is trained, it is extremely fast to test new experimental examples.

### Report

The manuscript applies machine learning (ML) approaches, specifically deep learning (DL) methods such as convolutional neural networks (CNN) used predominantly in image recognition, for material sceience/physics. As such, it "open(s) a new pathway in an existing or a new research direction, with clear potential for multipronged follow-up work". The manuscript is also written in a clear and intelligible way, contains a detailed abstract and introduction explaining the context of the problem, and provides sufficient detail.

We thank the referee for this positive evalution of our work, as well as the constructive criticism that we address below and in the modified manuscript.

The wavelet scaleograms are an ingineous way to harness the power of modern DL image recognition programs for a material science study of material parameters. It was interesting to note that page 8, RH column, that this 2D approach seems to work even better than a simple fully connected 1D neural network structure. A bit more detail might be useful here.

At the beginning of our research, we first tested simpler network architectures, specifically (i) a 1D CNN that was fed with a 1D stacked vector of the raw data and (ii) a simple feedforward deep neural network (DNN). The advantage of these models is that they are less complicated than the 2D CNN described in the paper and it thus takes less time to train. However, we never got fully satisfactory results, even in the simplest case of cubic symmetry, where only two Stevens parameters exist.

We describe details of the performance of these two networks here. We have expanded the paragraph in the manuscript that details the network comparison. It reads

"To make a fair comparison, we created architectures that had approximately the same number of parameters as the 2D CNN and were trained on the same data. Applying to the cubic case with two Stevens parameters, $x_0$ and $x_1$, we find the 1D CNN a factor of 2 worse for $x_0$ and a factor of 7 worse for $x_1$ than the 2D CNN. The feed-forward deep neural network performed a factor of 1.25 worse for $x_0$ and a factor of $9$ worse for $x_1$ than the 2D CNN. It is expected that the performance difference is enhanced in the lower symmetry cases with more Stevens parameters to predict, which is why we chose to use the 2D CNN. "

Here in the reply, we give further details on the architecture of the other networks we tested. Since the manuscript is already quite long and we do not include any results of the other networks, we have opted to not include all these details in the manuscript (and we hope the referee agrees with our decision):

Specically, to make a fair comparison, we tried to create architectures that had approximately the same number of parameters as the 2D CNN, $\sim$ $1.3 \times 10^7$. Each of the models were trained with the same data, although hyperparameters were changed in an attempt to further optimize them. After around 30 epochs and a batch size of 64, the networks converged.

1D CNN: The raw numerical thermodynamic data is stacked to form a two dimensional input with shape (64, 5) (64 temperature/magnetic field values, 5 thermodynamic observables). It is then fed though a number of 1D convolution, max-pooling layers, and eventually a sequence of fully-connected layers. With the fully-connected layers, we apply a dropout of 0.4 to help prevent overfitting. We similarly apply batch normalization and the ReLU activation function to hidden layers, ending with a fully-connected output layer of width 2 with a linear activation function. Applying to the cubic case with two Stevens parameters, this model is able to predict the $x_0$ coefficient with a mean absolute error (MAE) of $\text{MAE}({x_0}) = 0.585$ K. The $x_1$ coefficient is predicted with $\text{MAE}({x_1}) = 0.087$ units. This is about a factor of two (seven) worse than the MAE of $x_0$ ($x_1$) we found for the 2D CNN that is fed with the 2D wavelet scaleograms of the thermodynamic data, where we found $\text{MAE}{\text{2D CNN}}(x_0) = 0.321$ K and $\text{MAE}}(x_1) = 0.012$

Feedforward deep neural network: This architecture is the most naive approach to building a model. The input data is reduced to a single one dimensional array containing all 384 data points in sequence. In this network we only apply a sequence of fully-connected layers, each with batch normalization and dropout. We again end with an output layer of width 2 with a linear activation function. This model is able to predict the $x_0$ coefficient with a mean absolute error (MAE) of $\text{MAE}({x_0}) = 0.388$ units. The $x_1$ coefficient is predicted with $\text{MAE}({x_1}) = 0.103$ units. While the network predicts $x_0$ as good as the 2D CNN, the MAE for $x_1$ is about a factor of nine worse that of the 2D CNN.

It is expected that the performance difference is enhanced in the lower symmetry cases with more Stevens parameters to predict, which is why we chose to use the 2D CNN.

Also, is there an intuitive way to understand why a CNN might work better than a simple deep neural net? Why is the convolution/neighbor structure to much better?

Yes, a CNN can pick up specific features that appear in the frequency domain if a Fourier decomposition is applied to the data (here the time domain corresponds to temperature or magnetic field). This ability stems from the fact that the CNN has pooling layers and convolutional layers. They are designed to recognize specific features in images, i.e., specific features in the frequency domain in the 2D wavelt scaleograms. Convolutional layers can use localization of ($\omega, T$) features to classify images (= scaleograms). This information cannot be accurately resolved using simple deep neural nets or 1D neural nets, where the image data is flattened and this spatial information is lost.

I was surprised not to see a ResNet structure used. Or is the LeNet implementation you use now a ResNet?

The LeNet is not a ResNet. We chose a LeNet to see how well a relatively simple and well known architecture would perform on this problem. It would indeed be interesting to use other CNN architectures and compare their performance on this problem. We appreciate the referee's suggestions and leave this for future work.

I also did not see that you are using batch normalization? Is that not needed since already included in the CWT construction such that an overall norm is being used? Please comment.

The input data scaleogram is centered, but not normalized. Batch normalization is used in each of the layers, which involves both centering and normalizing the input of that layer.

The DL approach here is a multidimensional regression. More modern methods could use GANs or even variational auto encoders. Is there a reason why this is not done here?

It would indeed be interesting to use other DL approaches (in particular GANs) for this problem. Here, we chose a well established multidimensional regression approach for simplicity. Since we were able to obtain satisfactory results with this method, we did not need to increase the complexity of the DL method. However, future work, in particular for lower symmetry materials, may very well benefit from more sophisticated DL methods, and we very much appreciate the referee's suggestion.

Overall, the methods seems to work well since in the end, only very few Stevens parameters are "predicted". It would be good to gain an understanding for how many Stevens parameters one might see that the method still works. What is the theoretically possible such number, given the available crystal fields/materials?

There are 32 different site point symmetry groups in three dimensions. The number of Stevens parameters for $f$ electron systems range from 2 (highest cubic symmetry) to 26 (lowest symmetry = no symmetry or just inversion symmetry). The different number of Stevens parameters are 2 (cubic), 4 (hexagonal $D_{6h}$), 5 (tetragonal, $D_{4h}$), 6 (tetragonal $C_{4h})$, 8 (hexagonal $C_3$), 9 ($D_{2h}$), 14 ($C_{2h}$), 26 ($C_i$). See for example, Ref.[17] (Walter, 1984) of our paper.

It is an interesting follow-up work to investigate the lower symmetry groups and test how well the deep learning approach works there (possibly using more advanced models as suggested by the referee). In this manuscript, we wanted to focus on the experimentally more relevant case of higher symmetry, as studies of higher symmetry compounds are more common than, e.g., low symmetry triclinic systems.

### Requested changes

1 clearly indicate in the paper (not just in the caption to Fig. 1) that your wavelet construction leads to image of size 64x64.

We added a comment mentioning this explicitly in the training data generation section.

2 The discussion of what a CWT is, appears somewhat ad hoc on page 8 after CWTs have already been discussed. I think it might be useful to move this paragraph surrounding Eqs. (3.3) and (3.4) somewhere else or earlier. Or, alternatively, to introduce subheading in section III.A, or, perhaps a new section on just CWTs as section III.B (or some such label).

We introduced subheadings as suggested by the referee.

3 I am overall worried that you only use ML-based accuracy measures (section IV, e.g. caption to Fig. 4 and others). Clearly, MAE and MSE are useful image recognition/classification measures, but for a physics context, I would have expected to see errors and accuracy measures expressed in physical units. For example, an MSE of 10^-3 is somewhat meaningless while a % RMSE, i.e. (4.1) or (4.2) divided by, say O(x_true), would allow the contruction of a % error/deviation, averaged over the used Stevens coefficients. Indeed, one can also cmpute such measure for each Stevens parameter.

Regarding the MSE, we agree with the referee and have updated the manuscript to now define the MSE for normalized and dimensionless operator differences (see updated Eq.(4.2)). The MSE is now physically meaningful. We have also updated the figures accordingly and added the following sentence below Eq.(4.2): "To account for the differences in size and units between observables, we first normalize each dataset by their mean and perform Eq.(eq:4.2) on the resulting dimensionless quantities."

We think that the MAE is an appropriate measure for the quality of the Stevens's coefficients predictions. The coefficients $x_i$ with $i \neq 0$ are all dimensionless and range from $[-1,1]$ and a MAE thus describes how well they are predicted. The coefficient $x_0$ has units of Kelvin (it is the overall scale of the Stevens parameters) and it ranges from $[0.5, 50]$ K in practice (this covers the full range of typical $x_0$ values found experimentally). The MAE of $x_0$ thus is a measure of how well we can predict the overall scale.

4 page 9, LH column, "clearly recognized". Sorry, but I cannot see this, never having looked at a CWT. Please either indicate in the figure or reword my clearly.

We have expanded our discussion in the manuscript how some broad features of the original data can be recognized in the scaleograms. We have removed the word "clearly".

minor points: 5 page 12, RH column, "In this section, we demonstrate this ..." I am not sure what is meant by the 2nd "this", i.e. circumstances, parameters, custem CNN, each case? Please rewrite.

We have reformulated the paragraph to make this point more clear. In the sentence, "this" refers to "The ultimate application of the presented CNN algorithm is to extract CF parameters from real experimental data".

6 conclusions: what is meant be "net work performs ... well ... algorithm ... works well"? Could you please be quantitative and specific? How much faster and how much more accurate?

We provide more details in the modified manuscript on how long the generation of training data and how long training the network takes. We also refer more clearly to the MAE and MSE as a quantitative measure of how well the network performs. We feel that we do not want to repeat all these details in the conclusion of the paper, as it can be found in the corresponding sections, but we are more precise what we take as quantitative performance measures.

### List of changes

pdf file with differences highlighted will be provided. The changes are also detailed in the reply to the referees.

### Submission & Refereeing History

Resubmission 2011.12911v2 on 19 May 2021
Submission scipost_202012_00008v1 on 11 December 2020

## Reports on this Submission

### Report

I have read the detailed responses to all three referees' remarks.
In my opinion, the authors have clearly replied to all and improved the paper by providing another experimental example (Cerium compound) and claryfing several points.
The manuscript which was already well written has even improved.
I thus recommend publication of the manuscript in its current version.

• validity: high
• significance: good
• originality: good
• clarity: top
• formatting: excellent
• grammar: perfect

### Report

The authors have answered all concerns and questions raised by the reviewers, and the manuscript is now suitable for prompt publication in Sci Post. My only minor request regarding references on the interplay of CF and Kondo effect is to add reference Haule et al, Phys Rev B 81, 195107 (2010).

• validity: high
• significance: high
• originality: good
• clarity: top
• formatting: excellent
• grammar: excellent