## SciPost Submission Page

# Constraining the SMEFT with Bayesian reweighting

### by Samuel van Beek, Emanuele R. Nocera, Juan Rojo, Emma Slade

### Submission summary

As Contributors: | Emanuele Roberto Nocera |

Arxiv Link: | https://arxiv.org/abs/1906.05296v1 |

Date submitted: | 2019-06-17 |

Submitted by: | Nocera, Emanuele Roberto |

Submitted to: | SciPost Physics |

Domain(s): | Exp., Theor. & Comp. |

Subject area: | High-Energy Physics - Phenomenology |

### Abstract

We illustrate how Bayesian reweighting can be used to incorporate the constraints provided by new measurements into a global Monte Carlo analysis of the Standard Model Effective Field Theory (SMEFT). This method, extensively applied to study the impact of new data on the parton distribution functions of the proton, is here validated by means of our recent SMEFiT analysis of the top quark sector. We show how, under well-defined conditions and for the SMEFT operators directly sensitive to the new data, the reweighting procedure is equivalent to a corresponding new fit. We quantify the amount of information added to the SMEFT parameter space by means of the Shannon entropy and of the Kolmogorov-Smirnov statistic. We investigate the dependence of our results upon the choice of either the NNPDF or the Giele-Keller expressions of the weights.

###### Current status:

### Submission & Refereeing History

## Reports on this Submission

### Anonymous Report 1 on 2019-7-8 Invited Report

### Report

This article is a valuable contribution to the ongoing study of the best way to utilize the SMEFT as a tool to constrain generic new physics using precision measurements at the LHC and other experiments; it explores the possibility of using reweighting techniques to produce a fast approximation to the results of a complete fit in the presence of a new data source.

I note that there is a notational error occurring in equation (2.9) and following; one imagines that $F(_{\rm rw}(\langle c_i\rangle)$ does not need the additional open-parens. On a similar note, there are typos on page 10 (ne for new), in the caption to Figure 4.3 (fo for of), and on page 14 (NNDPF for NNPDF) that jumped out at me.

Moving to address the physics and statistics of the contribution, I do have a couple of questions that aren't fully explained (at least at my level of comprehension) in the article. The most troublesome point is the inclusion of quadratic EFT effects, that are $\mathcal O(\Lambda^{-4})$, in the signal function. The effects at this order in the EFT expansion are not fully given by the terms kept by the authors, and following the usual rules of perturbation theory calculations these ought to be dropped, with a theoretical error introduced to parameterize our ignorance of the size of the effect at this order. With the current treatment, which is sadly common in the field, "constraints" are regularly produced on the EFT parameter space which do not hold in more-complete models, admitting instead model-building workarounds, which would not be the case with a robust theoretical treatment of the EFT expansion. It would at least be very beneficial to understand, in every article produced on this topic, how impactful the "quadratic" terms are on the fit itself (which the authors have tersely explored in their previous work, citation [23]). More generally, it is very important to acknowledge that the EFT is a series expansion in something like $\frac{s}{\Lambda^2}$, and as such it doesn't make sense to consider scales $\Lambda\sim1$ TeV at the LHC; it's perfectly clear that this will not converge. I would recommend then that the authors re-benchmark their scale of new physics to $\Lambda\sim5$ TeV instead.

I'm also confused by the comments regarding double-counting and whether or not it is problematic in this context on page 6; if the goal is to explicitly reproduce the results of the fit of [23] through sub-fitting and then reweighting why should I be including additional data which was excluded there? Does the reweighting procedure meaningfully depend on that data to reproduce the fit accurately? More generally, it isn't clear to me why double-counting would be less of a concern in the reweighting, fast-fit production context than it is in the context of a full-blown fit.

The testing for reliability of reweighted results based on the KS statistic seems to have been employed in a thoroughly ad-hoc way here as well; is there any mathematical/statistical reason why we should expect these thresholds of 0.3 or 0.2 to be dispositive as to the reliability of the reweighting procedure? Given the differences in adopted thresholds for different sub-analyses, is there some meaningful interpolating formula, perhaps one that takes in to account the amount of data points being added, which could suggest a reasonable threshold for reliability in future reweighting exercises? Also, and more worryingly, given that e.g. the result for O13qq is officially reliable but the result for Ofq3, which is very strongly correlated with O13qq in the full fit of [23], is not, how are we to think about correlations and flat directions in the reweighting framework?

I also am struck by the (admittedly not phenomenologically relevant) increasing feature in Figure 4.2 in going from 6 to 7 datasets included; I would naively have assumed that introducing additional data should only be further narrowing the range of replicas that were consistent with the data, but that doesn't seem to be the case here. Is this behavior understood by the authors? A short comment explaining it would be valuable to the reader I believe.

Finally, I find myself thoroughly confused by the discussion of NNPDF versus GK weights; it is clear that having accidentally found something that fits very well can be damaging in the case of GK weights, but it isn't obvious why that sample should be fully discarded as in the NNPDF formalism; fitting well doesn't generally get punished in statistics, but it shouldn't be overly rewarded. Is it plausible that some middle-ground treatment exists, which for instance treats any fit better than that which maximizes the NNPDF weight as equally-worthy with that maximizing fit?

In all this is indeed a valuable exploration of techniques for rapidly estimating the impact of new data on fits in the SMEFT, and deserves to be published after addressing these questions and comments.

### Requested changes

1. Correct typographical errors in notation and text as identified.

2. Renormalize theory to a cutoff scale $\Lambda\sim5$ TeV where the SMEFT approach is theoretically consistent.

3. Explore the importance of keeping versus dropping "quadratic" contributions to the reweighting procedure, as well as the impact of introducing new theoretically errors for missing higher orders in $\Lambda^{-2}$.

4. Explore and discuss the impact of flat directions on reliability metrics for reweighting results.

5. Add explanation for rising feature to caption of Figure 4.2.

6. Explore and discuss potential intermediate-case weights for the reweighting procedure.