Spey: smooth inference for reinterpretation studies

Statistical models serve as the cornerstone for hypothesis testing in empirical studies. This paper introduces a new cross-platform Python-based package designed to utilise different likelihood prescriptions via a flexible plug-in system. This framework empowers users to propose, examine, and publish new likelihood prescriptions without developing software infrastructure, ultimately unifying and generalising different ways of constructing likelihoods and employing them for hypothesis testing within a unified platform. We propose a new simplified likelihood prescription, surpassing previous approximation accuracies by incorporating asymmetric uncertainties. Moreover, our package facilitates the integration of various likelihood combination routines, thereby broadening the scope of independent studies through a meta-analysis. By remaining agnostic to the source of the likelihood prescription and the signal hypothesis generator, our platform allows for the seamless implementation of packages with different likelihood prescriptions, fostering compatibility and interoperability.


Introduction
Unraveling new physics beyond the Standard Model has been a pursuit driving scientists to expand the boundaries of experimental analyses.The forefront of this exploration is the Large Hadron Collider (LHC), where valuable analyses are conducted, specifically targeting phase spaces to capture the signature of simplified models.These models, such as variations of the minimal supersymmetric extension of the Standard Model (MSSM), often include fixed masses and decay branching ratios, albeit unable to fully encapsulate the effects of a complete theory.While these searches yield crucial insights, the ever-expanding theoretical landscape presents scenarios beyond the reach of simplified models.
Analyzing the vast amounts of data from the LHC involves statistical modeling to distill meaningful information from experimental results, enabling inferences about new physics.While recent strides have been made in publishing complete statistical models, most analyses offer limited information, prompting the emergence of approximated approaches to capture the likelihood distribution of the original statistical models [12][13][14][15][16]. Particularly in cases with limited correlation information, simplistic Gaussian toy model-based approaches have been utilized (see ref. [17] for an example).Hence, inconsistencies arise in the publication of statistical models or data to construct them, leading to inaccurate inference of the new theories.The recent efforts by the HS3 collaboration 1 show promising advancements in standardization of likelihoods.However, numerous applications continue to rely on statistical models that fall outside the standardized prescriptions of the HS3 proposal, posing a significant challenge in their utilization.
This study introduces a new cross-platform Python-based package, SPEY. 2 This package harnesses a versatile plug-in system supporting standardized likelihood prescriptions for statistical hypothesis testing; computing exclusion limits, assessing discovery significance, and conducting parameter fitting.It accommodates various likelihood prescriptions, referred to as "backend" implementations, seamlessly integrating existing inference packages or facilitating the development of new ones.This design ensures agnosticism towards specific implementations, providing a flexible framework for likelihood inference, even for yet-to-be-proposed prescriptions.
Leveraging this flexibility, SPEY offers default approximated likelihood approaches specifically tailored for counting experiments such as those conducted at the LHC, i.e. simplified likelihood framework [13,14].These approaches encompass both correlated and uncorrelated Poisson-based likelihood prescriptions.However, these approaches tend to over-exclude certain parameter spaces.To mitigate this, we propose an alternative likelihood approximation incorporating asymmetric background uncertainties, yielding a more precise approximation of the original likelihood distribution.
While precise approximations improve exclusion limits, utilizing full statistical models whenever possible is preferable.To address this, we introduce a backend plug-in for the widely acclaimed pyhf package [18,19], integrating its functionality within SPEY for employing full statistical models while retaining backend agnosticism.
One crucial aspect in statistical modeling is likelihood combination, facilitated by SPEY's backend agnostic infrastructure, enabling various likelihood combination techniques across any backend implemented within SPEY.We showcase this capability by extending methodolo-gies from ref. [20] to enhance exclusion limits through the combination of full and approximated statistical models from different experiments, demonstrating the framework's strength in maximizing the potential of the experimental results.
This study is structured as follows: Section 2 summarizes inference through likelihoods and illustrates SPEY's application using a simple example.Section 2.1 delves into the implementation of default likelihood prescriptions for correlated histograms.Section 2.2 demonstrates the utilization of full statistical models through SPEY.Likelihood combination techniques are explored in Section 3. Finally, an example beyond LHC experiments is presented in Section 4.

Statistical models
This section briefly introduces statistical modelling and its applications within SPEY.A statistical model is based on the occurrence probability of a random variable n ∈ {n i }; P(n). 3his random variable n can be, for instance, observations at the LHC, which includes the randomness of the detector effects such as resolution, reconstruction efficiency etc.In continuous cases, the probability assigned to a specific value can be expressed as integration over the probability density function (PDF) given by Here, S represents the integration hyper-surface over variables x i ∈ X and f (x i , • • • ) represents the PDF.The likelihood function is defined as the overall PDF, evaluated from the observations x i with respect to some parameters A statistical model proposes a certain structure for the PDF, capturing the distribution of the variable X and its uncertainties.A generic composite likelihood can be described as [22]; In this equation, λ i is a function of signal (n s ), and background (n b ) yields, 4 µ represents the parameter of interest (POI) or signal strength, θ refers to the nuisance parameters, and C represents the constraint terms associated with uncertainties. 5 The first term, describing the main model, accounts for the bins of the histogram (represented by i), while the constraint term multiplies the constraint for each nuisance parameter.In the case of a counting experiment like the LHC, the Poisson distribution is commonly used to compare observations with the modelled expectations.In a simplified scenario, uncertainties can be captured using a Gaussian constraint term.Therefore, a multi-bin histogram likelihood for a background yield n i b ± σ i b can be modelled as: Here, we have used a unit Gaussian as the constraint term since the nuisance parameters are standardized.This simplifies the optimization process when computing the profiled likelihood.The choice of Gaussian originates from the fact that many calibration observables, whose PDF central limits to a Gaussian, are parametrised at zero nominal rates with one sigma deviation.The specific form of the L (x i ) is generally unknown and depends on the available information from the experimental analyses.Therefore, it may be necessary to construct an approximate PDF description in order to compare a theory hypothesis with experimental results.Experimental analyses, particularly at the LHC, typically encompass the exploration of simplified signal hypotheses.In order to broaden the scope of such analyses, several reinterpretation techniques have been developed, ranging from intricate detector simulations to straightforward efficiency maps.These techniques enable the examination of new theoretical hypotheses in light of existing experimental results.Reinterpretation platforms are specifically designed to generate signal yields based on the chosen histograms defined by the experimental analysis, which are then subsequently incorporated into a likelihood distribution for setting exclusion limits on the given signal hypothesis.
SPEY serves as a convenient cross-platform Python-based interface that consolidates various PDF implementations in a unified framework for reinterpretation studies.It comes equipped with various differentiable PDF prescriptions while maintaining a simplified likelihood methodology as the default approach.This means that the default PDFs included in SPEY are designed to approximate the original likelihood based on the available information regarding yields and uncertainties.It is important to note that there are multiple ways to approximate a likelihood distribution, some better than others.To address this, the API is constructed with an expandable plug-in structure at its core, allowing users to implement and publish their own PDF prescriptions independently.Once SPEY is installed, it automatically detects new plug-ins through the backend detection system and incorporates them into the inference process without requiring modifications to the main code structure.For detailed instructions on building a custom likelihood model and utilizing it within SPEY, we refer the reader to Appendix A. Additionally, guidance on appropriately citing these independent plugins is provided in Appendix B.
SPEY has been integrated within the Python package index and can be downloaded and imported via 1 !pip install spey 2 import spey commands. 6Afterwards, all available PDFs can be printed using spey.AvailableBackends() function.This function will list all the default backends as well as third-party plug-ins, if applicable.A list of backend constructors currently available through SPEY has been listed in Table 1.
To represent a simple statistical model described by equation ( 3) for a counting experiment with two bins, one can utilize the "default_pdf.uncorrelated_background"backend in SPEY.The PDF backend can be accessed using the get_backend function, as demonstrated below: The variable pdf_wrapper is a wrapper function that ensures proper integration of the PDF prescription into SPEY by verifying various versioning and inheritance constraints. 7 To exemplify the process, let's construct a two-bin uncorrelated statistical model, as defined by equation (3), using the following observed yields: 36 and 33.The expected background yields are given as 50 ± 12 and 48 ± 16, while the signal model yields 12 and 15.The code snippet below showcases this implementation: 1 statistical_model = pdf_wrapper ( 6 An online documentation can be found in this link. 7Refer to Table 1 for a list of accessors.The documentation of the pdf_wrapper function also specifies the backend it wraps around.Expands upon the uncorrelated background prescription by representing the constraint term using a multivariate Gaussian with the correlation matrix between bins.See eq. ( 7). "default_pdf.third_moment_expansion" , Arguments: signal yields, background yields, data, covariance matrix, diagonal elements of third moments.
Expands the likelihood prescription with skewness information, as presented in ref. [14].Correlated constraint terms are captured using a multivariate unit-Gaussian distribution.See eq. ( 8)."default_pdf.effective_sigma",Arguments: signal yields, background yields, data, correlation matrix, upper and lower uncertainty envelops for the background.
Implements asymmetric background uncertainty envelopes using the effective sigma approach, as described in ref. [12].Correlated constraint terms are captured using a multivariate unit Gaussian.See eq.(10).
See section 2.2 for installation details [18]."pyhf", Arguments: background only description of the likelihood and corresponding signal patch.
Constructs full statistical model within the pyhf interface [18].See section 2.2 for details on installation.The analysis keyword in line 6 is an optional unique name for the model, and xsection in line 7 is the cross-section value of the signal used in the model, with the units determined by the user. 8nce initialized, the statistical model inherits all the properties of the StatisticalModel class, providing a backend-agnostic interface.The functions provided by the StatisticalModel class are summarized in Table 2.
SPEY employs the asymptotic formulae for testing new physics, as outlined in ref. [23].In this framework, the test statistic is divided into three main classes: Discovery (q 0 ), q µ test statistic, and alternative test statistic qµ , for upper limits.Each test statistic covers a specific domain defined by the interplay between the profiled likelihood, L µ, θ µ , background only profiled likelihood, L (0, θ 0 ), and the maximum likelihood values, L μ, θ .Here μ and θ are the set of parameters that maximises the likelihood, and θ µ are the nuisance parameters that maximises the likelihood for a specific POI.
For computing p-values with respect to the desired test statistic, SPEY offers two methods: using asymptotic formulae or generating samples from the PDF distribution.In the aforementioned example, the exclusion confidence level, 1 − CL s , can be computed using the statistical_model.exclusion_confidence_levelfunction, with qµ test statistic, yields a value of 97% CL.
SPEY offers three distinct approaches for computing limits, which are designated by the expected keyword.This keyword can be set using spey.ExpectationType.<XXX>,where <XXX> can take on the values observed, aposteriori, or apriori.
The observed and aposteriori expectation types involve conducting a likelihood fit using the provided observations during the initialisation of the statistical model.The distinction between these two types becomes evident when computing the p-values.
For the observed expectation type, the functions statistical_model.exclusion_confidence_level and statistical_model.poi_upper_limitprovide observed results.In this scenario, the cumulative distribution function (CDF) is computed using two asymptotic distributions: the background-only distribution (CL b ) and the signal-plus-background distribution (CL s+b ).The computation involves comparing the test statistic calculated using the observations to the data generated by the background-only statistical model, also known as Asimov data.The observed p-value is obtained by evaluating the ratio CL s = CL s+b /CL b (for more details, see ref. [23]).
For the aposteriori expectation type, one and two sigma fluctuations around the background-only distribution are calculated.
On the other hand, the apriori approach involves fitting the expected background yields, assuming the Standard Model as the truth, and subsequently computing expected p-values, similar to aposteriori expectation type.This option is commonly utilised by theorists, such as in the expected results presented in MADANALYSIS 5 [24] and SModelS [25,26], where fitting is performed based on expected background yields.
To illustrate the distinction between the observed and apriori expectation types, one can plot the likelihood distributions using the statistical_model.likelihood()function.This will immediately reveal a shifted likelihood distribution: for the apriori case, the profiled likelihood peaks at zero ( μ = 0), whereas for the observed case, it peaks at μ = −1.08,computed   5) or eq.( 6).

excluded_cross_section()
Computes the upper limit on the cross-section, with the unit depending on the input unit of xsection.

likelihood()
Computes the profiled likelihood for a given POI.

maximize_likelihood()
Computes the maximum likelihood by fitting the likelihood to observed data or background yields.

asimov_likelihood()
Computes the profiled likelihood for a given POI by fitting the likelihood to the Asimov data.maximize_asimov_likelihood() Computes the maximum likelihood by fitting the likelihood to the Asimov data.

sigma_mu_from_hessian()
Computes the variance on the POI via the Hessian of the negative log-likelihood.

significance()
Computes the significance of the signal yields using eq.( 4).

fixed_poi_sampler()
Samples from the statistical model by profiling the likelihood with a fixed POI.

combine()
If a backend-specific combination routine has been implemented, it combines two statistical models.

available_calculators
Retrieves information on which calculator is available for upper limit computations, i.e., asymptotic or toy-based.
via the statistical_model.maximize_likelihood()function.This difference diminishes as the observed data approaches the background yields.
Utilizing the same statistical model, the significance function is employed to assess the discovery significance of this model.By utilizing eq. ( 4), the function computes significance (Z), represented as q 0,A = 5 × 10 −4 , alongside determining q 0 and expected and observed p-values relative to eq. ( 4).Consistent with the earlier presented exclusion limit, the significance of this signal yield is notably low.

Correlated histograms with simplified likelihoods
Under the assumption that the uncertainties are modelled as Gaussian distributions, one can employ several methods to incorporate this information into PDF distribution.The simplest approach would be extending the constraint term with the correlation matrix (or covariance matrix) provided by the experiment, where ρ represents the correlation matrix between each nuisance parameter.In such a simplified scenario, various uncertainty sources are contracted into a single uncertainty for each histogram bin.This PDF can be accessed via "default_pdf.correlated_background".Despite its efficiency, such models do not capture asymmetric uncertainties, which skew the likelihood distribution.In order to address this issue, ref. [14] proposed an expansion procedure to eq. ( 7) via the third moments of the background uncertainty where likelihood distribution has been expanded as Here nb are the central values of the expected background yields, A are the effective sigma within the symmetric covariance matrix approximation, and C represents the skewness of the distribution, which are given as; where m (3) are the diagonal elements of the third moments and Σ is the covariance matrix.Notice that A and C also modifies the correlation matrix, ρ, which enters the eq.( 8).
Such expansion provides a more accurate representation of the original likelihood distribution by integrating the skewness into the PDF.This PDF can be accessed via "default_pdf.third_moment_expansion" accessor. 9Notice that when skewness is zero, C = 0, the expansion reduces to eq. ( 7).
The skewness of the PDF distribution can also be captured by building an effective variance from the upper (σ + ) and lower (σ − ) uncertainty envelops as a function of nuisance parameters, . This method has been proposed in ref. [12] for Gaussian models which can be generalised for eq.( 7) by reparametrising the likelihood distribution, effectively implementing the skewness into the Poisson distribution. 10This PDF can be accessed via "default_pdf.effective_sigma"accessor. 11Notice that if σ + = σ − , eq. ( 10) reduces to eq. ( 7).
Going beyond background-only uncertainties, eq. ( 3) can be expanded to accommodate signal uncertainties and employ the aforementioned methodologies.To achieve this, the constraint term has been extended with a signal-specific constraint term where θ s are the nuisance parameters dedicated for signal uncertainties and ρ s is the correlation matrix between nuisance parameters.In case ρ s is not known, constraint terms reduces to N µ (θ s |0, 1), as before.Notice that the constraint term has been scaled with POI, which is necessary to capture the scaling on signal yields.Similarly, depending on the information provided, λ i (µ, θ ) can be modified with third-moment expansion or effective sigma.Signal uncertainties have been integrated via signal_uncertainty_configuration keyword for each default_pdf backend.Note that this implementation assumes signal uncertainties are not correlated with background uncertainties.

Comparing simplified approaches
To test the accuracy of these frameworks presented above against a realistic experimental analysis, we used CMS-SUS-20-004 [28], which searches for four jet and large missing energy phase-space at a centre-of-mass energy of 13 TeV and 137 fb −1 integrated luminosity.It focuses on the production of two Higgs decaying into b b alongside with two lightest neutralino, χ0 1 through two degenerate heavy neutralino mediators, χ0 2,3 , 1 .The analysis provides yields, covariance matrix and asymmetric uncertainties for 22 signal regions which can be found from the dedicated HEPData entry [29].
Utilising the signal yields provided by the collaboration, we constructed three different statistical models within SPEY, referring to correlated backgrounds, third-moment expansion (we refer the reader to Appendix C for the computation of third moments) and a model with effective sigma method (eq.( 10)).Fig. 1 shows the expected limits within heavy ( χ0 2,3 ) and light ( χ0 1 ) neutralino mass plane, provided by these three likelihood prescriptions where the black curve is the original CMS expected limit within ±1σ window, presented with dotted lines.Red, blue and green curves represent the result computed with correlated background, third-moment expansion and effective sigma model.As before, dotted lines represent each curve's ±1σ window.
As can be seen, for a large number of signal yields, m χ0 > 450 GeV, the correlated background approach overestimates the exclusion limit by approximately 80 GeV.The model with thirdmoment expansion reduces this difference to about 40 GeV.On the other hand, the model with effective sigma slightly underestimates the expected exclusion limit by only a few GeV.Similarly, this approach reproduces the uncertainty bands much more accurately than the other two for a low number of signal yields.
To have a closer look, we have chosen a point in Fig. 1 where correlated background and third-moment expansion is the closest to the CMS limit where the effective sigma model is further away.Using the χ 2 distribution for the full statistical model, provided by the CMS collaboration (see the corresponding HEPData entry), we chose m χ0 GeV point to compare all three PDF prescription against the full statistical model where the χ 2 distribution is given as, Black, red, blue and green curves represent CMS expected limit, and expected limits computed using correlated background model (eq.( 7)), third-moment expansion (eq.( 8)) and effective sigma (eq.( 10)) methods, respectively.The dotted lines for each curve represent ±1σ fluctuation from the background.The dashed black line is plotted as a reference where m χ0 2,3 becomes lighter than the mass combination of Higgs and the lightest neutralino.
Here denominator shows the maximum likelihood value where μ, θ are the values that maximize the likelihood, and the nominator shows the profiled likelihood for a given µ where θ µ are the nuisance parameters that maximize the likelihood at a given µ.This value can be computed via chi2 function as shown in Table 2. Fig. 2 shows the χ 2 distribution for all three curves compared against CMS full statistical model presented as black.The red, blue and green lines represent correlated background, third-moment expansion and effective sigma models.We additionally provided observed POI upper limits for each model, which are colour coded where the effective sigma model underestimates µ obs 95%C L by 18% and third-moment expansion overestimates it by only 3%.This value can be computed via poi_upper_limit function as shown in Table 2.The upper limit on the expected cross section at this mass grid has been given as 36.06 fb where correlated background, third-moment expansion and effective sigma models set this limit to 38.32 Asymmetric background uncertainties have also been provided by CMS-SUS-19-006 analysis [30] through its dedicated HEPData record [31] which is conducted at 13 TeV centre-ofmass energy with 137 fb −1 luminosity.This analysis is designed to investigate new physics signatures through multi-jet and missing energy final states through gluino production, which consequently decays into a t t pair and lightest neutralino, pp → g(→ t t χ0 1 ) g(→ t t χ0 1 ) .
The analysis includes 174 non-overlapping signal regions 12 classified with respect to jet and bjet multiplicity, hadronic transverse momentum and missing energy.The events are required to have minimum 300 GeV hadronic activity (H T ) and H miss T .At least two jets have been  required in each bin with p T > 30 GeV.Additionally, events including isolated high p T leptons or photons are removed.
For this analysis, we used MADANALYSIS 5 implementation [32] where the hard scattering process, pp → g g, has been simulated via MADGRAPH5_AMC@NLO version 3.2.0 with MSSM_SLHA2 UFO model [33] where BR(g → t t χ0 1 ) has been set to 100%.We used the leading order set of NNPDF 2.3 parton distribution function [34,35] and showered hard scattering events via PYTHIA version 8.240 [36].Finally, the leading-order cross section has been scaled to next-to-next-to-leading-logarithmic accuracy [37].To perform detector simulation, the recast employs Delphes, for which we used version 3.5 [38].
Fig. 3 shows the comparison between the official exclusion limit (black), "default_pdf.correlated_background"(red), "default_pdf.third_moment_expansion"(blue) and "default_pdf.effective_sigma"(green) models.Provided uncertainties are separated into systematic and statistical uncertainties, which we combined in quadrature.Note that we scaled the third moments, computed by eq.(C.1), by 0.9 to achieve numeric stability and satisfy the constraints in eq. ( 9).The left panel presents a comparison between observed exclusion limits in the (m g , m χ0 1 ) plane.However, we only observe a small improvement over "default_pdf.correlated_background"for both expansions.Conversely, the right panel compares the aposteriori (spey.ExpectationType.aposteriori)expectation limit produced by each model.We observe over 50 GeV improvement for the expected exclusion limit when "default_pdf.effective_sigma"model is used, and a minor but noticeable refinement by "default_pdf.third_moment_expansion".Although we still observed slight over-exclusion, both models produce significantly more accurate results than the correlated background approach. 13lthough the effective sigma method managed to reproduce the exclusion curve in Figures 1 and 3 better than the other approaches for low signal yield regions, it is essential to emphasise that this is only an approximation; hence its case-dependent.As shown in Fig. 2, it can √ s = 13 TeV, 137 fb −1 , 95% CL, @ NNLOapprox.+ NNLL pp → gg , g → t t χ0 1 CMS expected limit (±1σexp) Spey−aposteriori expected limit with correlated background model (±1σexp) Spey−aposteriori expected limit with third moment expansion (±1σexp) Spey−aposteriori expected limit with effective sigma (±1σexp) Figure 3: 95% CL exclusion contours for the process pp → g g, g → t t χ0 1 in the (m g , m χ0 1 ) plane using the MADANALYSIS 5 recast of the CMS-SUS-19-006 analysis.The left panel shows observed limits, and the right panel shows the expected limits with one standard deviation from the background hypothesis.The official limits from the CMS collaboration have been shown with black curves for each plot.The exclusion limits that are obtained by "default_pdf.correlated_background"," default_pdf.third_moment_expansion"and "default_pdf.effective_sigma"models have been shown in red, blue and green, respectively.vastly underestimate the exclusion limit even when signal yield is adequate.PDF distributions with a larger third moment or an assumption beyond purely Gaussian uncertainties might be more suitable to exclude low signal yield regions.Thus, one should always be aware of the limitations of each approach and test the results accordingly.

Full statistical models
As mentioned in Sec. 2, SPEY has been built to be expandable to exploit other packages that are designed to provide specific PDF prescriptions.pyhf [18] is a Python package designed based on HistFactory [39], which allows publication and usage of full statistical models through a JSON sterilised format.
The spey-pyhf plug-in enables SPEY to exploit the PDF constructed by pyhf interface and describe it within StatisticalModel class.This provides a completely backend-agnostic interface where no matter the PDF function's origin, it can be executed through the same functions, presented in Table 2. spey-pyhf plug-in can be installed via 1 pip install spey -pyhf command, 14 and once installed, SPEY can automatically detect it.The bottom section of Table 1 shows available accessors for pyhf plug-in where "pyhf" accessor will allow the user to input full statistical model prescriptions as defined in pyhf's online manual.Uncorrelated background samples can also be studied using "pyhf.uncorrelated_background"accessor.As shown before, the spey.get_backendfunction will enable the usage of these accessors. 15he usage of a full statistical model can be demonstrated using ATLAS-SUSY-2018-31 [40] analysis where we used the recast of the study implemented within MADANALYSIS 5 [41,42] package through SFS interface [9,43]. 16This analysis is designed to search for new physics for multi-jet final state scenarios where it's looking for sbottom production with decay pattern where b1 and χ0 2 decay branching ratios are set to 100%.The analysis includes eight signal regions grouped into three super regions called A, B and C, designed to capture different levels of mass spectra compression.The HEPData entries for this analysis can be found in ref. [44].
The hard scattering process, pp → b1 b1 , has been simulated with MADGRAPH5_AMC@NLO version 3.2.0[45] using MSSM_SLHA2 model [33].The leading order set of NNPDF 2.3 has been used as a parton distribution function [34,35], and events are showered as well as decayed via PYTHIA version 8.240 [36].The leading-order signal cross section has been scaled to approximate next-to-next-to-leading-order matched with soft-gluon resummation at the nextto-next-to-leading-logarithmic accuracy [37].
The left panel of Fig. 4 shows the observed exclusion limits in (m b1 , m χ0 2 ) plane where official limits published by ATLAS have been shown in blue.This analysis has been shipped with three different background-only statistical model descriptions allowing three distinct subsets of the signal regions to be used to create different statistical models.For the red curve, we choose the most sensitive statistical model for each mass grid by selecting the statistical model that produces the lowest POI upper limit at 95% CL.For m χ0 2 < 1 TeV, we observed that the best statistical model is dominated by so-called region A, which requires the highest jet and b-tagged jet multiplicity along with boosted leading b-jet.Since it is possible to combine statistical models by matching their modifiers, we combined all three statistical models for the orange curve.This has been achieved by combine() function implemented through StatisticalModel class.This function employs the workspace combiner routine implemented in pyhf interface and modifies the signal input to be accommodated into the new backgroundonly model.We observe that the difference between red and orange curves is only visible when the difference between POI upper limits for each subregion (regions A, B and C) is significantly closer to each other.
The right panel of Fig. 4 shows the same for expected exclusion limits along with one standard deviation from the background hypothesis, displayed with dotted lines.We observed the exact difference between the red and orange curves once the phase space is not statistically dominated by one sub-region.This deviation from the red curve has been observed to fit the official limits better in the region where the mass spectra are highly compressed.
Whilst such a combination enables one to merge multiple full statistical models by properly taking care of the common nuisance parameters, it only applies to a full likelihood scenario where all the nuisance parameters are properly identified.Needless to say, this assumes that there is a fixed naming convention in place within the experimental collaboration.In the next section, we will discuss the possibility of combining statistical models which do not include complete information but have been formed using approximation techniques discussed in Sec.2.1.

Combination of statistical models
The versatile modular interface of SPEY facilitates the comprehensive study of various PDF descriptions within a single package.This advantageous feature empowers researchers to employ ) plane using the MADANALYSIS 5 recast of ATLAS-SUSY-2018-31 analysis.The left panel shows observed limits, and the right shows the expected limits with one standard deviation from the background model.The official limits from the ATLAS collaboration have been shown with blue curves for each plot.The exclusion limits were obtained via "pyhf" plug-in.The red curve shows the most sensitive statistical model among three different background-only models shipped with this analysis.The orange curve shows the results produced by combining these three models.
various PDF combination methodologies, thereby harnessing the full potential of experimental analyses.Specifically, the combination methodology outlined in Sec.2.2 is applicable when complete knowledge of uncertainty sources enables the matching of nuisance parameters associated with the same uncertainties.Consequently, this statistical model expansion introduces a correlation term that characterizes any existing interdependencies among nuisance parameters.A generic combination of different statistical models can be formulated as Here, i represents the statistical models, and θ i corresponds to the independent nuisance parameters associated with each model.The combined likelihood in eq. ( 13) encompasses two key components.The first term involves multiplying the constituent PDFs obtained from different statistical models, while the second term introduces constraints that account for correlations among the associated nuisance parameters.Common nuisance parameters have been identified and appropriately addressed to avoid double counting.However, in the case of independent experiments, it is reasonable to assume negligible correlations between the nuisance parameters.This assumption leads to a simplified form of the likelihood, expressed as: Notably, the combined likelihood solely relies on the parameter of interest (POI) since each statistical model can be independently profiled with respect to the given POI value.It is crucial to emphasize that constructing a constraint term between two experiments is highly challenging due to limited information about the sources of uncertainties and overlapping regions.Furthermore, even if the sources of uncertainties are known, such as the jet energy scale or muon tracking efficiencies, the techniques employed to quantify these uncertainties may vary over time or between different collaborations.This variation leads to uncorrelated uncertainties, further validating the validity of eq. ( 14). 17  An alternative approach worth exploring is the combination of different non-overlapping regions, as demonstrated in [20].This technique eliminates the correlations between statistical models, further bolstering the validity of equation ( 14).However, it is important to note that this approach has some limitations.The construction of the overlap matrix is based on the phase space populated by signal events, and thus it may not fully capture the effects of background contributions.Additionally, the success of this method heavily relies on the recasting as control and validation regions are typically excluded from the recast.Consequently, the algorithm remains blind to those regions, potentially resulting in overlaps between signal regions in one analysis and control or validation regions in another analysis.Despite these limitations, this approach offers a semi-conservative strategy for combining regions and analyses when likelihood information is significantly constrained.
To evaluate the impact of combining likelihoods, we adopt a realistic MSSM scenario as outlined in ref. [24].In this scenario, the production of the lightest sbottoms and stops, pp → b1 b1 and t1 t1 , typically leads to decays involving charginos, top and bottom quarks, respectively.These decays give rise to a final state with multiple jets and significant missing energy due to the production of a bino-like lightest neutralino through chargino interactions.ATLAS-SUSY-2018-31 and CMS-SUS-19-006 analyses are ideal for this particular final state configuration, which is discussed above.
To ensure the validity of the theory on the electroweakino side, we additionally employed a comprehensive electroweakino production channel, pp → χ± using the same configurations mentioned above.We used the MADANALYSIS 5 recasts of ATLAS-SUSY-2018-32 [47,48], 18 ATLAS-SUSY-2019-08 [49,50], and CMS-SUS-16-039 [51,52] to study the exclusion limits on the MSSM scenario, which revealed a lower limit of M 2 > 650 GeV for our entire analysis.This lower limit will be included in the below results as a shaded area.The hard scattering process for both squark and electriweakino production has been simulated with MADGRAPH5_AMC@NLO version 3.2.0[45] using MSSM_SLHA2 model [33].All model parameters, masses, and decay widths are computed using the SoftSusy package [53,54].The leading order set of NNPDF 2.3 has been used as a parton distribution function [34,35].Finally, events are showered and decayed via PYTHIA version 8.240 [36].
We follow the same grid scan employed in the reference, where the parameters b1 , t1 , χ± 1 , and χ0 1,2 are allowed to vary within the sub-TeV range.At the same time, the other supersymmetric partners have heavier masses.To achieve this, we fix the bino mass (M 1 ) at 60 GeV and set the ratio of the Higgs vacuum expectation values (tan β = v 2 /v 1 ) to 10.The trilinear couplings (A t,b ), µ, and other soft masses are assigned values of −3.5 TeV, 1.6 TeV, and 5 TeV, respectively.The remaining parameters are scanned in the order 2M 1 < M 2 < M Q3 .This arrangement ensures that the sbottom and stop masses are approximately equal to M Q3 , the lightest chargino and the second lightest neutralino have a mass equal to M 2 , and the bino-like lightest neutralino has a mass fixed at 60 GeV. 17For a comprehensive discussion on likelihood combinations, we recommend referring to ref. [46]. 18The metadata has been updated during this study to include full statistical model information.For the recasts of ATLAS-SUSY-2018-31 and CMS-SUS-19-006, we utilized the same configurations as described previously.Specifically, we generated a dataset comprising 200,000 events for the pp → b1 b1 and t1 t1 production channels.
Figure 5 illustrates the anticipated limits and identifies the most sensitive signal regions established using the "default_pdf.uncorrelated_background"model.Determining the most sensitive region was based on the statistical model that yielded the lowest expected upper limit on the parameter of interest.Notably, our analysis reveals the prominence of four distinct regions.
In the ATLAS-SUSY-2018-31 analysis, the SRA (Signal Region A) is characterized by specific selection criteria.Firstly, it necessitates a jet multiplicity exceeding 6 and a minimum of 4 btagged jets.Moreover, the transverse momentum of a b-tagged jet is expected to surpass 200 GeV.To accommodate the Higgs mass, a restriction on the ∆R separation between two b-jets has been imposed, alongside a loose mass requirement of 80 GeV.
Additionally, the H variant of the SRA requires the highest effective mass in the analysis, surpassing 2 TeV.Notably, within the blue mass grid region, there exists a greater mass separation between squarks and charginos, which consequently leads to an increased multiplicity of boosted jets in this particular region.
In the ATLAS-SUSY-2018-31 analysis, the SRC (Signal Region C) imposes a lower jet multiplicity requirement compared to SRA.Specifically, it necessitates a minimum of 4 jets and 3 B-tagged jets.Additionally, SRC includes a minimum missing energy threshold of 250 GeV.Notably, SRC 28, which is a specific region within SRC, further demands a high missing energy significance of E miss T / H T > 28 GeV.These two regions, SRC and SRC 28, have been observed to dominate in scenarios with compressed spectra.As the squark mass increases, the significance of high missing energy also becomes more pronounced.
Furthermore, within the context of the CMS-SUS-19-006 analysis, only a single grid point is observed.This region specifically targets a low jet multiplicity requirement (N j ≥ 2 and N b ≥ 2) while also ensuring significant hadronic activity (H T ) and a minimum level of missing hadronic activity (H miss T > 600 GeV).In Fig. 5, the orange curve represents the expected exclusion limit, highlighting the most sensitive region.The dotted lines indicate the ±1σ deviation from the background hypothesis.To perform the combination of likelihoods, we employed the pathfinder algorithm proposed in ref. [20], visualized with the cyan color. 19o construct the overlap matrix, we utilized information regarding the regions populated by specific events generated by MADANALYSIS 5.Each region was weighted by These weights for each region can be computed using the likelihood() function within the StatisticalModel class.Once the statistical models to be combined are determined, the UnCorrStatisticsCombiner class can be utilized to combine them.It accepts StatisticalModel as input, leaving the validation of the combination to the user's discretion.It is worth noting that the StatisticalModel.combine()function is designed for backends with specific combination algorithms, whereas the UnCorrStatisticsCombiner does not interact with the statistical models directly but rather multiplies the PDFs as shown in equation (14).It provides the same functionality as the StatisticalModel, and once initialized, it can be used as an independent statistical model for computing exclusion and upper limits.Despite the ongoing dominance of the ATLAS-SUSY-2018-31 analysis in the most sensitive regions, we observed a slight improvement in the cyan curve due to the combination of regions from both analyses.This improvement primarily stems from the inclusion of ATLAS-SUSY-2018-31 SRA-H and two floating regions from the CMS-SUS-19-006 analysis, which fluctuates for different grid points due to insufficient yields.The reason for this modest enhancement over the orange curve lies in the fact that the combined signal regions do not exhibit close µ 95%C L values.In particular, the µ 95%C L values for different regions are maximally close in the compressed spectra region.Thus, the combination of these regions contributes to the observed improvement over the baseline orange curve.This means that the contributions from different signal regions only affect if their sensitivity is comparable.
To leverage the statistical model information provided by the analyses, we generated Figure 6, presenting the observed exclusion limits on the left panel and the expected exclusion limits on the right panel.In both panels, the ATLAS-SUSY-2018-31 analysis employed the "pyhf " model, indicated by the green curve which includes combination of all three super-regions A, B and C. For the CMS-SUS-19-006 analysis, we utilized the "default_pdf.effective_sigma"model as discussed in Section 2.1, represented by the blue curve.
Assuming complete independence between the two experiments, we combined their statistical models using equation (14).This combination was achieved by inputting both models into the UnCorrStatisticsCombiner, which combines them regardless of their backend.The combined analysis is shown as the red curve in Figure 6.However, it is important to note that complete independence may not hold in practice.Although the experiments are distinct, they might share uncertainties such as jet energy scaling or luminosity.Consequently, this combination should be regarded as the maximum information that can be extracted from these analyses based on the available statistical model information.Any correlations between the analyses would reduce the exclusion limit.
Furthermore, the range covered by these analyses differs significantly due to requirements in terms of multiplicity and transverse energy.We observe an improvement of up to 100 GeV with this combination, particularly when constituent regions yield similar exclusion limits.Thus, the red curve represents a highly conservative exclusion limit by utilizing both analyses.The red curve represents the new limit that can be achieved by combining these PDF distributions.The right panel shows the same for expected exclusion limits along with one standard deviation, which is captured via dotted lines.The grey area has been excluded by electroweakino searches.
A closer examination of the profiled likelihood distributions in three different scenarios can be achieved by analyzing the χ 2 (µ) distribution defined in equation (12).Figure 7 provides a visualization of the χ 2 (µ) distribution for µ ∈ [0, 2.2], utilizing the same color scheme as in Figure 6.The dashed red line represents the χ 2 (µ) value corresponding to the upper limit of the parameter of interest (POI) in the combined scenario.The exclusion cross-section, indicated by the same colour code, demonstrates that the combined scenario reduces the expected upper limit on the cross-section by 26%.Additionally, the shaded yellow area represents the ±1σ deviation from the background hypothesis.

An example beyond LHC: Neutrino mass ordering problem
The capabilities of SPEY have predominantly been demonstrated within the confines of the LHC experiments.However, it is crucial to highlight that any empirical study necessitates hypothesis testing.In order to showcase the versatility of SPEY, we aim to partially reproduce the findings presented in ref. [55], which addresses the issue of neutrino mass ordering.
The neutrino mass ordering problem pertains to the arrangement of neutrino mass eigenstates, specifically whether they follow a normal ordering (m ν 1 < m ν 2 < m ν 3 ) or an inverted ordering (m ν 3 < m ν 1 < m ν 2 ).In ref. [55], the authors analyze the latest data from the T2K and NOvA experiments, which measure the oscillation probabilities of muon and electron neutrinos and antineutrinos.Their analysis reveals a significant reduction in the preference for normal ordering compared to previous results, indicating that the mass ordering problem remains unresolved.The study also explores the implications of these findings for other neutrino physics investigations, including the JUNO, DUNE, and T2HK experiments, which are expected to provide a high-confidence determination of the mass ordering problem.
In this study, we reproduce their results based on the T2K experiment, which requires extensive computation based on energy flux, where details can be found in refs.[55,56].The expected number of events has been computed by considering different parameters related to   the degree of the CP violation (δ C P ), mixing angle (φ 23 ), 20 and the mass difference between neutrino flavours (∆m 2  31 ).The range for δ C P was set to [−π, π], while sin 2 φ 23 varied between 0.35 and 0.64, and ∆m 2  31 fell within the interval [2.4,2.7] × 10 −3 eV 2 for normal ordering and [−2.7, −2.4] × 10 −3 eV 2 for inverse ordering.The observed data and expected background yields for the T2K experiment were obtained from ref. [57].
Based on the information provided, we formed a likelihood backend for SPEY interface following the guidelines presented in Appendix A with the following functional form: where signal yields is a function of n s ≡ n s (δ C P , sin 2 φ 23 , ∆m 2  31 ).The likelihood function used in this study is defined as the product of bins within a set of channels.The provided dataset consists of 5 distinct channels, with each channel containing 40 bins.While the original implementation employs N θ k |0, σ b , we introduced a reparameterization of the likelihood to enhance the optimization process and achieve a constraint term that follows a unit-Gaussian distribution.
The results of the ∆χ 2 fit are illustrated in Fig. 8. ∆χ 2 is defined as ; hence POI has been set to one for both terms.In the left panel, the ∆χ 2 distribution is presented with respect to δ C P , obtained by marginalizing the likelihood with respect to sin 2 φ 23 and ∆m 2  31 for a fixed δ C P .In order to slightly favour the normal order, the maximum likelihood of the inverse ordering distribution was adjusted to equation with the maximum likelihood of the normal ordering scenario.The middle (right) panel displays the ∆χ 2 contours at significance levels of 2.3 and 4.61 for the normal ordering (inverse ordering) case.The marginalization was carried out solely over ∆m 2  31 by fixing δ C P and sin 2 φ 23 .The cross depicted in each plot represents the best-fit location for the respective scenario.
Our findings equation closely with the results reported in ref. [55] (refer to Fig. 3 in the reference), thus validating the performance of SPEY interface across a broad spectrum of applications.The implementation for eq.( 16) can be accessed from this GitHub repository.

Conclusion & future directions
SPEY is a modular, cross-platform Python package for building statistical models for reinterpretation studies.It allows the integration of likelihood prescriptions to be used through a global inference system without the need to alter the structure of the package.This has been achieved through an auto plug-in detection system, which searches through Python packages to find plug-in entry points.Such construction allows for a likelihood prescription agnostic interface which can be extended to cover any future likelihood constructions, which may enable a more accurate approximation of the full likelihood.
Whilst the publication of full statistical models reincarnates the experimental analysis, it can be computationally highly complex for fast inference.To improve computational limitations of full statistical models, machine-learned likelihood prescriptions have emerged [58,59].Additionally, methodologies such as simplified likelihoods using linearised systematic uncertainties [60] have significantly improved the computational cost of the full statistical models.SPEY's modular construction enables the usage of such likelihood prescriptions, allowing highly efficient full likelihood estimations to be used for reinterpretation studies through one package.Such plug-in integrations are currently in the making.
Combining different statistical models provides a valuable meta-analysis framework despite the challenges.Using overlap removal techniques [20] are highly effective despite their limitations.Application of methodologies used in Higgs physics, such as Simplified Template Cross Sections [61] (STXS), can allow for more robust likelihood combinations due to nonoverlapping phase spaces.SPEY's backend agnostic structure enables the combination of any likelihood prescription, originating from dedicated packages, seamlessly, and we will continue the development of new approaches for combinations.
SPEY is currently being implemented into SMODELS [11] and MADANALYSIS 5 [5][6][7][8][9] packages to be used for inference which will be available in the future releases.In the future, we are planning to include tools like CONTUR [62], Rivet [1] and CheckMATE [10] to achieve a reinterpretation software agnostic platform.This will allow searches and measurements to be used together across multiple platforms to enhance the limits of meta-analysis.
In this study, we have demonstrated the diverse range of use cases and applications of the SPEY package, spanning from LHC to neutrino experiments.Our contributions include the introduction of a novel simplified likelihood scenario, leveraging an effective variance that surpasses the performance of previous simplified likelihood approaches.Moreover, we have developed a user-friendly likelihood combination interface capable of seamlessly integrating full statistical models while ensuring compatibility with their respective properties.Additionally, we have devised a method to effectively combine statistical models lacking sufficient information regarding the source of nuisance parameters.

B Citing backends
SPEY ensures that the proper citation information for each backend has been included within the class metadata.In order to access metadata information for each backend, one can use spey .get_backend_metadatafunction, which for instance will return the following for "default_pdf .third_moment_expansion"

C Third moments from asymmetric uncertainties
Assuming that the uncertainties are modelled as Gaussian (which is a fair assumption since all the uncertainties presented in eq. ( 3) are Gaussian), third moments of the asymmetric uncertainties can be computed by integrating over Bifurcated Gaussian where upper_envelops and lower_envelops are NumPy arrays with same shape as background.
Note that third-moment expansion presented in simplified likelihood framework is only valid if 8diag (Σ) i ) 2 where Σ is the covariance matrix.For the terms that this inequality does not hold SPEY will warn the user and set such terms to zero.
Notice that eq.(C.1) has been derived from the expectation value of n-th moment where c is the shift from the central value and f (x) is the model.

1
pdf_wrapper = spey .get_backend ( " default_pdf .u n c o r r e l a t e d _ b a c k g r o u n d " )

2 signal_yields
e_u nc ert ai nti es =[12.0 , 16.0] , Functions and Properties Functionality exclusion_confidence_level() Computes the exclusion confidence level (1 − C L s ) for an observed or expected results.Can be computed for eq.(

3 χ0 2 HFigure 1 :
Figure1: Expected exclusion limits at 95% CL presented for CMS-SUS-20-004 analysis.Black, red, blue and green curves represent CMS expected limit, and expected limits computed using correlated background model (eq.(7)), third-moment expansion (eq.(8)) and effective sigma (eq.(10)) methods, respectively.The dotted lines for each curve represent ±1σ fluctuation from the background.The dashed black line is plotted as a reference where m χ0 model Spey -Correlated background Spey -Third moment expansion Spey -Effective sigma

Figure 2 :
Figure 2: χ 2 (µ) distribution versus parameter of interest shown for CMS full statistical model (black), correlated background (red), third-moment expansion (blue) and effective sigma (green).The dashed black line shows the χ 2 value that CMS excludes, and colour-coded values represent excluded POI value at 95% CL by each PDF.

Figure 5 :
Figure5: 95% confidence level exclusion limits on pp → b1 b1 and t1 t1 production channels in MSSM scenario presented in (M Q3 ,M 2 ) mass plane.The orange line represents the exclusion limit computed by using the most sensitive signal region, and the cyan line is computed by combining signal regions via the pathfinder algorithm.Other colours represent the most sensitive signal region per grid point.The hashed area is excluded by the electroweakino searches.

Figure 6 :
Figure 6: Left panel shows the observed exclusion limit generated by correlated background model in CMS-SUS-19-006 analysis (blue) and full statistical model in ATLAS-analysis (green).The red curve represents the new limit that can be achieved by combining these PDF distributions.The right panel shows the same for expected exclusion limits along with one standard deviation, which is captured via dotted lines.The grey area has been excluded by electroweakino searches.

Figure 7 : 1 = 60
Figure 7: χ 2 (µ) versus POI distribution for a benchmark point at M 2 = 600 GeV, M Q3 = 1.25 TeV and m χ0 1 = 60 GeV.ATLAS-SUSY-2018-31 results are plotted with the full statistical model (green), and CMS-SUS-19-006 results are computed with a correlated background model (blue).The red curve represents combined likelihood.The dashed red line represents the χ 2 value at the 95% CL upper limit of POI, and the shaded area within dotted lines shows 1σ deviation from the expected background for the combined scenario.

get_hessian_logpdf_func(
optional) This function provides the necessary computing tools to SPEY to compute the Hessian of the log-probability distribution.get_sampler (optional) Generates a function to sample from the likelihood distribution.

1 2 { 3 " 6 " 7 "
spey .get_backend_metadata ( " default_pdf .t h i r d _ m o m e n t _ e x p a n s i o n " ) " name " : " default_pdf .th ir d _ m o m e n t _ e x p a n s i o n " , doi " : [ " 10.1007/ JHEP04 (2019) 064 " ] , arXiv " : [ " 1809.05548" ]} This provides the information from top to bottom, name of the backend, author of the backend, version of the backend, the SPEY version that the backend requires, list of DOI and arXiv numbers.

x 3 N 0 x 3 N
(x|0, σ − )d x + σ + ∞ (x|0, σ + )d x , (C.1) where σ ± represents upper and lower uncertainty envelops and N (x|0, σ) is the normal distribution.This computation can be achieved by importing compute_third_moments function 1 from spey .backends .s i m p l i f i e d l i k e l i h o o d _ b a c k e n d .third_moment import com pute_t hird_ moment s 2 third_moments = compu te_th i r d _ m o m e n t s ( upper_envelops , lower_envelops )

Table 1 :
List of available PDF accessors which summoned through spey.get_backend("<accessor>") function.The usage of each accessor has been shown in the online documentation of the package.

Table 2 :
Functions provided by the StatisticalModel class.

Table 3 :
List of available functions that can be implemented for SPEY to use the new PDF prescription.Is any of the regions provided within the statistical model is non-zero.If the objective function is different than negative log-likelihood or gradients can be computed for the optimisation process, this function allows SPEY to pass that information to the optimiser.