First steps towards the reconstruction of the squark flavour structure

Assuming the observation of a squark at the Large Hadron Collider, we investigate methods to access its flavour content and thus gain information on the underlying flavour structure of the theory. Based on simple observables, we apply a likelihood inference method to determine the top-flavour content of the observed particle. In addition, we employ a multivariate analysis in order to classify different flavour hypotheses. Both methods are discussed within a simplified model and the more general Minimal Supersymmetric Standard Model including most general squark mixing. We conclude that the likelihood inference may provide an estimation of the top-flavour content if additional knowledge, especially on the gaugino sector is available, while the multivariate analysis identifies different flavour patterns and can accommodate a more minimalistic set of observables.


Introduction
One of the main goals of the Large Hadron Collider (LHC) is the quest for signals of physics beyond the Standard Model of particle physics. Currently in its Run-2, however, no direct evidence pointing towards the existence of new states has been found. Among the numerous extensions addressing the shortcomings of the Standard Model, Supersymmetry ranks among the most attractive solutions. However, if Nature is indeed supersymmetric, it is reasonable to assume that its exact realization is situated beyond the "vanilla" Minimal Supersymmetric Standard Model (MSSM) or its simplified realizations which are typically searched for in current experimental analyses [1][2][3][4][5][6][7][8][9][10][11][12][13][14][15]. To give an example, experimental studies typically are based on the Minimal Flavour Violation (MFV) paradigm, which is based on the assumption that all flavour-violating interactions stem from the Yukawa couplings alone, as it is the case in the Standard Model. However, there is no apparent reason that this paradigm is respected beyond the Standard Model. In the MSSM, additional flavour-violating terms may be present in the Lagrangian, leading to a modified phenomenology. This possibility is labelled as Non-Minimal Flavour Violation (NMFV).
The assumption of NMFV in the squark sector has received considerable attention throughout the last decade. Numerous studies have addressed flavour precision observables [16][17][18][19][20][21], dark matter aspects [22][23][24], and most importantly collider signatures related to squark generation mixing [25][26][27][28][29][30][31]. In particular, it has recently been shown that nonminimal flavour mixing between the second and third generation squarks can easily be accommodated with respect to current experimental constraints from flavour and precision data [32][33][34]. Even more recently, it has become apparent that such a configuration is likely to escape detection at current searches at the LHC, and that in consequence a dedicated search for characteristical signatures of non-minimal flavour violation in the squark sector is necessary [35,36].
Assuming the discovery of a squark-like state at the LHC, it will be crucial to understand its exact nature and in particular reveal its flavour content. This information will give important hints towards the flavour structure of the underlying theory and will hint towards possible Grand Unification frameworks [37][38][39][40][41]. It is the main goal of the present Paper to investigate different methods for reconstructing the flavour content of an observed squark state. To simplify this first attempt, we concentrate on squarks containing top and charm flavour. This situation is less constrained by flavour and precision data [34] as compared to mixing with first generation flavours [17]. Moreover, squarks containing top flavour are easier to access from the experimental point of view.
Our study will rely on the pair production of a flavour-mixed squark and its subsequent decays into either top or charm quarks plus missing transverse energy, or into bottom quarks and charginos. A direct reconstruction of the squark rotation matrix would basically be possible, provided that we have access to the corresponding branching ratios, potentially with the help of top-polarization measurements [42][43][44][45], plus complete information on the neutralino and chargino sector. In practice, having precise access to these information is not an option. We therefore develop methods to infer the top and charm content of the observed squark and obtain information about the flavour structure requiring as few information as possible. More precisely, we will present two methods: the first based on a likelihood inference, the second relying on multi-variate analysis techniques.
The Paper is organized as follows: in Sec. 2, we review the model of our interest, namely the MSSM with NMFV in the squark sector. In Sec. 3 we discuss observables which are measurable at LHC and which we will base our analyses on. Sec. 4 is then devoted to the first method, a likelihood inference of the top-flavour content of the squark. The second method, the multivariate analysis, is then presented in Sec. 5 for the simplified setup before discussing it for the more realistic framework of the MSSM in Sec. 6. Finally, Sec. 7 contains our conclusions.

Model and parameters
As discussed in the Introduction, the model of our interest is the Minimal Supersymmetric Standard Model (MSSM) with R-parity conservation and the most general squark flavour structure. In the super-CKM basis, i.e. in the basis (ũ L ,c L ,t L ,ũ R ,c R ,t R ), the hermitian up-type squark mass matrix is given by (2.1) The most important terms with respect to our study are the soft mass matrices M 2 Q and M 2 U , together with the trilinear coupling matrix T u . The remaining parameters, not directly related to the squarks, are the Higgsino potential µ, the up-type quark mass matrix m u , and the ratio of the vacuum expectation values of the two Higgs doublets tan β = v u /v d . Finally, the D-terms are given by Dũ ,L = m 2 Z (T 3 u − e u s 2 W ) cos 2β and Dũ ,R = m 2 Z e u s 2 W cos 2β, with m Z being the Z boson mass, s W and c W are the sine and cosine of weak mixing angle and T 3 u and e u the weak isospin and electric charge of the up-type quarks. The underlying flavour structure enters the mass matrix through the soft mass and trilinear matrices. Under the assumption of Minimal Flavour Violation (MFV), these matrices are diagonal, such that the CKM-matrix remains the only source of quark flavour violation. Relaxing this assumption, i.e. considering the more general framework of Non-Minimal Flavour Violation (NMFV), allows for off-diagonal entries within these three matrices. Let us note that the same arguments and definitions hold in the sector of down-type squarks and sleptons, which are, however, beyond the scope of the present work.
From the up-type squark mass matrix in Eq. (2.1), the rotation to the basis of physical mass eigenstates (ũ 1 , . . . ,ũ 6 ) is done through By convention, the statesũ i are labelled to be crescent in mass. All information about the flavour structure of the up-type squarks is contained in the rotation matrix Rũ, and couplings involving up-type squarks in the physical basis relate to the entries of this matrix. As discussed in the Introduction, a direct reconstruction of the complete squark rotation matrix cannot be aimed at in a near or even mid-term future. We therefore introduce a somewhat less precise but still very meaningful quantity, which is the stop flavour content of the lightest up-type squarkũ 1 . This quantity, defined as will be at the centre of the present study. In order to sample the parameter space, we will in addition make use of the quantities θt and θc, which correspond to the top and charm helicity mixing within the lightest stateũ 1 , (2.4) The cases xt = 0 and xt = 1 correspond to MFV with respectivelyũ 1 being a pure charm-flavoured or top-flavoured state. Moreover, cos θt ,c = 0 corresponds to a "righthanded" squark, while cos θt ,c = 1 corresponds to a "left-handed" squark.

Observables related to flavour violation at LHC
If a squark should be observed at the Large Hadron Collider or any future hadron collider, it will most likely be produced from (flavour-conserving) gluon-initiated processes and manifest through its decay into quarks and gauginos. In our setup, this corresponds to the following decay modesũ which are simoultaneously open if the squark is a mixture of the two flavours, i.e. if 0 < xt < 1. Here, the neutralinos manifest as missing transverse energy, while the charginos will decay further into W -bosons and neutralinos.
Our study is based on the assumption that these decays are observed, and that we have access to the observables Note that the production cross-section of the squarks, as well as their branching ratios alone, are difficult to access. We therefore choose to work with the ratios defined above rather than with the pure associated event rates. Analytical expressions for the relevant decay rates in the NMFV framework can be found in Ref. [18]. Note that in the definition of the ratios R c/t and R b/t , we assume without loss of generality that the decay into top quarks is always open. For the further study, it is interesting to examine those expressions in order to find the xt-dependence of the observables in certain limits concerning the nature of the involved neutralinos and charginos. For example, assuming a pure higgsino-like neutralino and neglecting the neutralino mass with respect to the squark mass, we obtain As a second example, we assume a pure bino-like neutralino and obtain R c/t χ 0 where κ q = e 2 q / e q − T 3 q 2 − 1 = 15 for q = c, t, and the last expression holds for a pure "left-handed" or a pure "right-haded" squark. Finally, for a pure wino-like neutralino, the ratio becomes m 2 q denotes the usual Källén function associated to the squark decay and B q = m 2 u 1 − m 2 Here, the last expression holds for a pure "left-handed" squark.
In order to gain a better understanding of these ratios, we start by randomly scanning over the parameters governing the lightest squark, neutralino, and chargino. More precisely, we vary the physical squark mass mũ 1 , and the parameters xt, θt, and θc defining its flavour decomposition. In the gaugino sector, we vary the bino, wino, and Higgsino mass parameters M 1 , M 2 , and µ. The physical gaugino masses are obtained by diagonalizing the mass matrices at the tree-level.
As the expressions in Eqs. (3.3) -(3.5) do not exhibit a dependence on tan β, we conclude that this parameter only has a mild impact on the observables of our interest. We  Table 1: Scanned ranges of the parameters associated to the squark (left) and gaugino sector (right). All masses are given in GeV. therefore fix tan β = 10 throughout the presented analyses. All parameters are scanned over in a uniform manner according to the ranges given in Table 1. The corresponding parameter distributions are illustrated in Fig. 1 for the relevant physical masses and Fig. 2 for the corresponding mixing parameters, respectively. The shape of the mass distributions are explained by the fact that we require the decay modes mentioned above to be kinematically allowed, which favours larger squark and smaller gaugino masses. Since we impose a flat distribution of the stop content xt, the elements Rũ 1i (i = 2, 3, 5, 6) of the up-squark rotation matrix follow a parabolic distribution. As the distributions of Rũ 1i for i = 3, 5, 6 are similar to the one of Rũ 12 , they are not shown seperately in Fig. 2.
For each parameter point, the gaugino masses and the ratios R c/t and R b/t of our interest are computed using the full analytical expressions of Ref. [18]. The results are depicted in Fig. 3, where we indidate as colour code the dominant component of the involved neutralino as well as the nature of the decaying squark. As expected from Eqs. (3.3) -(3.5), distinct regions are observed in the distributions of R c/t . The same kind of feature appears for the ratio R b/t . More precisely, the two ratios depend strongly on the neutralino decomposition and the "chirality" (expressed in terms of θt and θc defined in Eqs. (2.4)) of the decaying squark. The width of each band is due to the fact that the majority of the parameter points feature mixed gauginos and squarks rather than corresponding to the limit cases discussed above. Nevertheless, the presence of the observed rather distinct regions is an important feature which will turn out to be crucial in the identification of the squark flavour decomposition from the observables given in Eq. (3.2).

Likelihood inference in a simplified model
In order to infer the stop component xt of the observed squark, we start by constructing a maximum likelihood estimator. For a given set of data supposed to be obtained at the Large Hadron Collider, we associate a likelihood value to each point of an ensemble of random parameter points. Assuming uncorrelated parameters and thus a Gaussian distribution, this likelihood takes the form with θ being the set of parameters associated to the parameter point under consideration and σ i being the error associated to the observable D i . We now divide the interval xt ∈ [0; 1] into N bins of equal size. For each bin j = 1, . . . , N , we then compute the average likelihoodL j (xt) of all random parameter points having their value of xt inside the given bin. From the obtained values ofL j (xt) over the interval xt ∈ [0; 1], we can fit a Gaussian distribution in order to find the maximum of likelihood corresponding to the inferred value of the stop component xt. The associated uncertainty σ(xt) is then based on the standard deviation value of the fit.
Several subtleties appear and have to be addressed when performing this analysis. The first one is of technical nature: in order to fill the full xt interval in an appropriate way, we need a rather large number of parameter points. Note that this number depends on the chosen bin size. The second, and more tricky part, lies in the uncertainties σ i entering the likelihood in Eq. (4.2). Usually this standard deviation correspond to the uncertainty associated with the measurement of the observables D i . However, in the present case,  Table 2. The distributions are shown on a linear scale.
since different parameter points θ i can lead to the same observation, there is an additional feature, which we shall discuss later on. For the sake of simplicity, and in order to illustrate the proposed inference method, we fix in a first step the parameters associated to the neutralino and chargino decomposition as i.e. we consider a maximally mixed neutralino. For the present example, we have performed a random scan over the five parameters of Eq. (4.1) leading to an ensemble of 5 · 10 8 parameter points. Moreover, we assign a common value of σ i = 0.25D i to the uncertainties entering the likelihood calculation. Assuming four different test parameter points P i (i = 1, . . . , 4), we perform the analysis described above and infer the stop component xt using a Gaussian likelihood fit. The results are illustrated in Fig. 4 and summarized in the upper part of Table 2. More precisely, for each test parameter point, we show the average likelihoodL j (xt) obtained for each bin and the Gaussian fit. As can be seen, our method manages to recover the actual stop component within the resulting uncertainty from the Gaussian fit.    Table 2. For σ i /D i being set to 0.35, the fit converges well and the resulting value for xt is close to the assumed one. However, the gaussian width is rather large leading to a sizeable uncertainty σ(xt) on the inferred value. Going to smaller values for the uncertainties σ i leads to a less accurate fit and drives the inferred value away from the assumed one. The associated gaussian width is smaller, and thus the inferred uncertainty is less important. The explanation of this feature is that the actual uncertainty on the inferred stop composition has in reality two components, the first coming from the width of the fitted Gaussian function, the second coming from the overall quality of the fit. Finally, pushing σ i /D i to even lower values makes the fit break down, as can be seen in the last panel of Fig. 5. This is the actual tricky point of the present analysis. The choice of σ i /D i must therefore be made with precaution because it has more impact than the Gaussian width and it also depends on the density of points in a certain area of the scan. Using this method to infer the actual stop composition will need specific care and deeper understanding of the uncertainties, which is beyond the first step presented in this work.
As a final step, we relax the assumption on the gaugino decompositions given in Eq.  Fig. 4 for a single test parameter point but varying uncertainty σ i /D i . P 5 P 6 Figure 6: Same as Fig. 4 for two test parameter points obtained by scanning in addition over the parameters related to the gaugino sector.
an ensemble of 5 · 10 8 parameter points with σ i = 0.35D i , and apply our reconstruction method to two data sets P 5 and P 6 . The results are shown in Fig. 6 and summarized in the lower part of Table 2. Even if the true stop components lie within the infered intervals, the uncertainties are much larger in this case, such that the results become rather meaningless in certain cases. In addition, from Fig. 6 we can see that the likelihood is no longer Gaussian. This is due to the fact that here different regions of the parameters present a concentration of points able to explain the data.

Multivariate analysis in a simplified model
In order to go beyond the likelihood inference presented in the previous Section, especially in a more realistic setup as discussed, e.g., in Ref. [34], we now employ a multivariate analysis (MVA) classifier. In the present Section, we will start by presenting results obtained from a multi-layer perceptron (MLP) provided by ROOT through the TMVA package [46] for the simplified setup already used in Sec. 3. The discussion of the complete MSSM with squark generation mixing of Ref. [34] will follow in Sec. 6. In this context, the goal of our analysis is slightly different with respect to the previous Section. While the likelihood inference aims at estimating the actual stop component of the observed squark, a multivariate analysis is designed to effiently classify different configurations. In order to provide a simple illustration, we define two categories based on the stop composition xt, which remains the key quantity of our interest. We will divide the parameter space into "top-flavoured" squarks and "charm-flavoured" squarks according to Let us note that these categories are for the moment rather arbitrary and aim at the illustration of the method rather than representing specific physical regions. In particular, additional categories can be defined in order to refine the analysis. Based on the two categories, the MLP can be trained on the parameter points obtained from the random scan, and subsequently tested on a subset of points, the test sample, in order to compute the efficiency and the misidentification rate of the classifier. The analysis presented here is based on a training sample of 10 6 points, which have been obtained by uniformly scanning as indicated in Table 1. The classifier basically combines the set of obervables given in Eq. (3.2), i.e. R c/t , R b/t , mũ 1 , mχ0 1 , and mχ+ 1 , into one single variable called the MLP response. The algorithm will associate an MLP value to each parameter point of the scan, depending on the set of observables that maximize the separation between the two categories. Several points need to be kept in mind when performing this type of analysis. A key point is the so-called "overtraining", meaning that training the algorithm on a too small dataset may enforce the identification of unphysical regions, i.e. statistical fluctuations, as physical ones. We have performed an overtraining check by comparing the classification performance on the training sample and on the test sample. The behavior of the algorithm being the same on the two samples, we conclude that there are no statistical fluctuations having an impact on the classification.
The rather simple situation of having only two categories will also serve to study the influence of the underlying prior distribution, in particular of the stop component xt. We start from the same setup as in Sec. 3, where the random parameter scan has been performed such that the stop component xt exhibits a flat distribution. For this case, we show the obtained MLP response for the two categories in Fig. 7, together with the prior distribution of the stop component (see also Fig. 2).
As can be seen, the classifier manages to seperate the two categories with a rather good efficiency. For a given misidentification rate, the associated efficiency, i.e. the number of points of a chosen class surviving the misidentification cut, of the classifier can be computed. In the present case, for a misidentification rate of 10%, we obtain an efficiency of 54% for the "top-like" squark region and of 64% for the "charm-like" case. In other words, we can tag respectively approximately 54% and 64% of the points at 90% confidence level.
As a second example, we employ the classifier to the case of a non-uniform prior distribution of the stop-content xt. Inspired by the results of Ref. [34], we choose a prior distribution peaking at its "MFV-like" extremities xt ≈ 0 and xt ≈ 1. Apart from the prior distribution (and thus the squark rotation matrix elements), the sample has the same characteristics as the previous one. The prior distribution and the resulting MLP response are shown in Fig. 8. While it is approximately symmetric in the case of a flat prior, the MLP response associated to the two categories is clearly non-symmetric in the present case. This can be traced to the fact that the observables used to classify are non-symmetric with respect to "top-flavoured" and "charm-flavoured" squarks.
In this example, for the misidentification rate of 10%, we obtain an efficiency of 64% (60%) for the "top-flavoured" ("charm-flavoured") category. It appears that the efficiency depends on the prior distribution. More precisely, the classifier becomes more efficient in identifying the "top-flavoured" category, but slighly less performant concerning the "charm-flavoured" category.
The increasing classification power coming from the prior distribution can intuitively be understood as the two categories are now more different. The border between the two cases, i.e. xt ∼ 0.5, where it is phenomenologically difficult to assign a given point to a single category, are less populated in the second case with non-uniform prior. It is therefore easier to maximize the separation. After this first analysis within a simplified setup, we now aim at applying the MLP method to a more complete model.

Application to the MSSM with mixed top-charm squarks
As announced in the previous Section, we finally apply the multivariate analysis (MVA) classifier to the Minimal Supersymmetric Standard Model (MSSM) with non-minimal flavour mixing between charm-and top-flavoured squarks. In order to work with a rather "realistic" setup, we choose to use the parameter points obtained in Ref. [34] as basis of our study. These parameter points defined at the TeV scale have been shown to fulfill all relevant constraints coming from flavour and precision measurements, in particular the Higgs-boson mass, the decays B → X s γ and B → X s µµ, and the meson oscillation parameter ∆M Bs , to name the most important ones. For all details on the applied constraints and the related Markov Chain Monte Carlo study of the MSSM with non-minimal flavour violation in the squark sector, the reader is referred to Ref. [34].
Following the preliminary study of the simplified setup in Sec. 5, it is interesting to examine the prior distribution of the quantity that we want to address, i.e. the stop component xt of the lightest up-type squark. As can be seen from its representation in Fig.  9, the distribution strongly peaks at the "MFV-like" ends. Moreover, flavour and precision data tend to favour a high charm content with respect to top content in the lightest squark. Note that this situation is similar to the non-uniform prior tested in Sec. 5, which turned out to yield a higher efficiency than the simpler uniform prior. We now perform the same MLP classification using a training sample containing about 6 · 10 5 points obtained from the MCMC analysis of Ref. [34] 1 .  Categories Efficiency "charm" MFV 0.00 ≤ xt < 0.05 95% "charm" NMFV 0.05 < xt < 0.50 51% "top" NMFV 0.50 < xt < 0.95 41% "top" MFV 0.95 < xt ≤ 1.00 69% Table 3: Efficiencies of the classification method for the four categories of our interest assuming a misidentification rate of 10%.
Starting from the prior distribution shown in Fig. 9, we divide the ensemble of points into four categories defined as follows: 0.00 ≤ xt < 0.05 ⇐⇒ "charm MFV" 0.05 < xt < 0.50 ⇐⇒ "charm NMFV" 0.50 < xt > 0.95 ⇐⇒ "top NMFV" 0.95 < xt ≤ 1.00 ⇐⇒ "top MFV" (6.1) Note that, although the given definition of the above categories is again somewhat arbitrary, the exact value of the cuts between MFV and NMFV does not have a major impact on the methods presented in the following. It might, however, affect the efficiency of the proposed analysis, and the exact definition of the categories may in practice depend on the problem under consideration.
The MVA classifier is used to seperate each of the four categories from its complement, i.e. the ensemble comprising the three other classes. In Fig. 10, we show the MLP responses obtained for the four cases. As expected from the overpopulated prior region, the "charm MFV" category is rather well identified. However, the identification is less efficient for the two NMFV categories, which are underpopulated in the prior distribution. For the sake of a numerical comparison between the categories, and also to the cases presented in Sec. 5, we summarize the obtained efficiencies of the classifier in Table 3.
Overall, the performance of the classifier is better than for the simplified situations presented in Sec. 5. This can be traced to the underlying prior distribution of the stop content xt (see Fig. 9). The categories which are most difficult to identify, i.e. the two NMFV categories, are less populated in this particular model. The algorithm is therefore less performant in distinguishing these categories. The small bump observed around MLP ∼ 0.7 . . . 0.8 in both NMFV categories is an artefact of the employed multi-class MLP due to the presence of phenomenologically different regions.
Let us finally mention that we have also tested the likelihood inference method discussed in Sec. 4 on the present case of the NMFV-MSSM of Ref. [34]. However, for this method it turns out that inferring in a region of rather low density is quite difficult (contrary to the case of a uniform prior applied in Sec. 4). In addition, the strongly peaked prior distribution of the stop component xt leads to a certain bias, such that the obtained results are not reliable any more. We therefore do not discuss this method further for the given model.

Conclusion
We address the question to which extend the flavour decomposition of a squark-like state produced at the Large Hadron Collider can be reconstructed. As a starting point, we have considered a rather simple but typical set of collider observables related to intergenerational mixing between top-and charm-flavoured squarks. The quantity of our interest is the top-flavour content of the observed squark state, since it may give valuable information on the flavour structure of the theory.
We first have employed a likelihood inference method, which basically allows to infer the top-flavour content of the observed squark. With the help of a simplified model incorporating non-minimal flavour violation between the top and charm squarks, we have obtained viable information on the squark flavour structure assuming that additional information, in particular concerning the gaugino sector, is provided. In absence of information on the neutralino and chargino nature, the likelihood inference is less viable. However, the more additional information is available, e.g. on the gaugino sector (even if not fully determined), the more efficient this method will be. We also tried to use the likelihood inference method to the more general situation of the Minimal Supersymmetric Standard Model (MSSM) with additional top-charm mixing in the squark sector. However, it turns out to be inapplicable due to the somewhat extreme prior distribution of the top-flavour content and the available number of parameter points in the test sample.
The second method consists of a multivariate analysis classifier, which can efficiently separate two categories among a sample making use of a given set of observables. Performing this analysis on both the simplified setup and on the more general MSSM framework has led to promising results concerning the seperation between the Minimal and Non-Minimal Flavour Violation hypotheses. It turns out that this method can better deal with the strongly peaked prior distributions as it is the case in the MSSM with top-charm flavour mixing.
We want to emphasize the fact that the two methods are complementary. While the multi-variate analysis does not return an actual value for the top-flavour content of the squark, the likelihood inference can provide a reasonable estimation. However, the likelihood inference needs additional information, especially on the gaugino sector, and cannot handle very extreme prior distributions. These inconvenients can be avoided by the use of the multivariate analysis, which already allows to gain valuable information on the flavour structure. As this is a first attempt of the reconstruction of the squark flavour structure, the presented analysis relies on rather simple observables. Designing improved analyses inspired from this work should lead to a considerable improvement of the performances. As an example, one might consider additional observables related to the same parameters, such as, e.g., such as the top polarization from the squark decay or event rates stemming from gluino production and decay. From the machine-learning point of view, many algorithms exist for parameter-fitting problems and with a specific analysis it may be possible to access the actual value of the top-flavour content in a generic gaugino sector. Furthermore, considering new types of algorithms and additional observables may give access to the actual entries of the squark rotation matrix.