Per-Object Systematics using Deep-Learned Calibration

We show how to treat systematic uncertainties using Bayesian deep networks for regression. First, we analyze how these networks separately trace statistical and systematic uncertainties on the momenta of boosted top quarks forming fat jets. Next, we propose a novel calibration procedure by training on labels and their error bars. Again, the network cleanly separates the different uncertainties. As a technical side effect, we show how Bayesian networks can be extended to describe non-Gaussian features.

One open question is driven by particle physics' obsession with error bars: how do we quantify the different uncertainties in analyses using neural networks [28][29][30][31]? This question is related to visualization [32], understanding the relevant physics features [33][34][35][36][37], and weakly supervised learning approaches [38][39][40][41][42][43][44][45] -all combined under the general theme of explainable AI. In LHC physics we have the advantage of excellent Monte Carlo simulations and full control of the experimental setup. This allows us to define and control different sources uncertainties very precisely. If we accept that a neural network is just a function relating training data to an output there exist (at least) two main kinds of uncertainties: 1. first, labelled training data comes with statistical and systematic uncertainties, where we define the former as uncertainties which vanish with more training data. The systematic uncertainties can be Gaussian or include shifts, depending on their sources. Unstable network training also belongs to this category of training-induced uncertainties [28]; 2. second, on the test data or analysis side we also encounter statistical and systematic uncertainties. When we include an inference or any kind of analysis we also encounter model or theory uncertainties [29]. For these uncertainties it is crucial that we ensure our analysis outcome is conservative.
In a previous paper [28] we have shown how Bayesian classification networks can track uncertainties and provide jet-by-jet error bars for the tagging output. Such a Bayesian network can supplement a probabilistic classification output of '60% signal' with an error estimate of the kind '(60 ± 10)% signal' for a given jet. This kind of jet-by-jet information exceeds what is available from standard LHC classification tools. In principle, this approach covers both, statistical errors from the size of the training sample and systematic uncertainties for instance from the calibration of the training sample. However, our quantitative analysis of Bayesian top taggers encountered practical limitations, for instance that the jet energy scale simultaneously affects the central value and the error bar of the probabilistic output. A similar study of uncertainties just appeared for a matrix element regression task [46].
In this follow-up study we look at this problem from a slightly different angle, now defining the regression task of extracting the energy of a tagged top quark inside a fat jet. Again, we translate statistical and systematic uncertainties from the training sample to the test output. The Bayesian network, introduces in Sec. 2, allows us to construct a per-jet probability distribution function over possible top momenta, or p(p t |fat jet). The main advantage of using the regression task as example is that it does not enforce a closed interval for the network output and hence removes the correlation between central value and error estimate in the network output. We use this advantage to cleanly separate effects from the finite size of the training sample and from the stochastic nature of the training sample in Sec. 4.
In Sec. 5 the stochastic uncertainty leads us to a discussion of systematics in the sense of training-related uncertainties which do not shrink with more training data. Our regression task naturally leads us to developing a framework to calibrate deep network taggers and account for uncertainties in the training sample. We find that a straightforward treatment should be based on smearing the momentum labels in the training sample. It directly accounts for the uncertainties in the underlying measurements of the calibration sample and treats them as an additional systematic effect on the top momentum measurement. As before, the Bayesian network allows us to cleanly separate all different sources of uncertainty.
Our simple application serves as an example how we can use Bayesian networks to define statistical and systematic uncertainties coming from the training sample and affecting the network output. These error bars are defined jet by jet, or event by event, giving us more control than standard methods do. Training on smeared labels allows us to implement energy calibration in a straightforward and automized manner. While our modelling of uncertainties on the reference measurements for calibration is simplified, our approach can be extended in a straightforward manner. For instance, the effect of different jet algorithms or different Monte Carlo simulations can be implemented as a non-Gaussian contribution to the label smearing.
The key observation is that Bayesian networks allow us to quote uncertainties from all kinds of statistical and systematic limitations of the labelled training data.

Bayesian regression
Our Bayesian network [47][48][49][50][51][52], as always, relates training data D to an output C with the help of the network parameters ω. If we omit the argument D everywhere we can write the (posterior) probability for a distribution of network parameters given the network output as p(ω|C). For our regression network the output is the link between fat jet properties and the top momentum. The expectation value for the top momentum, as reconstructed by the network, can be extracted from a probability distribution, p T = dp T p T p(p T |C) = dp T p T dω p(p T |ω, C) p(ω|C) . (1) Because we do not know the closed form of p(ω|C), we approximate it with a learned function q(ω) in the sense of a distribution [53], The mean value of the top's transverse momentum is then This means we can extract p T from its weight-dependent counterpart p T ω by sampling the network weight distributions. Correspondingly, the variance of the transverse momentum extracted from the network is This way we identify two contributions to the jet-wise uncertainty from the Bayesian network. First, σ stoch occurs even before we sample the network weights, so it describes the stochastic nature of the training sample. In our conventions of Eq.(3) we can define an ω-dependent version such that Second, σ pred is defined in terms of the ω-integrated expectation value p T , so there does not exist an ω-dependent version, Only this second contribution will vanish in the limit of an infinitely large training sample, because in that case the network weight distributions become delta distributions.
The two contributions to the uncertainty σ tot also appear in the loss function. The standard approach for Bayesian networks is to ensure Eq.(2) using the Kullback-Leibler divergence, In this derivation we use Bayes' theorem. The prior p(ω) describes the model parameters before training. The model evidence p(C) guarantees the correct normalization of p(ω|C). In Eq.(7) we omit it just as the normalization condition for q(ω). This means that the relevant loss function consists of two terms, the regularization for q(ω) in reference to the prior p(ω) and the likelihood p(C|ω), which we can work with in a frequentist sense. For a Gaussian prior this regularization term becomes the standard L2-regularization. It is important to note that to this point we have not made any simplifying assumptions, for instance on the shape of the functions involved.
For illustration purposes or to improve the numerical performance we make a set of assumptions. For instance, we typically assume the prior p(ω) to be Gaussian. In Ref. [28] we have -by varying priors over several order of magnitude -shown that the this assumption had no visible effect on the network output. In addition, we can assume the likelihood p(C|ω) to be Gaussian, such that Eq.(7) becomes and the likelihood splits into two terms, the mean squared error (MSE) and log σ stoch as defined in Eq.(4). The loss function has to be minimized with respect to the parameters of q(ω). Assuming a Gaussian form also for q(ω) gives us with two trainable parameters per weight and the last layer of our neural network gives per jet. For this choice of q(ω) and p(ω) the first term in Eq.(9) can be computed analytically and acts as a regularization term.
To extract the per-jet probability distribution p(p T |C) following Eq. (2), we usually rely on Monte Carlo integration by sampling weights from the weight distributions. As in Eq. (9) we can assume that the probability p(p T |ω, C) is a Gaussian with the above-defined mean p T ω and width σ stoch,ω . Moreover, for large training statistics the distribution q(ω) should become narrow. According to Eq.(6) the effect of a finite width of q(ω) can be tracked by σ pred , so if we can for instance show that σ pred σ stoch we can approximate p(p T |C) as a Gaussian with weigh-independent mean p T and width σ stoch . This network structure is illustrated in Fig. 1.

Data set and network
The correct and precise reconstruction of the momentum of tagged top quarks is important for instance in top resonance searches and has influenced the design of many top taggers [54]. Our data set is therefore similar to standard top tagging references, with some modifications Figure 1: Illustration of our Bayesian network setup. The Bayesian network provides us with an uncertainty estimate for a single input jet x. which simplify our regression task. We generate a sample of R = 1.2 top jets in the range p truth at 14 TeV collider energy and the standard ATLAS card for Delphes [56]. We always neglect multi-parton interactions and always include final state radiation. Given initial state radiation we work with two event samples, one with ISR switched on and one with ISR switched off. We require the jets to be central |η j | < 2 and truth-matched in the sense that each fat jet has to have a top quark within the jet area. These settings essentially correspond to the public top tagging data set from Refs. [20] and [27]. The difference to the standard tagging reference sets is that we flatten our data set in p truth T,t , such that even accounting for bin migration effects we can safely assume that in the fat jet momentum the sample is flat for p T,j = 500 ... 800 GeV.
The final result of our Bayesian network will be a probability distribution over possible p T,t values for a given jet. For our labelled data we know the corresponding p truth T,t . However, the fact that we will modify this truth label as part of the calibration training makes it the less attractive option to organize our samples. The closest alternative observable is the momentum of the fat jet, so we can think of p T,j as representing the complete fat jet input to the network. So unless explicitly mentioned we train our networks on a large data set defined in terms of the fat jet momentum, Whenever we need a homogeneous sample without boundary effects we choose a narrow test sample with The data format for the fat jet information is a p T -ordered list of up to 200 constituent 4-vectors ( p and E) with ISR and 100 constituents without. Our total sample size is 2.2M jets without ISR, of which we use 400k jet for validation and testing, each. The training size is varied throughout our analysis.
Our regression network is a simple 5-layer fully connected dense network. Its first two layers each consist of 100 units, the next two 50 units, followed by a 2-unit output layer, unless mentioned otherwise. For the prior we choose a Gaussian around zero and with width 0.1. We have confirmed that our results are width-independent over a wide range [28]. The typical sizes and widths of the weights depends on the input data. The input is a flattened set of 4-vectors where we re-scale the p T values by a factor 1000 to end up between zero and one. The activation function is ReLU, except for the output layer. That one predicts the mean value p T without any need for an activation function and the SoftPlus function for the error to have a smooth function which guarantees positive values for the error. We have checked that this setup with these hyper-parameters is not fine-tuned.
For the Bayesian network features we rely on Tensorflow Probability [57] with Flipout Dense layers [58] replacing the dense layer of the deterministic network. All networks are trained with the Adam optimizer [59] and a learning rate of 10 −4 , determined by early stopping when the loss function evaluated on the training dataset does not improve for a certain number of epochs. This patience was set to 10 for a training size of 1M jets and to larger values for smaller training sizes because the loss function is more fluctuating. For the Bayesian network with a training batch size of 100 we observe no over-fitting.

Momentum determination and statistics
As a first part of our Bayesian regression analysis we need to show how well the networks reconstructs the top momentum and what the limiting factors are. We then have to separate the statistical and systematic uncertainties. In analogy to Ref. [28] we first study how the size of the training sample affects the regression output, i.e. how well the Bayesian network keeps track of the statistical uncertainty.
To illustrate the output of our Bayesian network for a single jet we shoe an example in Fig. 2. Sampling from the weight distributions, q(ω), provides us with a Gaussian per sampled set of weights, shown in petrol. The combination of these distributions is shown in red. The width of the combined distribution is the predicted per-jet uncertainty σ tot , defined in Eq.(4). For illustration purposes we pick a top jet where p truth T,t coincides with the peak of the predicted distribution.

Regression performance
To begin, we show in the left panel of Fig. 3 the correlation between the measurable p T,j and the MC label p truth T,t . We see that over the entire range the two values are aligned well. This allows us to use p T,j as a proxy to the truth information, keeping in mind that we will eventually smear the truth label to describe the jet calibration. In the right panel of Fig. 3 we show the correlation between the central extracted p T,t value, which in Sec. 2 is properly denoted as the expectation value p T , and the label p truth T,t . In the left panel of Fig. 4 we show the p truth 600 ... 620 GeV. In the absence of initial state radiation the distribution is asymmetric. The simple reason is that the jet clustering can only miss top decay constituents, so we are more likely to observe p T,j < p truth T,t . Aside from that we see a clear peak, suggesting that we can indeed represent p truth T,t with p T,j . Because the peak is washed out by ISR, we switch off ISR to make it easier to understand the physics behind our network task. In practice, this could be done through a pre-processing and pruning step.
Whenever we have access to MC truth, we can measure the performance of the regression network for each top jet as (p T,t − p truth T,t ) 2 . The squared difference measure only uses the mean or central value reported by a Bayesian or deterministic network, not the additional uncertainty information from the Bayesian network. For a given test sample with N top jets t i we construct the mean quadratic error as   We evaluate it over homogeneous samples, for example our usual slice in p T,j . In Tab. 1 we contrast results with and without ISR and show what happens if we limit ourselves to the most top-like jets based on a standard LoLa tagger [20], trained on events with ISR. To estimate the effect of different trainings we also give an error bar based on five independent trainings and the resulting standard deviation. Expectedly, the p T -measurement benefits from more top-like events, but the effect is not as significant as in the HEPTopTagger analysis [54]. One of the reasons is that we are using relatively large R = 1.2 jets for the high transverse momentum range. Similarly, we confirm that additional ISR jets have the potential to affect the top momentum measurement whenever hard extra jets enter the fat jet area.
In the right panel of Fig. 4 we show √ MSE as a function of p T,j for a bin width of 40 GeV. While the absolute error increases, the relative error on the extracted p T,t shrinks for more boosted jets. If we assume that an improved jet pre-selection can efficiently remove ISR contributions our regression network can measure the top momentum to roughly 4%. This result is only a rough benchmark to confirm that the regression network performs in a meaningful manner. It would surely be possible to improve the network performance, but we deliberately keep the network simple, to understand the way it processes information and the related uncertainties. From the right panel of Fig. 4 we know that boundary effects will appear already around 200 GeV away from the actual boundaries. Indeed, around p T,j we see such effects indicating the phase space boundary of p T,j < 1 TeV in our training sample.
In the same Fig. 4 we also show this uncertainty estimate of the Bayesian network, σ tot as defined in Eq.(4). It follows the √ MSE estimate of the network error, indicating that the Bayesian output captures the same physics as the frequentist-defined spread of the central values. for jets with p T,j = 600 ... 620 GeV, without and with initial state radiation. Right: regression uncertainty as a function of p T,j (solid), compared with the average σ stoch as the network output (dashed). The most top-like events are defined with a simple LoLa tagger [20].
Training sample size and σ pred As discussed in Sec. 2 the contribution σ pred to the uncertainty reported by the network can be identified as a statistical uncertainty in the sense that it should vanish in the limit of infinitely many training jets. In complete analogy to the classification task described in Ref. [28] we confirm this by training Bayesian networks on 2k, 5k, 10k, 15k, 20k, 30k, 50k, 100k, 200k, 500k, and 1M jets. We test these networks on the narrow range p T,j = 600 ... 620 GeV, similar to the results shown in Tab. 1. The uncertainties quoted by the Bayesian network are shown in Fig. 5. In the lower part of the figure we first see that the statistical error σ pred indeed asymptotically approaches zero for 1M training jets. The error bars on the extracted uncertainty are given by the standard deviation of five independent trainings. As expected, they grow for smaller training samples, where the Bayesian networks also give fluctuating results.
In the same figure we also show the systematic σ stoch and the combined σ tot , defined in Eq.(4). We confirm that the extracted σ stoch hardly depends on the size of the training sample. Once we have a reasonably number of training events it reaches a plateau of around 50 GeV or 8%, while for less than 10000 training events the network simply fails to capture the full information. We can compare the plateau value for σ stoch to the √ MSE value and find again that the two values agree.This allows us to conclude that σ stoch describes a systematic uncertainty and that it is related to the truth-based √ MSE estimate. We will discuss it in more detail in Sec. 5.
After observing the average effect of the training sample size on σ pred the obvious question is if we can understand this behavior. In the left panel of Fig. 6 we show the distribution of σ pred values for a sample of 400k jets. The network is trained on 100k jets with an extended  range p T,j = 500 ... 900 GeV. We see a clear maximum around σ pred ≈ 5 GeV, with a large tail towards large uncertainties. It is induced by the constraint that no network should quote an uncertainty close to zero.
The jet property we can relate to the σ pred behavior is the number of particle-flow constituents. As mentioned before, we cover up to 100 constituents for jets without ISR. Their effect on top tagging is discussed for instance in Ref. [20]. The center panel of Fig. 6 shows how the number of constituents in the test sample jets peaks at around 25, but with a tail extending to 60. Jets with a larger quoted uncertainty have significantly more constituents. The same information is shown in the right panel, where we see the average number of jets increases with the range of quoted statistical uncertainties. The reason for this pattern is that also within the training sample the number of constituents will peak around 25, limiting the number of training jets with higher constituent numbers. We note that we could use the same argument using the jet mass.

Frequentist approach
From a practical point of view it is crucial to validate the Bayesian network using a frequentist approach. We do this by showing that predictions from many trainings of a deterministic network reproduce our Bayesian network results for the statistical uncertainty σ pred .
For the deterministic networks we use the same architecture as for the Bayesian network. The loss function of the deterministic networks is the negative log-likelihood given in Eq. 8, and we fix the L2-regularization to match the Bayesian network in Eq. (7), where N is the total training size and σ prior = 0.1 is our prior width. We then train 40 deterministic networks on statistically independent samples, which we sample from the total of 2.2M training jets. Each set of deterministic network then predicts a mean and a standard deviation, in analogy to Eq. (10). The difference between the Bayesian evaluation and the frequentist networks is that we replace the integral over weights with a sum over independent networks. For deterministic networks we need to avoid over-training. An over-trained set of networks will underestimate σ stoch , while the spread represented by σ pred increases. However, it is not guaranteed that these two effects compensate each other for finite training time. This is why we introduce dropout for each inner layer with a rate of 0.1. This value is a compromise between network performance and over-training. Unlike in our earlier study [28] we do not use a MAP modification of the Bayesian network.
In Fig. 7 we compare the Bayesian and frequentist uncertainties for different training sample size. While the results agree well for properly trained networks or large training samples, the frequentist approach slightly underestimates the uncertainty for small training samples. The plateau value of σ stoch depends on the chosen dropout value. Accounting for this effect we see that the training-size-dependent σ pred and the plateau value of σ stoch , agree well between the Bayesian network and the frequentist sanity check.

Systematics and calibration
In our original paper [28] we have shown that the Bayesian setup propagates uncertainties from statistical and systematic limitations of the training data through a neural network. In addition to the usual output the Bayesian network provides event-by-event error bars. A limitation we encounter in Ref. [28] is that forcing the network output onto a closed interval, like a probability p ∈ [0, 1], strongly correlates the the central value and the error bars in the network output. This makes it difficult to track systematic uncertainties. We circumvent this problem by extracting the transverse momentum, which does not live on a closed interval. In the previous section this allowed us to decompose σ tot into a statistical component, σ pred , and a systematic component, σ stoch . What we still need to study is the actual output distribution of the Bayesian network, p(p T |C), and how it compared to the truth information from the test data.

Variance of training data and σ stoch
In the upper left panel of Fig. 8 we show the correlation of p truth T,t and p T,j . The orange curves represent the maximum and the 68% CL interval in 20 GeV bin. The corresponding maximum and 68% CL interval of the BNN output are illustrated in blue. Both confidence intervals are constructed by requiring equal functional values at both ends. In the lower left panel we see why the two sets of curves agree very poorly: for the narrow p T,j slide the p truth T,t distribution is all but Gaussian, while the Bayesian output in our naive approach is forced to be Gaussian, as seen in Eq.(10).
From Sec. 2 we know that it is not necessary to assume that the Bayesian network output is Gaussian. As a simple generalization we can replace the two-parameter Gaussian form of p(C|ω) in Eq.(9) with a mixture of Gaussians, with i α i,ω = 1. The network output from Eq.(10) then becomes  vs p T,j including its 68% CL around the maximum. In blue we show the BNN results. Lower: p truth T,t -distribution for a narrow slice in p T,j . From left to right we approximate p truth T,t with one, two, and three Gaussians.
To guarantee i α i,ω = 1 we use SoftMax as an activation function for α i,ω and the SoftPlus function for σ (i) stoch,ω to ensure positive values. In the center and right sets of panels in Fig. 8 we see what happens if we use two or three Gaussians, specifically with the parameters averaged over weights and jets in a bin. For three Gaussians the BNN output and the p truth T,t distribution agree perfectly. The corresponding parameters are shown in Tab. 2.
Technically, we follow Sec. 2 in extracting σ stoch and σ pred independently of the form of the underlying assumption. Two aspects render this computation slightly expensive: the integration over all weights and, if required, the combination of different predictions in one p T,j bin. On the other hand we know that σ pred σ stoch and we can always use narrow bin sizes. This means that in both cases we can replace the integrals by simply averaging over the parameters of the Gaussian mixture model. This implementation is computationally less expensive and gives us simple analytic expressions from which we extract the maximum and 68% CL interval.

Noisy labels
A crucial question in experimental physics is how we include a systematic uncertainty for instance on the jet energy scale in the training procedure. We can understand such an energy calibration when we remind ourselves that the jets in the calibration sample come with a measured reference value for their energies and the corresponding error bar; and that the   Fig. 8, specifically p T,j = 600...620 GeV.
calibration sample in our case is the training sample. There are two ways we can include the error on the calibration measurements in our analysis: 1A. fix the label or 'true energy' and smear the jets in the training sample; 1B. fix the jets and smear the continuous label in the training sample; 2. train the Bayesian network on the smeared label-jet combination; 3. extract a systematics error bar for each jet in the test sample.
In Ref. [28] we have followed the option 1A and encountered some practical/numerical problems when tracing the corresponding systematics to the network output. In this study we shift to the less standard and yet straightforward option 1B. We assume that jet calibration incorporates external information on the training sample, be it another measurement or a theory requirement (one-shell Z-decays) or a MC prediction. This information defines a label together with a corresponding error bar. This means we train our network on a fixed sample of jets with a smeared label representing the full reference measurement. In this approach we can trivially include additional uncertainties from pre-processing the training data, like running a jet algorithm of the Z-sample, removing underlying event and pile-up, etc. As a side effect our setup also allows us to capture possible transfer uncertainties, whenever our test sample cannot easily be linked to the training sample. In the ML literature such uncertainties are referred to as out-of-sample error.
To illustrate and test our setup we smear p truth T,t , the label in the training data, according to Gaussians with widths of In Fig. 9. we see that for a small amount of smearing the non-Gaussian shape of Fig. 8 remains, so we use two Gaussians in the BNN. For sizeable Gaussian smearing we see that the resulting distributions all assume a Gaussian shape and we can stick to the single-Gauss standard BNN. In both cases the distribution of the BNN output and the (smeared) label p truth T,t agree almost perfectly.
From the previous sections we know that the reported uncertainty by the BNN includes a statistical uncertainty vanishing with an increasing amount of training data and a systematic uncertainty representing the stochastic nature of the training data. When we introduce vs p T,j including its 68% CL around the maximum, after adding 4% (left) and 10% (right) Gaussian noise on the top momentum label. In blue we also show the BNN error estimate. Lower: corresponding p truth T,t -distribution for a narrow slice in p T,j . another uncertainty induced by smeared labels we expand Eq.(4) to added in quadrature because of the central limit theorem. The baseline value σ stoch,0 is defined as σ stoch in the limit of no smearing. In Fig. 10 we show how σ cal correlates with the input σ smear over a wide range of scale uncertainties. As usually, the error bar represents the standard deviation from five independent trainings. This correlation shows that our network picks up the systematic uncertainties from smeared training labels perfectly. We note that, as before, this analysis does not require a Gaussian shape of the network output. p T, j = 600...620 GeV Figure 10: Correlation between σ stoch , as given by the Bayesian network and the smearing σ smear applied to the label in the training data. The baseline σ stoch,0 is defined as σ stoch in the limit of no smearing. The error bars indicate the standard deviation from five independent trainings.

Outlook
We have shown that Bayesian networks keep track of statistical and systematic uncertainties in the training data and translate them into a jet-by-jet error budget for instance in a momentum measurement. Outside particle physics it is not unusual to treat uncertainties as a smearing of labels, whereas in particle physics we usually model them by smearing the input data. We show that smearing labels is a natural, feasible, and self-consistent strategy in combination with deep learning. An advantage of this approach is that the treatment of uncertainties is moved from the evaluation time to the training time and so-trained networks accurately report predictions of the central value as well as systematic uncertainties.
We have shown that the corresponding Bayesian networks allow us to cleanly separate statistical and systematic uncertainties. In addition, the smeared labels are ideally suited to translate uncertainties from reference or calibration data to the network output.
Technically, we have modified the Bayesian network approach of Ref. [28] to include non-Gaussian behavior. This step is crucial for modeling systematic uncertainties in general.
We emphasize that before this approach can be generally adapted, open questions such as multiple correlated uncertainties and the translation between input-uncertainties and labeluncertainties need to be answered. However, our first results show great promise for smeared labels describing uncertainties in particle physics applications of deep learning. mann for getting us into Bayesian neural networks. ML is funded through the Graduiertenkolleg Particle Physics Beyond the Standard Model (GRK 1940

A Comparison to smeared data
To further validate the proposed approach, Fig. 11 compares the performance of the BNN approach with a more traditional smearing of the input objects. For smearing the objects we use a Bayesian neural network trained on data without smearing and evaluate this network on a test dataset with modified inputs. Each jet in the test sample is smeared once up and once down, then the difference of the two network outputs is evaluated and divided by two. We then show the average in the given p T,j -range. The BNN prediction is in good agreement with modified inputs, giving additional confidence in uncertainty predicted by the Bayesian network.