Real-time discrimination of photon pairs using machine learning at the LHC

ALP-mediated decays and other as-yet unobserved $B$ decays to di-photon final states are a challenge to select in hadron collider environments due to the large backgrounds that come directly from the $pp$ collision. We present the strategy implemented by the LHCb experiment in 2018 to efficiently select such photon pairs. A fast neural network topology, implemented in the LHCb real-time selection framework achieves high efficiency across a mass range of $4-20$ GeV$/c^{2}$. We discuss implications and future prospects for the LHCb experiment.


Introduction
The γγ final state is interesting for a variety of reasons. On one hand, the decay of a B 0 s or a B 0 meson to two photons remains unobserved and is described by an annihilation topology. It is sensitive to contributions from physics beyond the Standard Model of Particle Physics (SM) including a fourth generation [1], an extended Higgs sector [2] and SUSY [3]. Previous measurements by the Belle and BaBar collaborations have set limits of B(B 0 s → γγ) < 3.1 × 10 −6 at 90 % confidence level (CL) [4] and B(B 0 → γγ) < 3.3 × 10 −7 at 90 % CL [5], which are significantly above the SM predictions of B(B 0 s → γγ) ∼ (2 − 37) × 10 −7 and B(B 0 → γγ) ∼ (1 − 10) × 10 −8 [6].
Undiscovered particles known as Axion-Like Particles (ALPs) could also be accessed through this final state. ALPs are pseudo Nambu-Goldstone bosons, associated to spontaneously broken approximate symmetries, which appear in several models and can solve many of the SM problems [7]. Probing very small couplings of ALPs to the SM sets indirect constraints on the New Physics (NP) scale. For the type of model addressed in Ref. [7], ALPs couple to gluons (which allows their production at the LHC) or photons (which can be used for their detection) in the SM sector. The mass of ALPs can be arbitrarily below the NP scale. In particular, for ALPs with a mass in the range between 5 and 10 GeV/c 2 , the LHCb experiment, described in Ref. [8], has unique sensitivity for their discovery provided they can be selected by its trigger algorithms [7]. For masses below 5 GeV/c 2 , LHCb may have sensitivity through other decay channels, as described in Ref. [9].
The maximum rate at which events can be read out of the LHCb detector is imposed by the front-end electronics and corresponds to ∼ 1 MHz. In order to determine which events are kept, hardware triggers based on field-programmable gate arrays are used with a fixed latency of 4 µs. Information from the ECAL, HCAL, and muon stations is used in separate hardware-level (L0) triggers. As explained in Ref. [10], the strategy at L0 for the photon pairs under study here is similar to that of other radiative decays and relies on the Photon or Electron channels, based on inputs from the ECAL. All events selected by L0 are transferred to the High Level Trigger (HLT). The HLT is a software application, executed on an event filter farm, that is implemented in the same Gaudi framework [11] as the software used for the offline reconstruction. It consists of two levels: an initial selection of high energy and/or displaced single-or double-particle signatures (HLT1) and a second level (HLT2), in which full offline reconstruction is available, allowing for more complex searches to be performed [12].
The study of these purely neutral modes at LHCb is challenging, but the use of γ → e + e − conversions in the detector material, which happens for around 25 % of photons, provides additional information to reduce the background levels. With offline selections already in place since Run 1, this paper describes the trigger strategy adopted in Run 2, where a set of trigger selections were introduced to select the γγ signature for the case of zero, one, and two photon conversions, labelled as 0CV, 1CV and 2CV, respectively. An additional label LL and DD is used to distinguish photon conversions reconstructed as: • Long tracks (LL): when all possible tracking information from every tracking station is available, implying that the parent particle decayed within about 1 metre of the pp interaction point • Downstream tracks (DD): using only information from tracking stations different to the vertex locator, implying that the parent particle decayed after this.
The work presented in this paper builds on the strategy first introduced in 2015 to select only the B 0 (s) decay [10], in which a selection was put in place for 0CV, 1CV LL, 1CV DD, and 2CV (which includes both LL and DD combinations). The new approaches to select candidates in a wider mass range are described for the 0CV, 1CV LL, and 1CV DD topologies. The 2CV selection remains as is described in Ref. [10]. Section 2 describes the method by which our signal decays are simulated, Section 3 describes the HLT1 strategy. Section 4 describes the HLT2 strategy, performance, and corresponding implementation. The prospects for the current dataset collected by LHCb in addition to that expected to be collected by the upgraded LHCb detector (during Run 3 of the LHC and beyond) are discussed in Section 5.

Simulating the signal B 0 s and ALP decays
In order to simulate B 0 s → γγ decays, pp collisions are generated using Pythia [13] with a specific LHCb configuration [14]. Decays of hadronic particles are described by EvtGen [15], in which final-state radiation is generated using Photos [16]. The interaction of the generated particles with the detector, and its response, are implemented using the Geant4 toolkit [17] as described in Ref. [18]. This allows accurate simulation of the different topologies of the decay, as photons may interact with the detector itself and subsequently decay to electron-positron pairs.
For the ALP signal, samples are generated using MadGraph v2.6 [19], with parameters taken from the ALP model described in Ref. [20]. The hadronization is performed by Pythia, and the rest of the simulation steps are identical to simulating the B 0 s → γγ decay. Three ALP masses are simulated: 5 GeV/c 2 , 10 GeV/c 2 , and 15 GeV/c 2 .
After the detector response has been simulated, the trigger reconstruction and associated selection requirements are simulated using data taking conditions similar to those of the 2017 LHCb running period, with mean interactions per bunch crossing of 1.3 and center-of-mass-energy of 13 TeV.

HLT1 strategy
The first trigger software level is required to reduce the input rate by a factor 10 from the output of the hardware trigger level. In order to achieve this, a search is made primarily for high transverse momentum tracks or tracks with a high impact parameter with respect to the primary vertex. A detailed description of the HLT1 selections for 2018 can be found in Ref. [10]: in the case of photon conversions with at least one electron reconstructed as a long track, a generic inclusive single-track, MVA-based trigger selection is used [12]; in all other cases, the HLT1 strategy relies on a custom reconstruction approach using information from the calorimeter collected in the hardware trigger stage. This latter reconstruction technique, imposed by the limited time available for decisions in HLT1, applies simple approximations to calculate the mass of the photon pair with a minimum E T requirement in a negligible time using only their energies as calculated in L0.
Two selections with a different set of requirements are used: a first one, referred to as HLT1(B), focused on the selection of B decays, and a second one, not included in Table 1: Selection applied in the Hlt1B2GammaGamma and Hlt1B2GammaGammaHighMass HLT1 trigger selection. Energies and masses given here are computed with 2 × 2 cell clusters.

Requirement HLT1(B) HLT1(ALP)
Ref. [10], with stricter E T requirements and with a wider mass range, which extends the reach to ALP masses above the B mass window (HLT1(ALP)). Their requirements are given in Table 1; it is worth noting that the E T requirements of the HLT1(ALP) are very close to the saturation of the LHCb ECAL when using L0 energy clusters, 1 and therefore the true mass reach is higher than the requirement given in the table. The efficiency of the two selections relative to all candidates passed by the L0 hardware trigger is given in Table 2.

HLT2 strategy
The second trigger software level performs a more complete event reconstruction. In HLT2, over 400 multibody decay signatures are searched for in parallel, with candidate selection information equivalent in quality to that used by analysts offline.

Training sample preparation
The first step in designing an HLT2 strategy is to collect representative samples of signal and background in order to train the neural network (NN) classifiers. For the case of the signal, i.e. B 0 s and ALP decays, simulated data is used, which is generated as described in Sec. 2. In order to describe the background, proton-proton collisions collected by the LHCb collaboration during 2017 are used, in which the high level trigger selected candidates randomly. Since the intention is to implement a NN inside the trigger software, and in order to generate the largest possible NN training sample, a set of loose requirements are applied to the samples mentioned above. These are inspired by those applied in the analysis of other radiative decays (e.g. those in Refs. [21][22][23]). In summary, • photons reconstructed as calorimeter energy clusters are required to have an energy above 6 GeV and a transverse energy with respect of the beam direction above 3 GeV; • photons reconstructed as electron pairs are required to have a transverse momentum in excess of 2 GeV, have a mass below 60 MeV/c 2 and be displaced with respect to the pp collision; • the sum of transverse momentum of the two photons must be in excess of 6.5, 5.5 and 5 GeV/c for the 0CV, 1CV and 2CV cases, respectively; and • diphoton candidates are required to have a combined transverse momentum above 3 GeV/c and, in the case of 2CV, to form a good vertex.
In the training of the NN, a random subset of the B 0 s → γγ sample is taken, such that efficiencies of the ALP signal (for the 3 different masses under consideration) and B 0 s decay remain similar. The yields of the samples are provided in Table 3.

Multi-layer perceptron training and implementation
A multilayer perceptron (MLP) was chosen for the NN updates. A Scikit-learn [24] implementation was preferred due to the relative simplicity of the topology that allows for quick evaluation in real-time environments in association with the NNDrone package [25]. Due to the small training dataset, a LBFGS solver method [26], which shows better results under these conditions, was used. To avoid overtraining a small (O(10 −5 )) regularization parameter was chosen. A simple hidden layer structure was implemented: three neurons in the first and two in the latter. The small dimension was primarily set to avoid large time complexity. Finally, for simplicity, the logistic activation function, which is also used in the output layers, was chosen for the hidden layers. In order to train the MLP for each topology, different variables are found to have different separation powers. We select those providing significant discrimination between signal and background. The following variables are used as a feature for one or more of the NN models: • The transverse momentum of the parent B 0 s or ALP candidate (X p T ).
• The impact parameter significance of the B 0 s or ALP candidate with respect to the best primary vertex (X IP χ 2 ), defined as the difference in χ 2 of a given PV reconstructed with and without the considered particle.
• The invariant mass of the electron-positron combination in a photon conversion.
• The probability that the photon candidate is not a π 0 meson based on a combination of calorimeter information [27] (γ prob).
• Asymmetry of the p T of the two photon candidates (p T asym).
• The output of a multi-variate classifier [27] trained using various inputs corresponding to calorimeter shape variables (γ shower shape).
The signal and background distributions for each topology are given in Appendix A. It should be remarked that, as mentioned above, our classifiers use as inputs the outputs of two other different classifiers, one designed generically to identify calorimeter photons (γ shower shape) and one to differentiate photons and π 0 mesons (γ prob). While, ideally, one could integrate the features used in these classifiers into our own one, making use of the ability of algorithms such as Deep Neural Networks as feature extractors, it was decided against this given the high level of complexity in the training of both γ shower shape and γ prob. The first accounts for the expected deposit shapes in different regions of the calorimeter and excludes the possibility that these are originated by a charged track. The second also relies on the energy value and the calorimeter zone comparing the expected energy deposit of a photon with respect to that of a π 0 . Although both classifiers were trained to maximize its sensitivity for B meson decays, potential extensions of this work could involve integrating all the features into a single classifier to be trained specifically with our signal samples. The samples are split into training and test data, where half of the data are used for training and the other half used for testing. Optimizations of this splitting, such as the use of techniques like k-folding [28], will be explored in future versions of this algorithm  to maximize the size of the training and test samples. The output distributions of the trained models for both the test and training samples are given in Fig. 1. Good agreement can be seen in all training and test comparisons, meaning the models show few signs of overtraining. To quantify this, p-values obtained with the Kolmogorv-Smirnov test [29] can be found in Table 4.

Performance
The performance of each of the models shown in terms of ROC curves is provided in Appendix B. Ultimately, the efficiency of the models depends on the chosen working point. This is driven by resource requirements. The chosen working point of each of the models is shown in Table 5, along with the rejections and efficiencies per sample, per decay topology.

Implementation in the real-time software stack
In order to apply the neural networks in the C++ software stack used to perform event selection in real time, a conversion must take place so that the models trained in the Python implementation are reproduced.
In order to do this, the NNDrone framework [25] is used to convert the network weights and bias parameters to JSON format. This allows reducing the processing time by an order of magnitude while keeping the same classifier performance. The weights and parameters are then read in by a dedicated tool which uses the JSON parameters to initialise a LWTNN network [30] that reproduces the original model exactly. The reconstructed LWTNN model has been validated to produce identical outputs from the same input values.

Prospects
As explained in Section 1, the trigger strategy described in this paper is essential to improve the sensitivity of LHCb to γγ final states, especially after 2018. In this section, we describe briefly the prospects for the two benchmark analyses studied: the search for the rare decay B 0 (s) → γγ and that of an ALP decaying to a pair of photons. It should be noted that the same final state could also be sensitive to other models, such as those including a composite Higgs together with light scalars [31].
During the full LHCb Run 2 data-taking, selections providing high efficiencies for the reconstruction of B 0 (s) → γγ decays were included (see Tab. 2 and Tab. 5). In the same regard, the efficiencies for an ALP with a mass close to that of the B (s) meson is also very high for that period. For the case of an ALP with a mass between ∼ 6 GeV/c 2 and ∼ 12 GeV/c 2 , only data collected in 2018 with the strategy described in this document provides significant sensitivity. A similar strategy to that used in 2018 is expected to be used during Run 3 of the LHC, in which LHCb will have been upgraded [32]. Concerning the trigger, the new design [33] including the removal of the first hardware level, should provide similar or higher efficiencies.
The efficiencies reported in this document, together with the fraction of triggered data provided in Ref. [10], are used to roughly estimate the expected sensitivity in both analyses. A γγ invariant mass resolution of ∼ 2.5% is assumed for these studies. Additional offline discrimination against the background can be achieved by using similar but more powerful classifiers than those available in HLT2. This additional discrimination is based on the use of larger training samples and variables whose reconstruction is too slow to be performed in real time. Additional background rejections of ∼ 90 % can then be achieved with associated signal efficiencies of ∼ 60 % for all the photon reconstruction categories included in this document.
For the B 0 s → γγ decay an upper limit B(B 0 s → γγ) 10 −5 at 90 % CL could be achieved using the Run 2 LHCb dataset. This is around two times the Belle limit, currently the most stringent. Assuming similar efficiencies and backgrounds in Run 3 of the LHC, a simple projection yields B(B 0 s → γγ) 4 × 10 −6 . This assumption might not hold if the ECAL performance is affected by the larger occupancy expected in Run 3. If a more optimistic background discrimination is assumed (95% background rejection for the same signal efficiency) upper limits of B(B 0 s → γγ) 6 × 10 −6 and B(B 0 s → γγ) 2 × 10 −6 could be achieved using the Run 2 and Run 3 LHCb datasets, respectively. A discovery, should the SM prediction hold, would probably need to wait until Run 4 or a potential Phase-II LHCb upgrade [34].
Concerning a search for ALPs, estimations agree with those of Ref. [7]. For the reasons explained previously, the sensitivity with the current dataset is more limited for an ALP with a mass in the ∼ 6 − 12 GeV/c 2 range. In the most sensitive region, decay constants below ∼ 0.3 TeV could be excluded using the LHCb Run 2 dataset. Keeping the same efficiency as in 2018 for Run 3 would provide an increase to ∼ 0.4 TeV for the ∼ 4 − 12 GeV/c 2 mass range. No other experiment is expected to contribute to the measurement in this mass range in near future.

Summary
Di-photon selections were first implemented in the LHCb trigger in 2015 focusing on the B 0 s decay. In this paper, we have detailed the selection modifications required to expand the search region to allow sensitivity to undiscovered ALPs in the 2018 trigger. In order to do this and remain within resource budgets, neural network models have been introduced using the NNDrone framework to ensure fast evaluation. This is the first time such models have been used to directly select multibody candidates in real time.

A Neural network feature distributions
The signal and background distributions for each topology are shown in Figs. 2, 3, and 4, respectively, where "signal" refers to the combination of all signal modes (i.e., B 0 s → γγ and the ALP at the three mass points of reference).

B ROC curve performances of the NN models
The performance of each of the models is shown in terms of ROC curves in Fig. 5, which display the signal efficiency versus background rejection power.