Deep learnig analysis of the inverse seesaw in a 3-3-1 model at the LHC

Inverse seesaw is a genuine TeV scale seesaw mechanism. In it active neutrinos with masses at eV scale requires lepton number be explicitly violated at keV scale and the existence of new physics, in the form of heavy neutrinos, at TeV scale. Therefore it is a phenomenologically viable seesaw mechanism since its signature may be probed at the LHC. Moreover it is successfully embedded into gauge extensions of the standard model as the 3-3-1 model with the right-handed neutrinos. In this work we revisit the implementation of this mechanism into the 3-3-1 model and employ deep learning analysis to probe such setting at the LHC and, as main result, we have that if its signature is not detected in the next LHC running with energy of 14 TeVs, then, the vector boson $Z^{\prime}$ of the 3-3-1 model must be heavier than 4 TeVs.


I. INTRODUCTION
Seesaw mechanisms [1][2][3][4] are seem as the simplest proposals to solve the long-standing problem of the smallness of the neutrino masses. Recently researchers have focused their investigations on phenomenologically viable seesaw mechanisms, as inverse seesaw one [4], since their signatures may be probed at the LHC [5].
The distinguishable aspect of the inverse seesaw (ISS) mechanism is the fact that it is a genuine TeV scale seesaw mechanism and according to the original idea [4] its implementation requires the addition of six new neutrinos ( N iR , S iL with i = 1, 2, 3 ) to the standard model particle content composing the following bilinear terms [6], where m D , M and µ are generic 3 × 3 complex mass matrices. These terms can be arranged in the following 9 × 9 neutrino mass matrix in the basis (ν L , N C L , S L ), Considering the hierarchy µ << m D << M , the diagonalization of this 9 × 9 mass matrix provides the following effective neutrino mass matrix for the standard neutrinos: The double suppression by the mass scale connected with M turns it possible to have such scale much below than that one involved in the canonical seesaw mechanism [1][2][3]. It happens that standard neutrinos with mass at sub-eV scale are obtained for m D at electroweak scale, M at TeV scale and µ at keV scale. In this case all the new six neutrinos may develop masses around TeV scale or less, and their mixing with the standard neutrinos is modulated by the ratio m D M −1 . The core of the ISS mechanism is that the smallness of the neutrino masses is guaranteed by assuming that the µ scale is small and, in order to bring heavy neutrino masses down to TeV scale, it has to be at the keV scale.
In this regard it was showed in [7] that the SU (3) C × SU (3) L × U (1) N with right-handed neutrinos (331RHN) [8] has the main ingredients for realizing the ISS mechanism. However, a probe of the ISS mechanism in 331RHN at the LHC is missing. The proposal of this work is to complete this job and probe the ISS in 331RHN at the LHC. For this purpose we review the model, the mechanism, and employ deep learning to probe the signature of the mechanism at the LHC by means of the production of these new neutrinos and their detection in the form of leptons as final products.
This work is organised as follow: in Sec. II we revised the implementation of the ISS into the 331RHN and present the charged and neutral currents of interest for our analysis.
In Sec. III we perform our analysis by applying deep learning techniques to probe both the ISS and the 331RHN. In Sec. IV we present our conclusions.

II. SOME ESSENTIAL POINTS OF THE MODEL AND OF THE MECHANISM
In order to implement the ISS mechanism into the 3311RHN we have to add three lefthanded neutral fermions in the singlet form to the original leptonic content of model, where a = 1, 2, 3 which corresponds to three families of leptons.
For completeness reasons, we present the quark content. As it is well known, in the quark sector, two families must transform as anti-triplet. This is so to cancel anomalies. Here we make the following choice: where i = 1, 2 while the third family will transfrom as triplet, The scalar sector keeps the original content, The gauge sector is composed by the standard ones , W ± µ , Z µ and the photon A µ plus five new ones U 0 µ , U 0 † µ , W ± µ and Z µ . This particle content allows the following Yukawa interactions, where a, b = 1, 2, 3, i, j = 1, 2 and l, m, n = 1, 2, 3. For the sake of simplicity, we consider charged leptons in a diagonal basis. Observe that the last line of this lagrangian includes the terms that trigger the ISS mechanism.
As usual, we assume that only η 0 , ρ 0 and χ 0 develop vaccum expectation values (VEVs) other than zero and we consider the following expansions around the VEVs: With this set of VEVs, the last line of the Yukawa Lagrangian above provides the following mass terms for the neutrinos: where the 3 × 3 matrices are defined as with M ab and m Dab being Dirac mass matrices, with this last one being antisymmetric.
Considering the basis S L = ν L , ν C L , N L we can write L νmass in the following form with the mass matrix M ν having the texture, This is the mass matrix that characterize the ISS mechanism. The hierarchy M m D µ provides a seesaw relation for the masses of the standard neutrinos. In order to see this it is useful to define the matrices, so that we have the following block matrix where M R is supposed invertible, This last matrix can be block diagonalized. For this purpose let us definife the matrix W , such that, where m light = −M T D M −1 R M D and m heavy = M R . When we plug M D and M −1 R in m light we obtain the canonical inverse seesaw mass expression for the standard neutrinos: Observe that the matrix in Eq. (18) is not diagonal. It is a block diagonal matrix. The diagonalization of the mass matrix in Eq. (16) is done through the unitary matrix V = W U , with U P M N S being the PNMS matrix that diagonalizes m light while U R diagonalizes m heavy , and m diag is the diagonal mass matrix with nine eigenvalues.
The matrix V connects the flavor basis S L = ν L , ν C L , N L T =(ν L , ζ L ) T with the physical one which we call n L = (n 0 iL , n 1 kL ) T where n 0 i L with i = 1, 2, 3 and n 1 k L with k = 1, 2, ..., 6. The relation between flavor and mass eigenstates, S L = V n L , is given explicitly by.
For simplicity, we will define the matrix V in the following form: Returning to m light , on substituting Remember that G is an anti-symmetric matrix implying that one eigenvalue of the neu- the non-unitarity of the mixing matrix V νν any set of values for the entries in G and G that do the job must obey the following constraints [10], To simplify our job we consider v η = v ρ = v. Thus, the constraint v 2 η + v 2 ρ = (246GeV) 2 implies v = 174GeV. It is supposed that v χ lies around TeV. Here we assume 5 TeV. We also consider µ = 0.3 I keV where I is the identity matrix.
Regarding the Yukawa couplings G and G , we consider the scenario where G is diagonal but non-degenerate and as illustrative case we take, With these set of values for G, G and for the values of the VEVs v, v χ and µ presented above, the diagonalization of the mass matrix m light in Eq. (26) furnishes Let us check if the values for G and G above are in accordance with non-unitarity constraint [10].
On substituting the set of values of G and G in η yields, which respect the bounds in Eq. (28).
Regarding the six new neutrinos, by diagonalizing m heavy = M R in Eq. (15), our illustrative example yields (n 1 1 L , n 1 6 L ,) with masses ∼ 373.28 GeV, (n 1 2 L , n 1 5 L ) with masses ∼ 220.84 GeV and (n 1 3 L , n 1 4 L ) with masses around ∼ 96.32 GeV. The degeneracy in mass is due to the simplicity of our illustraive example.
So we developed the basic aspects of the implementation of the ISS mechanism within the 331RHN and presented an illustrative example that recovers the current experimental results involving neutrino oscillation.
Our wish now is to probe this scenario at the LHC. We do this by means of the production of pairs of heavy neutrinos, n 1 i L , and their subsequent detection in the form of leptons as main final products. The processes we study are intermediated by the standard charged gauge boson W ± and Z . The neutral and charged currents of interest are presented below.
We present, first, the charged current with W ± which are composed by the following terms, The neutral current interactions with Z have two contributions. The first one is This is the set of interactions that matter for us here. In the first line of Eq. (34) we have Such pattern of mixing is due to the simple choice of the parameters G and µ. In the next section we are going to probe the signature of this mechanism by producing the lightest new neutrinos, n 1 3 L and n 1 4 L , at the LHC. Observe that as (V νN ) 13 and (V νN ) 14 are null, then these neutrinos do not form charged currents with the electrons. For this reason the analisys done in the next section is based on the production of these neutrinos and their final products in the form of muons .
Concerning neutral currents, we also explore the direct production of Z and its subsequent decay into a pair of n 1 3 L or n 1 4 L . The interactions that generate these processes are the last terms of the Eqs. (35) and (36). Our illustrative example yields the following values for the mixing matrix V N N , that along with Eq. (37) allows us to perform the analysis for this production.
Before go into the analysis, with the charged and neutral currents at hand, first thing to do is to check if our illustrative example obeys the rare lepton flavor violation(LFV) process µ → eγ constraint. Such process is allowed by the second coupling in Eq. (34). The branching ratio for the process mediated by these six heavy neutrinos is given by [11], where In the above branching ratio expression we use The present values of these parameters are found in [12]. Our illustrative example provides BR(µ → eγ) ≈ 1.4 × 10 −13 . This is very close to the current bound that is BR(µ → eγ) < 4.2 × 10 −13 [13]. So, this case may be confirmed or excluded at the next running of the MEG experiment.

III. ANALYSIS OF THE PRODUCTION MECHANISM AND MAIN CHANNELS
There are two major production channels for the n 1 i L neutrinos. The first one is via vector gauge boson W ± , which can be produced trough the s-channel in a proton-proton collision.
In the particular case of our illustrative example, the W ± can further decay into a µ lepton and the neutrinos n 1 i L . On the other hand, the n 1 i L can decay into µ and W ± . Then this channel can have as final product 3 leptons plus missing energy (µ ± µ ∓ ± ν ) or 2 muons and 2 jets (µ ± µ ∓ jj).
The second production mechanism for the neutrinos n 1 i L is through the direct production of the Z and its subsequent decay into a pair of n 1 i L . The final state for this type of channel will appear as pair of high boosted muons, pair of leptons and missing transverse energy To do so, we generate an UFO [14] file using the FeynRules [15]. This UFO file is latter used by the MadGraph5 [16] package to produce the hard scattering processes we want to investigate. All the hard scattering processes are further pass to Pythia version 8.1 [17] and Delphes [22] in order to hadronize and include the detector effects to make the data from of Monte-Carlo pseudo-events be as close as possible to the data produced by the LHC at 14

TeV.
A. pp → µ ± µ ∓ e ± ν e channel: As mentioned earlier, this is one of the main production mechanisms for the production of n 1 i L and is displayed in FIG. (1). To investigate this channel we generate 450000 events with 14 TeV center of mass energy. To stay safely away from infrared and colinear divergences, we apply the basic cuts of Eq. (40) at the generation level FIG. 1: Production of n 1 (3,4) at the LHC via W channel.
We focus our investigation in the production of the lightest new neutrinos. Thus, we are going to analyze the channel with the decay chain for the neutrino This choice allow us to reconstruct, with a good accuracy, the full decay chain generated by the n 1 i L . Another reason for this choice stems from the fact that in our model the couplings between W ± , n 1 3 L or W ± , n 1 4 L and µ are relatively large, allowing a sizable cross section for the production at the LHC. As consequence for this choice we have as main irreducible background the channels:  II: Kinematic (Dimension-full) and angular (Dimension-less) observables selected to study the channel pp → µ ± µ ∓ e ± ν e . We include dimensionless observables in two different referential frames: Center of Mass frame (top row) and n 1 i L rest frame (bottom row), where θ i,j is the angle between the respective particles from either the final state or reconstructed objects, W, n 1 i L , and ∆R(i, j) is the separation in the η × φ plane defined by (∆φ) 2 + (∆η) 2 .
boson. However, due to the number of events for the background remained after the selection, even when we impose a cut window around the mass predicted for the n 1 3 L (n 1 4 L ), buries completely our signal. To overcome this problem we make use of a Deep learning algorithm trained to distinguish the signal over the main irreducible background using the observables described before. We present the details of the architecture and training methodology in the section III C. between selected particles at the Center of mass and n 1 i,L particle frames. The subscript indicate the observable is be taking in the n 1 3 L (n 1 4 L ) reference frame. The blue region represents the kinematic distribution for the signal events, while the orange, green and red lines are the Ztb, W Z and tt + Z respective backgrounds. The met variable corresponds to E T vector direction.
FIG. 4: Angular (dimensionless) observables for the pp → µ ± µ ∓ e ± ν e channel. Separation in the η ×φ plane between selected particles at the Center of mass and n 1 i,L particle frames. The subscript script indicate the observable is be taking in the n 1 i L reference frame. The blue region represents the kinematic distribution for the signal events, while the orange, green and red lines are the Ztb, W Z and tt + Z respective backgrounds. The met variable corresponds to E T vector direction.

B. Z channel:
Another production mechanism for the n 1 i L is through the production and subsequent decay of Z , see FIG. (5).  The W ± bosons are reconstructed from the final state electrons and the E T . In our simulations we set the value for the Z mass to 4 TeV and n 1 3L (n 1 4L ) to 96.31 GeV which are consistent with the current estimate limits [20,21] for the expected Z mass. In FIGs. 6 we display the cross section for a given range of Z mass against the n 1 i L ones. The region explored in this paper offers a sizeable cross section for the production of a Z and its subsequent decay into n 1 i L . For the main irreducible background we have: This channel contains six leptons as final state particles, 4 visible (µ + , µ − , e + , e − ) and 2 invisible (ν e ,ν e ), which opens up the number of observables we can use to distinguish the signal over background. We choose the following dimension-full and dimensionless variables, see TABLE IV, and in FIGs. 7 -9 we display the respective distributions.
FIG. 8: Angular (dimensionless) observables for the pp → µ ± µ ∓ e ± ν e e ∓ ν e channel. The blue region represents the angular distribution of our signal, while the orange and green lines are the ttZ and W W Z backgrounds. The subscript script indicate the observable is be taking in the n 1 i L object reconstructed reference frame.
FIG. 9: Angular (dimensionless) observables for the pp → µ ± µ ∓ e ± ν e e ∓ ν e channel. The subscript script indicate the observable is be taking in the n 1 i L object reconstructed reference frame.

C. Deep learning analysis: Methods and results
After we select the events and gather the kinematic and angular information we can feed this information into a Neural Network (NN) designed to proper separate signal over background. Due to the simplicity of the data-set of our events, which store the information from the events as tables where each row correspond to an event entry and the columns are the observables, we decide to work with a fully connected NN. However, we still have to choose some important parameters for the NN: number of layers, number of neurons, kernel initializes, etc. The decision of choose the correct parameters directly reflect the efficiency of our NN, which can be translated into significance of discovery, or not, of the particles predicted by the model.
This selection is often refereed as hyperparameter optimization. A first approach is to use "brute force" to tune the hyperparameters by using a grid search, but the number of combinations and the computational time to test each one of them increases exponentially.
More efficient ways beyond grid search are random sampling or using gaussian process algorithms to learn the best hyperparameters. Another way to tackle this problem is to use genetic/evolutionary algorithms, as in Ref. [18].
To test the different architectures, as well the modifications and fining tuning of the parameters, we set up an evolutionary algorithm to test the different combinations of parameters by creating a set of populations. In our case we restrict the population to 25 models, and keep the top 5 models with highest accuracy, after 5 rounds (generations) we obtain the top 3 architectures sorted by accuracy and we select the best one to continue our analysis. This full process takes around 2 hours in a NVIDIA GTX 1070 GPU. We use Tensorflow 2.0 [23] to build, train and evaluate our models.
The best architecture and hyperparameters found by our genetic algorithm consist of a 5 layers NN each one with 512 neurons with a Rectified Linear Unit (a.k.a. ReLU) activation function with the exception of the top layers which consist of a layer with 4 neurons, one for each channel analysed (µ n 1 3 L (µ n 1 4 L ), Ztb, W Z, ttZ), and a sigmoid as activation function. We also found that initial random weights for the layers sampled from normal distribution and L2 regularization with a value of 10 −7 gives the best significance. We also found a similar architecture for the channel n Our data sets consist of tables where each row corresponds to an event entry and the columns are the kinematics and angular distributions we described in the sections. Due to the selection criteria I and III we impose into the the signal and backgrounds events, we ended up with an imbalanced number of events for each channel, this can lead the DNN model to over-fit towards the majority class, which turns the model unable to make correct predictions for the classes we are interested. To overcome this problem we balance the original data set using Synthetic Minority Over-sampling Technique (SMOTE) [24], we first dived the original data set into 80% to generate the balance data set and 20% to use our validation set.
is to one, the better we should expect the backgrounds can be cleaned up for a giving signal efficiency.
We are interested in obtaining not only the acceptance and rejection factors, but mainly the statistical significance of the signal. To do so we can use the predictions made by our NN to estimate the number of events expected and from the number of events for each of the analysed channels get the estimate Asimov significance, which depends on the integrated luminosity and systematic uncertainties which are often disregarded in machine learning studies. The Asimov estimate of significance [19], a well-established approach to evaluate likelihood-based tests of new physics taking into account the systematic uncertainty on the background normalization, can then be used for a more careful estimate of the signal significance at the training and testing phases of construction of the classifier. The formula of the Asimov signal significance is given by where, for a given integrated luminosity, s is the number of signal events, b is the number of background events, and the uncertainty associated with the number of background events is given by σ b . In Fig. 11 we plot the estimate Asimov significance dependency over the classification score assigned by the NN. Despite the relative higher cross-section for the process pp → W → µn 1 iL and the 99% accuracy achieved by the NN, the overwhelm irreducible background we have for this channel  dominates the uncertainties for the Asimov significance. This imposes a bigger challenge to one who intend to probe such particle using this channel alone. Meanwhile, the process pp → Z → n 1 iLn 1 iL offers a new window to probe not only the n 1 iL but the aforementioned Z boson. The smaller backgrounds cross-section and the 100% accuracy achieved by the NN allow us to safely probe this channel and estimate higher significance using current LHC luminosity. Combining all these factors if the Z is not discovery in this channel, we can exclude this model with a Z mass below 4 TeV using current LHC luminosity. However, from FIG. (6) we still have a wide range of mass to explore and use the analysis we developed so far as main guideline to constrain the parameters of the 331RHN.
We can project the Asimov significance for a range of luminosity values. In FIG. (12) we have the projected significance with 1% systematic error versus the expected luminosity.
The bands correspond to the projected systematic uncertainties. Due to the systematic dominance over the W → µn 1 iL channel, we can only achieve 3σ significance at 3000fb −1 ; yet, the projected significance for the Z → n 1 iLn 1 iL shows a better perspective with 10.5 σ of significance using the RUN-2 luminosity and around 33 σ at 3000fb −1 showing the sensitivity power not only of the analysis we developed, but the channel Z → n 1 iLn 1 iL as well.

IV. CONCLUSIONS
In this work we revisited, in details, the implementation of the inverse seesaw mechanism into the 3-3-1 model with right-handed neutrinos and, then, probed their signatures, in the form of heavy neutrinos, at the LHC by means of deep learning techniques. The spectrum of mass for these new neutrinos may vary from some hundreds of GeVs up to TeV scale.
Our analysis considered the production of such neutrinos by means of the processes pp → W ± → µ ± n 1 (3,4) L → µ ± µ ∓ e ± ν e and pp → Z → n 1 (3,4) L n 1 (3,4) L → µ + µ − e + e − ν eνe . We applied deep learning techniques in conjunction with evolutionary algorithms in our analysis and concluded that the second process is much more efficient than the first one. As main result we have that the second process allows we probe not only the signal of the ISS mechanism, but also the model in question, i.e., the 331RHN. According to our analysis if the Z is not discovery in this channel, we can exclude within 6 σ at 95% of confidence level this model with a Z mass below 4 TeV using current LHC luminosity.