Applying Artificial Neural Network Hadron - Hadron Collisions at LHC

High Energy Physics (HEP) targeting on particle physics, searches for the fundamental par‐ ticles and forces which construct the world surrounding us and understands how our uni‐ verse works at its most fundamental level. Elementary particles of the Standard Model are gauge Bosons (force carriers) and Fermions which are classified into two groups: Leptons (i.e. Muons, Electrons, etc) and Quarks (Protons, Neutrons, etc).


Introduction
High Energy Physics (HEP) targeting on particle physics, searches for the fundamental particles and forces which construct the world surrounding us and understands how our universe works at its most fundamental level.Elementary particles of the Standard Model are gauge Bosons (force carriers) and Fermions which are classified into two groups: Leptons (i.e.Muons, Electrons, etc) and Quarks (Protons, Neutrons, etc).
The study of the interactions between those elementary particles requests enormously high energy collisions as in LHC [1][2][3][4][5][6][7][8], up to the highest energy hadrons collider in the world s =14 Tev.Experimental results provide excellent opportunities to discover the missing particles of the Standard Model.As well as, LHC possibly will yield the way in the direction of our awareness of particle physics beyond the Standard Model.
The proton-proton (p-p) interaction is one of the fundamental interactions in high-energy physics.In order to fully exploit the enormous physics potential, it is important to have a complete understanding of the reaction mechanism.The particle multiplicity distributions, as one of the first measurements made at LHC, used to test various particle production models.It is based on different physics mechanisms and also provide constrains on model features.Some of these models are based on string fragmentation mechanism [9][10][11] and some are based on Pomeron exchange [12].
Recently, different modeling methods, based on soft computing systems, include the application of Artificial Intelligence (AI) Techniques.Those Evolution Algorithms have a physical powerful existence in that field [13][14][15][16][17].The behavior of the p-p interactions is complicated due to the nonlinear relationship between the interaction parameters and the output.To understand the interactions of fundamental particles, multipart data analysis are needed and AI techniques are vital.Those techniques are becoming useful as alternate approaches to conventional ones [18].In this sense, AI techniques, such as Artificial Neural Network (ANN) [19], Genetic Algorithm (GA) [20], Genetic Programming (GP) [21 and Gene Expression Programming (GEP) [22], can be used as alternative tools for the simulation of these interactions [13][14][15][16][17][21][22][23].
The motivation of using a NN approach is its learning algorithm that learns the relationships between variables in sets of data and then builds models to explain these relationships (mathematically dependant).
In this chapter, we have discovered the functions that describe the multiplicity distribution of the charged shower particles of p-p interactions at different values of high energies using the GA-ANN technique.This chapter is organized on five sections.Section 2, gives a review to the basics of the NN & GA technique.Section 3 explains how NN & GA is used to model the p-p interaction.Finally, the results and conclusions are provided in sections 4 and 5 respectively.

An overview of Artificial Neural Networks (ANN)
An ANN is a network of artificial neurons which can store, gain and utilize knowledge.Some researchers in ANNs decided that the name ``neuron'' was inappropriate and used other terms, such as ``node''.However, the use of the term neuron is now so deeply established that its continued general use seems assured.A way to encompass the NNs studied in the literature is to regard them as dynamical systems controlled by synaptic matrixes (i.e.Parallel Distributed Processes (PDPs)) [24].
In the following sub-sections we introduce some of the concepts and the basic components of NNs:

Neuron-like Processing Units
A processing neuron based on neural functionality which equals to the summation of the products of the input patterns element {x 1 , x 2 ,..., x p } and its corresponding weights {w 1 , w 2 ,..., w p } plus the bias θ.Some important concepts associated with this simplified neuron are defined below.
A single-layer network is an area of neurons while a multilayer network consists of more than one area of neurons.
Let u i ℓ be the i th neuron in ℓ th layer.The input layer is called the x th layer and the output layer is called the O th layer.Let nℓ be the number of neurons in the ℓ th layer.The weight of the link between neuron u j ℓ in layer ℓ and neuron u i ℓ+1 in layer ℓ+1 is denoted by w ij ℓ .Let {x 1 , x 2 ,..., x p } be the set of input patterns that the network is supposed to learn its classification and let {d 1 , d 2 ,..., d p }be the corresponding desired output patterns.It should be noted that x p is an n dimension vector {x 1p , x 2p ,..., x np } and d p is an n dimension vector {d 1p ,d 2p ,...,d np }.The pair (x p , d p ) is called a training pattern.
The output of a neuron u i 0 is the input x ip (for input pattern p).For the other layers, the network input net pi ℓ+1 to a neuron u i ℓ+1 for the input x pi ℓ+1 is usually computed as follows: where O pj ℓ = x pi ℓ+1 is the output of the neuron u j ℓ of layer ℓ and θ i ℓ+1 is the neuron's bias value of neuron u i ℓ+1 of layer ℓ+1.For the sake of a homogeneous representation, θ i is often substituted by a ``bias neuron'' with a constant output 1.This means that biases can be treated like weights, which is done throughout the remainder of the text.

Activation Functions
The activation function converts the neuron input to its activation (i.e. a new state of activation) by f (net p ).This allows the variation of input conditions to affect the output, usually included as O p .
The sigmoid function, as a non-linear function, is also often used as an activation function.The logistic function is an example of a sigmoid function of the following form: where β determines the steepness of the activation function.In the rest of this chapter we assume that β=1.

Network Architectures
Network architectures have different types (single-layer feedforward, multi-layer feedforward, and recurrent networks) [25].In this chapter the Multi-layer Feedforward Networks are considered, these contain one or more hidden layers.Hidden layers are placed between input and output layers.Those hidden layers enable extraction of higher-order features.The input layer receives an external activation vector, and passes it via weighted connections to the neurons in the first hidden layer [25].An example of this arrangement, a three layer NN, is shown in Fig 1 .This is a common form of NN.

Neural Networks Learning
To use a NN, it is essential to have some form of training, through which the values of the weights in the network are adjusted to reflect the characteristics of the input data.When the network is trained sufficiently, it will obtain the most nearest correct output for a presented set of input data.
A set of well-defined rules for the solution of a learning problem is called a learning algorithm.No unique learning algorithm exists for the design of NN.Learning algorithms differ from each other in the way in which the adjustment of Δw ij to the synaptic weight w ij is formulated.In other words, the objective of the learning process is to tune the weights in the network so that the network performs the desired mapping of input to output activation.
NNs are claimed to have the feature of generalization, through which a trained NN is able to provide correct output data to a set of previously (unseen) input data.Training determines the generalization capability in the network structure.
Supervised learning is a class of learning rules for NNs.In which a teaching is provided by telling the network output required for a given input.Weights are adjusted in the learning system so as to minimize the difference between the desired and actual outputs for each input training data.An example of a supervised learning rule is the delta rule which aims to minimize the error function.This means that the actual response of each output neuron, in the network, approaches the desired response for that neuron.This is illustrated in Fig 2.
The error ε pi for the i th neuron u i o of the output layer o for the training pair (x p , t p ) is computed as: This error is used to adjust the weights in such a way that the error is gradually reduced.The training process stops when the error for every training pair is reduced to an acceptable level, or when no further improvement is obtained.A method, known as "learning by epoch", first sums gradient information for the whole pattern set and then updates the weights.This method is also known as "batch learning" and most researchers use it for its good performance [25].Each weight-update tries to minimize the summed error of the pattern set.The error function can be defined for one training pattern pair (x p , d p ) as: Then, the error function can be defined for all the patterns (Known as the Total Sum of Squared, (TSS) errors as: The most desirable condition that we could achieve in any learning algorithm training is ε pi ≥0.Obviously, if this condition holds for all patterns in the training set, we can say that the algorithm found a global minimum.
The weights in the network are changed along a search direction, to drive the weights in the direction of the estimated minimum.The weight updating rule for the batch mode is given by: ( ) ( ) where w ij s+1 is the update weight of w ij ℓ of layer ℓ in the s th learning step, and s is the step number in the learning process.
In training a network, the available input data set consists of many facts and is normally divided into two groups.One group of facts is used as the training data set and the second group is retained for checking and testing the accuracy of the performance of the network after training.The proposed ANN model was trained using Levenberg-Marquardt optimization technique [26].
Data collected from experiments are divided into two sets, namely, training set and testing set.The training set is used to train the ANN model by adjusting the link weights of network model, which should include the data covering the entire experimental space.This means that the training data set has to be fairly large to contain all the required information and must include a wide variety of data from different experimental conditions, including different formulation composition and process parameters.
Linearly, the training error keeps dropping.If the error stops decreasing, or alternatively starts to rise, the ANN model starts to over-fit the data, and at this point, the training must be stopped.In case over-fitting or over-learning occurs during the training process, it is usually advisable to decrease the number of hidden units and/or hidden layers.In contrast, if the network is not sufficiently powerful to model the underlying function, over-learning is not likely to occur, and the training errors will drop to a satisfactory level.

Introduction
Evolutionary Computation (EC) uses computational models of evolutionary processes based on concepts in biological theory.Varieties of these evolutionary computational models have been proposed and used in many applications, including optimization of NN parameters and searching for new NN learning rules.We will refer to them as Evolutionary Algorithms (EAs) [27][28][29] EAs are based on the evolution of a population which evolves according to rules of selection and other operators such as crossover and mutation.Each individual in the population is given a measure of its fitness in the environment.Selection favors individual with high fitness.These individuals are perturbed using the operators.This provides general heuristics for exploration in the environment.This cycle of evaluation, selection, crossover, mutation and survival continues until some termination criterion is met.Although, it is very simple from a biological point of view, these algorithms are sufficiently complex to provide strong and powerful adaptive search mechanisms.
Genetic Algorithms (GAs) were developed in the 70s by John Holland [30], who strongly stressed recombination as the energetic potential of evolution [32].The notion of using abstract syntax trees to represent programs in GAs, Genetic Programming (GP), was suggested in [33], first implemented in [34] and popularised in [35][36][37].The term Genetic Programming is used to refer to both tree-based GAs and the evolutionary generation of programs [38,39].Although similar at the highest level, each of the two varieties implements genetic operators in a different manner.This thesis concentrates on the tree-based variety.We will discuss GP further in Section 3.4.In the following two sections, whose descriptions are mainly based on [30,32,33,35,36,37], we give more background information about natural and artificial evolution in general, and on GAs in particular.

Natural and Artificial Evolution
As described by Darwin [40], evolution is the process by which a population of organisms gradually adapt over time to enhance their chances of surviving.This is achieved by ensuring that the stronger individuals in the population have a higher chance of reproducing and creating children (offspring).
In artificial evolution, the members of the population represent possible solutions to a particular optimization problem.The problem itself represents the environment.We must apply each potential solution to the problem and assign it a fitness value, indicating its performance on the problem.The two essential features of natural evolution which we need to maintain are propagation of more adaptive features to future generations (by applying a selective pressure which gives better solutions a greater opportunity to reproduce) and the heritability of features from parent to children (we need to ensure that the process of reproduction keeps most of the features of the parent solution and yet allows for variety so that new features can be explored) [30].

The Genetic Algorithm
GAs is powerful search and optimization techniques, based on the mechanics of natural selection [31].Some basic terms used are: • A phenotype is a possible solution to the problem; • A chromosome is an encoding representation of a phenotype in a form that can be used; • A population is the variety of chromosomes that evolves from generation to generation; • A generation (a population set) represents a single step toward the solution; • Fitness is the measure of the performance of an individual on the problem; • Evaluation is the interpretation of the genotype into the phenotype and the computation of its fitness; • Genes are the parts of data which make up a chromosome.
The advantage of GAs is that they have a consistent structure for different problems.Accordingly, one GA can be used for a variety of optimization problems.GAs are used for a number of different application areas [30].GA is capable of finding good solutions quickly [32].Also, the GA is inherently parallel, since a population of potential solutions is maintained.
To solve an optimization problem, a GA requires four components and a termination criterion for the search.The components are: a representation (encoding) of the problem, a fitness evaluation function, a population initialization procedure and a set of genetic operators.
In addition, there are a set of GA control parameters, predefined to guide the GA, such as the size of the population, the method by which genetic operators are chosen, the probabilities of each genetic operator being chosen, the choice of methods for implementing probability in selection, the probability of mutation of a gene in a selected individual, the method used to select a crossover point for the recombination operator and the seed value used for the random number generator.
The structure of a typical GA can be described as follows [41] In the algorithm, an initial population is generated in line 2.Then, the algorithm computes the fitness for each member of the initial population in line 3. Subsequently, a loop is entered based on whether or not the algorithm's termination criteria are met in line 4. Line 6 contains the control code for the inner loop in which a new generation is created.Lines 7 through 10 contain the part of the algorithm in which new individuals are generated.First, a genetic operator is selected.The particular numbers of parents for that operator are then selected.The operator is then applied to generate one or more new children.Finally, the new children are added to the new generation.The most significant differences in GAs are: • GAs search a population of points in parallel, not a single point • GAs do not require derivative information (unlike gradient descending methods, e.g.SBP) or other additional knowledge -only the objective function and corresponding fitness levels affect the directions of search • GAs use probabilistic transition rules, not deterministic ones • GA can provide a number of potential solutions to a given problem • GAs operate on fixed length representations.

The Proposed Hybrid GA -ANN Modeling
Genetic connectionism combines genetic search and connectionist computation.GAs have been applied successfully to the problem of designing NNs with supervised learning processes, for evolving the architecture suitable for the problem [42][43][44][45][46][47].However, these applications do not address the problem of training neural networks, since they still depend on other training methods to adjust the weights.

GAs for Training NNs
GAs have been used for training NNs either with fixed architectures or in combination with constructive/destructive methods.This can be made by replacing traditional learning algorithms such as gradient-based methods [48].Not only have GAs been used to perform weight training for supervised learning and for reinforcement learning applications, but they have also been used to select training data and to translate the output behavior of NNs [49][50][51].GAs have been applied to the problem of finding NN architectures [52][53][54][55][56][57], where an architecture specification indicates how many hidden units a network should have and how these units should be connected.
The process key in the evolutionary design of neural architectures is shown in Fig.
The topologies of the network have to be distinct before any training process.The definition of the architecture has great weight on the network performance, the effectiveness and efficiency of the learning process.As discussed in [58], the alternative provided by destructive and constructive techniques is not satisfactory.
The network architecture designing can be explained as a search in the architecture space that each point represents a different topology.The search space is huge, even with a limited number of neurons, and a controlled connectivity.Additionally, the search space makes things even more difficult in some cases.For instance when networks with different topologies may show similar learning and generalization abilities, alternatively, networks with similar structures may have different performances.In addition, the performance evaluation depends on the training method and on the initial conditions (weight initialization) [59].
Building the architectures by means of GAs is strongly reliant on how the features of the network are encoded in the genotype.Using a bitstring is not essentially the best approach to evolve the architecture.Therefore, a determination has to be made concerning how the information about the architecture should be encoded in the genotype.
To find good NN architectures using GAs, we should know how to encode architectures (neurons, layers, and connections) in the chromosomes that can be manipulated by the GA.Encoding of NNs onto a chromosome can take many different forms.

Modeling by Using ANN and GA
This study proposed a hybrid model combined of ANN and GA (We called it "GA-ANN hybrid model") for optimization of the weights of feed-forward neural networks to improve the effectiveness of the ANN model.Assuming that the structure of these networks has been decided.Genetic algorithm is run to have the optimal parameters of the architectures, weights and biases of all the neurons which are joined to create vectors.We construct a genetic algorithm, which can search for the global optimum of the number of hidden units and the connection structure between the inputs and the output layers.During the weight training and adjusting process, the fitness functions of a neural network can be defined by considering two important factors: the error is the different between target and actual outputs.
In this work, we defined the fitness function as the mean square error (SSE).The approach is to use the GA-ANN model that is enough intelligent to discover functions for p-p interactions (mean multiplicity distribution of charged particles with respects of the total center of mass energy).The model is trained/predicated by using experimental data to simulate the pp interaction.GA-ANN has the potential to discover a new model, to show that the data sets are subdivided into two sets (training and predication).GA-ANN discovers a new model by using the training set while the predicated set is used to examine their generalization capabilities.To measure the error between the experimental data and the simulated data we used the statistic measures.The total deviation of the response values from the fit to the response values.It is also called the summed square of residuals and is usually labeled as SSE.The statistical measures of sum squared error (SSE), where y ^i = b 0 + b 1 x i is the predicted value for x i and y i is the observed data value occurring at x i .The proposed GA-ANN hybrid model has been used to model the multiplicity distribution of the charged shower particles.The proposed model was trained using Levenberg-Marquardt optimization technique [26].The architecture of GA-ANN has three inputs and one output.The inputs are the charged particles multiplicity (n), the total center of mass energy ( s ), and the pseudo rapidity (η).The output is the charged particles multiplicity distribution (P n ). Figure 1 shows the schematic of GA-ANN model.

Results and discussion
The input patterns of the designed GA-ANN hybrid have been trained to produce target patterns that modeling the pseudo-rapidity distribution.The fast Levenberg-Marquardt algorithm (LMA) has been employed to train the ANN.In order to obtain the optimal structure of ANN, we have used GA as hybrid model.
Simulation results based on both ANN and GA-ANN hybrid model, to model the distribution of shower charged particle produced for P-P at different the total center of mass energy, s 0.9 TeV, 2.36 Tev and 7 TeV, are given in Figure 2-a, b, and c respectively.We notice that the curves obtained by the trained GA-ANN hybrid model show an exact fitting to the experimental data in the three cases.
Then, the GA-ANN Hybrid model is able to exactly model for the charge particle multiplicity distribution.The total sum of squared error SSE, the weights and biases which used for the designed network are provided in the Appendix A. In this model we have obtained the minimum error (=0.0001) by using GA.Table 1 shows a comparison between the ANN model and the GA-ANN model for the prediction of the pseudo-rapidity distribution.In the 3x15x15x1 ANN structure, we have used 285 connections and obtained an error equal to 0.0001, while the connection in GA-ANN model is 225.Therefore, we noticed in the ANN model that by increasing the number of connections to 285 the error decreases to 0.01, but this needs more calculations.By using GA optimization search, we have obtained the structure which minimizes the number of connections equals to 229 only and the error (= 0.0001).This indicates that the GA-ANN hybrid model is more efficient than the ANN model.

Conclusions
The chapter presents the GA-ANN as a new technique for constructing the functions of the multiplicity distribution of charged particles, P n (n, η, s ) of p-p interaction.The discovered models show good match to the experimental data.Moreover, they are capable of testing experimental data for P n (n, η, s ) that are not used in the training session.
Consequence, the testing values of P n (n, η, s ) in terms of the same parameters are in good agreement with the experimental data from Particle Data Group.Finally, we conclude that GA-ANN has become one of important research areas in the field of high Energy physics.

Appendices
The The optimized GA-ANN: The standard GA has been used.The parameters are given as follows: Generation = 1000, Population = 4000, probability of crossover = 0.9, probability of mutation = 0.001, Fitness function is SSE.A neural network had been optimized as 229 of neurons.

Figure 1 .
Figure 1. the three layers (input, hidden and output) of neurons are fully interconnected.

Lines 11 and 12
serve to close the outer loop of the algorithm.Fitness values are computed for each individual in the new generation.These values are used to guide simulated natural selection in the new generation.The termination criterion is tested and the algorithm is either repeated or terminated.

Figure 3 .
Figure 3. Overview of GA-ANN hybrid model.

Figure 4 .
Figure 4. ANN and GA-ANN simulation results for charge particle Multiplicity distribution of shower p-p.
Data collected from experiments are divided into two sets, namely, training set and testing set.The training set is used to train the GA-ANN hybrid model.The testing data set is used to confirm the accuracy of the proposed model.It ensures that the relationship between inputs and outputs, based on the training and test sets are real.The data set is divided into two groups 80% for training and 20% for testing.For work completeness, the final weights and biases after training are given in Appendix A.

Table 1 .
Comparison between the different training algorithms (ANN and GA-ANN) for the for charge particle Multiplicity distribution.