Improvements of Track Fitting with Well Tuned Probability Distributions for Silicon Strip Detectors

The construction of a well tuned probability distributions is illustrated in synthetic way, these probability distributions produce faithful realizations of the impact point distributions for particles in silicon strip detector. Their use for track fitting shows a drastic improvements of a factor two, for the low noise case, and a factor three, for the high noise case, respect to the standard approach. The tracks are well reconstructed even in presence of hits with large errors, with a surprising effect of hit discarding. The applications illustrated are simulations of the PAMELA tracker, but other type of trackers can be handled similarly. The probability distributions are calculated for the center of gravity algorithms, and they are very different from gaussian probabilities. These differences are crucial to accurately reconstruct tracks with high error hits and to produce the effective discarding of the too noisy hits (outliers). The similarity of our distributions with the Cauchy distribution forced us to abandon the standard deviation for our comparisons and instead use the full width at half maximum. A set of mathematical approaches must be developed for these applications, some of them are standard in wide sense, even if very complex. One is essential and, in its absence, all the others are useless. Therefore, in this paper, we report the details of this critical approach. It extracts physical properties of the detectors, and allows the insertion of the functional dependence from the impact point in the probability distributions. Other papers will be dedicated to the remaining parts.


Introduction
Arrays of micro-strips are fundamental parts of almost all the recent high energy physics experiments [1], their excellent position resolutions are essential for track recognition. Very sophisticated algorithms are developed to reconstruct the tracks, they extract all the information released by particles crossing the sensitive area. The complexity of these systems is astounding, special efforts are dedicated to the associations of hits to tracks and track selection in an environment with high noise and fake hits [2] [3]. The track reconstructions are always performed with χ 2 minimization (often indicated as least squares) or Kalman filters. Each of these methods assumes, in an implicit or explicit way, identical Gaussian probability distribution functions (PDF) on large set of sensor arrays, or it is invoked, as a weaker justification, the optimality of the least squares among the linear methods (Gauss-Markov theorem). The advantages of these assumptions are evident, few details needed, linear equations to solve and, in any case, acceptable solutions obtained. The selection of Gaussian models and the connected linear forms is an obliged step for such huge detectors.
Deviations from pure gaussian model are considered in ref. [4,5]. In those papers, the PDF are approximated as sums of gaussians, a type for the cores and another for the tails of the distributions. The method of gaussian sums can be tailored to conserve the main part of the Kalman filters and their linearity. The resulting increase in resolution open the road to explorations of more realistic forms with the possibility of further fit improvements. This work introduces very specialized types of PDF for minimum ionizing particle (MIP). These PDF are finely tuned to contain many statistical properties of the MIP on silicon micro-strip sensors. Essential details of the hit-reconstruction algorithms and detector physics are inserted in the mathematical expressions used in track fitting, and our simulated fits show substantial improvements respect to the least squares method. Hence, any move toward more realistic PDF has an evident gain. Our PDF deviate strongly from gaussian PDF or gaussian sums, their unusual forms are imposed by the non-linearities of the most used hit-reconstruction algorithms: the center of gravity (COG) defined as (∑ i E i l i )/ ∑ i E i (sometimes called "centroid"). The weighted average (another name for the COG) of the strip positions (l i ), with weights depending from the strip signals (E i ), gives a hit resolution much better than the strip size. The spreading of particle signal in few nearby strips is a key element of this improvement, and in some devices the spreading is enhanced with appropriate cross-talk. The importance of the hit reconstruction is evident: better positions give better fits.
The gain, produced by the signal spreading, can be maximized with different COG algorithms with a different number of strips. These algorithms can be tuned at any specific situation and are able to keep the noise to a minimum maintaining the resolution. Each COG algorithm has its own set of systematic errors and very different PDF, it is important to operate with similar algorithms in similar set of events.
Our detector model is the PAMELA tracker [6] for MIP events. The PAMELA sensors are double-sided silicon strip detectors [7] of the type we used in the L3 micro-vertex detector [8]. Each side has strips with different properties. On one side, the read out electronics is applied to all the strips, this side has properties similar to conventional micro-strip detectors. On the other side, one strip each two is connected to the readout. The unconnected strip is left to a floating potential and it spreads the charge, released around it, to nearby readout strips. This side will be called floating strip side. Thus, we have to study two sensor types with very different signal to noise ratio and charge distributions.
Given the complex development needed to complete the work, it will be split in various part and published separately. Here all the details that conduct to our PDF will be neglected, in a wide sense these developments are standard [9] even if very complex. Section two describes few properties of our PDF and its non gaussian form. In section three, a mathematical tool will be developed, it defines an average energy collected by a strip at any impact point This is a key element that allows the association of a most probable set of impact points to each hit. Section four and five are dedicated to the simulation of tracks on this two-sided silicon detector, the floating strip side with low noise and large charge spreading, and the normal strip side with high noise and small charge spreading.

Probability distributions
The PDF for the COG algorithms are fixed by the details of the COG calculation; the number of strips involved and the rules of the strip selection are the most important. A detailed discussion of these aspects is reported in in ref. [10,11,12], a special attention is devoted to the systematic errors introduced by the discretization of the charge distributions. In ref. [10,11] we handled the signals with the formalism of large use for the Shannon sampling theorem (or with the present naming convention: the Whittaker Kotel'nikov Shannon sampling theorem [13]). In fact, a modern particle detector performs a sampling of the incident signals, and the natural methods to treat them are just those developed to reconstruct analogical signals from their sampled forms. This well grounded formalism allows us to demonstrate properties that will be encountered in the following. One property, discussed in ref. [10,12], is the effect of strip suppression. The limitation on the number of strips is very beneficial for the noise reduction, but these suppressions add typical systematic errors. The COG algorithms with an even number of strips have a set of forbidden values corresponding to the strip center, those with odd numbers have forbidden values around the strip borders. For n-strip COG, the amplitudes of the forbidden regions are proportional to the signal amplitudes escaping the n − 1 strips. Realistic PDF must accurately reproduce them.

Probability distribution for two strip COG
In ref. [10,12] no attention was given to the signal fluctuations, now this aspect is central: the signals are (gaussian) random variables and (non-gaussian) random variables are the results of the COG algorithms operating on them. The two strip COG will be our starting point, this algorithm has the lowest noise and is well suited around the orthogonal incidence (our selected direction in the simulations). As we anticipated, the extraction of an usable form of the PDF for the two strip COG is a very complex development, and we skip now all these huge details, we recall simply some key points that are essential in the following. Identically we skip the heavy details of the PDF for three, four and five strip COG, even if the results for the three strip COG will be recalled.
By definition, the weights of a COG algorithm are ratio of different combinations of independent random variables, hence the PDF of the COG values share some similarity with the PDF given by the ratio of two independent random variables. For example, these PDF go to infinite as a Cauchy distributions when gaussian random variables are implied. Therefore, given our assumption (as usual) of the gaussian PDF for the strip signals, similar tails must be expected even in our case. The two strip COG algorithm is more elaborate than the ratio of two random variables, and, even if the gaussian integrals in the PDF have not a closed form, the characteristic heavy tails of the Cauchy distributions are isolated from the very beginning. Similar results are encountered for the PDF of COG with an higher number of strips, no closed form for the gaussian integrals, and Cauchy-like tails.
For our heavy use, it is impossible to handle numerical integrations, and we are forced to find approximate analytical forms able to give excellent reproductions of the PDF. The most complete and accurate of these forms are very long, almost a printed page for each of them, thus we are obliged to publish them separately. We will synthesize the properties of the PDF P x g2 (x) for the two strip COG (indicated by x g2 ) with: The function P x g2 (x), for the COG value x, depends from six constants, the three a 1 , a 2 , a 3 are the mean values of the gaussian PDF. They are the noiseless charges collected by the strips. The other three constants σ 1 , σ 2 , σ 3 are the amplitudes of the gaussian noise. The parameters of central strip have index 2, and the indices 1 and 3 indicate respectively the parameters of the right and left strip. For x going to infinite, the x-dependence in the function F goes to one, hence P x g2 (x) is a slow decreasing function similar to a Cauchy distribution and sharing with this the infinite variance. The divergence of the variance produces long tails in a finite set of data, these tails rule out the use of the root mean square error as an useful parameter to compare the error distributions. For this we will use the full width at half maximum (FWHM) that is well defined even for Cauchy-like distributions.
The analytical approximations we derived for P x g2 (x) (and all the other with 3, 4, 5 strips) have very small differences respect to those given by numerical integrations, around 10 −5 or less. One of these expressions is illustrated by its comparison with hit simulations of floating strip detector [12]. The incidence angle is orthogonal to the detector plane, and the MIP charge spreads on few nearby strips. The non gaussian aspect of the COG probability is more evident for the impact point ε ≈ 0 indicated with a vertical black line in fig. 1. At this orthogonal incidence, the set of forbidden COG values around zero is reduced to a minimum. Even if the algorithm accumulates its largest position error in this region, it globally remains much better than the three strip COG that is free of this gap. The addition of one strip drastically increases the noise and degrades the position reconstructions, for this reason the two strip COG is preferred. The continuous red line of fig. 1 is the result of the position algorithm called η 2 in ref. [12] and it is a generalization of the η-algorithm of ref. [15]. The η 2 -algorithm is derived from the COG distributions, it suppresses appreciably the COG systematic error discussed in ref. [10] and places the red line in the maximum of the {ε, x g2 } density distribution. On the contrary, a pure COG algorithm assumes its value as an estimation of ε, thus its relation with ε is a straight-line from {−0.5, −0.5} to {0.5, 0.5}, appreciably different from the point distribution and the red line. The COG straight-line is not reported in fig. 1 because we will never use the simple COG for position reconstructions.
For the x g2 distribution, fig. 1 shows two branches around ε ≈ 0, these branches are produced by the noise that increases the signal of the left strip for ε > 0 or increases the signal of the right strip for ε < 0.
A sample of P x g2 (x) is plotted in fig. 2, the set of data is generated with the constants a 1 , a 2 , a 3 , the noiseless strip signals of the selected ε-value, and identical standard deviations σ 1 , σ 2 and σ 3 of their gaussian noise (4 ADC counts as reported in ref. [14]). The line of P x g2 (x) overlaps completely the normalized x g2 -histogram of the data sample, confirming the quality of these analytical approximations and of our PDF. This plot reproduces the PDF just in the critical region of the two strip COG where a forbidden region of x g2 -values is present in the noiseless algorithm. The noise dresses the two noiseless spikes giving them the form of two shifted gaussian-like curves, but the tails are Cauchy-like as in eq. 2.1. The most difficult testing point for our analytical P x g2 (x) is around ε = 0 for MIP direction inclined of 20 • respect to the orthogonal incidence, here the forbidden gap is very large (≈ 0.8τ) with two narrow bumps, but even in this case the P x g2 (x) gives a perfect match with the data histogram. At these angles, the three strip COG is correct selection, but our analytical form must work even here.
The simulated data are produced with uniform ionization along the particle path, the charge diffuses in a standard way up to the collection by the readout electrodes. This model neglects the non-uniform ionization along the particle path, but its insertion does not modifies fig. 2. The fluctuation of the charge release looks well contained in the Landau distribution of the total collected charge.

The introduction of the impact point
The PDF of eq. 2.1 are defined for a fixed ε, as shown in fig. 2, and give the corresponding x g2 distributions. This is the maximum that the probability theory can give, but, in spite of the work invested, these forms are useless for track reconstruction, any further improvement must go in depth into the sensor physics. In fact, useful PDF must associate sets of ε-values to the measured COG for each hit, the COG is the information given by the detector. The standard gaussian model attributes identical set of ε-distributions at each hit. This assumption looks very improbable just observing fig. 1 for small x-values. If we cut fig. 1 with constant COG lines, the thickness of the blue point distribution is very different for each selected value. The thicknesses become negligible around zero and increase drastically above and below zero, this region has the strongest deviations from constant gaussian PDF.
To complete our analytic approach to a well tuned PDF, we need the probability of ε at fixed x along lines orthogonal to the black line of fig. 1. Therefore eq. 2.1 must be extended to contain the ε functional dependence. No ε-dependence can be supposed for the strip noises σ 1 , σ 2 , σ 3 , they depend from the quality of the strip (noisy strips, average strips, etc.), and are obtained from the pedestal runs. On the contrary, the energies a 1 , a 2 , a 3 surely depend from the impact point ε, and a method must be defined to extract this functional dependence from the data.
Along the particle path, the ionization fluctuates projecting the fluctuations on the collecting strips, hence, well defined functions a 1 (ε), a 2 (ε), a 3 (ε) do not exist. They must be defined as averages over the signal fluctuations. A procedure could be the use of the average signal distribution defined in ref. [10,12] with the dx gn (ε)/dε. But, the functions a 1 (ε), a 2 (ε), a 3 (ε) require the convolution of the signal distribution with an interval function large as its period (a strip), and this convolution is always a constant function. A different path, to avoid this trivial result, must be found.

Brief synthesis of the η-algorithm
In the following, we will amply use our generalizations of the η-algorithm especially in the extraction of the {a k (ε)}-functions, hence a rapid synthesis of these extensions is in order. A key point for the effectiveness of the original η-algorithm is the compensation of the non-uniformity of the two strip COG histograms. For the sake of precision, the original algorithm dealt with a special combination of two strip signals called η-function, but the η-function can be transformed in a two strip COG, this justifies our constant reference to the COG.
It is easy to observe that a large set of MIP , crossing almost uniformly a detector, produces non uniform COG histograms, this means that the COG algorithm has privileged outputs (systematic errors). A good position reconstruction algorithm must produce uniform histograms, and this is just the result of the η-algorithm. The derivation of ref. [15] limited the algorithm to two strips and to symmetrical signal distribution, this very improbable condition renders problematic its use for generic geometry. In ref. [12] we demonstrated how to extend the η-algorithm for any asymmetric signal distribution and any number of strips, for this we introduce the notation η j -algorithms ( j is the number of strips implied). In its simplest form, our extended η j -algorithms are defined as : Where τ is the strip length, Γ p j (x ′ g j ) is the periodic PDF for the j-strip COG x ′ g j , it is given by a uniform illumination over the detector plane and normalized to one on a strip (essentially a normalized histogram divided by the bin size and shifted by a set of strip length). Therefore, the integral term is periodic and can be expressed as a Fourier Series. The constant C 0 is 0 for symmetric signal distributions. The general expression for C 0 is reported in ref. [12,16] with other details about η j (x g j ) (indicated there as ε j (x g j ) and Γ p (x g j ) ). These extensions of the η jalgorithms were successfully verified in a dedicated test beam experiment [14] with the use of a special set-up. The test beam evidenced even another subtle systematic error whose correction was discussed in ref. [16]. All these corrections are crucial for our PDF, the final result must produce the maximum of the probability just along the continuous red line of fig. 1 where it is evident the largest COG population. Any systematic error of the η 2 -algorithm will introduce a shift from this optimal position and a corresponding error in the fitted tracks with any type of fitting algorithm. The consistency of all our procedures will be verified in the following.

The definition of the strip energies
For the extraction of the functions a 1 (ε), a 2 (ε), a 3 (ε), the appropriate formalism is that of ref. [10] with extensive use of the sampling theorems [13]. But, due to the central role of these functions as our key to go in depth in the sensor physics, we will follow a simpler approach where the sole properties of the Fourier transform and the Fourier Series are implied.
Let us define our notations. The signal collected by a strip is obtained from the convolution of the strip response function g(x) with the average charge distribution ϕ(x − ε). The function ϕ(x − ε) is defined to have its COG coinciding with the impact point ε of the MIP, and normalized to one. This position fixes ε to be the COG of the average primary ionization segment [12]. The average strip signals are given by the convolution: For any n ∈ Z and τ the strip length, the value of the function f (x − ε) at x = nτ gives the signal collected by the strip with its center in nτ (sampling at nτ). The origin of the reference system is the center of strip with the maximum energy ( with the index 2 ), the other two strip centers are to the right and to the left with position +τ and −τ ( indices 1 and 3 ). The short range of ϕ(x) and g(x) gives very few non zero f (nτ − ε), hence all our infinite sums will be always convergent. In the following, the strip length τ is always taken equal to one, its indication is reported to assure the right dimensions of the equations. The functions a 1 (ε), a 2 (ε), a 3 (ε) are expressed as: With the shifts of eq. 3.2, the function f (−ε) produces all the functions a J (ε).

We will prove that: if a generic signal distribution is a convolution with an interval function, with the size of the strip or any of its multiples, the sum of the signal collected by all the strips is independent from the impact point for any type of strip-loss.
Let us consider all the strips at intervals T = Nτ ( N-multiple of τ ). These strips are the sole utilized, all the others are neglected. This selection generates an effective drastic loss that allows the exploration of the tails of the function f (x). The integration of the function f (kT − ε) ( for k ∈ Z ) with an interval function Π(x/T ) ( Π(x) = 1 for |x| ≤ 1/2 and Π(x) = 0 for |x| > 1/2 ) of size T will be our starting point to reconstruct f (x): For future manipulations is better to rearrange eq. 3.3 as a convolution: The function ∑ k∈Z h(kT − ε f ) is obtained by shifting copies of the function h(−x) for all the intervals kT , this produces a periodic function in ε f with period T . As we said, the short range of h(−x) assures the convergence of the infinite sum. Let us show that it is constant for any ε f as stated previously. Due to periodicity, ∑ k∈Z h(kT − ε f ) can be expressed with a Fourier Series and the Poisson identity [17,18] gives this form of Fourier Series: Where H(−2π L/T ) is the Fourier Transform of h(x) in the points −2π L/T . For the convolution theorem, the Fourier Transform of h(x) is the product of the Fourier transforms of f (x) and Π(x/T ) respectively F(ω) and 2 sin(ωT /2)/ω. Due to eq. 3.1, F(ω) is the product of G(ω)Φ(ω) the Fourier transform of g(x) and ϕ(x). The term 2 sin(ωT /2)/ω is zero for all ω = 2π L/T with L = 0 and equal to T for L = 0. Therefore, all the terms depending from ε f are zero and ∑ k∈Z h(kT − ε f ) is constant as we stated. Due to our definition of Φ(0) = 1, eq. 3.5 becomes: This result can be obtained even with a different path. With another change of variable in eq. 3.4, the sum ∑ k∈Z h(kT − ε f ) reduces to an integral from −∞ to +∞ of eq. 3.1 and the convolution theorem gives eq. 3.6. The COG with the h(kT − ε f ) is: Isolating the periodic part and inserting eq. 3.6, it becomes: that can be converted in: Again the Poisson identity gives a generalized form of eq. 3.5: As defined in eq. 3.5, the function H(ω − 2π L/T ) is the product of all the convolved functions, and its derivative is a sum of terms with a derivative of single factor. For ω → 0 and L = 0, all the terms without the derivative of 2 sin(ωT /2)/ω (the Fourier transform of Π(x/T )) are zero. For L = 0 and ω → 0, the convolution theorem for the first momenta defines dH(ω)/dω to be the sum of terms proportional to the first momenta of the convolved functions. The interval function Π(x/T ) is symmetric and its first momentum is zero. The first momentum of ϕ(x) is zero by our definition.
If g(x) is symmetric even its first momentum is zero, otherwise it will give the only non zero term. For asymmetric g(x), the shift δ g of the strip COG respect to the strip center must be added. In this general case x g (ε f ) − ε f becomes: With the addition of the missing term F(0)/G(0), the sum, in right hand side of eq. 3.11, becomes a Fourier Series . But F(0)/G(0) must be 1. In fact, Applying back the Poisson identity (defined in eq. 3.5 with our notations), it can be written as: The factor T is absent in eq. 3.11 and must be inserted. Equation 3.12 underlines the possible presence of tails interference (called "aliasing" in signal theory) if the range of f (x) is larger than T . The aliasing can be limited selecting a reasonable period T to reproduce correctly these tails. In the absence of aliasing, a section of amplitude T and n = 0 of eq. 3.12 is given by: that easily produces the a j (ε) of eq. 3.2 with the appropriate shifts. Due to eq. 3.11, the normalization of dx g (ε f )/dε f on a period T is just T , thus the remaining factor is normalized to one in any case. The normalization of g(x) is τ (one for our conventions) in absence of loss, and decreases with the loss. The fixed normalization of dx g (ε f )/dε f does not allow the extraction of the average strip efficiency G(0) in presence of a loss.

The sliding window and its simulation
The form of eq. 3.3 suggests an experimental procedure to extract the functions {a j (ε)}. A rectangular window, as wide as the period T and with a side parallel to the strips, is required. The window slides on the detector and, at each step, collects a fixed amount of signal from the events contained in its interior. With very small steps and collecting a large number of events, this set of data can approximate a convolution with a function Π(x). The use of eq. 3.13 on the COG calculated with eq. 3.7 gives the unknowns.
In the absence of this special setup, the data collected in a standard test beam can be a valid substitution, but the result could contain some artifacts. The following approximation of equation 3.3 can substitute the sliding window: 14) The terms f (kT − ε i ) are the charges collected by the selected strips at distance kT , their values fluctuate due to the mechanism of charge release. In the simulation, careful averages of eq. 3.14 and the use of the impact points are able to produce a i (ε) identical to the starting ones. In a test beam data or in a running experiment, the impact points are not available and the use of estimated values is obliged. The COG values must be excluded due to a linear correlation with the strip signals, in this case the {a i (ε)}-functions turn out to be almost identical triangular functions. The outputs of the η 2 (x g2 ) and η 3 (x g3 ) algorithms give good results in the simulations. Their error distributions introduce small artifacts and distortions in the function f (x), but almost all can be corrected with an accurate research of their origins. The final result shows very smooth shapes unexpected for a Monte Carlo integration. After acquiring confidence with the method in the simulations, we utilize it with test beam data of ref. [14]. The test beam data are spread over a large number of different strips, but it is easy to collect the hits to have an identical maximum signal strip, this collection simulates a uniform illumination on that strip. We need a uniform distribution over a larger strip set, fifteen strips are used in our reconstruction. An integer random number is selected for each hit and all the elements of the hit are shifted by a number of strips corresponding to this integer. These translations simulate a uniform hit population on the selected strips. The sliding window (five strips wide in this test) operates on this distribution, isolating the data whose estimated (η 2 , η 3 )-positions are contained in the window. This shuffling is done many times and the results of the sliding window operations are averaged. The quality of the resulting functions a i (ε) can be tested by a confront with the histograms of x g2 and x g3 as illustrated in the following.

Extended probability distributions
The functions a 1 (ε), a 2 (ε), a 3 (ε) introduce in our (a page long) PDF the impact point ε: The a i (ε) are the fractions of the total signal collected by each strip, and the total (noiseless) signal E t must be introduced explicitly in eq. 3.15. For simplicity, all the σ j are taken identical and dropped from the notation, but it is evident the possibility to consider noisy strips. In the application, each hit has the normalizing constant E t given by the sum of the signals collected by the central strip and the two lateral ones. This total signal contains noise, but nothing better is available. Now the function P x g2 (x, E t , ε), for each E t , is a surface in the {x, ε}-plane. The measured COG of the hit, the x-value, cuts on this surface the ε-distribution that will be used in the maximum likelihood search. The normalization of P x g2 (x, E t , ε) on x for any a j and E t imposes to the marginal probability P x g2 (ε) to be always equal to one. This produces an effective uniform "illumination" that is consistent with our assumptions. The other marginal probability H x g2 (E t , x) is connected to the histogram of x g2 . In fact, the probability of x g2 = x for a total event signal E t is given by the integration of P x g2 (x, E t , ε) on ε over a reasonable range of ε-values. The range of integration is not critical, it must cover all the ε-values that give an appreciable contribution to the integral, and remain in the region where our a j (ε) are well defined. We standardize this range to a two strip length. The resulting H x g2 (E t , x) must be calculated for x in the interval −0.5 < x ≤ 0.5, and averaged over the probability of E t . This can be compared with the histogram of the data to test the quality of the functions {a j (ε)}. The histograms are very sensible to these functions, and further improvement could be introduced. As we said above, a lot of work has been dedicated to find analytical approximations for the COG PDF. These approximate analytical expressions, even if very long, are essential to speed up this comparison. Otherwise, each point of our calculated histogram would be obtained by multidimensional numerical integrations.
To evaluate the results of our PDF, simulations are produced as near to the data as possible. For this, the {a j (ε)} are extracted from the data of a test beam [14] with sensors identical to those of the PAMELA tracker. The detector was formed by five layers of two-sided sensors without magnetic field. The slight differences among sensors are neglected and all the data of similar sides are used together to increase the data sample.

Floating strip side
This side has the lowest noise (4 ADC counts) and the maximum of the charge released around 142 ADC counts. The functional relations of a j (ε) and σ j in eq. 3.15 are scale invariant and we are free to measure all them in ADC counts with x as a pure number in unity of strip length. The function f (−ε) for this side is illustrated in fig. 3. Its aspect is similar to that obtained in ref. [19] with a laser test, surprising similar are the lateral crosstalk around ±1.5. The dips around ±1 are small reconstruction artifacts. The shoulders around ±0.5 are typical of the floating strips [12], they spread the charges to nearby strips producing the two lateral peaks of COG histograms.
Cutting fig. 3 at ±0.5 and ±1.5 and shifting the selected parts, the functions {a j (ε), j = 1, . . . , 5} are obtained. The first three a j are shown in the left side of fig. 4, the other two are used in the x g3 PDF where five a j are required. The accurate elimination of all the systematic effects, discussed in ref. [12,16], renders the a 2 (ε) the strip with the maximum energy in the range −0.5 ≤ ε ≤ +0.5. Improper corrections give regions where a 2 (ε) is not the maximum.
The functions {a j (ε), j = 1, 2, 3} allow to extract the average charge release by a MIP. The numerical derivative of the x g3 (ε), built with the functions {a j (ε)}, gives an hint of this charge release. The two bumps in the right side of fig. 4 are typical of the floating strip sensors that doubles the MIP shower. It is noticeable the smoothness and the absence of the artifacts discussed in ref. [12].
The histograms of x g2 and x g3 can be used to test the quality of the functions {a j (ε)}. The marginal probabilities H x g2 (x, E t ) and H x g3 (x, E t ) averaged over the charge released should coincide with the histograms. The average over the charge released turns out to be of minor relevance and to speed up the confronts we use a fixed E t of 142 ADC counts corresponding to its most probable value. The histogram for the x g2 data is very similar to the H x g2 (x, E t ) (E t = 142 ADC), thus for the two strip case we can assume a good quality of the functions {a j (ε)}. For the three strip case a discrepancy is present around x g3 ≈ 0. Probably the functions {a j (ε)} are acceptable even in this case for the strong sensitivity of the histograms to these functions. We have to remind the proportionality of the histograms to dε/dx g .

The simulated data
The three functions {a j (ε)} allow the production of simulated data very similar to the real data, the following steps synthesize the procedure.  All the generated values are separately saved for future use. In spite of noise of step #3, the distributions of the simulated data are practically identical to the experimental ones. The second generation of the {a j }-functions turns out to be almost identical to the first one, but, although the differences are negligible, the calculated histograms have disagreements similar to those of fig. 5. This type of simulation does not allow the introduction of the non uniform charge release along the particle path. Other simulations, produced with this feature, give results indistinguishable from those without the feature. The main reason of this insensitivity is probably due to the orthogonal incidence, the diffusion can spread the charge on the strips in a way very similar to a uniform release. The total charge accounts for the main part of the fluctuations, the remaining part adds in quadrature to the noise, but, being probably small, remains invisible. In any case increases of the σ i could absorb these fluctuations.

Track reconstruction
All the simulated tracks are identical: straight lines, incident on the origin and directed orthogonal to the detector plane. A track is defined by five hits, hence the fit has three degree of freedom as the PAMELA tracks in the magnetic field (but the magnetic field was absent in this test beam).
The simulated hits are divided in groups of five to produce the tracks. For the least squares, the exact impact position ε of each hit is subtracted from its reconstructed η 2 (x g2 ) position, in this way each group of five hits defines a track with our geometry and the error distribution of η 2 (x g2 ). We prefer the use of the η 2 (x g2 ) positions because this choice gives parameter distributions better than those obtained with the simple x g2 -COG positions (a frequent choice). The slight improvement is due to the reduction of the systematic errors present in the COG. In the {ξ , z}-plane the tracks have equation: By definition, all the tracks have γ = 0, β = 0, but, due to the noise, their fitted values are distributed around zero. For our PDF, no linear reduction is possible and we have to handle the non-linearity of the likelihood maximization. Therefore, the parameters { γ n , β n } of the track n are obtained minimizing L(γ n , β n ) defined as the negative logarithm of the likelihood with the PDF of eq. 3.15: x( j), E t ( j) and z j are respectively: the x g2 -value of the two strip COG, the total signal of the hit j for the track n and the position of the j − 5n detector plane. The functional dependence γ n z j + β n + ε( j) is inserted in the ε-dependence of {a k (ε) k = 1, 2, 3}, ε( j) must be added to recover the shift to have impact points on tracks with {γ = 0, β = 0}. The minimizing algorithm is the standard MATLAB [21] fminsearch function. As a starting point of the minimum search, we could use the parameters given by the least squares results. But, their precisions are modest and it would be better to have a nearer starting point. For this we try to approximate the probability distributions at fixed x g2 with gaussians. For each hit, we calculate an effective variance ({σ e f f ( j) 2 }) and use it as the width of a supposed gaussian error. The range of integrals for P x g2 (x, E t , ε)ε 2 must be drastically limited to avoid the divergence of the Cauchy-like tails. The effective gaussian of each hit is centered in the η 2 ( j) − ε( j) with a width σ e f f ( j).  . 6), the scale of this figure amplifies the variations. Much larger variations will be encountered in the following. The trend of the x g2 -histogram is easily recognizable in the {σ e f f ( j)} distribution. This is due to a relation between these two plots. In facts, {σ e f f ( j)} estimates the range of possible ε-values corresponding to an x g2 -value, but, similarly for the histograms, the height of the x g2 -value is given by the ε-interval that produces the same x g2 . Thus, the highest {σ e f f ( j)} are located in the highest regions of the histogram, and the lowest {σ e f f ( j)} are in the lowest regions of the histogram. Here the assumption of uniform "illumination" is again essential.
The effective gaussian approximations reduce the maximum likelihood search to linear equations, their solutions are used as starting points for the MATLAB fminsearch function to minimize eq. 4.2. In the following, we will call MIN-LOG the parameters { γ n , β n } given by the minima of eq. 4.2. With "effective variance" we will indicate the results obtained with the effective gaussian parameters {σ e f f ( j), η 2 ( j) − ε( j)} (weighted least squares) for each hit and these two methods will be compared with the results of a least squares approach, often the baseline of track fitting. (The last two approaches are often called χ 2 minimization.) The Kalman filter is not essentially different from least squares, it has important advantages for its recursiveness in complex environments.
We can see in fig. 7 the drastic improvements of these two methods: the MIN-LOG and the use of the effective variances respect to the least squares method. The extraction of the FWHM is an annoying procedure, but we have no other way to compare distributions with large tails. The ratios of the FWHM of the least squares and the MIN-LOG are around a factor two. The detailed values are: the ratio of the FWHM for γ is 1.7, that for β is 2.17, the ratio of the maxima (MIN-LOG divided by least squares) is 2.3 for γ and 2.0 for β . The MIN-LOG has a slightly better distributions of γ or β than the that obtained with the effective variances. The strong similarity of the results are due to the good approximation with gaussians of our PDF, here the differences of  the tails are irrelevant. The drastic simplification of the linear equations suggests this method as a viable alternative to minimization of eq. 4.2, even if the computational complexity is comparable. The extraction of σ e f f ( j) is a time consuming procedure.
Our preference to explore the distributions of the γ and β parameters is due to the acceptable similarity with the least squares results and we are directly interested in them as the aim of the fit. Another alternative would be the exploration of the residuals, their extensive use is due to easy extractions from the data and the assumption of a direct relation with the error PDF. Now the process is very complex and the residual distributions turn out to be extremely different from these given by the least squares, our PDF produces very high peaks around zero due to the contacts of the tracks with good hits. Even if accessible from the data, the residuals do not show a clear connection to our preferred plots of fig. 7.
Another type of interesting residuals are the differences respect to the exact points. These are easily produced by the simulations, but, showing relations similar to those of fig. 7, they are discussed in the last subsection. The agreement of the effective gaussian approximations to our PDF is illustrated in fig. 8, the two types of PDF are plotted together for the five hits of a track. For each hit, we observe different distributions with the gaussian approximations very similar to the our PDF. The third hit is very good, with the narrowest probability distribution, and the reconstructed track passes near to it. The first hit is almost discarded having a wide probability distribution and a position evidently out of line. The other hits contribute to the slight bending of the track. We have to remind the differences of the tails of the PDF, invisible at this scale, they do not play a role on this sensor side, but are relevant in the other side.
To verify the consistency of our PDF, we can observe in fig. 8 the strict proximity of the maxima of the peaks for the narrow distributions. We have to recall that the sole element obtained from {P x g2 (x( j), E t ( j), ε)} are the {σ e f f ( j)} of the effective gaussian PDF, and integrals on {P x g2 (x( j), E t ( j), ε)}ε 2 give their widths. The center of each gaussian is η 2 ( j) − ε( j) imposed by our virtual tracks {γ = 0, β = 0}. The maximum of the peak for P x g2 (x( j), E t ( j), ε) (shifted of the identical ε( j) ) is produced by the three functions a j (ε) inserted in the PDF. The functions a j (ε) are obtained with a set of transformations (from eq. 3.3 to eq. 3.13) over the averages of eq. 3.14, and the sliding window covers a large set of charges deposited in their corresponding positions η j (x g j ). These two completely different paths give very consistent results. At the same time, fig. 8 underlines the inconsistency of using directly the COG in the fit. In this case, the effective (or constant) gaussian acquires a variable shift respect to its corresponding P x g2 (x( j), E t ( j), ε) distribution, and part of the COG systematic error enters the fit.

Normal strip side
In the normal strip side, all the strips are connected to the read out system. The charge spread is lower and the noise is higher than in the floating strip side. The strip noise of 8 ADC counts produces large shifts of the reconstructed points, and the long ranges of the P x g2 (x, E t , ε) turn out to be relevant. The function f (−ε) is similar to an interval function, the slight rounding to the sides is due to a small charge diffusion on the adjacent strips. The large noise increases the reconstruction artifacts around ε ≈ ±1, and around ε ≈ 0. The forms of the {a j (ε)} modifie appreciably the reproduction of x g3 histogram, but they have a negligible effect on the x g2 histogram as illustrated in fig. 10. The good consistency with the x g2 histogram is an indication that the {a j (ε)} could be well suited to an application using two strips at time.
Even in this case, the energy fractions {a j (ε)} are used to produce simulated events with the steps of section 4.1. Now, the large noise forces us to modify point 3). The distribution of the scaling factors must be cleaned from the noise, and the agreement to the data can be achieved after

Reconstruction details
The improvement given by the minima of eq. 4.2 respect to those of the least squares is appreciably larger than that of the floating strip side. The FWHM of the parameter distributions are better by a factor 3 respect to the least squares approach as shown in fig. 12. The precise values of the ratios for the FWHM are: 3.6 for the γ distributions and 4.8 for the β distributions. The ratio of the peak values are 2.6 for the γ distributions and 3.4 for the β distributions. We assume a conservative factor 3 for the global improvement of the minima of eq. 4.2 (MIN-LOG) respect to the least squares method. The effective variances give parameter distributions a little wider than those obtained by eq. 4.2, but always much better than the least squares.
To understand the origin of these large improvements, we have to look to the hits and their reconstructed tracks, trying to recover some rules. An immediate explanation can be given by the observation of the σ e f f distribution reported in fig .11. If two hits in a track are in the regions with low σ e f f , they define almost completely the track parameters. The least squares method has not this information, all the events are equally important and the noise displays its full effect. Figure 13 illustrates this easy situation in detail for a track. The first two hits have very narrow and very high probability distributions (peaks around 31 and 44), and they determine completely the track parameters in the MIN-LOG method. The minimum search routine finds a well defined global minimum, and the reconstructed track has parameters very near to β = 0 and γ = 0. The last point has a large error but small peak value for P x g2 (x( j), E t ( j), ε − ε( j)) (around 4.4) and a small effective variance σ e f f , it looks completely excluded from track reconstruction. The different scales in the horizontal axis (in cm) and in the vertical axis (in µm) amplify the bending of the tracks. The approach with the effective variances reconstructs a track as good as that of the MIN-LOG for   identical reasons. The least squares method strongly deviates from the exact track.
The explanations of the excellent results of the MIN-LOG based on the effective variances are misleading in some cases. Some hits have small effective variance but large position errors, they are expected to pass large errors in the reconstructed tracks with linear equations. On the contrary, our probability distributions are very different from gaussian distributions and their long tails allow the existence of hits whose positions are impossible in a gaussian model. The results of the MIN-LOG are surprising with these events.

Worst hits and effective hit suppression
Let us fix the limit of our approach, the natural selection is addressed to the worst hits discussed above: hits with a narrow effective variance and large position error. It is easy to observe in fig. 14 two groups of hits with x g2 > 0.1 and ε < −0.2, and x g2 < −0.1 and ε > 0.2 that belong to the sought set. They are well known because they form the long tails of the error distribution of the η 2 algorithm. In fact, the η 2 ( j)-positions are given by the intersection of the continuous red line and straight lines of constant x g2 -values, thus the first group has η 2 > 0.3 and the second group has η 2 < −0.3 with position errors greater than 0.5 (more than 30 µm). A pure COG algorithm is slightly better for these hits, but it is worst for all the hits near to the red line (the large majority). The dangerous effects of the high errors hits (outliers) are well known and dedicated algorithms are used to attenuate their effects. We also expected large distortions of the tracks by the special emphasis that eq. 4.2 imposes on the higher part of the P x g2 (x( j), E t ( j), ε). In fig. 14, the worst hit of the worst hit-set is that with ε > 0.4 and x g2 < −0.3. It has ε = 0.43, x g2 = −0.31 and η 2 = −0.42 and its η 2 ( j) − ε j . In our definition of a track, it is shifted by −0.85 from zero (its exact position). The maximum of the probability distribution is in η 2 ( j)−ε j = −0.85 and the probability is almost all concentrated around this point with a small effective variance ( fig. 15).
The first run of minimum search on eq. 4.2 gives track parameters that strongly deviate from their exact values (γ = 0, β = 0). The high probability of large shifts of the first point moves the track, the continuous blue line of fig. 15, far from its true position. Similarly for the effective variance method, the small σ e f f obliges the track (dashed red line) to pass very near to the first point. These two results are easy to grasp simply by observing the forms of the PDF. The (superficial) strong similarity of the two PDF gives no chance to obtain a different result from the minimum search routine. A better result is given by the least squares method (dash dotted magenta line) for its total independence from the form of the probability distributions.
In our first random scanning of simulated tracks, we obtained good or excellent reconstruc- tions similar to fig. 13, and few unpleasant results similar to fig. 15, but not so bad. However, having in each case a consistency with the effective variance approximation, these bad results were considered an unavoidable defect of the method. The consistency removed the suspicion of any connection with the well known limitations of the minimum search routines in complex surfaces. In fact, the minimum search routine is initialized with the parameters obtained with the effective variances and it stops after a fixed step number. When the map similar ( fig. 16) of eq. 4.2 does not show any minimum for these parameters, it was natural to rerun the algorithm from this position to look for a minimum within few other steps (this is an expected imprecision of the minimum search routine), but still consistent with the effective variance result. On the contrary, the additional run gives the outputs of fig. 17. Here the track parameters are almost exactly around zero as our ideal configuration, and a nice minimum is evident. This result is very surprising because large error hits induces always strong deviation to a track. The first result looked absolutely reasonable, and complications from these high noise hits are expected and very difficult to isolate. Dedicated algorithms some times are able to attenuate these effects (outliers suppression), some times no. In the absence of these dedicated tools, we expected negligible corrections from the second run of the minimum search. But, without any special algorithm, our worst track becomes the best of the three illustrated.   To exclude the possibility of a lucky accident, tracks containing other supposed worst hits were explored. Similar results were obtained at the small price of additional runs of the minimum search routine. We tested even the stability to a slight perturbation of the probability distributions, imprecisions or drift of the a j -functions must always be considered. The minima of eq. 4.2 for tracks containing these worst hits move slightly, but they remain well positioned around their excellent values.

Tentative explanation
The full explanation of these effective hit selections is not easy. It has to do with non linear problems that depend in an essential way from the all the other hits in the track. The long range of our PDF (eq. 3.15) is implied, the tails give a non negligible probability even to far placed points. In the build up of the maximum for the likelihood, the function P x g2 (x( j), E t ( j), ε) for the worst hit does not go to zero so rapidly to exclude any maximum near to other hits. On the contrary, a narrow gaussian goes to zero too fast and forces the maximum to be near to its center. Figure 18 clearly illustrates the large differences of the effective gaussian and P x g2 (x( j), E t ( j), ε) for this case, in fig. 15 they look similar. As shown in the left part of fig. 18, P x g2 (x( j), E t ( j), ε) has a long tail and a small peak around zero. The presence of a second peak is a characteristic property of eq. 2.1 as illustrated in fig. 2. At increasing ε, the functions {a j (ε)} drive the lowest bump of fig. 2 to lower values of x g2 and decrease its height reproducing a fading cloud of hits at negative x g2 . The constant x g2 plane of fig. 18 cuts the lowest part of this set of bumps producing a small peak. For this, the track parameters obtained with the insertion of worst hit are slightly better than these obtained without. The main result is produced by the remaining four hits, and they simulate the suppression of the worst hit. It would be nice if this type of hit suppressions would be effective even with the fake hits. In any case, it is evident the advantage of using well tuned PDF.
After the isolation of this surprising event, we found other types of strange events, there the minima of eq. 4.2 look to know the exact answer with the other two methods being in the dark. Thus, our good intention to present a worst track reconstruction can not be concluded. The set of data we expected to be a worst case turns out to be good or excellent. Truly bad tracks look to be produced when three or more hits have some fake alignment, the track parameters tend to be polarized in that direction giving an unsatisfactory result, but consistent with least squares method.

Different Noise Amplitudes
To test the stability of our approach and the characteristics of the worst hits, we scale the saved random numbers of the additive noise and explore the track reconstruction with different noise amplitudes. All the other parameters of the simulation are kept fixed, thus we can follow the noise effect in each track and the modifications of distributions of the track parameters. The noise values explored are 4, 6, 8, 10, 12, 14 ADC counts ( or better for an average signal to noise ratio (S/N) of 40, 27, 20,16,13,11). The FWHM of the distributions for γ and β increases with increasing the noise, as expected, but the ratios of the FWHM for the two approaches (MIN-LOG and least squares) are almost constant with a slight degradation at increasing noise.
We see that the track of fig. 17 starts to assume its form at a noise level of 6 ADC counts (S/N = 27). The effective variance method gives a track that deviates largely from its exact position just at this noise level. The least squares method gives a bad reconstruction for any noise, and our MIN-LOG is almost exact for any noise. The large error of the first hit is due to the two strip COG algorithm, in some case, the noise increases the value of the left strip beyond that of the right strip giving a change of sign to x g2 . This type of error is rare for noise up to 8 ADC counts (S/N = 20), but become frequent at higher noise.

Confronts and generalizations
The exploration of two types of sensors, a single direction and no magnetic field, is a small fraction of all the conditions encountered in real experiments, but it is impossible to cover additional configurations especially in the absence of true data. In any case, we can extract some indications for other configurations.
One evident outcome of this approach is the relation of the COG systematic errors with the effective variances. Figures 11 and 14 illustrate clearly this condition, the COG systematic errors increase when the red line of fig. 14 nears constant x g2 lines and these regions have high effective variances as shown in fig. 11. Thus, the exploration of the systematic COG error can give some hints of the probable improvements given the use our PDF respect to the least squares. The trends of the COG errors were carefully analyzed in ref. [10,11,12], and those results can be helpful even now. In any case, to a flatter COG histogram corresponds least squares results that nears to the MIN-LOG results, but the MIN-LOG will be better for the proper handling of the signal amplitude.
Another results can be read in fig. 6 and fig. 11 and their "microscopic" visualization of the σ e f f -distributions on the strip: the standard resolution estimate of ref. [1] (σ x ∝ τ/(S/N)) tends to be very optimistic in the two sensor sides discussed here, confirming the general suggestion of ref. [20]. In fact, the large majority of σ e f f -values are well above the line at 0.03 in fig. 6, and similarly in fig. 11 for the 0.06-line. But, being an order of magnitude estimation, it can be an useful parameter for projects and decisions. If this constant value is extended to a fit, the least squares results is obtained. It is evident the superiority of our σ e f f -distribution, it contains a lot of information essential to handle these complex physical processes, and a better description of the strip statistical properties is transferred to the fit.
The gain in resolution of our procedure can be used to mitigate the reconstruction defects of the functions {a j (ε)}, the hit-positions, defined by the η 2 , η 3 algorithms, introduce small artifacts that can be further reduced with better position determinations. Figure 19 illustrates the improvement of the hit-positions given the reconstruction of the track with our PDF compared with the input distribution given by the η 2 -algorithm. In fig. 19, we also reported (with the blue line) the distributions of the differences of the least squares respect their exact positions. These distributions turn out to be similar or worst than the distributions of the input points. The redundancy of the track points looks to be ineffective (or worst) in this reconstruction method. These results, surely obtained in many other simulations, are very similar to the application of the least squares method to points with Cauchy distributions. The x g2 -COG, as input for the least squares, produces difference distributions lower than that with the input η 2 , but appreciably better than the wide and flat distribution of the differences (x g2 ( j) − ε( j)).
The production of δ -rays is neglected, but we have to consider two main cases: forward and non forward δ -rays. At orthogonal incidence, forward δ -rays do not appreciably modify the signal distribution except for an increase of the total signal collected, and this is inserted properly. The non forward δ -rays introduce a large error in the hit reconstruction, but as for the worst tracks of section 5.2 it is probable an effective discarding of this type of hits.
The multiple scattering effect is relevant for the low momentum tracks, our finer details are suited for the high momentum tracks. For low momentum tracks, the multiple scattering can be inserted with a convolution of its distribution with our PDF for each hit.
The introduction of the magnetic field requires at least another parameter in the minimum search of eq. 4.2. The plots of the minima are more difficult to produce with an additional difficulty to follow the projections in three planes. In any case, improvements can be expected even with the magnetic field.

Conclusions
Some aspects of our well tuned PDF are synthetically discussed with the results of simulations of track reconstructions. The complete forms of our analytical expressions for the PDF can not be reported here, they will be discussed in coming papers with the mathematical details of their derivations. A crucial argument is treated with the due completeness, it allows the extraction of the average strip energies in function of the MIP impact point. Without these special functions all our PDF would be useless. To produce realistic simulations, we extracted the average strip energies from test beam data with double-side silicon strip sensors, identical to those of the PAMELA tracker. Part of the simulated hits, based on these strip energies, are used to produce virtual tracks for comparison of fitting methods. The virtual track are composed by five random hits and arranged to have identical direction and impact point. Our PDF were completed with another set of strip energy functions derived again from all the simulated hits. The corresponding non-linear approaches show drastic improvements respect to the least squares method. We find distributions of track parameters with an improvement (FWHM) of a factor two, for the low noise side and a factor three for the high noise side. The factor three of the higher noise side is truly unexpected. We expected a result surely less than the low noise case, we supposed the high noise a disturbance able to render all the methods similar. This nice result obliged us to analyze with the maximum care our outputs. A set of tracks was explored individually to check the consistency of the reconstructions, various details were plotted to verify the correctness. This deep analysis allows the isolation of another curious effect: an apparent hit selection that tends to avoid hits too far from the average of the remaining data. The long tails of our PDF allow this suppression that is impossible with the effective variance or with the least squares. The approximations of our PDF with gaussians introduce drastic suppressions of the tails of the distributions. Good result can not be expected when the long ranges with low probability are essential. When the PDF tails are unimportant, the effective variances turn out to be very useful, they are extracted by our PDF as parameters for each hit (effective gaussian PDF). The corresponding linear approximations produce a slight degradation of the distribution of the track parameters, partly due to the tail suppression. With these limitations in mind, the effective variances can be easily introduced in running experiments. In many case, it is just the form of the effective variance distributions that gives a partial explanation of the improvements of the high noise side of the sensors. Along the strip, very anisotropic effective-variance distribution is observed. An appreciable region around the border shows very small effective variances, hence, if a track has two hits in these regions, the fit has an high probability to be excellent.
These results are evidently simulations built as near to the data as possible. It would be nice to have a dedicated test beam to confirm them as done in ref. [14] for our correction of ref. [12]. With parallel tracks, the direction distributions obtained with different approaches, can be compared to estimate the true improvements.
The use of our well tuned PDF has an high computational price and a large set of detector parameters must be extracted from the data. We prefer the real data, but it is possible that well calibrated simulations could be viable substitutes. To give an idea of the computational complexity, we produced few thousand lines of MATLAB [21] instructions and not too less lines of MATHEMATICA [22] outputs. Part of these developments are redundant or utilized for cross checks and a selection of the essential elements can produce appreciable reductions. The consis-tencies of these long developments are verified by the near coincidence of the peaks of our PDF with the approximate gaussian PDF. Here we limit to single incidence angle, similar procedures must be done for a set of incidence angles that cover the acceptance of the tracker.