Single event upset studies using the ATLAS SCT

Single Event Upsets (SEU) are expected to occur during high luminosity running of the ATLAS SemiConductor Tracker (SCT). The SEU cross sections were measured in pion beams with momenta in the range 200 to 465 MeV/c and proton test beams at 24 GeV/c but the extrapolation to LHC conditions is non-trivial because of the range of particle types and momenta. The SEUs studied occur in the p-i-n photodiode and the registers in the ABCD chip. Other possible locations for SEU were not investigated in this study. Comparisons between predicted SEU rates and those measured from ATLAS data are presented. The implications for ATLAS operation are discussed.


Introduction
The ATLAS SemiConductor Tracker (SCT) [1] has been operating successfully at the LHC. The SCT on-detector readout ASICs were designed to be suitably radiation tolerant to survive the expected fluences and doses corresponding to 10 years of LHC operation. The front-end readout ASIC, the ABCD, was designed in a radiation hard process [2]. The other two on-detector ASICs (DORIC4A and VDC) are the receiver and driver for the optical links [3]. They were designed in the AMS 0.8µm AMS BiCMOS process but only bipolar npn transistors were used and it was verified that they were sufficiently radiation tolerant [4]. However at the high luminosity in LHC operation, the system also has to tolerate Single Event Upsets (SEUs) without causing a significant loss of data. SEUs are expected to occur in the on-detector p-i-n diodes that receive the Timing, Trigger and Control (TTC) data for the front end modules. A comparison between the expected rate for these SEUs based on test beam studies and the rate of SEUs measured in ATLAS operation is given in section 2 and a similar comparison for SEUs in the ABCD DAC registers is given in section 3. All the results shown in this paper are for the SCT barrel modules only. The strategies to minimize the data loss in ATLAS resulting from SEUs are briefly reviewed in section 4. The implications for the operation of tracking detectors at the high luminosity LHC (HL-LHC) are reviewed in section 5. Finally conclusions are given in section 6.

SEU in p-i-n diodes
The TTC data for the SCT is sent from the counting room to the SCT front end modules using optical links [3]. The optical signal is received by an epitaxial silicon p-i-n diode. This diode has an active diameter of 350 µm and the depth of the intrinsic region is 15 µm [5].
The SEU rates were studied for optoelectronics of the type used in the SCT, using a range of momentum for different types of beam [6]. The device under test consisted of a readout chain of -1 -the p-i-n diode, the DORIC4A amplifier and the optical data link consisting of a VCSEL driver and VCSEL [3]. The part of the system that would be expected to be most sensitive to SEU is the p-i-n diode because it has a relatively large active volume and before the signal is amplified a smaller energy deposition would be required to create an SEU. This expectation is confirmed by the test beam measurements of SEU rates versus current in the p-i-n [6]. The threshold energy deposition for producing SEUs (at I pin = 100 µA) is 2.3 MeV [6]. The energy deposited in the depleted depth of the p-i-n diode by a minimum ionizing particle (MIP) at normal incidence is 6 keV, which is two orders of magnitude lower than the SEU threshold. Therefore MIPs are not expected to create SEUs and this was confirmed by tests with a 660 MBq 90 Sr source [6]. This is also consistent with neutron tests from which a negligible SEU cross section was found for 7 MeV neutrons but a measurable SEU cross section was found for 14.1 MeV neutrons and for pions in the momentum range between 300 MeV and 405 MeV and protons with a momentum of 465 MeV [6]. The SEU cross sections for these optical links were measured for low energy π and p beams at PSI using a simple loop-back system to measure the Bit Error Rate (BER) with and without beam [6]. There was no measurable BER without beam, so the measured BER with beams was used to determine the SEU cross sections. The SEU cross sections were measured as a function of the current (I pin ) in the p-i-n diode. In the AC coupled system the energy required to cross the discriminator threshold increases with the value of I pin and therefore the SEU cross section decreases rapidly with increasing values of I pin .
It is not possible to directly measure the BER during ATLAS operation, however the SEU rate can be inferred from the occurrence of trigger synchronization errors. The TTC system sends the Level 1 Accept (L1A) signal to the master ABCD ASIC on the front end modules. The ABCD has a 4-bit counter for L1A and these 4 bits are returned to the Read Out Drivers (ROD) in the data stream. The RODs perform a synchronization test by comparing these 4 bits with the 4 LSBs of the full L1A received from the trigger system. 1 The L1A signal sent to the ABCDs is defined to be '110', therefore since the system is sending 0s between triggers, a single bit error can cause the loss of an L1A signal but not generate a spurious L1A. 2 The loss of an L1A means the module 4 bit L1A counter will fail to increment, and the counter will be out of synchronization with the RODs until it is reset. A counter reset and re-synchronization is expected to occur after at most 60 seconds after the on-set of the SEU, therefore loss of synchronization of the L1A counter that last longer than that are rejected. From the observed rate of these longer periods of loss of synchronization, the maximum bias in the measurement is 13%.
Cases when two or more candidate SEUs were detected in the same module in a given run, were rejected in order to avoid spurious SEU candidates. The possible bias associated with this procedure was less than 6%. SEU candidates from affected modules are allowed in different runs as long as there is no second candidate in the same module and the same run. The 'smoking gun' signature for genuine SEUs is that the error rate should scale with flux through the module. It was first checked that there were no errors when the detector was read out at high rate in the absence of beam. The correlation between the measured SEU rate and the cluster occupancy in a barrel 1 Note that the L1A signal sent by the TTC system to the front end modules is 3 bits long ('110'), whereas the counter in the ABCD which counts the number of L1As received is chosen to have 4 bits. 2 A single 0 to 1 bit error can change a "110" to be a "111" and therefore cause the loss of an L1A. A single 0 to 1 bit error can never change a sequence of 0s into a 110. module in a given run is shown in figure 1. The cluster occupancy is a good proxy for particle flux, as the rate of noise hits is negligible. This plot uses the spread in occupancy over the different barrel layers as well as the variations in occupancy that arose from different runs having different luminosities. The plot shows the expected linear correlation between SEU rate and particle flux.
The second signature for genuine SEUs is that the rate should decrease with increasing values of I pin . The minimum signal to cause a bit error in an AC coupled system is half the peak current, i.e. the mean current I pin . The minimum energy deposited (E min ) required to produce an SEU is therefore proportional to I pin [5]. The probability of depositing an energy (E pin ) in the active volume of the p-i-n diode that is greater than E min , is a rapidly falling function of E min . Therefore the SEU cross section is expected to fall rapidly with I pin as observed [6]. This causes the distribution of I pin for modules weighted by the number of SEUs observed in a module to be biased towards lower values of I pin . This effect is clearly demonstrated in figure 2. Using exponential fits to the measured SEU cross section data [6] the predicted distribution of I pin for modules weighted by the number of SEUs observed could be calculated. The weighted histogram agrees well with the distribution of I pin for modules with SEU (black), in particular there is a shift of the weighted distribution towards lower I pin values.
Having established that we are measuring SEUs in ATLAS operation, we can use the data to study the dependence of the SEU rate on the incident angle of particles. The measured SEU cross section is shown as a function of incidence angle in fig 3. No significant angular dependence is observed. Finally a simple calculation of the absolute SEU rate can be made and compared to the measured rate.
On a module by module basis, the predicted number of SEUs in a given run is calculated:

JINST 9 C01050
Ipin (mA)  -4 -where σ (I pin ) is taken from a fit to beam test measurements and F is the fluence per module received. In principle the fluence could be taken from Fluka simulations however this requires a detailed knowledge of the bunch structure in the LHC which changed during the run, Therefore the fluence was estimated from the module occupancy in the given run. The individual predictions are summed over all analysed runs and all barrel modules to give a total prediction. The procedure was applied to a data set corresponding to an integrated luminosity of 7.8 fb −1 and gave a prediction of 1900 SEUs for the barrel modules. In addition to the systematic errors given in this section, there is a large but difficult to quantify systematic error associated with the variation of the SEU cross section with particle type and energy. Considering the large systematic uncertainties involved, the prediction is in good agreement with the observed number of 2504.

SEU in ABCD registers
The SEU cross sections in ABCD registers were studied using 24 GeV/c proton (CERN PS) and 200 MeV/c π + (PSI) beams [7]. As the ABCD registers do not allow for read-back of values, a more indirect determination of SEU rates was used. This involved using the mask register to look for evidence of bit flips. It is assumed that the SEU cross sections should be the same for all the registers in the ABCD. The average SEU rate using 200 MeV/c π + was defined as the fluence required per SEU and was measured to be 3.7 × 10 13 π + /cm 2 /SEU . However there was evidence for batch to batch variations. The procedure used was effectively measuring the SEU cross section for '0 to 1' bit flips. As for SEUs in the p-i-n diodes, SEUs in the ABCD registers cannot be directly identified during ATLAS operation. However an indirect determination of SEUs in the ABCD DAC registers is possible. There is one such register per ABCD and it determines the threshold for the discriminator to register a hit. This is a common threshold across the 128 channels (there are also registers to allow for some channel to channel variations but they do not affect this analysis) [2]. The effect of a '0 to 1' bit flip will tend to decrease the hit efficiency of all the channels readout by the corresponding ABCD, however this effect would be very difficult to detect. On the other hand a '1 to 0' bit flip will increase the noise occupancy. The fifth bit is normally set to '1' and if an SEU causes it to be flipped to a '0' the threshold would go below the baseline. This would then result in a very high occupancy until the register was reset. The occupancies per ABCD were defined as the mean number of strips fired, averaged over 10 events and are shown in figure 4. As expected the mean occupancy is very low but there is a spike at 128 strips, i.e. every strip is hit each event. The lower peaks correspond to when there are some low occupancy events combined in the average with very high occupancy events. Candidate SEU events were identified by a sequence of events with average occupancy above a threshold of 100 and the effect of varying this threshold was studied. ABCDs which gave more than one SEU candidate event in a given run, were excluded from this analysis.
To demonstrate that these candidates are genuine SEUs, the rate was measured as a function of fluence in a similar way to that used for SEUs in the p-i-n diode (section 2). The results for barrel modules are illustrated in figure 5 and also show the expected linear relationship. The data for the last two bins appear to be lower than expected from the linear fit. Part of this effect is due to a bias in the analysis caused by the requirement that there only be one SEU per ASIC per run, but this effect is too small to account for the magnitude of the deviation.
-5 - The effect of varying the threshold on the strip occupancy to identify the start of a sequence of events with an ABCD in a high occupancy state was studied and the results are shown in figure 6. A fixed lower threshold of 20 was used to determine the end of such a sequence. As it is not possible to lower the threshold to identify the start of a such a sequence too far without being biased by non-SEU effects, a linear fit was used to extrapolate the measured rate to zero threshold on the occupancy. A quadratic fit was used to estimate a systematic error from the fitting procedure. The result of the fit for a data set corresponding to 20.3 fb −1 was 3046 ± 100 where the error quoted is the systematic error on the fitting procedure.
Having established that these are genuine SEUs, we can study the SEU cross section as a function of incidence angle to the module (see figure 7). This shows an increase in SEU cross section with increasing incidence angle. This feature would be expected if the SEUs were caused by MIPs as the path length through the register will increase with incident angle. However we expect these SEUs to be dominated by nuclear interactions causing large local energy depositions. Therefore the SEU cross section will depend on the volume of the active device but would not be expected to be so sensitive to the incident angle.
A simple prediction for the observed number of events is given by where σ (SEU ) is the measured SEU cross section for π + at 200 MeV/c and F is the fluence. The rate of SEUs in the ABCD registers only depend on the total fluence and not on the bunch structure in the LHC (unlike the case for the SEUs in the p-i-n diode discussed in section 2). Therefore the fluence was taken from FLUKA simulations for the inner barrel layer and scaled to the integrated luminosity of this sample. The fluence for the other three layers was scaled to the inner layer using the ratio of cluster occupancies in the modules. The linear fit to figure 7 was used to correct the data so that it could be compared with the PSI data which was all taken at an incidence angle of 79 • . This procedure gave a prediction of N pred (SEU ) = 1000, which is a factor of 3 lower than the measured rate. There are large uncertainties in the prediction because of the unknown variation of SEU cross section with energy. Some of the discrepancy might be due to batch to batch variations in the ABCD ASICs and some might be due to the SEU cross section for '1 to 0' bit flips being higher than that for '0 to 1' bit flips.

SEU mitigation strategies in ATLAS operation
The first mitigation strategy employed to minimize SEU in the p-i-n diodes was to use VCSELs in the RODs which gave a fibre-coupled power of greater than 700µW. This resulted in large values of I pin , generally above a value of 100 µA. This is much larger than required for error free operation of the links in the absence of beam but reduced the SEU rates because of the steeply falling cross section as a function of I pin [6]. The second mitigation strategy was that the if the DAQ detected a module with such an error, it automatically reset the event counters and pipelines for that module. The time elapsed between the error occurring and the module being reset took about 20 to 50 seconds (see figure 8(a)). In order to mitigate the effects of SEU in the ABCD registers, online histogrammes were available which showed occupancies in ABCDs. This allowed operators to issue a command to fully reset a module for which this condition was detected. Secondly automatic  Figure 7: SEU rate normalized by module occupancy versus angle of incidence for particles coming from the nominal centre of the detector for the 6 rings of the 4 barrel layers.

ATLAS Preliminary
-8 -  full module resets were issued every 30 minutes during data taking. The effect of these mitigation strategies is shown in figure 8(b). This shows a peak below 50s corresponding to manual resets and a tail out to 30 minutes corresponding to the automatic resets. Finally we can summarize the effect of these two sources of SEUs on the system performance by determining the fraction of data lost, with the mitigation strategies described in this section and without the use of any mitigation strategies. In order to quantify the meaning of not using any mitigation strategies we define • For the p-i-n diode SEUs we consider the case that the L1A counters are only reset at the start of every run, as this would be required even in the absence of SEUs.
• For the ABCD register SEUs we consider the (extreme) case that the data is written to the registers at the start of operation and never refreshed.
The results in table 1 show that the mitigation strategies are successful in reducing the data loss to a negligible level. Note that this table does not reflect the mitigation strategy of using larger values of I pin than would have been required for a low BER link in the absence of SEUs. As the SEU cross section is a rapidly falling function of I pin this strategy was also very important in reducing the loss of data from SEUs.

SEU mitigation strategies for HL-LHC
For the high luminosity expected at HL-LHC the SEU rates would be significantly higher than at LHC if no preventative strategies were used. The first strategy is to use triple gate redundancy for -9 -

JINST 9 C01050
every register. Provided that the gates are physically sufficiently separated a single SEU event can only affect one gate, therefore the probability of two gates being hit in a single bunch crossing would be negligible. It is not possible to eliminate SEUs in the p-i-n diodes and after radiation damage, much lower values of I pin will be used, tending to enhance the SEU rates further. Therefore full error correction will be essential for the TTC links. In the proposed Versatile Link [8], the GBTx ASIC [9] will use a forward error correction code that allows for errors in up to 16 consecutive bits. The use of this code has been studied with SEUs in p-i-n diodes in test beams and shown to reduce the SEU rates to a negligible level [10].

Conclusions
Test beam studies have shown that SEUs are expected in the SCT readout electronics. This paper gives the first published results of studies of SEU rates in ASICs and p-i-n diodes during LHC operation. Clear evidence of SEUs were observed. While it is difficult to make precise comparisons between SEU rates during ATLAS operation and test beams, the predicted SEU rates were in approximate agreement with the measured rates. Mitigation strategies in the SCT readout have been employed to reduce the loss of data as a result of SEUs to a negligible level. Strategies to minimize the impact of SEUs for tracking detectors at the HL-LHC have been briefly reviewed.