High level verification of the VFAT3 ASIC for CMS GEM detectors

A front-end readout chip VFAT3 was designed for the muon detector gas electron multipliers (GEM). GEMs were installed at the Compact Muon Solenoid (CMS) experiment of the Large Hadron Collider (LHC) at CERN for the high luminosity upgrade. The design of the VFAT3 uses 790 analog and 172 digital blocks which are highly integrated, thus it is crucial to ensure that the different blocks work together and the chip works as a whole. Mixed signal simulation methods were used to verify the high level functionality. Trigger latencies of 125, 150, 175 and 225 ns were found for front-end peaking times of 25, 50, 75 and 100 ns, respectively. The maximum trigger rate for reading out standard data packets was found to be 1.7 MHz. Results of the VFAT3 high level verification are presented and the simulation methods described.


Introduction
The Large Hadron Collider (LHC) at CERN is planned to get a luminosity upgrade by 2027. The upgrade is known as the High Luminosity LHC (HL-LHC) and it is supposed to increase the instantaneous luminosity of the collider from 2·10 34 cm −2 s −1 to 5·10 34 cm −2 s −1 . This improvement will increase the rate of collisions and, due to the statistical nature of particle physics, help confirm current results and bring rarer and rared interactions within reach of the collider. The upgrade of the collider brings also higher requirements for the detectors used in the experiment [1]. To meet these requirements, a decision was taken to upgrade the muon subsystem of the Compact Muon Solenoid (CMS) experiment with additional gas electron multiplier (GEM) detectors [2]. An existing front-end readout chip VFAT2 (Very Forward ATLAS and TOTEM), which has been used for the GEM telescope of TOTEM experiment, was considered for the new CMS GEM detector [3]. The chip design was optimized to be used for silicon, GEM and CSC detectors at TOTEM. Although the VFAT2 was used in GEM detectors at TOTEM, its shaping time is too short to collect the whole charge produced by particles crossing the new GEM detectors. Additionally, the chip is unable to cope with the increased particle rate and is not compatible with CMS requirements. Consequently, a new FE-chip had to be designed. The chip had to have a fast communication channel and a large internal memory to cope with the expected trigger rate from the CMS. To be suitable for the GEM detectors, the chip shaping time had to be longer than in VFAT2 [2]. Due to placement in -1 -

JINST 16 P02005
the forward region of CMS, radiation hardness was required. No commercial component with such specifications was found.
A decision was taken to design a front-end readout chip VFAT3. The chip had to have 128 analog input channels and a vast amount of configurable components, including internal biasing DACs, ADCs, a front-end with different peaking times and gains, a constant fraction discriminator, monitoring possibilities and data packet options [4]. To implement all of the functionality desired for the chip, a large number of both analog and digital design blocks were used. These blocks are closely integrated with each other, so mixed signal simulation methods were needed during the design of the chip. This allowed high level functional verification of the chip throughout the whole design phase.

VFAT3
As mentioned in the previous section, the VFAT3 chip has both analog and digital functional blocks. The blocks can be divided into four parts, namely: the analog front-end, which has 128 input channels and a constant fraction discriminator (CFD) for each channel [5,6]; the calibration, bias and monitoring (CBM) block, which handles the chip biasing and calibration [7]; the digital part, which has the communication port, internal SRAM memories, internal registry and control blocks [8]; the ADC-block, which has two ADCs used for chip monitoring.
The analog input channels have four peaking time settings: 25 ns, 50 ns, 75 ns and 100 ns and three gain levels: low, medium and high. The CBM unit provides the needed biasing for the input channels and the possibility to monitor these signals externally via the onboard ADC blocks. The digital block provides the slow control functionality of the chip and furnishes a 320 MHz GBT communication channel. The main functional blocks of the VFAT3 are shown in figure 1. The aim of this paper is to present the high-level simulation results of the VFAT3 front-end chip. Section 2 starts with an introduction to mixed-signal simulation methods and then goes on to present details about the methods used in the VFAT3 high-level simulations. Section 3 presents the main results obtained from the simulations. These results include measuring the trigger latencies at different peaking times and overflow of the internal FIFO at different data packet sizes. The -2 -

JINST 16 P02005
advantages of using mixed-simulation methods are discussed in section 4. Section 5 gives an overview of the used methods and obtained results.

Mixed-signal simulation
Mixed signal verification is a method used to verify the high-level functionality of an ASIC throughout the whole design phase. The method started first appearing around the turn of the century when the industry started slowly adopting it to their workflows. The high energy physics community continued to rely on the traditional verification method. However in recent years, the community has started to include this method as it has become more mature and standardized.
At the beginning of a typical top-down mixed signal design process, a chip is composed of a set of analog and digital mixed-signal (AMS) and register-transfer level (RTL) models which describe the desired high level functionality of the final ASIC. These models are described using mathematical and functional models written in modeling languages such as Verilog-AMS and SystemVerilog. The models are later replaced by components which are described at transistor level or netlist level as the component designs are finalized. Using models on different abstraction levels allows the newly designed components to be verified as part of the whole system already in the early stages of the chip design [9][10][11][12].
The mixed-signal based approach comes also with drawbacks. Running the top-level simulations with transistor level descriptions of the components is computationally very demanding and requires a lot of computational power. To avoid long simulation times, the components can have descriptions with different levels of complexity: for instance, a computationally intensive transistor-level model for accurate and detailed simulations and a lightweight real number model (RNM, wreal) for simulations that are more functional in nature. Thus, when simulating a certain part of the chip in more detail, parts that are not strictly bound to that part can be replaced by functional models. The functional RNM models are designed to be run by using digital simulation engines. Typically, signals in a digital domain can only have the values of high or low, but the RNM extensions allow the signals to have a real number value. The RNM models do not require the use of analog solvers during the simulations, which improves the speed of the simulations significantly. The downside is the fact that it is difficult to create models which accurately describe the operation of the analog blocks. This is why the RNM-models are typically used when verifying high-level connections inside the mixed signal models [13]. The use of RNM brings the verification to the digital domain, where powerful verification methods have been developed, such as the universal verification methodology (UVM). The UVM is a standardized verification methodology which offers testing libraries and easy reuse of developed verification blocks. However, the UVM is typically used for the testing of the digital domain and currently does not provide many features for the use in high-level verification which is better suited for the analog and mixed-signal performance of the chip. For future work with MS-simulations, the UVM could be considered since there is an ongoing effort to include more mixed-signal features into the UVM [14,15]. The basic concept of different abstraction levels and their simulation times is shown in figure 2 [16].
Since the chip can be simulated as a whole, mixed-signal high-level models allow the development of test routines and procedures for use in the verification of the physical chip. The routines -3 - can be designed before the physical chip arrives from the foundry and testing can start immediately after the arrival of the chip [17]. Mixed-signal simulation methods were used during the design of VFAT3. The high-level simulations were run in addition to the analog and digital designers' own design block verification. The high-level models allowed the testing of cross domain functions, such as the readout chains, which require the use of the whole analog front-end and the digital readout features. The mixedsignal model was used to test some of the purely digital features of the chip, although in addition the digital domain had been tested thoroughly by the designers during the development.
For different sections of the chip, models of different levels of abstraction were used. For the analog blocks, three models were typically developed: a Verilog-AMS model to be used in place of the transistor level model while it is designed, a real number model to be used in high-level functional simulations, and the final transistor level model itself [18]. The different level models were provided by the designers, who also verified their designs more accurately on the transistor level. RTL models were used for the digital part of the chip until the netlist level description was available. After that, only the netlist was used for the simulation, since the simulation time difference between the RTL model and netlist level simulation is small.
The communication between the test bench and the model is mostly done through the VFAT3 main communication channel, the comm-port, which provides the clock signal and the two-way communication with the model. The test bench is also reading out the signals from the eight outputs of the trigger unit, which are essential for the operation of the chip. For the 128 input channels of the chip, the test bench offers a possibility to inject desired charge to any number of the channels.
For the simulation setup used for the VFAT3 register verification, an RTL-model was used for the digital section. Since the operation of the analog blocks is not critical for this verification, RNM models were used. These models are described at higher level, which allows faster simulation times. One simulation setup is shown in figure 3.
-4 - Figure 3. A setup used in simulation. Digital parts are described at RTL level and analog parts in wreal or SPECTRE models. The test bench mainly communicates through the VFAT3's comm-port, but it also reads the trigger output from the chip and can inject pulses into the input channels.
The model of the chip can also be used in the development of the test setup for the physical chip. By running the simulation environment and the developed test software alongside each other, the communication and test routines could be verified before the chip was submitted to the foundry.
For the co-development of the chip design and the physical test setup software, a custom communication protocol was implemented to allow the software to communicate with the environment where the chip simulation model was run. Interprocess communication (IPC) methods such as pipes and sockets are often used for the communication between the software and the simulation environment [19]. A plain text communication protocol had been earlier proposed for the use of the communication to the physical testbench [17]. To ease the co-development for both systems, the same protocol was used also for the simulations. The protocol is based on temporary files shared between the simulation environment and the test setup software. One file is for control commands between the environments and two files are used to implement the communication between the software and the chip model. The protocol is described in more detail in figure 4.

VFAThigh level verification results
VFAT3 high-level functionality was tested throughout the design phase. The simulations were run in the Cadence Virtuoso Analog Design environment, which has many integrated mixed-signal simulation functionalities. The high-level verification is mostly aimed at functions which include both analog and digital domains, such as calibration pulses, data packets, latency and monitoring. In addition, some of the essential digital features were verified with the model, such as synchronization, slow control registers and FIFO overflow. In depth simulations and verification for the pure analog and digital domains were performed by the designers: different gate delays were used for the digital domain, and corner and Monte Carlo simulations for the analog domain. Test runs using the mixedsignal simulation model to verify the functionality of the chip are introduced in the next sub sections.

Synchronization
As a first test, the synchronization functionality of the communication channel was tested. The aim of the procedure is to synchronize the chip with the LHC 40 MHz clock. When the communication line is synchronized with the LHC clock, the chip is able to read the incoming information bits correctly and reliably. The synchronization can be achieved by sending a unique synchronization character to the chip. When the chip receives three consecutive synchronization characters (CC_A), the chip communication channel adjusts its internal clock to match the phase of the synchronization characters. When the internal clock has been adjusted, the chip sends back a response character (SyncAck). To ensure that the chip is correctly synchronized, a synchronization verification character (CC_B) can be sent to the comm-port. When the chip receives the character, it triggers a response character (SyncVerifAck) after reception if the line is synchronized. Since the synchronization of the internal clocks is a purely digital feature, the accurate performance of the analog parts of the chip is not critical. This allowed the use of wreal models for the analog blocks, which in turn provided fast simulation times. Thus, the functionality of the communication channels was verified. The synchronization procedure is presented in figure 5.

Slow control registers
There are 147 16-bit slow control registers in the VFAT3 chip which have read and write functionality. These registers are the interface through which the chip can be configured and operated. The registers control things such as the values of the internal bias DACs, the structure of the out coming data packets, setting of the internal calibration pulse and the reading of the value of the two internal ADCs. The register control inside the chip is based on a wishbone structure. The value of the registers is read and written by sending IPbus packets to the chip through the communication port. The functionality of these registers can be verified by writing bit patterns to the registers and then reading back and verifying that the bit patterns are equal. All 147 slow control registers of VFAT3 -6 - were verified to be functional. The verification of the registers requires only the digital blocks of the chip. So in this case it was also possible to use the wreal-models for the analog blocks and gain faster simulation times.

Triggering and calibration
The VFAT3 chip has a triggering function which allows data packets to be triggered by sending a trigger character to the chip. VFAT3 also has an internal calibration functionality which allows the calibration pulses of a certain charge to be injected into channels under investigation. The pulse can be a voltage pulse, which is sent to the preamplifier through a series of capacitors, creating a delta-like pulse. This type of pulse is similar to a signal from a silicon detector. In this mode the polarity, amplitude and the phase of the pulse can be controlled. The other type of pulse is a current pulse, which is fed straight to the preamplifier. This type of input is more similar to a signal obtained from a GEM-detector. In this mode, the amplitude and the polarity of the pulse can be controlled. Testing this functionality requires the use of mixed-signal methods, since both analog and digital domain are in use. The calibration pulse is set up through the digital registers, the pulse is injected to the analog front-end and the output of the front end is read through the digital domain.
In this case, depending on the required accuracy of the analog front-end, a wreal or Verilog-AMS model could be considered. Since the timing of the input signal was not critical, a wreal-model was used. The analog blocks in the CBM needed only to provide static signals to the front end, so they could be modelled with wreal-models. To test the internal calibration and triggering of the chip, the internal calibration pulse was set up for different channels and a data packet corresponding to the pulse was triggered from the chip. The triggering and calibration functions were found to function as expected.

Data packets
The default data packet from the chip includes a header, a bunch crossing counter value (BC), an event counter value (EC), the hit data from each of the 128 channels and a CRC-value for the packet. VFAT3 has a vast array of options for modifying the size of the data packet. The size of the counters can be changed, the data zero-suppressed and whole data packets can be suppressed into just the header byte if no hits are received. The chip offers also a more advanced zero suppression mode, the sequential partial zero suppression (SPZS). In the SPZS mode the channel hit data is divided into 16 partitions. The data section of the packet starts with a partition table, indicating which partitions have hits. The partition table is then followed by the data from the partitions which have hits. The registers controlling the data packet formatting are the following: P16, forcing the data to include only the partition summary (SPZS mode); PAR, defining the maximum number of sent partitions (SPZS mode); DT, for toggling on and off the SPZS mode; SZP, suppressing zero data packets; SZD, suppressing zero data fields; TT, time tag formatting; ECb, setting EC size from one byte to three bytes; BCb, setting BC size to two or three bytes. This verification procedure uses the full analog and digital readout chain, so mixed-signal methods were needed. Data packets are created and triggered inside the digital part of the chip, so a netlist level model was used for the digital part. Calibration pulses were sent to the analog input channels to obtain hits for the data packets, but the accuracy in their performance is not that critical. This allowed the input channels and CBM to be modeled with wreal models. A verification of the data packet formatting functionality was done by going through every combination of the data packet formatting options and checking whether the structure of the packet was as expected. Two different situations are shown in figure 6. Using the testing routine, all data packet formatting options were verified to be functional.

FIFO overflow
VFAT3 is able to accept triggers at the rate of 40 MHz. It has an internal buffer memory that stores the triggered data packets before they are sent out of the chip. When triggers are received more often than data packets can be sent out, the FIFO buffer starts to fill up. When the buffer is full, it is unable to store more data until some of the data is sent out of the chip and new data is rejected. This situation is referred as an overflow of the buffer memory. The rate at which the FIFO buffer fills is a function of the trigger rate and the size of the data packet, which defines the time it takes to send out data. The point of overflow is defined as the amount of triggers needed before the internal buffer fills up and start to rejects new data. Theoretically the point of overflow can be formulated as follows: where is the FIFO depth, is the trigger interval in BC's and is the size of the data packet in bytes. The trigger interval is defined as the interval between LHC bunch crossings (BC), which is 25 ns for one BC. The overflow point was simulated with several different values of data packet size, with FIFO depth being 512 and the interval 1 BC (25 ns). It was not possible to simulate all different data-packet sizes since some of the formats need incoming data from the channels to be triggered. Results of the simulated point of overflow for different data packet sizes and the theoretical prediction are presented in figure 7. The simulated overflow points closely follow the expected value.
-8 - Figure 6. Two different data packet formatting options. On the left side, the packet has the default formatting. On the right side, a zero-suppression mode is activated. In this mode, the chip sends a table of the partitions which have hits. The partitions have also been limited to 9, which causes some data to be lost since 15 partitions have data. If a mismatch is found between the received data and the sent hits, the system raises an error. An interesting parameter is the maximum trigger rate which the chip can maintain indefinitely without the FIFO overflowing. The default VFAT3 data packet size is 22 bytes. According to the equation (3.1) the point of overflow goes to infinity when the trigger interval is 23 BC (575 ns). This equals to a trigger rate of 1.7 MHz. Trigger rates higher than that can be maintained only for a certain period of time. Trigger rates of 1.7 MHz and lower can be used indefinitely. This result was verified -9 -by using the simulation model. The model requires mostly the digital blocks of the chip, so all of the analog input channels and the blocks inside the CBM could be modeled using the wreal-models.

Latency
An important parameter of the chip is the latency between the moment when a signal is received at the analog front-end input and the moment when the trigger signal is sent from the chip. This is defined as the chips "trigger latency", which is an important parameter when pairing incoming data with the correct bunch crossing. The latency can be studied when having a full analog and digital chain in the simulation. For the digital functionality of the chip, a netlist level model was used to achieve more accuracy. The netlist was run by using typical delay values and checked by using maximum and minimum delays. Choosing the right model for the analog front-end was crucial. The model needed to represent the front-and as accurately as possible, but still be light weight enough to be simulated along the digital domain. To optimize the accuracy and simulation time, the verilog-AMS model was chosen for the input channels. Since CBM in this case mainly provides static biasing signals, a wreal model was used. Latencies were verified by setting a certain peaking time for the input channel and injecting an input pulse to one of the channels. By observing the time between the input injection and the following trigger signal, the trigger latency was verified. During the simulation, several internal signals were also monitored for a better understanding of the internal timing. One simulation case is presented in figure 8. In the figure, two additional latencies are shown. The first one is the CMS latency, which describes the time between the injected signal and the transmission of the level one accept trigger-signal (LV1A) to the chip. Here an arbitrary value was used. The second latency is the chip latency, which is defined as the used depth of the internal FIFO memory. The depth can be controlled by an internal slow control register. During operation, the chip latency should be set to such a value that the CMS LV1A-trigger points to a correct event.
The trigger latency changes with different front-end peaking times. Using the simulation, all different peaking time settings were studied and the trigger latencies in bunch crossings (BC) were: 5 BC for peaking time of 25 ns. 6 BC for peaking time of 50 ns. 7 BC for peaking time of 75 ns. 9 BC for peaking time of 100 ns.

Fixed latency trigger path
VFAT3 has eight trigger output signals, which send out the data from the input channels. The full trigger data is sent out every 25 ns. The channel data has two modes: Normal mode where the data is sent as a fastOR of every two channels in 320 MHz; and Double Data Rate mode (DDR), which sends full granularity data at 640 MHz by sending data on the rising and falling edge of the 320 MHz clock. In verification of the fixed latency trigger path, the full analog and digital chain is again needed. The digital part of the chip was modeled at the netlist level since it holds most of the functionality for the path. To reduce the simulation time, wreal-models were used for the input channels and CBM. This can be justified since the timing accuracy of the analog components is not crucial in this simulation. The simulation outputs of the two different output modes are presented in figure 9 and figure 10. By simulating the trigger signals with different input patterns, the fixed latency path was verified to be performing as expected.

Monitoring
There are multiple current and voltage DACs inside the VFAT3 chip which allow the internal biasing of the analog blocks to be adjusted. The chip has an internal monitoring functionality which -11 - allows monitoring of several internal currents and voltages using the on-chip ADCs. There are two internal 10-bit successive-approximation-register (SAR) ADCs. One of them has a reference voltage derived from the internal VFAT3 bandgap, and one has a reference voltage tied to VDD through an external high precision resistor. The internal currents and voltages can be chosen for monitoring with the internal slow control registers. The value can then be read out through the internal ADC's. By running this monitoring procedure combined with changing the internal DAC values, the functionality of the internal DACs and the monitoring block were verified.
Mixed-signal methods were essential in the verification of the monitoring feature. The DACs, the monitoring multiplexer and the ADCs are analog components, which are controlled through the digital registers. In this case, the blocks inside the CBM and the monitoring ADC's needed to be simulated by using Verilog-AMS models. Since there was no need to send signals through the analog inputs, they could be modeled by using the wreal-models.

Discussion
The use of mixed signal simulation methods proved to be a valuable tool in development of the VFAT3 chip. It allowed the high-level functionality of the VFAT3 to be verified throughout the design phase. The analog and digital domains in the chip have a complex array of interconnects between them. Verifying these connections has been a common source of errors in a typical chip design. However, by using the ms-simulation methods, all of these connections could be verified by using both domains in the same simulation model. Many of the presented simulation cases required the use of the whole readout chain from the analog input to digital outputs. The simulations wouldn't have been possible using the old method of simulating analog and digital blocks in their respective domains.
The biggest challenge in the use of mixed-signal methods was the number of different models that needed to be handled and constantly updated. Creating Verilog-AMS and RNM model, which accurately describes an analog block, can be a time consuming and tedious task. The models need -12 -to be constantly updated as the chip design progresses, and this can take time away from the actual chip design process. Minimum, maximum and typical delays were used when the digital netlist was used in time critical simulations. The analog models didn't take into account corners and Monte Carlo simulations. This is something that could be improved in future work and models could be defined for analog blocks with possibility to use different scenarios. When the mixed-signal UVM methods become more mature, they will offer more possibilities for the automation of the verification tasks. In this work, some manual setup changes were still required between tests.
In addition, the use of mixed-signal methods enabled estimation of the user experience already during the design of the chip. This in turn allowed the design team to decide whether changes were needed to improve the compatibility of the chip with the external DAQ systems. Significantly, the main functionality of the VFAT3 chip was verified before the submission to the foundry.
The use of the simulation models made it possible to start developing test systems and routines for the physical chip during the design and manufacturing phase. The design of the test software was already started before submission by running the software along with the simulation models to test the communication and test routines developed for the software.
Due to the co-development of the mixed-signal simulations and the physical test bench, a working test environment was ready to be used for the characterization and verification of the physical chip [17]. The physical chip was fully verified and characterized, and was found to be working as predicted by the simulations [7,20].

Conclusions
A front-end ASIC VFAT3 was designed for a future detector upgrade of CMS. The VFAT3 design has 790 analog and 172 digital design blocks which have complex interconnections between them. Due to this complexity, it was decided to use mixed signal simulation methods to ensure the highlevel functionality of the chip before the submission to the foundry. With the high-level simulations, the major functionalities of the chip were verified to be working. The latencies for different frontend peaking times are: 5 BC for 25 ns, 6 BC for 50 ns, 7 BC for 75 ns and 9 BC for 100 ns. The maximum trigger rate for reading out standard data packets was confirmed to be 1.7 MHz, which matches the theory. The simulation models were used to start the design of the test environment used for physical chip, before it was received from the foundry.