Slice-Based Analog Design

Advances over the last decades in electronic design automation (EDA) for the design of digital integrated circuits (ICs), have led to the development of a robust set of tools and methodologies that automate almost all low-level phases of the digital design workflow. In contrast, analog IC design remains a mostly handmade, time-consuming and knowledge-intensive process. The amount of design iterations can be heavily cut down by the use of realistic value tables through the $g_{m}/I_{D}$ design technique; however, the process still remains time-consuming and error-prone, with an end result of limited applicability beyond the scope of the initial specifications. The slice-based design methodology, first introduced in this paper, is a new approach to analog IC design, suitable for implementation in EDA tools, that aims to reduce the amount of time and expertise required from the user. This methodology, inspired by the $g_{m}/I_{D}$ design technique, is based on the use of pre-designed circuit cells, which can be connected in parallel to scale important performance metrics. Although not limited to any particular fabrication process, the present paper explores the application of the proposed design methodology to CMOS technologies, and in the context of a particular target application: low-noise charge-sensitive amplifiers (CSA) used for instrumentation in particle physics experiments. The methodology was successfully applied and validated through the design, fabrication and testing of a CSA with configurable noise performance.


I. INTRODUCTION
The proliferation of consumer electronics has been a driving factor in the advancement of integrated circuit (IC) design towards increasingly complex circuits and ever smaller process technologies. The move towards design complexity has been aided by a mature and widely available set of tools for Electronic Design Automation (EDA) in the digital domain. To take advantage of these tools, circuit functions are implemented in the digital domain whenever possible. In stark contrast, analog IC design lacks the automation tools that facilitate the design process, and remains essentially handcrafted by analog designers, on technologies typically optimized for digital applications. Due to this comparative The associate editor coordinating the review of this manuscript and approving it for publication was Gian Domenico Licciardo . disadvantage, it is typically the development cycle of the analog blocks that bottleneck the design process of complex Systems on a Chip (SoC), even though they comprise only a small area of the entirety of the chip [1].
A typical electronic design flow for analog and mixed-signal integrated circuits is performed using a topdown approach. Three levels of abstraction can be readily identified during the design process: the system-level, where system specifications are set and functional blocks are identified; the circuit level, where circuit schematics are designed for each functional block; and the layout-level, where the circuit layout for each functional block is designed, followed by floorplanning, placement and global routing to generate the layout of the entire system. Simulation and verification steps are performed at each level to account for undesired effects and detect potential problems. If the design fails to meet specifications at some point in the design flow, redesign iterations are performed.
The circuit-level design is particularly challenging, as it often requires a custom optimized design, which is typically an underconstrained problem, with many degrees of freedom, and many often conflicting performance requirements that must be taken into account [2]. To solve this problem effectively and produce an optimized design, an analog designer is required to have an advanced knowledge of device behavior, circuit topologies and design trade-offs. For these reasons, the analog design process is generally perceived to be less systematic, more heuristic, and much more knowledge-intensive than digital design [3].

A. ABOUT SYSTEMATIZATION AND AUTOMATION
While EDA tools for analog design have not reached the level of maturity to be widely adopted, Computer-Aided Design (CAD) tools have been fundamental to tackle the design flow for decades. An analog designer will routinely use circuit simulators (e.g. LTspice [4]), layout editing environments (e.g. Virtuoso [5]) and verification tools (e.g. Calibre [6]) to reach an optimized design.
The optimization of design productivity of analog ICs through EDA tools has provided fertile ground for research since the mid-1980s [3], and continues to be an active research topic, as the productivity gap between analog and digital circuits has not been satisfactorily closed. Three distinct hierarchical levels are identified in the literature for Analog Design Automation (ADA) [3]: topology selection, where the most appropriate circuit topology is selected based on the given specifications; specification translation, where high-level specifications are mapped into sub-blocks, and at the lowest level, into device sizes; and layout generation, the creation of the geometrical layout of the low-level sub-blocks and the place and route of these sub-blocks at a higher level. Comprehensive surveys that describe the historical evolution of these topics can be found in [7] and [1].

B. THE g m /I D DESIGN METHODOLOGY
At the circuit level, analog designers rely heavily on hand analysis and circuit simulators to derive low-entropy expressions [8] suitable for design. Among the techniques that allow some degree of systematization to the process, the g m /I D technique stands out [9], [10]. The technique relies on the use of g m /I D as a design variable, which is a measure of the level of inversion of a transistor, and the use of tables for g m /I D -dependent parameters built from precise simulation results, both of which contribute to a more insightful approach to the design process.
The basics of the g m /I D methodology are simple. Let us consider a transistor of width W and length L biased at a certain operating point, with a gate-to-source voltage V GS , drain current I D , transconductance g m , and gate-to-source capacitance C gs . If an identical transistor is connected in parallel, the equivalent device will have a width of 2W , drain current of 2I D , transconductance of 2g m , gate-to-source capacitance of 2C gs , while the length L, the gate-to-source voltage V GS , and the ratio g m /I D , a measure of the level of inversion of the channel, remain unchanged. Large values of g m /I D are associated to subthreshold and weak inversion operation, whereas small values are associated to strong inversion operation.
There are several ratios that can be expressed as a function of g m /I D , including the transit frequency ω T (commonly defined as g m /C gs ), drain current density I D /W , and even normalized noise I 2 n /I D [11], [12]. The analysis presented in this paper can be extended and applied to any linear circuit whose performance metrics can be described by g m /I D .

C. OVERVIEW OF THIS WORK
The present work explores a technique for analog design, namely the slice-based design technique, suitable for implementation in EDA tools at the circuit and layout levels. It does not borrow concepts and techniques traditionally used in the literature of ADA, and was instead inspired in the g m /I D design technique.
In order to explore the proposed design technique and as a proof of concept, particle physics instrumentation was selected as the target application, and an IC was designed, fabricated and tested, using a 0.5-µm CMOS technology. The IC prominently includes a configurable charge-sensitive amplifier (CSA), and the main metric used to test the circuit was noise performance.
Although the analysis presented in this work is based around CMOS device models, it can be extended to include bipolar transistors and be applied in the design of BiCMOS ICs. The selection of the CMOS technology for the design of the configurable CSA was merely due to availability.

II. SLICE-BASED DESIGN METHODOLOGY
The slice-based design methodology is a new approach to analog design, introduced here, which aims to help reduce the amount of work and expertise required from the analog designer. The methodology involves the use of a library of optimized circuits to cover different regions of the design space. The circuits in this library are indivisible cells, hereafter referred to as slices, that can be connected in parallel in order to scale important performance metrics. Through the careful selection of the correct slice and number of parallel-connected slices, a wide range of specifications can be met with minimal time investment from the designer. Furthermore, the use of characterized circuit cells minimizes performance uncertainty, reducing the number of ASIC spins necessary to reach an optimized design. Thus, the design of a library of optimized and fully characterized circuit slices is a precondition for this methodology to be of any use.
Given the difficulty of assessing the applicability and practicality of the proposed design methodology to any arbitrary application, it was decided to limit the scope of the analysis to a particular target application: amplifiers used in particle physics instrumentation. This analysis can be adapted to other applications as well.

A. THE EFFECTS OF CONNECTING CIRCUITS IN PARALLEL
The basis of the slice-based design technique is in the parallel connection of previously optimized complex circuit blocks, in order to meet load, noise and other relevant specifications that scale with parallel connection, at the expense of power consumption and die area. The idea of connecting circuits in parallel to increase drive capability, or the trade-off between power consumption and noise performance, are not new concepts in IC design [13], [14], however, the idea of connecting large and complex circuits in parallel in order to scale circuit performance as an approach to the design process has not been found on the literature.

1) THE GENERAL CASE
When an arbitrary linear circuit (namely, circuit A) is connected in parallel to an identical copy of itself, i.e. each of the N nodes of the circuit is connected to the corresponding node of the identical copy, in the resulting circuit (namely, circuit B) some figures of merit and quantities change, whereas other stay the same. The concept is illustrated in Figure 1. For example, all the N node voltages remain, whereas all M branch currents are doubled: As a consequence of this, all impedance elements Z k are halved whereas all admittance elements Y k are doubled. The latter includes transconductances as well: This applies to both explicit passive elements (e.g. resistors, capacitors and inductors) and equivalent node impedances. Naturally, with the doubling of branch currents, power consumption is doubled as well. The operating point of all MOS devices stays the same, as the g m /I D ratio remains unchanged. can be expressed as the product of the circuit effective transconductance G meff and the output resistance R Out . On the parallel connection of identical circuits, the former increases and the latter decreases, while the open-loop gain remains constant: The same is true for the amplifier bandwidth. The equivalent capacitance of the dominant pole increases twofold, while the equivalent resistance seen by the capacitor is halved: However, it is common for the bandwidth of a circuit to be set by an externally connected load. As long as the load is also parallel-connected, the bandwidth is maintained, otherwise the bandwidth would change. Nonetheless, the resulting parallel-connected amplifier has twice the drive capability of the single circuit.

3) NOISE ANALYSIS
In terms of noise, the effects of connecting circuits in parallel are more involved. Let us consider an arbitrary linear circuit with a single noise generator. The simplest example is a resistor, which generates thermal noise. Its power spectral density (PSD) is directly proportional to its resistance when expressed as a voltage variance, whereas when expressed as a current variance, it is inversely proportional to its resistance: When connecting two identical copies of a circuit in parallel, all equivalent resistance values are halved, which in turn means that voltage noise power is halved as well, while current noise power is increased by a factor of two. Since voltage signals remain unchanged, the signal-to-noise ratio (SNR), expressed as a ratio of squared voltage signals, is increased by a factor of two. And since current signals double, current signal power quadruples, and the SNR, expressed as a ratio of squared current signals, is increased by a factor of two as well.
Let us consider now the case of a single MOSFET transistor, with a drain current of I D , as the noise generator. It can be shown [11] that MOSFET voltage and current noise, including thermal noise (for strong inversion), shot noise (for weak inversion) and flicker noise, can be normalized and expressed as V 2 n =V 2 n /I D for voltage noise, and as I 2 n = I 2 n · I D for current noise, whereV 2 n andÎ 2 n are normalized voltage and current power spectral densities, which can be expressed as a sole function of g m /I D . An example of this can be seen in Table 1, which shows the normalized PSDV 2 n for the different noise processes of a MOSFET, expressed as functions of g m /I D . In other words, for a constant g m /I D value, MOSFET voltage noise variance is inversely proportional to the drain current, while current noise variance is directly proportional to the drain current. When connecting two identical copies of a circuit in parallel, the equivalent transistor drain current doubles while the g m /I D value remains unchanged, which in turn means that voltage noise variance is halved, while current noise variance is increased by a factor of two. The same as in the resistor example, the SNR is increased by a factor of two, for both voltage and current noise.
Let us consider now an linear circuit with an arbitrary number of noise generators. The total noise on the output node of the circuit can be computed as follows: where V 2 n,i (f ) is the PSD of each individual noisy device, and H i (f ) is the transfer function from each individual transistor noise source to the output. Since the noise generators, transistors or resistors, are independent, their noise contributions are uncorrelated and are added in quadrature. For two parallelconnected circuits, it is immediately apparent that, since the transfer functions remain unchanged and the PSD of each individual noisy device is halved, the noise power measured at the output node of the circuit is also halved. The same is also true for the total integrated noise of the circuit, that is, the integral of the PSD over frequency, as long as the circuit bandwidth remains constant.

B. THE DESIGN METHODOLOGY 1) THE PRE-DESIGN STAGE
A pre-requisite to the application of the slice-based design methodology is the compilation of a library of optimized and fully characterized slices. Each slice will include a transistorlevel schematic, physical layout and documentation. This task is done by an analog designer through the standard analog design workflow, with all the inherent difficulties it has. The difference is that, once the design is done, it can be re-used in the future, as it was designed from the ground up for scalability.
At the circuit level, each slice will be optimized individually to meet a set of specifications, so that different slices cover different corners of the design space, e.g. maximum gain-bandwidth product, minimum noise, etc., in the case of amplifiers. Each device in the slice will have its operating point defined by its current and its g m /I D . The small-signal performance of the slice can be computed as a function of the resistances, transconductances and capacitances of individual devices. These equations can include node impedances, poles, effective transconductances, input-referred noise, among others, and will be part of the documentation for the slice. These equations can be recomputed and tabulated for increasing branch currents as the slices are connected in parallel. Currents can only increase in integer values of the unit slice, which is the indivisible unit. Any arbitrary scaling of a particular slice is still possible at a circuit level in order to achieve an optimized result for a particular application, however, this would require a custom layout. For each slice, the corresponding bias circuit can be either integrated into the slice itself, or made into a separate, independent slice.
At the layout level, each slice needs to be designed from the ground up in a way that facilitates the parallel connection of multiple layout blocks. One such way is shown in Figure 2(a). This scheme represents a top view of the circuit layout, where each slice can be seen as a two-dimensional object, implemented in a rectangular shape, with inputs and outputs on the sides, and all internal node connections running vertically, i.e. parallel to the y-axis. It could result convenient during the design process to reserve one or multiple metal layers for internal node traces for parallel connection, and focus on a compact design on the remaining layers. This approach favors functionality and simplicity, and serves as a proof of concept for the proposed design methodology. The optimal geometry for things such as intra-cell device matching or for minimum die area are out of the scope of the present document, and require further analysis.

2) THE DESIGN STAGE
With a library of optimized circuits at hand, the design process can begin. The IC designer will pick a pre-designed slices according to the required specifications, and will scale it by connecting a number of copies in parallel, in order to meet load, noise and other relevant specifications that VOLUME 9, 2021 scale with parallel connection. Once the slice and number of parallel-connected circuits has been selected, the design can be validated through SPICE simulations. Should any incremental change be required on the slice, it is possible to tweak the biasing circuit to adjust all the currents. Through the use of g m /I D curves, the new current density I D /W can be computed for all devices, from which the g m /I D value can be obtained, and all performance equations can be re-evaluated. This results in a circuit with a new set of specifications, and thus must be carefully evaluated by the designer.
Having the circuit-level design, the next step is layout. Figure 2(b) shows how two slices can be stacked and abutted, and the same scheme can be extended to any number of slices. Likewise, if the biasing cell is not part of the circuit slice, it can be placed in the middle of the stack. This task can be efficiently automated through EDA tools, and by the use of a library of pre-designed and fully characterized circuit slices, the subsequent verification procedure for the resulting layout would become a mere sanity check. Through this design procedure, the time involved in the analog design blocks will be minimized, along with the associated uncertainties.

C. ADJUSTABLE PERFORMANCE
Another benefit of the slice-based design technique is the possibility of designing circuits that can scale dynamically according to real-time performance requirements. Through the use of switch banks to connect and power-on different numbers of parallel connected slices, the number of active copies of a circuit can be adjusted to meet changing specifications (e.g. load) while minimizing power consumption.

D. POTENTIAL ISSUES
There are some caveats with the layout implementation that become apparent after careful analysis. First, there are some inherent parasitic components implied in the stackable layout due to the traces that connect the parallel slices, which might have an effect on performance depending on the circuit. Second, depending on the number of parallel connected slices, the distance between slices might become large enough so that the mismatch related to process gradients becomes significant. Third, mismatch might also cause voltage differences between nominally identical nodes, which would translate into current flow through the wires that connect the parallel-connected slices. The latter point is not exclusive to gradient-related mismatch, but can also occur due to size-related mismatch. Although a thorough analysis and understanding of each one these caveats is desirable, only the effects of gradient-related mismatch are further studied in the present document, given its apparent relevance to the obtained measurement results. It is possible that, depending on the specific technology, circuit topology and application, other issues might become dominant, such as wire parasitics.

III. NOISE IN PARTICLE PHYSICS EXPERIMENTS
The design of an optimal front-end circuit for particle physics experiments is a well understood problem, but also deeply complex, as it involves a wide range of considerations and parameters in the system, circuit and layout levels. Every new problem requires a fully-custom solution. Then if any parameter in the application changes in some way, the design will no longer be optimal, either failing to cope with some of the required specifications and/or burning too much power. For these reasons, it is not efficient to reuse previous designs in a new problem, which makes particle physics instrumentation an attractive candidate to test the proposed design methodology. In this section, a brief summary of relevant concepts for particle physics instrumentation and noise analysis are presented.

A. ELECTRONICS FOR PARTICLE PHYSICS EXPERIMENTS
Although particle physics detector systems can take many different forms, their associated electronics perform the same basic functions [15]. The signal from the detector channel must be acquired, amplified, filtered and stored for subsequent analysis. A typical channel of a generic particle physics detector system includes the detector, an amplifier, a filter, an analog-to-digital converter (ADC), and a readout circuit [16]. Figure 3 shows a simplified block diagram for a generic detector channel.

B. THE ANALOG FRONT-END
The detector converts the energy deposited by an incident particle into an electrical signal, typically in the form of a finite amount of electrical charge Q in proportional to the absorbed energy. The front-end amplifier translates the electrical charge generated by the detector into a voltage signal. The charge-to-voltage translation is done by transfering the charge Q in from the nonlinear capacitance of the detector C D to a known capacitor C F . The output voltage V out of the amplifier is given by V out = Q in /C F , and the gain of the amplifier is naturally measured in [V /C] or [F −1 ]. The most common preamplifier implementation consists of a voltage amplifier with a capacitor in negative feedback configuration. The resulting feedback circuit is a charge-sensitive amplifier (CSA), which has been extensively studied in the literature related to particle physics instrumentation [12], [17]- [20].
The output of the amplifier is fed to a filter, the primary function of which is to improve the signal-to-noise ratio (SNR) by applying a bandpass filter that tailors the frequency response to favor the signal, while attenuating the noise [16]. Given that the filter also changes the time response of the input signal, the terms pulse shaper and filter are often used interchangeably. The pulse shaper is typically an analog block, either time-invariant or time-varying, which sets both the speed and the total noise of the output signal before digitization. Given that the output of the CSA is a voltage step, the pulse shaper is typically characterized by its step response g(t) = u(t) * h(t), where u(t) is the Heaviside step function. On a typical pulse-shaping filter, the pulse shape g(t) has a clearly defined maximum value at t = τ P , referred to as the peaking time. Figure 4 shows the schematic of a typical front-end circuit. The detector is modeled as current signal source in parallel with a capacitance C D . The CSA is modeled as a voltage amplifier of gain A(jω), input capacitance C gg , and feedback capacitor C F . The pulse-shaping filter is modeled as a transfer function H (jω). The resistor R rst shown in Figure 4 represents the reset element of the circuit, with the purpose of discharging the feedback capacitor. This reset element can be either a gate-controlled switch, for an instantaneous discharge; or a resistor, for a continuous-time discharge.

C. EQUIVALENT NOISE CHARGE
One common figure of merit used to describe the noise performance of a front-end circuit is the equivalent noise charge (ENC) [21], measured in number of electrons. It is defined as the number of electrons of input charge necessary to produce an output signal-to-noise ratio (SNR) equal to 1:  where V 2 O,N is the total integrated output-referred noise of the analog front-end, and v o,e is the peak amplitude of a single electron of input charge in the absence of noise:

D. NOISE IN PULSE PROCESSORS
Let us consider the schematic for noise analysis presented in Figure 5. There are two noise sources implied in the circuit: the detector noise with a PSD of I 2 Det ; and the amplifier noise, modeled as two correlated noise sources, with a PSD of V 2 A (jω) and I 2 A (jω) = |jωC gg | 2 · V 2 A (jω), respectively. The amplifier noise PSD includes the input-referred contributions of all noisy devices on the amplifier.
The detector introduces shot noise into the circuit, whereas the amplifier introduces both thermal and flicker noise. Both the detector shot noise and the amplifier thermal noise are white noise processes, while the amplifier flicker noise is frequency-dependent (1/f ). Thus, the PSD of the amplifier noise can be written as a function of two independent noise terms, as follows: where K F is the flicker noise coefficient, and A F is the flicker noise exponent. To simplify the analysis, it will be assumed that A F = 1.
Through straightforward circuit analysis, it is possible to refer all the noise contributions to the output of the circuit in Figure 5, and by integrating the resulting expression over frequency, the total integrated output noise can be computed, yielding the following expression: where C D+gg = C D + C gg is the total shunt capacitance at the input node, and are the normalized noise coefficients for parallel (shot), white and flicker noise, respectively [19]. The function g n (t) is a time-normalized (i.e. with a peaking time of t = 1) version of g(t). These coefficients are independent of the timing parameter of the filter τ P , which has been made explicit in (12), and thus can be conveniently tabulated. A table for a variety of commonly adopted linear time-invariant (LTI) filters can be found on [19].

IV. APPLICATION TO THE DESIGN OF A CONFIGURABLE CSA
In order to evaluate practical design considerations and measure the real-world performance of a circuit designed using the proposed design technique, a front-end circuit for particle physics experiments was implemented on a custom integrated circuit and a printed circuit board (PCB). The chip, which was designed in a 0.5-µm CMOS technology, prominently includes a configurable CSA, to measure the scaling behavior of the output noise as an increasing number of amplifier slices are connected in parallel.
A. SYSTEM-LEVEL DESIGN Figure 6 shows the block diagram of the testing system used to measure the noise performance of the integrated circuit. The block diagram illustrates, for the most part, a typical implementation of a pulse processing chain. Figure 6 also shows the naming convention to be used in the rest of this document: the IC, the test board, and the test system, in increasing hierarchical order.  The circuit was tested without a detector, which was instead emulated through an explicit large capacitance C D on the test board, and a pre-charger circuit to inject a precise amount of electrical charge. The amount of charge deposited by the pre-charger is controlled by a 12-bit digital-to-analog converter (DAC) located on the test board. Figure 7 shows the circuit block diagram of the chip. The circuit can be divided into four functional blocks: the pre-charger circuit, the configurable amplifier, the feedback network and the output buffer. The combination of the amplifier and the feedback network comprise the CSA.
The configurable amplifier consists of eight identical amplifier slices that can be connected in parallel via switches. The connection of the different amplifier slices is done in thermometer mode, that is,  where the numbers corresponds to the numbering of the amplifier slices shown in Figure 7.
The CSA is very sensitive to load capacitances, which can limit the amplifier bandwidth, cause instability, and introduce output slewing. A voltage buffer was added on the signal path to prevent excessive loading on the CSA output.

B. CIRCUIT-LEVEL DESIGN
The CSA was designed around the folded-cascode topology, with a N-channel MOSFET input device. Figure 8 shows the schematic of the single-ended, NMOS-input folded-cascode amplifier used in the design of each individual amplifier slice of the integrated circuit. It consists of 5 transistors: the input transistor M I , the folding transistor M F , the cascode for the input and folding transistors M CF , the load transistor M L , and the cascode for the load transistor M CL .
In the folded-cascode topology, the output DC operating voltage, commonly referred to as signal baseline in particle physics instrumentation, is defined by the gate-to-source voltage V GS of the input device when connected in a DC negative feedback configuration. This value is near the device threshold voltage V th for an NMOS input transistor. Figure 9 shows the schematic of the bias circuit used to generate bias voltages {V F , V CF , V CL , V L } for the folded-cascode amplifier. The bias circuit was included in each amplifier slice, instead of using a single bias circuit for the equivalent amplifier composed of all parallelconnected slices. This was mainly done to favor simplicity of implementation. A single bias resistor R B was implemented off-chip, using a potentiometer, to bias all the amplifier slices at the same time. This means that all amplifier slices are always powered on, even when they are not connected in parallel.

1) GAIN, TRANSCONDUCTANCE AND OUTPUT RESISTANCE
The low-frequency, open-loop gain of the amplifier A v can be computed as the product of the effective transconductance G meff and the output resistance R out of the amplifier. Through straightforward circuit analysis, and assuming that the transistors have large intrinsic gain (g m r o ), the values of G meff and R out can be computed to be: R out ≈ (g mCL r oCL r oL ) (g mCF r oCF (r oI r oF )) (17) From (16) and (17) it can be inferred that, if two amplifier slices are connected in parallel, the gain of the equivalent amplifier is the same as the gain of a single slice, given that G meff increases in the same proportion that R out decreases.

2) FREQUENCY RESPONSE
The frequency response of the open-loop folded-cascode amplifier it typically dominated by a single pole in the output node, defined by the output resistance R out and a load capacitance C L .
Let us consider Figure 10, which shows the small-signal schematic of the closer-loop CSA. The transfer function of this circuit can be computed to be: where represents static attenuation, as 1 − γ ol represents the static error due to the finite open-loop gain of the amplifier, and are the zero and non-trivial pole of the transfer function.
Considering a large open-loop gain A v = G meff R out , (21) can be approximated as From (22) it can be inferred that, unlike the amplifier gain, the closed-loop bandwidth of the CSA does not remain constant when two identical circuit copies are connected in parallel, and instead increases due to the change in G meff , although the exact proportion is not immediately apparent. This is due to the CSA bandwidth being in part set by external capacitances, such as the detector capacitance C D , the external component to the load capacitance C L , and the feedback capacitance C F , which do not change as an increasing number of slices are connected in parallel.
Regardless, the speed of the circuit is set by the peaking time of the pulse-shaping filter, which limits the speed of the CSA. It is desirable that the CSA time constant is significantly smaller than the peaking time, so to not slow down the nominal speed of the circuit.

3) INPUT-REFERRED NOISE
The amplifier input-referred noise can be expressed as a sum of the input-referred contributions of all individual transistors, as follows: where V 2 n,i is the gate-referred noise power of each transistor, and H 2 i is the low-frequency, open-loop transfer function of each individual transistor to the input of the folded-cascode amplifier. The frequency response of each individual transfer function was assumed to be dominated by external factors. The gate-referred noise power V 2 n,i can include thermal and flicker noise processes.
As an example, the transfer function of each of the 5 transistors of the NMOS-input folded-cascode amplifier can be computed to be: From (24)-(28) it can be observed that, if two amplifier slices are connected in parallel, the resulting transfer function towards the input node for each individual transistor would not change. Conversely, the gate-referred PSD of each transistor would be halved, and consequently, the total inputreferred noise would be halved as well.
In practice, the bias circuit also introduces noise into the circuit, unless large bypass capacitors are added on the DC bias nodes. In the case of the integrated circuit that was designed, the bias circuit does have a considerable effect on the total noise on the output, but given that it is included into each slice, it does not compromise the validity of the results. Figure 11 shows the schematic of the parallel-connection scheme used for the amplifier slices in the IC. All internal nodes of the amplifier, i.e. all nodes with the exception of the input and the output, are connected between adjacent slices using a switch bank of CMOS switches controlled via a single control signal sw n . As for the input and output nodes, each slice is connected and disconnected from a common wire using CMOS switches, so that all amplifiers slices see the same signal path. The CMOS switches were designed to have a low series resistance.

5) PULSE-SHAPING FILTER
A linear time-invariant (LTI) CR-2RC pulse-shaping filter was implemented on the test PCB using discrete components. The peaking time of the filter is τ P = 20 µs. Two gain stages are also included to compensate for the attenuation introduced by the filter, so that the peak value of the unit step response of the filter g(t) is equal to 1.

6) NOISE PSD AND BANDWIDTH SCALING
When two CSA slices are connected in parallel, the effective transconductance of the resulting circuit is doubled, while the equivalent capacitances that set the bandwidth remains mostly unchanged. As a result, the bandwidth of the amplifier increases. Conversely, the contribution of each individual transistor to the voltage noise spectral density is halved. It follows that, if noise was measured directly at the CSA output, while the PSD of the noise would decrease, the bandwidth of the circuit would increase, resulting in no obvious improvement in the total integrated noise performance.
This problem can be circumvented, as a typical CSA has a pulse-shaping filter connected in cascade. The filter limits the bandwidth of the circuit, resulting in a clear reduction in CSA noise when additional slices are connected in parallel. Thus, in order to characterize the noise performance of the integrated circuit, noise was measured at the output of the CR-2RC filter.
C. IMPLEMENTATION Figure 12 shows a partial chip micrograph, which includes the configurable CSA implemented on the integrated circuit. Figure 13 shows a partial chip floorplan: the left side of the image contains the configurable amplifier, consisting of individual amplifier slices and switch banks; and the right side contains the output buffer, the feedback network and the pre-charger circuit.   Figure 14 shows the layout of a single amplifier slice, which was designed to include both the folded-cascode amplifier and the bias circuit, to match both amplifier and bias devices locally. The reference current for the amplifier slices is generated off-chip using a potentiometer. The dimensions of the slice are 421.5 µm × 177.3 µm.

2) AMPLIFIER SLICE AND PARALLEL CONNECTION
The layout of the amplifier slice was inspired by digital cell layout design, with clearly separated voltage rails on opposing sides. Figure 14 shows a two-dimensional, top view of the amplifier slice, where the inputs and outputs run horizontally, parallel to the x-axis, and the internal node traces run vertically, parallel to the y-axis. The above considerations allow for abutting the slices by mirroring the orientation of the layout. However, in order to test a varying number of parallelconnected slices, this approach was not used in the design of the chip. Figure 15 shows the approach used in the chip for the parallel connection of the different slices. A switch bank, with one CMOS switch corresponding to each node, is used as the interface between the slices. This allows to connect and disconnect the adjacent slices, and adjust the performance of the equivalent circuit accordingly. Since the corresponding voltage rails are physically separated between slices, no layout mirroring is necessary.
The addition of the switch bank effectively increases cell pitch to 276.6 µm.

V. EXPERIMENTAL RESULTS
In order to test the configurable CSA, two types of measurements were defined: average waveform measurements to assess general functionality, and statistical measurements of the output voltage to characterize noise performance.
To simplify notation, the number of parallel-connected slices, which can vary from 1 to 8 on the chip, will be referred to as k.
A. CSA STEP RESPONSE Figure 16(a) shows the step response of the CSA for different values of k. The output of the voltage buffer was measured with an oscilloscope to analyze the step response of the CSA, by averaging 8,192 identical events to remove noise.

1) STEP AMPLITUDE
The expected step amplitude for an injected charge Q in is given by: is the static attenuation of the amplifier, and C gg is the gate capacitance of a single amplifier slice. The static attenuation has a very weak dependency on k in the form of the total gate capacitance k · C gg . In practice, for the designed chip, the detector capacitance is much larger than the total gate capacitance (C D k · C gg ), and furthermore, the amplifier gain is large enough that the static attenuation is very close to unity.
From Figure 16(a) it can be seen that the step amplitude of the measured results decreases slightly for increasing values of k. This can be explained by a parasitic component to the feedback capacitance C F that scales with k, namely C FP,T = k · C FP , where C FP has a value of roughly 21 fF. With this consideration, (29) can be rewritten as: Besides the slight decrease in amplitude as a function of k, the behavior of the step amplitude of the CSA complies with predictions from hand calculations and simulations.

2) BANDWIDTH
The importance of the bandwidth of a CSA is primarily in its interaction with the bandpass pulse-shaping filter, which limits the bandwidth of the circuit to minimize noise. As long as the amplifier is significantly faster than the filter, the exact bandwidth of the amplifier is irrelevant. Regardless, it is worth analyzing the behavior of the bandwidth to assess whether it behaves as expected. Also, the analysis might be useful to apply the slice-based design methodology to other target applications.
The expected bandwidth for the single-pole approximation of the CSA is defined by (22), and can be rewritten as an explicit function of k, as follows: where G meff is the effective transconductance of a single amplifier slice, and is the load capacitance, which is determined by the sum of the output capacitance of the CSA (that scales with k), and the input capacitance of the buffer. From (32) it can be inferred that, if two amplifier slices are connected in parallel, the bandwidth is limited to a maximum of a twofold increase when the denominator remains constant due to the dominance of non-scaling capacitances. Under any other condition, the proportion in which the bandwidth scales is not obvious, and depends on the capacitance values. Figure 16(b) shows the step response of the CSA for different values of k with normalized amplitude. It can be seen that, in practice, the output of the CSA appears to behave as a second-order circuit, as there is a small amount of overshoot for some of the curves. The presence of overshoot indicates that the circuit is behaving as an underdamped second-order system, meaning that the two poles of the circuit are complex conjugates. It also appears that the response of the amplifier goes from being overdamped, or at least underdamped with a damping factor (ζ ) very close to unity, to being notably underdamped as more slices are connected in parallel.
This effect was also observed in post-layout simulations, and can be explained by the presence of a secondary, nondominant pole in the circuit, due to a large parasitic shunt capacitance on the node that corresponds to the drain of the folding transitor M F . Let us consider the circuit shown in Figure 17, where R x = r oI r oF (34) This circuit is a small-signal schematic of the folded cascode amplifier used in chip, but with the addition of an explicit capacitance C x on the drain of the folding transistor M F . The term C x in (35) includes the parasitic capacitances from gate-to-drain (C gd ) and drain-to-body (C db ) for transistors M I and M F , and the parasitic capacitances from gate-tosource (C gs ) and source-to-body (C sb ) for the transistor M CF . From Figure 17 it is possible to compute a simplified expression for the damping factor by considering large intrinsic gain values for transistors M I , M F and M CF , as follows: It can be observed that the dominant-pole of the amplifier (22) appears in the damping factor (since G meff ≈ g mI ), which can be rewritten as a function of p From (38) it can be observed that the numerator does not change as additional slices are connected in parallel, since the capacitance and the transconductance scale in the same proportion. In contrast, the denominator (|p|) gets increasingly larger as more slices are connected in parallel.
Through numerical evaluation of the damping factor, using the corresponding amplifier parameters, it was observed that the circuit indeed goes from being overdamped to being underdamped for increasing values of k, so the behavior of the bandwidth of the amplifier falls within expectations.
An important conclusion from the analysis of the bandwidth of the CSA is that, when certain circuit parameters remain static due to externally connected circuits on the inputs and outputs, the scaling behavior of the circuit can become non-trivial. Consequently, it is important to have a good understanding of the target application to successfully use the slice-based design technique.

B. NOISE MEASUREMENTS
For each discrete value of k ranging from 1 to 8, and for different values of C F ranging from 1 pF to 8 pF (with increments of 1pF), the noise of the CSA was measured at the output of the CR-2RC filter with the 16-bit analog-to-digital converter (ADC) of the test board, and stored for subsequent analysis. Figure 18 shows a semi-logarithmic plot with the results of the noise measurements of the configurable CSA, for all combinations of k and C F . Each marked point corresponds to the variance of 75,000 voltage samples. The data points are joined though straight lines to display curves, to better appreciate the scaling tendencies of the noise.

1) TOTAL INTEGRATED NOISE
Since the noise measurements of the test system were done in the absence of a detector, the noise term in the ENC equation (12) associated to I 2 Det can be neglected, and only the amplifier contributions need to be considered. With this consideration, the total integrated noise at the output of the VOLUME 9, 2021 pulse-shaping filter can be expressed as a voltage variance, as follows: The effects of the limited bandwidth of the filter are included in the filter parameters τ P , N Wn and N Fn , all of which remain constant when multiple CSA slices are connected in parallel.
From (39), it is possible to express the total integrated noise as an explicit function of k, as follows: where I D is the drain current of an arbitrary transistor (e.g. M I ),V 2 A,W is the normalized input-referred thermal noise of the amplifier, andK F is the normalized input-referred flicker noise coefficient. The normalized noise terms are solely a function of the g m /I D of the relevant transistors, and the current ratios between the drain current of the transistors and the normalization current. The selected normalization current is irrelevant to the analysis, as long as it is properly included in the input-referred noise terms.
Let us consider a simple example, where the input-referred noise is dominated by transistors M I and M F . From (23), (24) and (25) it is possible to write an expression for the inputreferred white noise, as follows: If we multiply on both sides of (41) by the drain current of the input transistor I DI , it is possible to write a normalized expression for the input-referred amplifier white noiseV 2 A,W = V 2 A,W · I DI , as follows: whereV 2 A,W is solely a function of the g m /I D values of the transistors, and the current ratios between the drain current of the transistors and the normalization current. A very similar analysis can be done for the normalized flicker noise coefficientK F .
A large explicit capacitor C D was implemented on the test board, much larger than the total gate capacitance C gg of the configurable CSA (i.e. C D C gg ). With this consideration, it can be observed from (40) that the total integrated noise of the amplifier is inversely proportional to the number of parallel-connected slices (i.e. ∝ 1/k).

2) NOISE SCALING
From Figure 18 it can be observed that the scaling tendencies of the noise as a function of k appears to be insensitive to the value of the feedback capacitance C F . This is clear from (40), since the capacitive term appears as an independent factor with a very weak dependency with k (C D+gg (k) ≈ C D ), and thus as an additive constant in the semi-logarithmic plot.
Given that all curves scale as functions of k in almost identical fashion, it is not relevant which curve is used to analyze the scaling of the noise. Let us consider Figure 19, which also include a fitted curve for the expected behavior. It can be observed that the tendency of the noise scaling follows closely with the expected behavior (∝ 1/k), but it is not a perfect fit.
The deviation of the curves with respect to the expected behavior can be explained by gradient-and size-related device mismatch. The effects of device mismatch on the proposed slice-based design methodology are explored in detail in Section VI. In particular, a Monte Carlo simulation of the CSA with plausible values for the mismatch variances was performed, the results of which are presented in Section VI-F2, to confirm whether it was possible to replicate similar results to the ones obtained with the IC.
Each one of the iterations of the Monte Carlo simulation of Section VI-F2 corresponds to a single realization of the chip under the influence of device mismatch. The integrated noise curves as a function of the number of parallel-connected slices were plotted for all 2000 realizations and analyzed through visual inspection. In most cases, the mismatch has very little impact on the behavior of the noise and the scaling. However, in some outlier cases, there were more obvious deviations with respect to the expected noise scaling. A couple of examples are presented in Figure 20, which were selected specifically because of their similarity to the measured curves.
The above explanation is not conclusive, but only a possible explanation, and the most likely one given the available information. To further test this hypothesis, a larger number of chips manufactured in different wafers (so that the process gradients are randomized and uncorrelated) would be necessary. Unfortunately, only a very small number of chips were available for testing, all of which most likely shared a wafer.

VI. INTERPRETATION OF THE RESULTS THROUGH MISMATCH
Mismatch is the performance difference between two or more devices on a single integrated circuit [22], although the term can also be used to refer to performance differences between real and nominal devices. Differences in device performance can be attributed to parameter variations due to imperfections on the manufacturing process. Intradie parameter variation can be categorized into two: systematic variations due to parameter gradients along the surface of the die, which are dependent on the distance between devices; and random variations which are dependent on device size.
The effect of parameter gradients are typically accounted for at the design stage, and can be minimized with the use of proper layout techniques, such as symmetry and common-centroid, to properly match critical transistors. The slice-based design technique is particularly susceptible to gradient-related mismatch. By using pre-existing cells, the application of layout techniques between critical transistors on different slices becomes impossible. Additionally, if a large number of slices are connected in parallel, nominally identical transistors can be separated by large distances.
The effects of parameter variations on the performance of circuits using the proposed design technique are analyzed in the present section. A simple model for the scaling of mismatch parameters for multiple parallel-connected slices is presented, followed by the results of Monte Carlo simulations to assess the validity of the model.

A. MISMATCH MODEL
The mismatch model presented by Pelgrom et al. [23] models the normalized standard deviation of parameter P between two nominally identical MOS transistors on the same die, of width W and length L, separated by a distance D from centroid to centroid, as where A P is the area proportionality constant for parameter P and accompanies the size-dependent term, while S P describes the variation of parameter P with spacing and accompanies the distance-dependent term. The values of A P and S P are typically provided by chip manufacturers for specific parameters, for the sake of mismatch calculations. Due to the random nature of mismatch, and the complexity of the analysis for large circuits, mismatch analysis is well suited for Monte Carlo simulations to assess circuit performance. The random sizedependent term in (43) is straightforward to be included in a circuit simulation tool, e.g. SPICE. Having identified VOLUME 9, 2021 the transistor mismatch parameters, a normally-distributed random variable needs to be added to each parameter on each transistor of the simulated circuit netlist, properly scaled by the standard deviation provided by the manufacturer and by device size. The systematic distance-dependent term in (43) is less straightforward to include in a circuit simulator.

B. PHYSICAL INTERPRETATION OF THE DISTANCE TERM IN PELGROM'S MISMATCH MODEL
A simple physical interpretation and mathematical model for the distance-dependent term in (43) is presented in [24], which is summarized in the present section.
Let us consider an arbitrary layout, where the central coordinates for each transistor M i are known to be (x i , y i ). Let us consider an arbitrary parameter subject to mismatch, namely P. Let us assume that, for a given die, it is possible to approximate the gradient of parameter P along the die by a plane, as follows: where and A and B are random numbers. The value P Nom represents the nominal value of parameter P, while P Off represents a systematic offset for the given die with respect to the nominal value.
Consider two transistors, namely M i and M j , located in positions (x i , y i ) and (x j , y j ), respectively. The mismatch between transistors M i and M j caused by gradient effects is given by For a large number of realizations, the variance of the gradient-related mismatch between two transistors can be computed to be Assuming symmetry along the axes in the random planes, i.e. no preferred direction, then it can be stated that σ (A) = σ (B). Under this consideration, (47) can be rewritten as where D ij is the distance between transistors M i and M j . Comparing equations (43) and (48) reveals that So, for a given S P provided by the manufacturer, two random variables can be readily computed to generate a random plane to model the gradient-related systematic parameter variations of a die.

C. MISMATCH PARAMETERS
For mismatch modeling, critical parameters can be categorized into two types: process and electrical [22]. Process parameters are physically independent parameters that control the electrical behavior of a device, e.g. carrier mobility µ, whereas electrical parameters are those that are of interest to a designer, e.g. transistor transconductance g m .
All electrical parameters are subject to mismatch, in the sense that they deviate from their nominal value. However, a limited number of independent process parameters, directly affected by the manufacturing process, are the underlying cause of variations of electrical parameters.
Let us consider an electrical parameter e(p), that is a function of n independent process parameters p = {p 1 , p 2 , · · · , p n }. The variance of the electrical parameter e(p) is related to the variance of the independent parameters p through the propagation of uncertainty relationship [22], as follows: There are usually a small number of transistor mismatch parameters that are considered to be dominant. In [23] two main ones (V t0 and β) and a secondary one (γ ) were suggested, derived from traditional square-law transistor models. With the evolution of transistor models, more accurate mismatch models have since been developed, and ones that utilize additional mismatch parameters, or different mismatch parameters altogether, have been proposed [22], [25], [26].
Nonetheless, the purpose of the current analysis is not to produce perfectly accurate results, but to gain insight into the effects of device mismatch on circuit performance when using the proposed design technique. To favor simplicity, only the mismatch parameters β and V t0 will be considered.

D. MISMATCH MODEL FOR SLICE-BASED DESIGN 1) PARAMETER VARIATION MODEL
Let us consider a circuit layout consisting of a twodimensional array of k equidistant, horizontally aligned, vertically abutted, equally oriented analog cells, separated by a distance D cc . Figure 21 shows a graphical representation of this cell configuration. Let us consider an arbitrary transistor in the first cell, namely M 1 , and the corresponding transistors in the other cells, that will be referred to as M j for cell j.
Let us consider an arbitrary parameter subject to mismatch between nominally identical transistors, namely P. For transistor M j , the realization of parameter P will be where P j has been decomposed into a nominal value P Nom for the given transistor, a systematic offset P Off for the given die, and two mismatch components, ( P G ) j and ( P R ) j . The two mismatch components ( P G ) j and ( P R ) j represent gradient, distance-dependent variations, and random, sizedependent variations, respectively. To compute distance-related mismatch between devices, transistor M 1 will be considered as reference. For simplicity, transistor M 1 will be considered to be located at coordinates (0, 0). All gradient-related variations for this transistor will be considered to be included in the systematic offset term P Off , common to all devices on the die.
As explained in Section VI-B, the distance-dependent term ( P G ) j can be written as the product of a plane gradient and the distance between transistors, as follows where G P is the plane gradient for parameter P in the direction of transistor separation, i.e. direction vectorŷ in Figure 21, and D 1j is the distance between transistor M 1 and M j . For this particular formulation, the value of G P would correspond to B in (46), as the cells are horizontally aligned. For the given assumptions, cell-to-cell distance and transistor-to-transistor distance are equivalent. Therefore, the distance between transistors M 1 and M j can be written as a function of D cc , as follows With these considerations, the mismatch parameter P j can be rewritten as where P Die = P Nom + P Off , equivalent to C in (46), has a constant value for a given die and arbitrary origin selection.

2) PARAMETER SCALING
Certain electrical parameters have a linear relationship with the device width, e.g. drain current I D , long-channel current factor β, transconductance g m , output conductance g o , among others. Let us consider that parameter P falls within this category. Let us consider the equivalent transistor when k cells are connected together in parallel. The equivalent mismatch parameter P for this equivalent transistor can be written as an explicit function of k, as follows: The middle term of (55) is the sum of the first k positive integers, and can be replaced by the following explicit formula, as follows: To simplify the analysis, only intradie relative differences between individual devices will be considered, and P Die will be referred to simply as P for the sake of notation. With this consideration, P eq (k) can be written in a normalized form, as follows: where kP is the nominal value of parameter P for k parallelconnected transistors on a given die. Further insight can be obtained from (57) by analyzing the variance of the expression for a large number of realizations. Given that all the additive random effects are uncorrelated, the individual variances are added in quadrature: The variance of the random variables can be written using the notation shown in (43), as follows: where the factor of 2 in the denominator of (60) accounts for the fact that each transistor deviates from a nominal transistor on the die. In contrast, the formulation presented in (43) models the relative variation between two devices subject to mismatch. Using this notation, the expression shown in (58) can be rewritten as A close inspection of (61) reveals that it is a direct application of Pelgrom's mismatch model. An increase in k linearly increases the equivalent width of the device, so the size-dependent term is inversely proportional to k. At the same time, an increase in k increases the average distance between the individual devices in the array, so the distancedependent term is proportional to k 2 . For a given value of D cc , this expression ties together both gradient and random variations using a single variable k.
From a design perspective, the expression shown in (61) reveals that the slice-based analog design technique is particularly susceptible to gradient-related variations, since for a given cell library there is no way to reduce cell pitch. There is a trade-off between improvements on device performance, e.g. noise, and uncertainty due to mismatch effects with respect to the expected performance, both of which increase with k.
A single value of k can be calculated for which the variance of the two mismatch effects are matched. This is relevant to assess which is the dominant effect on a given design. The value of k for which the two additive terms in (61) are equal can be calculated by solving the following expression: There are three roots for the polynomial expression shown in (62), only one of which is strictly positive and real, and therefore of physical significance. The resulting expression is long and unintuitive, and does not offer any insight into circuit operation, thus it is not shown.
Let k * be the solution from (62), which can be numerically evaluated for a given problem. For k < k * , random sizedependent variations have higher mismatch variance. For k > k * , gradient distance-dependent variations have higher mismatch variance.

E. A SIMPLE EXAMPLE OF NOISE MISMATCH
As shown in Appendix, it is possible to express the thermal noise of k parallel-connected transistors as an explicit function of k, as follows: where This analysis shows how mismatch coefficients affect the noise of a slice-based design.

F. MONTE CARLO SIMULATIONS
Two different Monte Carlo simulations were carried out to evaluate the effects of mismatch on parallel-connected slices. The first of the two is a single transistor simulation, in order to assess the applicability of the proposed mismatch model in (61) to noise mismatch. The second simulation is of the configurable CSA used in the design of the IC, to gain insight into the behavior of the circuit when mismatch becomes relevant.
For both simulations, three simulation scenarios were considered:

1) Random variations only 2) Gradient variations only 3) A combination of the two effects
The values of A V t0 , A β , S V t0 and S β were arbitrarily selected to favor the clarity of the plotted results. For each scenario, and for each value of k, a total of 2,000 points were simulated.

1) SINGLE-DEVICE SIMULATION
The results of the Monte Carlo simulation are presented in Figure 22. A fitted curve with the k-dependent scaling model is also included on the plots, computed using the nonlinear least-squares method. From the plots, it can be seen that the scaling of the noise mismatch as a function of k follows closely with the proposed mismatch model (61) and (64). And thus, for a single equivalent transistor, the model appears to be valid and applicable.

2) CSA SIMULATION
The results of the Monte Carlo simulation are presented in Figure 23 for all simulation scenarios. A fitted curve of the single-device parameter scaling model is also included, computed using the nonlinear least-squares method.
The results of simulation scenario #1, shown in Figure 23(a), indicate that the mismatch scaling model (61) appears to properly predict the behavior of the output integrated noise of the CSA when only size-dependent mismatch is considered, even though it was derived from a single transistor. This is an indication that perhaps it is possible to derive a generalized model, and requires further analysis.
The results of simulation scenario #2, shown in Figure 23(b), indicate that model (61) partially predicts the mismatch scaling behavior, although it is less accurate when gradient-related mismatch is considered. It appears as though the scaling of gradient-related mismatch as a function of k is not exactly quadratic, and it is underestimated for larger values of k.
A closer inspection of the results reveal that the histogram of the noise variance, an example of which is shown in Figure 24, is not normally-distributed, and instead has a rightskewed distribution. The reason for this effect is unclear, but it certainly contributes to the poorer predictive capability of (61), when contrasted with size-related variations.
The results of simulation scenario #3, as a combination of the other two scenarios, again shows that the model is an adequate, although not perfect predictor of the mismatch behavior of the CSA, even though it was derived for a single transistor. The model then can serve as a basis to assess the behavior uncertainty due to mismatch for relevant electrical parameters.

VII. CONCLUSION
This work introduces the slice-based design methodology, a new approach to analog integrated circuit design, suitable for implementation in EDA tools, that aims to reduce the time and expertise required from the user. This methodology is based on the used of pre-designed, optimized and fully characterized circuit cells, namely slices, which can be connected in parallel to scale important performance metrics.
The proposed design methodology was validated through the design of a configurable CSA, and the main metric used to test the circuit was noise performance. The experimental results show that it is possible to easily and effectively reduce circuit noise by connecting multiple amplifier slices in parallel, without a considerable effect on the nominal operation of the amplifier.
There are some caveats that are highly problem-specific, related to non-scaling parameters on the particular target application. In the case of the CSA, non-scaling capacitances have an effect on the amplifier bandwidth, which increases as more slices are connected in parallel. Without limiting the bandwidth of the amplifier, the increase in bandwidth would increase the total integrated noise on the output, negatively affecting the main performance metric.
Additionally, the stackable layout approach is particularly susceptible to gradient-related device mismatch, which can become relevant depending on cell pitch and the number of parallel-connected slices. The effects of mismatch translate to uncertainty on the expected performance of the circuit, but this uncertainty can be quantified and assessed with the appropriate mismatch models and Monte Carlo simulations.

APPENDIX APPLICATION OF THE MISMATCH MODEL TO THERMAL NOISE
Let us consider an arbitrary transistor circuit, for which the noise performance is dominated by the input device. Let us assume that the input device is operating in strong inversion and that thermal noise is the dominant noise process. The total input-referred integrated noise of the circuit is given by: where f is the integration bandwidth. For simplicity, let us assume that only the mismatch of the input transistor has an effect in noise performance. Mismatch analysis done by hand for a multiple-transistor circuit can become very impractical and unintuitive, and Monte Carlo simulations are better suited to analyze circuit performance.
Under these assumptions, and using the propagation of uncertainty relationship presented in (50), the mismatchrelated variations of the total noise can be computed: where the first factor can be identified as V 2 n 2 , therefore Furthermore, the transconductance of the input transistor can be expressed as: where the overdrive voltage V OV is a linear function of V t0 [27]: The propagation of uncertainty relationship presented in (50) can be applied to (70) to compute the variance of g m as a function of the mismatch parameters β and V t0 , resulting in the following relation Thus, the values of S g m and A g m can be written as For a circuit composed of k nominally identical circuit slices connected in parallel, the value of the mismatchrelated variations of the noise performance can be expressed as an explicit function of k, by considering (61) and (69), as follows: