Low-Cost Indirect Measurements for Power-Efﬁcient In-Field Optimization of Conﬁgurable Analog Front-Ends with Self-X Properties: A Hardware Implementation

: This paper presents a practical implementation and measurement results of power-efﬁcient chip performance optimization, utilizing low-cost indirect measurement methods to support self-X properties (self-calibration, self-healing, self-optimization, etc.) for in-ﬁeld optimization of analog front-end sensory electronics with XFAB 0.35 µm complementary metal oxide semiconductor (CMOS) technology. The reconﬁgurable, fully differential indirect current-feedback instrumentation ampliﬁer (CFIA) performance is intrinsically optimized by employing a single test sinusoidal signal stimulus and measuring the total harmonic distortion (THD) at the output. To enhance the optimization process, the experience replay particle swarm optimization (ERPSO) algorithm is utilized as an artiﬁcial intelligence (AI) agent, implemented at the hardware level, to optimize the performance characteristics of the CFIA. The ERPSO algorithm extends the selection producer capabilities of the classical PSO methodology by incorporating an experience replay buffer to mitigate the likelihood of being trapped in local optima. Furthermore, the CFIA circuit has been integrated with a simple power-monitoring module to assess the power consumption of the optimization solution, to achieve a power-efﬁcient and reliable conﬁguration. The optimized chip performance showed an approximate 34% increase in power efﬁciency while achieving a targeted THD value of − 72 dB, utilizing a 1 Vp-p differential input signal with a frequency of 1 MHz, and consuming approximately 53 mW of power. Preliminary tests conducted on the fabricated chip, using the default conﬁguration pattern extrapolated from post-layout simulations, revealed an unacceptable performance behavior of the CFIA. Nevertheless, the proposed in-ﬁeld optimization successfully restored the circuit’s performance, resulting in a robust design that meets the performance achieved in the design phase.


Introduction
The integration of machine learning (ML) and artificial intelligence (AI) with other emerging technologies, such as cloud computing, big data analytics, cyber-physical systems, and industrial internet of things (I(I)oTs), is revolutionizing the industrial sector, commonly known as industry 4.0 [1][2][3][4]. The current industry transformation predominantly depends on acquiring, analyzing, and interpreting data from smart sensors and I(I)oTs devices [5]. Consequently, it is crucial to develop accurate and reliable sensors and sensory electronics [6], which are capable of collecting, processing, and transferring data to the primary processing unit [7]. However, the performance and long-term reliability of these sensors and readout electronics encounter significant challenges due to static and dynamic variations and the aging effect [8,9]. To overcome these challenges, industry 4.0 emphasizes the importance of self-X properties, such as self-configuration, self-optimization, self-calibration, and self-diagnosis [10], in automation technology, which are new design principles for effective and autonomous control of the manufacturing process [11]. These self-X properties are essential for ensuring the reliability and performance of smart sensory electronics in industry 4.0, enabling a higher level of control and coordination across the entire value chain of products [12].
Static variations, particularly for analog sensory electronics in modern node complementary metal oxide semiconductor (CMOS) technology, result from process imperfections of semiconductor fabrication [13,14]. Additionally, package mechanical stress and the heat generated during die molding induced by the assembly process can lead to severe mismatches in characteristics of chip devices [15]. Dynamic variations are caused by environmental fluctuations, voltage changes in the power supply, and thermal drift resulting from IC self-heating [16]. On the other hand, the impact of aging, which refers to the deterioration of device characteristics over time, poses ongoing challenges in accurately modeling and forecasting circuit performance and long-term reliability during design and simulation [17,18]. This consideration is essential when evaluating and implementing robust circuits for critical applications under actual operating conditions across the IC lifecycle [19].
Chip-level dynamic calibration methods of analog and mixed-signal ICs are realized by utilizing configurable elements of the circuit serving as a calibration or tuning knobs and system performance evaluation setups [20][21][22][23][24]. As shown in Figure 1 this approach utilizes evolvable hardware (EHW), which refers to configurable electronic hardware that can be self-configured using ML and AI techniques such as metaheuristic optimization algorithms [25,26]. The evolutionary processing unit (EP) [27] runs the EHW to enable self-X properties for the system. The self-X methodology presents benefits by facilitating the calibration of sensory electronics systems even after chip packaging. Nonetheless, EHW resources may constrain the dynamic performance of analog circuits, owing to increased parasitic effects. Additionally, EHW consumes more die area and necessitates optimization time.

Manufacturing Process Manufacturing Process
Sensor(s) Likewise, the overhead associated with the performance evaluation setup is critical for smart sensory electronic systems (SSEs) with self-X properties regarding system complexity, cost, and measurement time of different quantities [8,28]. Synopsys, after acquiring Moortec, developed Silicon Lifecycle Management (SLM) based on on-chip sensing devices and corresponding control loops integrated into their complex chips (SoCs) along with long-term data collection [29][30][31][32]. On-chip measurement setups can be classified into two different categories, depending on the evaluation criteria for the intended performance parameters. The direct performance measurement method offers enhanced accuracy and precision at the expense of increased design complexity and chip area [8]. In contrast, the indirect measurement (IM) method relies on the statistical correlation of different performance characteristics, enabling simultaneous estimation of multiple system performance parameters from simple test stimuli [8,28,[33][34][35].

Assessment Unit Optimization Unit
This paper continues our work previously published in [36] after receiving the manufactured chip from the XFAB foundry using the standard CMOS 0.35 µm technology supported by EUROPRACTICE MPW. For the design and verification process, we utilized the Cadence tools, also provided by EUROPRACTICE. Figure 2a shows the layout implementation of the chip, while the chip micrograph can be seen in Figure 2b. The chip has been coated with passivation and the top metal layer for the die surface protection. Hence, the die details are not visible. The chip is assembled using the CPGA 100 package type. The chip is a multi-project chip (MPC) that comprises amplitude and spike-domain analog front-end circuits with self-X properties (AFEX). The MPC cells will serve as the foundation for an advanced universal sensor interface featuring self-X properties, referred to as the USIX chip. However, the focus of this paper is solely on the amplitude domain. The amplitude domain comprises three essential blocks: the indirect current-feedback instrumentation amplifier (CFIA), the CFIA with digital offset autozeroing (CFIA2), and the programmable analog filter. Our former work was based on extrinsic evaluation of the proposed methodology, while the present study primarily emphasizes the intrinsic, or hardware, implementation of in-field optimization for the amplitude domain AFEX, in particular the configurable CFIA. The CFIA is a key component of the AFEX and designed to process wide-input-range sensor input signals with adjustable gain and dynamic performance. It serves as a test vehicle for in-field optimization. A four-row (4X48 bits) shadow register is integrated into the chip for CFIA as a memory unit to save and pipeline the configuration patterns from the optimization unit. One row will be active at a time to supply the CFIA with the current optimization solutions, while the other three are utilized to save the following solutions in the background. To reduce the complexity of the performance evaluation setup, a cost-effective indirect measurement method based on the total harmonic distortions (THD) is being implemented to optimize the performance of the CFIA circuit using a single test stimulus.
To carry out in-field performance optimization, an AI agent is embedded inside the automatic test equipment (ATE) for system-performance optimization. This work involves placing the AI agent at the hardware level closest to the device under test (DUT). The proposed experience replay particle swarm optimization (ERPSO) [37] has been selected as the choice of AI-based optimizer and implemented on the field programmable gate array (FPGA) board provided by the Red Pitaya [38], which functions as an edge computing device. As far as long-term chip reliability and power efficiency are con-cerned, an indirect power monitoring module (PMM) is integrated with the THD-based optimization methodology.
Mixtrinsic evolution, which was originally introduced by [39], involves a population of particles in an optimization algorithm that includes both intrinsic and extrinsic individuals. The previous work conducted at the authors' institute [40] expanded this idea to perform complex measurements, such as open-loop gain, phase margin, and output resistance, using the SPICE simulator in an extrinsic manner, while running simple measurements, such as common-mode range, output-voltage swing, and offset, intrinsically. However, this approach is not always suitable for two reasons. Firstly, simulating circuit modules is less accurate compared to physical hardware of the DUT. Secondly, it requires a significant amount of time depending on the processing power unit.
The primary objective of this work is to address three challenges in the field of chipperformance optimization and reconfigurable analog circuits. The first challenge is to demonstrate a low-cost evaluation setup that can be used for chip-performance optimization. This involves developing a cost-effective method for evaluating the performance of chips, which can help to reduce the overall cost of the evaluation process. The second challenge is to reduce the chip area of configurable SSE with reserved flexibility. This requires the development of a method to selectively apply configurability to the critical elements within the circuit that have a substantial influence on the SSE's performance. The third challenge is to provide a reliable and power-efficient optimization method for reconfigurable analog circuits. This involves developing simple PMM that can improve circuit efficiency and enhance the device's long-term reliability. This study endeavors to advance the development of highly efficient and robust chips that are compatible with Industry 4.0 applications and adhere to the specifications outlined in the association for sensors and measurement (AMA) vision [6]. The remainder of this paper is structured as follows: Section 2 provides an overview of the proposed methodology. Section 3 details the experimental setup and presents the measurement results of the fully-differential CFIA. Lastly, Section 4 offers conclusions and highlights avenues for future research. Figure 3 displays a block diagram of the proposed method for cost-effective indirect performance measurement in smart sensory electronic systems. The method utilizes a reconfigurable, fully-differential CFIA as a test vehicle for intrinsic evaluation. A sinusoidal signal with predefined amplitude and frequency is generated by the on-chip digital-to-analog (DAC) converter of Red Pitaya and applied to the CFIA for optimization purposes. Subsequently, the output response of the CFIA is sampled using the high-speed analog-to-digital converter (ADC) of the Red Pitaya board. The THD is then evaluated based on the sampled system response, which aids in predicting most of the CFIA's characteristics simultaneously. This method relies on the fact that design imperfections such as slew rate (SR), gainbandwidth product (GBW), input common-mode range (ICMR), effective number of bits, full-power bandwidth, and signal-to-noise ratio (SNR) can be translated as nonlinear distortions at the closed-loop amplifier's output. Optimization of evolvable analog circuits at the transistor level can result in harmful solutions, such as excessive currents that may cause a permanent failure of the DUT or reduce its life cycle, unlike digital evolvable hardware optimization, such as the FPGA optimization. To address this issue, enhance long-term reliability, and improve the CFIA power efficiency, we incorporated a lowcost indirect PMM with a THD-based optimization methodology. Finally, we chose our proposed experience replay particle swarm optimization (ERPSO) [37], which is a modified version of PSO, as the optimization unit.

Indirect Current-Feedback Instrumentation Amplifier (CFIA)
The instrumentation amplifier (in-amp) is the key component of the AFE circuitry for the sensor signal interface and conditioning [41]. Three primary topologies exist for implementing in-amp circuits [42]; those are the capacitive coupling chopper-stabilized in-amp (CCIA) [43], the most traditional in-amp based on three operational amplifiers (op-amps), and the indirect current-feedback in-amp (CFIA) [44]. The CFIA employs the active feedback amplifier topology [45] also referred to as a differential-difference amplifier (DDF) [46], which offers several advantages, such as high input impedance, high open-loop DC gain, and broad bandwidth [47]. Compared to the 3-opamp in-amp, the CFIA is more area and power-efficient, as its input and feedback transconductance stages share a single output-driver stage [41].
A notable feature of the CFIA is the separation and isolation of the input stage's common-mode voltage from the feedback stage's common-mode voltage, achieved through two balanced differential stages [48]. This design allows for the direct connection of sensor pairs with distinct common-mode voltages from the CFIA output common-mode voltage [45]. The input and feedback transconductance stages convert voltage signals into current signals while rejecting the common-mode voltage, resulting in a higher CMRR than that of the 3-opamp in-amp [49]. Furthermore, mismatches in the feedback resistor only lead to a closed-loop gain inaccuracy error [50] and do not impact the CMRR's performance. Depending on the input stage type (NMOS or PMOS), the CFIA's capability to amplify sensor voltages approaching either supply rail renders it suitable for conditioning various sensor types. Nevertheless, the CFIA faces two issues associated with the DDF core amplifier. The first issue concerns gain inaccuracy errors due to the mismatch between input and feedback transconductances. However, this can be tackled using the same type of differential transistors for both input and feedback stages, with meticulous attention to layout matching during physical implementation.
Additionally, employing cascoded biasing currents can achieve a higher degree of matching. The second issue arises from the limited input-differential range of the input transconductance in an open-loop configuration [46]. This limitation becomes particularly problematic when interfacing high dynamic range sensors, such as magnetoresistive sensors [51]. In order to address this issue, the authors in [52] proposed a wide-inputrange fully differential CFIA solution based on a fully balanced DDF [53]. This method enabled the concurrent achievement of outstanding dynamic performance and wide-inputrange capabilities.
In order to facilitate self-X characteristics, we incorporated configuration capabilities into the critical components of the CFIA circuit, as well as those elements that significantly influence its performance. The selection of these sensitive elements was carried out by running the optimization algorithm on CFIA using extrinsic optimization techniques [54][55][56] and analyzing the performance of the CFIA. These elements function as design tuning knobs [57], as illustrated in Figure 4 and denoted by the arrow symbol. The algorithm consist of digitally weighted, scalable arrays governed by configuration bits originating from the optimization algorithm. A shadow register memory with 4 rows saves the con-figuration bits and allows hot swapping between different saved solutions to improve the optimization time.  The proposed design incorporates programmable GBW by adjusting the compensation capacitors based on the stability requirements for the selected gain. Eight discrete gain levels are available, i.e., 1, 2, 4, 8, 16, 32, 64, and 128. Moreover, programming the biasing current enables control over the amplifier's dominant and nondominant poles. This feature proves highly beneficial in restoring the stability of the CFIA should it encounter an unstable state. More details about the CFIA circuit schematic are provided in the next section. Figure 5 illustrates the primary components of the CFIA, including the PMM. The amplifier employs a buffered class-AB topology [58,59]. A common-mode feedback amplifier (CMFB) from [60] is incorporated to maintain the CFIA's output common-mode voltage around the target voltage (V CM ). To optimize the output dynamic range, V CM is established at the midpoint of the supply voltage, specifically at 1.65 V. Due to the utilization of NWELL CMOS technology, the bulk connections for all NMOS transistors are linked to the ground, while those for PMOS transistors are connected to V DD , unless otherwise specified. The power-down scheme is represented in blue. For the sake of simplicity, the biasing circuit and programmable current source are not displayed. The adjustable input and feedback transconductance (G m1 and G m2 , respectively) can be seen in Figure 6. This configuration comprises three selectable stages that can be multiplexed based on the common-and differential-range voltages. Stage 1 is appropriate for scenarios with high dynamic sensor signals centered within the CFIA-supply operation's midpoint. Stage 2 and 3 incorporate degeneration resistors and benefit when the sensor's common-mode voltage approaches V DD or G ND , respectively. More details about the transistor sizes can be found in our previous work in [36,57]. Figure 5. Schematic CFIA diagram integrating the power monitoring module. In the printed circuit-board (PCB) domain, a prevalent technique for gauging circuit current involves monitoring the voltage drop across a tiny current-sense resistor (CSR) situated along the primary supply-voltage rail. This method employs a differential amplifier and an ADC [61]. The voltage drop on the CSR should not substantially lower the circuit headroom voltage when high currents flow through it. Moreover, the total circuit's alternating current, resulting from dynamic operation, modulates the voltage drop at an equivalent frequency of operation. Consequently, the power supply rejection ratio (PSRR) must be considered when this method is used to assess the power of the amplifier working at a high frequency. When copying this method in integrated circuit solutions, it must be noted that it measures power on the primary supply rails, which can be shared by various cells. Hence, separate power rings should be employed to measure individual cells' power consumption.

Power Monitoring Module (PMM)
In some instances, only the power threshold value detection is necessary, while continuous measurement is not essential. For this purpose, the authors of [62] suggested a fundamental method for detecting maximum power utilizing a simple current-sense sensor. This approach is unsuitable when the power of different optimization solutions must be evaluated. The authors of [63] proposed an alternative technique for indirectly estimating the CFIA DC power. As illustrated in Figure 5 with the green-colored components, this method relies on mirroring scaled-down values of currents from the power-intensive branches into the current-starved ring oscillator [64]. This method modulates the drawn current and, subsequently, the power dissipation in the form of clock frequencies. The existing digital processing unit in smart sensory electronics can easily interpret the generated signal. As the output frequency is proportional to the current drawn, this method can not only identify the power-threshold value but also provide a reasonable approximation of the power consumption for different optimization solutions.

Experience Replay Particle Swarm Optimization (ERPSO)
The ERPSO algorithm enhances the classical PSO algorithm's selection producer by randomly selecting the historical global best of particles using the experience replay buffer (ERB) to tackle the complex objective space problem in SSEs [37]. The ERB concept, commonly used in reinforcement learning [65,66], leverages the accumulated historical values to improve convergence accuracy. In the case of ERPSO, the ERB represents an archive of previously visited global best particles, which reduces the likelihood of being trapped in local minima by utilizing prior knowledge instead of relying solely on recent experiences. The ERB selection process employs an adaptive epsilon-greedy algorithm to balance exploration and exploitation [67].
A flow diagram the proposed design methodology is presented in Figure 7. The process begins with a random initialization of the particle's velocity and position. Subsequently, fast Fourier transform (FFT) is executed on the output signal of the reconfigurable amplifier, where a sinusoidal signal with a known fundamental frequency is applied as a test stimulus, and the THD value is calculated from the output spectrum. The power consumption of the solution is then estimated using the built-in indirect PMM. The values of power consumption and THD serve as fitness values or cost functions for the ERPSO algorithm. During the subsequent phase, the ERPSO algorithm updates the respective personal or global best positions, if required.
To balance the trade-off between exploration and exploitation, we modified the velocity-update equation (VUE) of the conventional PSO algorithm by incorporating previously visited global best positions, and implemented its selection using the epsilon-greedy algorithm. According to the proposed VUE, the particles aim to converge quickly towards the global optimal solution with a probability of 1-. As a result, the VUE of the conventional PSO algorithm is utilized for the first scenario. On the other hand, to mitigate the probability of premature convergence, the ERPSO algorithm randomly selects any historic global best solution from the ERB with a probability of . This iterative process continues until the maximum number of iterations is reached.

Intrinsic Implementation and Architecture of the Self-X System
A block diagram of the intrinsic implementation for in-field optimization of the CFIA using the indirect measurement method is displayed in Figure 8. Two Red Pitaya boards are used in this experimental setup, the first one (FPGA board 1) is responsible for data acquisition, THD calculation using FFT, and transmitting data to the server. The second one (FPGA board 2) is responsible for the implementation of ERPSO, a serial data-transfer protocol of the configuration pattern to the CFIA, and calculating the signal frequency of the power-monitoring module. Since the analog outputs of the Red Pitaya board are referenced to 0 V, the FPGA board 1 is DC level shifted by 1.65 V to match the dynamic input range of the single supply operation of the CFIA powered by 3.3 V. Alternatively, to match the dynamic range between the FPGA board and the CFIA chip it is possible to use a transformer balun, such as PWB2010 from Coilcraft, or an active DC level shifter utilizing wide-bandwidth fully-differential amplifier circuits, such as LMH6553 from Texas Instruments or LTC6363 from Analog Devices. However, using a transformer would limit the experiment to higher frequencies, while the second solution is avoided to eliminate any uncertainty associated with adding another analog block in the chain of the prototyping demonstration. Figures 9 and 10 show the detailed implementation of the proposed self-X architecture for the CFIA circuit on the Red Pitaya boards 1 and 2, respectively. The generation of necessary binary files for Red Pitaya boards, which enables the implementation of this architecture, is carried out by utilizing the Vivado design suite software provided by Xilinx. The RF DACs incorporated in the Red Pitaya boards are employed to generate fully-differentiated stimulus signals for assessing the CFIA circuit. Meanwhile, the RF ADCs are utilized to acquire the output response of the CFIA circuit. Both the ADC and DAC have a resolution of 14 bits. The ERPSO is executed on the Red Pitaya board 2, while the Red Pitaya board 1 is responsible for carrying out the THD measurement.

Workflow of the Optimization Process
The optimization workflow, illustrated in Figure 11, is similar to Synopsys' performance optimization benchmarking platform [31]. The assessment unit comprises two Red Pitaya boards with the associated ADCs/DACs. The optimization goal is to find the best THD value with minimum power consumption using an agglomerative multi-objective optimization approach. The scalable elements in the CFIA circuit serve as tuning knobs. The algorithm reconfigures the system by passing the configuration pattern to the CFIA. In the next step, the output response of the CFIA is measured. The algorithm continues in this loop until the end condition is reached. The results are reported at the end of the optimization process.

Run and Measure
Report the Results Figure 11. Performance optimization workflow for smart sensory electronics.
The optimization process begins by serially writing the particle values of ERPSO to the shadow register of the CFIA using the Red Pitaya board 2. The CFIA is powered down during the data writing process on the shadow register to avoid the unknown transition state. After completing the data-writing process, the CFIA is turned on, and the Red Pitaya board 1 is acknowledged via the server to start performing the THD calculations for the corresponding ERPSO particle solution.
For the THD calculation task, the Red Pitaya board 1 applies the fully-differential sinusoidal stimulus to the input of the CFIA and acquires its output response by using the onboard RF DAC and ADC, respectively. In the next step, the acquired data samples are written to the shared dynamic random-access memory (DRAM) of the Red Pitaya board using an advanced eXtensible interface (AXI) stream to memory-mapped IP. The controller module sets the acknowledgment flag to report to the processing subsystems (PS) of the Red Pitaya board about the acquisition process completion. Subsequently, the THD calculation is performed on the acquired samples on the PS side of the Red Pitaya board 1. This THD value is passed to the Red Pitaya board 2 via the server for the ERPSO algorithm. After that, the ERPSO activates the power-motioning module to measure the output frequency from the power-monitoring circuit of the CFIA to estimate the DC power consumption of the corresponding solution indirectly. It is worth noting that, during the THD calculation, the power monitoring process is deactivated to avoid transient pulse switching disturbances coupled to the analog outputs. The optimization process continues until the maximum number of iterations is reached. Figure 12 presents the experimental Lab setup of the proposed methodology. The four-layer PCB prototyping board is designed using Eagle Autodesk software. Separated power and ground plates with decoupling capacitors near the chip power pins are added to improve the system's noise performance.

Shadow Register Verification
The initial verification step involves providing the CFIA circuit with the default configuration pattern, which was obtained from the post-layout extrinsic evaluation described in [36]. The configuration data is transferred serially from the Red Pitaya to the shadow register of the CFIA. The transfer rate is set to 1 Kb/s, utilizing a transfer mode similar to the SPI (serial peripheral interface) protocol mode 0. In this arrangement, the Red Pitaya and the chip function as master and slave devices, respectively. During the idle state, the clock polarity is set to a logical low level, while the shadow register samples data on the rising edge, and data transitions occur on the falling edge. Four bits are used to control the reading and writing operations on the shadow register, two bits to perform the writing operation, and two bits for the reading operation. Additionally, the most significant bit (MSB) from the register is connected to the "Dout_Debug" pin of the chip, which is utilized for debugging the serial data of the register, as presented in [36]. Figure 13 illustrates the debugging process, wherein the data initially written to the first row of the register is read back successfully after the completion of the writing operation on all four rows, which demonstrates the successful transfer of the configuration data.

CFIA Testing Using the Default Configuration
While the circuit functioned correctly during the simulation with the RC extraction netlist, and passed the PVT (process, voltage, and temperature) verification by using Monte Carlo (MC) and worst-case (WC) simulations under an extended industrial temperature range (from −40°C to 85°C) and by considering ±10% of supply voltage variation, actual measurements revealed that it suffers from instability. This instability may be attributed to shifts in device characteristics induced by the fabrication and packaging process, although it has to be mentioned that a process variation of 6 sigma was considered during simulation.

Serial data
Memory control bits Figure 13. Shadow-register-function verification using the debugging pin. Figure 14 illustrates the MC post-layout simulation for evaluating the CFIA phase margin (PM) as an indicator of unity-gain closed-loop stability in the default configuration. The evaluation employed 500 samples and a Gaussian-distribution function to emulate the actual process profile. Both process and mismatch variations were considered for the entire CFIA circuit during the MC run. As depicted in the figure, the CFIA exhibited a safe PM at the extreme corners, achieving a 100% yield for a targeted PM above 45 degrees. Throughout the test, a 15 pF capacitive load and a 10 kΩ resistive load were connected to each differential output pair. It is important to note that the default configuration only employs the two least significant bits of the configurable compensation capacitor and consumes reduced power in the output stage. Consequently, there is available space for further PM enhancement, although it is not deemed necessary based on the simulation results. Figure 15 demonstrates the practical behavior observed at the outputs when both inputs are tied to the DC common-mode voltage (VCM) of 1.65 V. The input capacitance of the mixed-signal storage oscilloscope (MSO) from Rohde & Schwarz is 14 pF by the X10 channels with 10 MΩ impedance, which falls within the load capability range of the designed CFIA.
In contrast, the output signals convey valuable information, signifying that a symmetrically balanced layout implementation leads to even and in-phase outputs, thus achieving a high common-to-differential-mode rejection ratio. As a result, the differential output signal (Vout_diff) exhibits a reduced oscillation amplitude. The fully differential circuit's ability to subtract common signals in noise amplitude levels is advantageous [68]. However, oscillatory behavior at the output indicates that the CFIA cannot respond linearly to the input signal. Figure 16 demonstrates this nonlinearity by showing the output DC characteristics of the CFIA when the inputs are swept linearly from 0 to 3.3 V with unity gain configuration and a step size of 33 mV. This behavior is compared with the characteristics derived from the post-layout simulation. Figure 17 illustrates the output's transient response to a fully differential sinusoidal input signal with a 1 Vp-p amplitude and a frequency of 1 MHz, highlighting the extent of distortion in the time domain. Additionally, Figure 18 displays the differential output signal in the frequency domain acquired through the execution of the FFT. The output nonlinearity introduces a harmonic distortion within the signal's frequency spectrum. Consequently, the CFIA nonlinearity is correlated with the measured THD value. In this particular case, a THD value of −30 dB indicates substantial nonlinearity. The experiment described above was conducted for the first time on 15 chips received from the foundry, selected from a batch of 32 and numbered sequentially. In this experiment, chip numbers 1 and 3-16 were tested, all demonstrating similar characteristics. This may be attributed to their origin from the same wafer during the fabrication. In other words, if the circuit is designed using fixed-size elements, the entire batch of products may need to be discarded. The significance of configurable circuits with self-X properties becomes evident in addressing such issues. Consequently, in the subsequent experiment, the chip was subjected to in-field optimization using the ERPSO algorithm to explore the optimum configuration pattern that brings the CFIA into the optimum operating region.

PMM Characterization
Prior to the optimization, the power monitoring circuit was evaluated by varying the CFIA biasing current through programming the current DAC, and subsequently monitoring the corresponding output-pulse frequency of the module. The PMM circuit is a current-tofrequency converter, generating a quasi-digital signal characterized by a 50% duty cycle as shown in Figure 19. The CFIA current is recorded using the current meter of the power supply unit (PeakTech 6181), which offers a resolution of 1 milliampere. A frequencyto-digital converter (FTD) was designed on the Red Pitaya to read the signal frequency from the PMM and convert it to a decimal value, as given in Table 1 via selected values. Upon detecting the initial rising edge of the output signal of the PMM circuit, the FTD module begins counting until the subsequent rising edge is identified, at which point the counter value corresponds to the relevant frequency. The FTD counter operates at a rate of 125 MHz, synchronized with the Red Pitaya's system clock. Given that the maximum PMM frequency has been determined to be less than 10 MHz, the FTD's resolution proves adequate for this particular measurement.

Vout+
Vouthhh f = 451.816 kHz Figure 19. Output signal of the integrated power monitoring module.  Figure 20 shows that the power monitoring scheme exhibits adequate linearity, facilitating the optimization algorithm by providing the necessary CFIA power data. This enables identifying and selecting the most efficient solution within the investigated range. Indeed, the linearity graph is influenced by the current measurement resolution.

CFIA Performance Optimization Using the Proposed Methodology
A fully differential sinusoidal signal, featuring an amplitude of 1 Vp-p and a frequency of 1 MHz, was generated by Red Pitaya 1 using the Digital Signal Synthesizer (DSS) offered by the Xilinx Vivado IP blocks. This signal functioned as the test stimulus for the optimization process. The Red Pitaya's ADC acquired the CFIA output at a sampling frequency of 125 MHz, which was beneficial for THD calculation through FFT. Moreover, the optimization algorithm employed 15 particles and 200 iterations. In the agglomerative multi-objective optimization approach, an 80% weight was assigned to the THD value, and a 20% weight was assigned to the power-monitoring aspect. To minimize the impact of random occurrences or lucky shots, the optimization process was performed for 10 independent runs on the chip marked as number 1. Comprehensive information on the optimization algorithm can be found in [37,63]. The mean value of the error-convergence curve for the optimization algorithm is illustrated in Figure 21. The CFIA was set to unity gain configuration for the most critical stability condition.  As illustrated in Figure 22, the FFT graph presents the frequency response corresponding to one of the solutions discovered by the algorithm. Upon completion of the optimization process, the mean THD value attained is −72 dB, accompanied by a power dissipation of 55 mW. This is visually represented through the error bar graph in Figure 23a for 10 independent trials. Concurrently, Figure 23b demonstrates the optimization statistics for a single optimization iteration across 15 distinct chips. The sinusoidal output response is presented in Figure 24, and it is evident that the signal is devoid of oscillation. The measured slew rate from the impulse response analysis is approximately ±11 V/µs. Moreover, in order to validate the stability, a step response test was conducted. The results, depicted in Figure 25, indicate that the rising and falling edge features exhibited a phase margin exceeding 60 degrees. The DC characteristics presented in Figure 26 demonstrate the dynamic input range at unity gain. This wide differential range effectively enables interfacing high-output differential sensor signals such as tunnel magneto-resistance (TMR). The AC response of the system is assessed at various programmed gain settings, as illustrated in Figure 27. However, it should be noted that the graph depicts a gain that was 6 dB lower than anticipated. This disparity is not an issue and can be attributed to the setup, which involved the acquisition of a singleended output during the bode plot. Incorporating a class-AB-complementary output stage, the CFIA circuit featured an output common-mode range that approached the supply rails, as depicted in Figure 28. During this test, a small sinusoidal signal with an amplitude of 250 mVp-p and a frequency of 1 kHz is utilized, while the CFIA gain is established at 16. It should be noted that the output-signal constraint is attributable to the output stage rather than the input characteristics.
The preceding discussion suggests that the nonlinear behavior of a CFIA or any CMOS amplifier can be indirectly estimated from the measured THD values. This is primarily due to the statistical interdependence of the various performance characteristics of the CFIA [28]. For instance, in Figures 16 and 17, the CFIA exhibited oscillatory behavior at the output, and its input range was entirely nonlinear. Consequently, as can be observed from Figure 18, its THD value was considerably reduced due to the presence of a harmonic distortion. However, after performing the self-X performance optimization loop, the THD value of the CFIA improved significantly, which indirectly indicates a linear output response, step and sinusoidal output response, and flat frequency response, as shown in Figures 22, 24, 25, 27 and 28. Therefore, it was demonstrated in practice that the proposed THD-based optimization methodology can be effectively employed to optimize most of the performance characteristics of the CFIA simultaneously.  Table 2 compares the CFIA performance between extrinsic and intrinsic evolution. The intrinsic differential DC gain is indirectly estimated from the closed-loop gain error, as it is not possible to disconnect the feedback network from the amplifier core. Overall, it is evident that there are differences between the extrinsic and intrinsic results. However, as mentioned earlier, this discrepancy is due to the shift after manufacturing. One possible reason could be the inductance effect of the package leads and the bonding wires that might put the CFIA in an oscillatory condition, and it is our first prototyping chip fabricated using the XFAB technology. The fourth column in the table illustrates the performance of the CFIA, utilizing the default configuration obtained from the extrinsic optimization process. However, due to the inherent instability and oscillatory nature of the CFIA under this particular configuration, it is not feasible to accurately characterize its performance. Nonetheless, the optimization using the proposed approach successfully identified the optimal configuration pattern, resulting in satisfactory performance of the CFIA function. This difference in configuration patterns also accounts for the divergence between the simulated and measured power, as the algorithm attempts to find the stable solution by pushing the first nondominant pole from the CFIA driver stage away from the unitygain frequency point using higher currents. The output stage is designed with fixed-size transistors; therefore, shifting their associated poles is only possible by increasing the transconductance (g m ) through a higher current. Moreover, the algorithm tries to increase the compensation capacitor, which further explains the decrease in the measured slew-rate value. The compensation capacitor ranges from 0.35 pF to 2.35 pF with a step size of 0.25 pF. The average value obtained from the extrinsic evaluation was 0.850 pF, while the intrinsic evaluation yielded a mean value of 2 pF. It is worth mentioning that the primary purpose of this work was to develop a software and hardware concept for reconfigurable electronics to make degraded circuits recover with a minimum system performance setup cost. Figure 26. Output DC characteristics of the stable CFIA with unity gain configuration after the optimization.    The optimization process was repeated without using a power monitoring approach, and the resulting mean value of the CFIA power consumption was found to be 80 mW for achieving the same THD value of −72 dB. This clearly indicates that incorporating power monitoring into the solution resulted in a 34% increase in power efficiency. This has significant advantages, especially for applications with limited power resources, such as sensor nodes powered by energy harvesting or batteries. Furthermore, reducing the current also improves the device's lifetime by avoiding the chip-interconnection currentdensity limitation. An excessive current could cause interconnection failure due to the electromigration. To validate the deviation between the designed and manufactured chips, a solution from the intrinsic optimization process was transferred to the extrinsic evaluation stage. The difference in performance becomes apparent when examining power consumption values. The specific configuration consumes 15 mA during the intrinsic run and 24 mA during the extrinsic evaluation. Table 3 presents the CFIA performance using this imported configuration, facilitating comparison with the values depicted in Table 2. It is important to note that extrinsic evaluation for this measurement was conducted at the typical mean corner of the process module, while the actual fabrication inherently differs. To render this comparison more realistic, a Monte Carlo simulation was performed around this solution. Nevertheless, the deviation was observed to remain outside the intrinsic region, as can be seen from Figure 29 by recording the power dissipation.

CFIA Design Parameter
It is important to highlight that the sensory measurement process experiences an interruption when the device is subject to optimization. However, two potential approaches can facilitate continuous measurement. The first approach involves implementing a realtime operating system (RTOS) or a time-triggered embedded system (TTES) on the Red Pitaya board when dealing with low-frequency sensor signals, such as a TMR sensor measuring the speed of a rotating shaft. This implementation allows for interleaving the calibration and measurement processes. The second approach, applicable to high-frequency sensor signals, involves adopting a ping-pong strategy, wherein one CFIA undergoes optimization while the other remains active and fully operational.   In comparison to the recent literature, the authors of [44] developed a fully differential CFIA circuit utilizing 180 nm CMOS technology for biomedical impedance-spectroscopy applications. With a fixed gain of four, the CFIA circuit achieved a −3 dB bandwidth of 5.83 MHz and slew rate of 8.3 V/µs while driving a 1.33 pF capacitive load. The measured THD value was −38 dB at a differential signal range of 60 mVp-p and a frequency of 10 kHz. The authors reported that the CFIA circuit's performance would be inferred when the differential range approaches 100 mVp-p. Operating on a single 1.8 V supply, the circuit consumed a total power of 4.795 mW. In our study, the CFIA circuit demonstrated a THD value of −72 dB at a 1 Vp-p differential voltage and 1 MHz frequency. As depicted in Figure 27, the −3 dB CFIA bandwidth is approximately 3 MHz at a gain equal to four and a slew rate of 11 V/µs while driving a capacitive load of nearly 15 pF. Additionally, the proposed CFIA offers eight programmable gain levels but consumes 53 mW of power when powered by a 3.3 V supply operation. Due to the programmable devices, our proposed CFIA consumes a layout area of 1.6 mm × 0.38 mm compared to 119.5 µm × 254.6 µm in [44].

CFIA Design Parameter Intrinsic Evaluation Extrinsic Evaluation
Compared to some commercial CFIA chips available in the market, the LMP8358 CFIA from Texas Instruments demonstrates a bandwidth of 8 MHz at a gain of 10, while driving a capacitive load of 10 pF and a resistive load of 10 kΩ. The device offers seven programmable gain levels, i.e, 20, 50, 100, 200, 500, and 1000, utilizing a parallel SPI protocol for communication. The configurable compensation capacitor is set automatically according to the selected gain to optimize the bandwidth. Capable of processing a differential signal of ±100 mVp-p, the LMP8358 consumes an average power of 6.27 mW. With an offset-voltage correction below 10 µV, the chip is apt for interfacing low-frequency differential sensor signals, even those with weak amplitudes, supported by its high gain capabilities. On the other hand, the MCP6N11 CFIA from Microchip offers five programmable gain levels, i.e., 1, 2, 10, and 100. The unity gain bandwidth is 500 kHz, and the device operates within a supply voltage range of 1.8 V to 5.5 V. When powered by a 3.3 V supply, the MCP6N11 consumes 2.64 mW. Additionally, the chip supports rail-to-rail input differential range.

Conclusions
A cost-effective and power-efficient approach is used for the intrinsic evolution of the configurable CFIA. The primary focus is to reduce the complexity of the performance evaluation setups required to support the AFE with self-X properties for in-field optimization. Initial testing, conducted prior to the intrinsic optimization process, was based on post-layout simulations using the configuration obtained from extrinsic optimization; it revealed degraded performance and unexpected instability within the CFIA circuit. Then, the in-field optimization, based on THD and a power-monitoring approach, successfully discovered the optimal configuration for the linear operation of the CFIA circuit using the ERPSO algorithm. This outcome underscores the benefits of implementing sensory electronic circuits with self-X properties for yield optimization. In essence, without the self-X capability, this manufactured batch may have been discarded, resulting in significant costs associated with its fabrication. The ERPSO algorithm is implemented at the hardware level, using Red Pitaya FPGA boards. During the optimization process, the DACs and ADCs of the Red Pitaya were utilized to assess and acquire data from the CFIA circuit. The THD optimization approach proved to be an effective tool in reducing the total number of assessment units required to optimize the performance of the CFIA or any other type of linear circuits. This is primarily due to the statistical correlation of the various performance characteristics of the amplifier on the measured THD value. Additionally, it has been observed that even the unstable circuit condition is correlated to the lower THD value. However, to ensure stability, a pulse test was conducted at the end of the optimization process. Therefore, the majority of the optimization process was conducted using a single sinusoidal signal stimulus, which was found to be an efficient method for improving amplifier performance. The power-monitoring technique was employed to help the ERPSO algorithm in identifying the power-efficient solution from the explored search space. This significantly improves the power efficiency of the solution, ultimately leading to a prolonged device lifetime and better energy utilization. The CFIA is optimized for a 1 MHz signal frequency and a 1 Vp-p dynamic input range. The achieved average optimized THD is equal to −72 dB with 34% more power efficiency than that of the optimization process without power monitoring. The output dynamic of the rail-to-rail was accomplished due to the use of the push-pull output stage. The chip was designed using the XFAB 0.35 µm technology. The optimization carried out in this work was conducted under static conditions, specifically at room temperature; our future research will involve running the optimization across industrial temperature ranges using BINDER climate chambers. Additionally, future research will examine the use of CFIA for interfacing low-frequency and high-dynamic-range sensor signals, such as TMR sensors. The optimization will be aimed at reducing power consumption, as a smaller bandwidth is necessary to achieve the desired THD value in these lower-frequency applications.

Acknowledgments:
The authors would like to thank the DAAD (Deutscher Akademischer Austauschdienst) for sponsoring the PhD researchers. We also thank EUROPRACTICE for their support in providing design tools and MPW fabrication services for our prototype chip and research activity. The reported work and, in particular, the chip manufacturing was made possible due to funding by the BMBF (German Federal Ministry of research), obtained in the context of the program SElekt_I40, and the concluded consortial project MoSeS-Pro, subproject 'Robuste adaptive integrierte Sensorelektronik und Informationsverarbeitung mit Self-X-Eigenschaften für zuverlässige Systeme der Industrie 4.0', grant no. 16ES0425 and is gratefully acknowledged.

Conflicts of Interest:
The authors declare no conflicts of interest.

Abbreviations
The following abbreviations are used in this manuscript: