A CMOS Temperature Sensor with a Smart Calibrated Inaccuracy of ±0.11 (3σ)

This paper presents a BJT-based smart CMOS temperature sensor. The analog front-end circuit contains a bias circuit and a bipolar core; the data conversion interface features an incremental delta-sigma analog-to-digital converter. The circuit utilizes the chopping, correlated double sampling, and dynamic element matching techniques to mitigate the effects of process bias and nonideal device characteristics on measurement accuracy. Furthermore, based on the principle of charge conservation, the dynamic range utilization of the ADC increases. We propose a neural network that uses a multilayer convolutional perceptron to calibrate the sensor output results. Using the algorithm, the sensor achieves an inaccuracy of ±0.11 °C (3σ), exceeding the accuracy of ±0.23 °C (3σ) achieved without calibration. We implement the sensor in a 0.18 µm CMOS process, occupying an area of 0.42 mm2. It achieves a resolution of 0.01 °C and has a conversion time of 24 ms.


Introduction
Temperature sensors are widely applied in instrumentation acquisition, environmental monitoring, and overheating protection, among other applications. With the increasing number and diversity of applications, the requirements for sensors have become increasingly stringent. Therefore, temperature sensors that generate a readily interpretable digital output format without requiring additional configuration circuitry are significant. Such sensors are predominantly constructed using cost-effective standard CMOS processes that enable the integration of the analog front end (FEA), data conversion, and interface circuitry onto a single chip.
In recent years, a range of temperature sensors based on BJTs have emerged, employing various readout interfaces to enhance energy efficiency. These interfaces include Continuous Time (CT) delta-sigma ADC [7], SAR ADC [8], and zoom ADC [9]. However, when considering factors such as clock jitter, circuit complexity, and area, the Discrete Time (DT) delta-sigma ADC combines the aforementioned advantages.
Furthermore, the accuracy achievable by bipolar transistors is ultimately limited by variations in base-emitter voltage resulting from process scaling and mechanical stress. To address these limitations, trimming and calibration procedures are necessary to compensate for such variations. In addition to single-point calibration, alternative approaches like two-point calibration and heater-assisted voltage calibration have been proposed [10][11][12]. However, for applications requiring high accuracy, the conventional calibration techniques described above tend to be time consuming and, consequently, expensive. This paper presents a BJT-based smart CMOS temperature sensor that operates over a temperature range of −45 • C to 125 • C. The sensor uses chopping, correlated double sampling (CDS), and dynamic element matching (DEM) [13] to minimize errors caused by op-amp, transistor, and current mirror mismatches. Additionally, proportional capacitive amplification is used to improve the usage of the incremental delta-sigma analog-to-digital converter (ADC) dynamic range. The digital curvature compensation mitigates nonlinearity in the transistor base and emitter stages. The proposed calibration method uses a multilayer convolutional perceptron neural network to improve accuracy from ±0.23 • C (3σ) to ±0.11 • C (3σ) compared to that without calibration. The remainder of the paper is structured as follows: in Section 2, the topology of the sensor is described and the operating principle of BJT-based sensors is explained in brief. In Sections 3-5, the circuit and algorithm implementation is presented, followed by experimental results in Section 6. Finally, a summary of the conclusions drawn from this study is presented in Section 7.

Operating Principle
BJTs are suitable for ratio-metric measurement schemes [14], resulting in the generation of an accurate voltage proportional to absolute temperature (PTAT) and a temperatureindependent bandgap reference voltage. Herein, a digital circuit is used to assist the analog circuit for implementing the temperature sensor. Using digital circuitry for some complex circuit functions provides area and power consumption benefits while simplifying nonlinear compensation and calibration.
Pertijs et al. [5] proposed a BJT-based temperature sensor that uses a custom deltasigma ADC. Instead of generating a temperature-independent reference voltage as the ADC reference, the sensor dynamically adjusts V BE and ∆V BE based on the ADC output, indirectly generating the desired quantized output result µ. Equations (1) and (2) express this process. ( The proposed scheme aims to address the issue of temperature curvature compensation by adjusting the ratio of V BE and ∆V BE . Although this approach can offer benefits, the practical implementation of the scaling factor α remains challenging.
This study uses a temperature sensor, which utilizes ∆V BE voltage as the input signal and V BE voltage as the quantized reference voltage for the delta-sigma ADC, as shown in Figure 1. The output of the delta-sigma ADC is ∆V BE /V BE , which can be further processed through digital circuitry to obtain the desired temperature signal µ of the PTAT. The expression for the temperature signal µ is given as follows: The above-described design offers flexibility in designing the scale factor α in digital circuits. However, the maximum quantization result of this method is approximately 0.1 at temperatures ranging between −45 • C and 125 • C. This limitation results in insufficient utilization of the ADC dynamic range.
The design incorporates a fourfold amplified ∆V BE voltage as the input to the deltasigma ADC to enhance the utilization of the ADC dynamic range. Therefore, we recorded the quantization result of the ADC as Y = 4·∆V BE /V BE . Subsequently, we can express the equation for the temperature-dependent signal µ as follows:  The design incorporates a fourfold amplified ∆VBE voltage as the input to the deltasigma ADC to enhance the utilization of the ADC dynamic range. Therefore, we recorded the quantization result of the ADC as Y = 4·∆VBE/VBE. Subsequently, we can express the equation for the temperature-dependent signal μ as follows: where the constant has a value of α/4, and Equation (4) can be readily implemented in digital circuits.
As shown in Figure 2, the use of Y = 4·∆VBE/VBE allows for greater utilization of the dynamic range of the ADC.

Analog Front-End Circuits
The analog front-end circuit used herein, as illustrated in Figure 3, comprises two major parts: the bias circuit and the bipolar core [15]. Two substrate PNPs, QLB and QRB, biased by different currents, generate two voltage signals, VBEbias2 and VBEbias1, respectively, and a ∆VBEbias voltage in the primary circuit [16]. These signals can be expressed as follows: As shown in Figure 2, the use of Y = 4·∆V BE /V BE allows for greater utilization of the dynamic range of the ADC. The design incorporates a fourfold amplified ∆VBE voltage as the input to the deltasigma ADC to enhance the utilization of the ADC dynamic range. Therefore, we recorded the quantization result of the ADC as Y = 4·∆VBE/VBE. Subsequently, we can express the equation for the temperature-dependent signal μ as follows: where the constant has a value of α/4, and Equation (4) can be readily implemented in digital circuits.
As shown in Figure 2, the use of Y = 4·∆VBE/VBE allows for greater utilization of the dynamic range of the ADC.

Analog Front-End Circuits
The analog front-end circuit used herein, as illustrated in Figure 3, comprises two major parts: the bias circuit and the bipolar core [15]. Two substrate PNPs, QLB and QRB, biased by different currents, generate two voltage signals, VBEbias2 and VBEbias1, respectively, and a ∆VBEbias voltage in the primary circuit [16]. These signals can be expressed as follows:

Analog Front-End Circuits
The analog front-end circuit used herein, as illustrated in Figure 3, comprises two major parts: the bias circuit and the bipolar core [15]. Two substrate PNPs, Q LB and Q RB , biased by different currents, generate two voltage signals, V BEbias2 and V BEbias1 , respectively, and a ∆V BEbias voltage in the primary circuit [16]. These signals can be expressed as follows: where the coefficient p is the ratio of the collector current densities of the two BJT tubes, and I C represents the collector current. Additionally, it can be derived that ∆V BEbias is PTAT, where ε is the process-dependent nonlinearity factor, k is Boltzmann's constant, q represents the electron charge, and T is the absolute temperature.
The bias circuit produces IBIAS, which is then replicated through the current mirror and delivered to the triode in the bipolar core. This leads to the generation of the ∆VBE voltage and the VBE voltage, which together form the temperature signal and serve as the ADC input signal and the reference voltage, respectively.

Bias Current Design
The bipolar transistor is connected as a diode to reduce base current leakage. This connection results in a change in the relationship between the current gain and transistor currents, which can be expressed as follows: where IC1 and IC2 denote the collector currents of different bipolar transistors, IE1 and IE2 represent the emitter currents, and β1 and β2 are the current gains. Additionally, the collector and emitter current ratios of the two substrate PNPs are denoted as pc and p.
Ensuring that β1 and β2 are equal is necessary to maintain a stable IE bias ratio. This subsequently requires a high level of stability in β over the set current operating range. As shown in Figure 4, β varies with IC current density at different temperatures in the 0.18 µm CMOS process.
There is only a slight variation in the value of β within the temperature range of −55 °C to 125 °C in this process, with the value ranging from 1.5 to 4.
Selecting appropriate values for the current density ratio (p) and bias current (IBIAS) involves a comprehensive evaluation of multiple factors, including noise, capacitive load The op-amp establishes a feedback loop in the bias circuit, clamping the input terminal voltages to be equal, allowing it to operate. This results in the application of ∆V BEbias across the bias resistor R BIAS , thus generating the I BIAS necessary for the circuit's operation.
The bias circuit produces I BIAS , which is then replicated through the current mirror and delivered to the triode in the bipolar core. This leads to the generation of the ∆V BE voltage and the V BE voltage, which together form the temperature signal and serve as the ADC input signal and the reference voltage, respectively.

Bias Current Design
The bipolar transistor is connected as a diode to reduce base current leakage. This connection results in a change in the relationship between the current gain and transistor currents, which can be expressed as follows: where I C1 and I C2 denote the collector currents of different bipolar transistors, I E1 and I E2 represent the emitter currents, and β 1 and β 2 are the current gains. Additionally, the collector and emitter current ratios of the two substrate PNPs are denoted as p c and p.
Ensuring that β 1 and β 2 are equal is necessary to maintain a stable I E bias ratio. This subsequently requires a high level of stability in β over the set current operating range. As shown in Figure 4, β varies with I C current density at different temperatures in the 0.18 µm CMOS process.
There is only a slight variation in the value of β within the temperature range of −55 • C to 125 • C in this process, with the value ranging from 1.5 to 4.
Selecting appropriate values for the current density ratio (p) and bias current (I BIAS ) involves a comprehensive evaluation of multiple factors, including noise, capacitive load driving capability, and circuit robustness. The bias currents of the two substrate PNPs in the biased circuit are 1.8 and 9 µA, respectively, resulting in an area ratio of 2:1 and a current density ratio (p) of 10. The bias current of the two substrate PNPs in the bipolar core was set to the same level as that of the bias circuit. Consequently, the transistors have equal areas with a ratio of 1:1 and a current density ratio (p) of five. Additionally, the requirements of the layout design must be considered while selecting the number of BJTs to minimize mismatch errors. The finite value of β in this process will cause significant inaccuracies in the output results. Therefore, a resistor with a value of R BIAS /m is connected in series with the base of the high-current biased BJT to make the V BEbias independent of the β value, eliminating the effect of limited current gain. driving capability, and circuit robustness. The bias currents of the two substrate PNPs in the biased circuit are 1.8 and 9 µA, respectively, resulting in an area ratio of 2:1 and a current density ratio (p) of 10. The bias current of the two substrate PNPs in the bipolar core was set to the same level as that of the bias circuit. Consequently, the transistors have equal areas with a ratio of 1:1 and a current density ratio (p) of five. Additionally, the requirements of the layout design must be considered while selecting the number of BJTs to minimize mismatch errors. The finite value of β in this process will cause significant inaccuracies in the output results. Therefore, a resistor with a value of RBIAS/m is connected in series with the base of the high-current biased BJT to make the VBEbias independent of the β value, eliminating the effect of limited current gain.

DEM
The primary source of error in ∆VBE is a current ratio mismatch between the two current branches [16]. For example, assuming a current density ratio mismatch of ∆p, the absolute error in ∆VBE can be expressed as follows: The error in the ∆VBE resulting from current density mismatch is observed to be proportional to absolute temperature. Therefore, a precision of ∆p/p ≈ 0.1% can be achieved by carefully designing the layout of the current sources [17]. To further enhance the accuracy of the current source in the analog front-end circuit, the ALL-DEM [18] approach is employed herein for dynamic element matching of the current mirror. The fundamental principle of this approach is to dynamically swap circuit devices to average device mismatch, reducing the overall mismatch between devices.
The bias circuit and bipolar core contain two blocks, each with six-unit current sources, to ensure DEM matching. In the φ1 phase of the sampling period, the left bipolar transistor in each block is biased by a unit current source selected in turn while the remaining five-unit current sources are used to bias the suitable bipolar transistor. In the second phase (φ2) of the sampling period, the bipolar core uses double CDS to exchange the PNP bias, creating a fresh set of VBE and ∆VBE voltages. This process is intended to

DEM
The primary source of error in ∆V BE is a current ratio mismatch between the two current branches [16]. For example, assuming a current density ratio mismatch of ∆p, the absolute error in ∆V BE can be expressed as follows: The error in the ∆V BE resulting from current density mismatch is observed to be proportional to absolute temperature. Therefore, a precision of ∆p/p ≈ 0.1% can be achieved by carefully designing the layout of the current sources [17]. To further enhance the accuracy of the current source in the analog front-end circuit, the ALL-DEM [18] approach is employed herein for dynamic element matching of the current mirror. The fundamental principle of this approach is to dynamically swap circuit devices to average device mismatch, reducing the overall mismatch between devices.
The bias circuit and bipolar core contain two blocks, each with six-unit current sources, to ensure DEM matching. In the ϕ1 phase of the sampling period, the left bipolar transistor in each block is biased by a unit current source selected in turn while the remaining five-unit current sources are used to bias the suitable bipolar transistor. In the second phase (ϕ2) of the sampling period, the bipolar core uses double CDS to exchange the PNP bias, creating a fresh set of V BE and ∆V BE voltages. This process is intended to mitigate the mismatch of bipolar transistors and assumes the following steps to eliminate area mismatch (∆r) of a bipolar transistor: where ∆V BEA and ∆V BEB are generated before and after CDS, respectively; taking the difference between these two ∆V BE s entirely cancels the error caused by ∆r. This process is repeated during the next sampling period until both the current mirrors of the first and the second blocks complete one cycle of biasing the biased circuit and the bipolar core, following which the contemporary mirrors of the first and second blocks are swapped, and the cycle continues in a loop. Averaging the ∆V BE voltage eliminates most of the error, leaving only high-order errors. The error obtained following DEM can be mathematically expressed as

Modulator Topology
The ADC presented herein comprises an incremental delta-sigma modulator and a Sinc 3 digital filter [19]. The modulator is implemented using a second-order single-bit CIFF structure; its topology is shown in Figure 5.
area mismatch (∆r) of a bipolar transistor:  (9) where ∆VBEA and ∆VBEB are generated before and after CDS, respectively; taking the difference between these two ∆VBEs entirely cancels the error caused by ∆r. This process is repeated during the next sampling period until both the current mirrors of the first and the second blocks complete one cycle of biasing the biased circuit and the bipolar core, following which the contemporary mirrors of the first and second blocks are swapped, and the cycle continues in a loop. Averaging the ∆VBE voltage eliminates most of the error, leaving only high-order errors. The error obtained following DEM can be mathematically expressed as

Modulator Topology
The ADC presented herein comprises an incremental delta-sigma modulator and a Sinc 3 digital filter [19]. The modulator is implemented using a second-order single-bit CIFF structure; its topology is shown in Figure 5. The effective number of bits (ENOB) for the incremental delta-sigma ADC using this structure can be mathematically expressed as [20]: Considering the limited OTA gain, digital filter type, and other relevant factors, this paper has chosen a conversion period of N = 600 and a sampling clock frequency of f = 25 kHz. The effective number of bits (ENOB) for the incremental delta-sigma ADC using this structure can be mathematically expressed as [20]: Considering the limited OTA gain, digital filter type, and other relevant factors, this paper has chosen a conversion period of N = 600 and a sampling clock frequency of f = 25 kHz.

Modulator Circuit Implementation
The circuit block diagram and timing diagram of the incremental delta-sigma modulator are presented in Figures 6 and 7. To enhance the dynamic range utilization of the incremental delta-sigma modulator, the capacitance ratio of the ∆V BE sampling capacitor is set to four times that of the V BE sampling capacitor, achieving ∆V BE amplification [21]. Consequently, the charge acquired from the sampling process during one sampling period can be mathematically expressed as Sensors 2023, 23, x FOR PEER REVIEW 7 of 15

Modulator Circuit Implementation
The circuit block diagram and timing diagram of the incremental delta-sigma modulator are presented in Figures 6 and 7. To enhance the dynamic range utilization of the incremental delta-sigma modulator, the capacitance ratio of the ∆VBE sampling capacitor is set to four times that of the VBE sampling capacitor, achieving ∆VBE amplification [21]. Consequently, the charge acquired from the sampling process during one sampling period can be mathematically expressed as  Multiplication by two accounts for double sampling. While the feedback voltage is sampled using a unit sampling capacitor, the resulting charge can be expressed as The incremental delta-sigma modulator maintains charge balance through negative feedback control by setting the reference voltage to VBE and maintaining the average charge at zero. The following expression can thus be derived:

Modulator Circuit Implementation
The circuit block diagram and timing diagram of the incremental delta-sigma modulator are presented in Figures 6 and 7. To enhance the dynamic range utilization of the incremental delta-sigma modulator, the capacitance ratio of the ∆VBE sampling capacitor is set to four times that of the VBE sampling capacitor, achieving ∆VBE amplification [21]. Consequently, the charge acquired from the sampling process during one sampling period can be mathematically expressed as  Multiplication by two accounts for double sampling. While the feedback voltage is sampled using a unit sampling capacitor, the resulting charge can be expressed as The incremental delta-sigma modulator maintains charge balance through negative feedback control by setting the reference voltage to VBE and maintaining the average charge at zero. The following expression can thus be derived: Multiplication by two accounts for double sampling. While the feedback voltage is sampled using a unit sampling capacitor, the resulting charge can be expressed as The incremental delta-sigma modulator maintains charge balance through negative feedback control by setting the reference voltage to V BE and maintaining the average charge at zero. The following expression can thus be derived: Sensors 2023, 23, 5132 The output Y of the ADC can be obtained using the following equation, where Y represents the average value of the code stream output from ADC: To mitigate the impact of nonideal factors such as detuning and noise on the quantization accuracy of the delta-sigma ADC, circuit design techniques such as autozeroing and chopping techniques are used herein because the temperature signal is a low-frequency signal closely related to DC and is significantly affected by DC detuning and flicker noise.

Multilayer Convolutional Perceptron Calibration Network
To further reduce sensor errors, this study proposes a multilayer convolutional perceptron (MLCP) neural network algorithm that leverages one-dimensional convolution (Conv1d) within the hidden layer to extract relevant features based on the principles of a multilayer perceptron [22]. The MLCP architecture proposed herein is as follows.
As shown in Figure 8, the MLCP network, similar to BP neural networks [23], comprises three distinct components: the input, hidden, and output layers. The input and output layers include two linear layers while the hidden layer comprises four Conv1d layers. First, the sensor data are fed into the network via the input layer and processed by the Conv1d hidden layer to extract relevant features. Then, linear variation preprocessing is applied at the input layer to improve the subsequent network processing and fitting capability of the model. The linear variation equation is given as follows: where X represents the output, V represents the input, and A and B represent the weight and bias matrices, respectively. Two linear variations in preprocessing were applied to the information to consider the calibration temperature and alignment with subsequent network nodes. The formula for Conv1d is given as follows: where * denotes the valid cross-correlation operator, N denotes batch size, C denotes the number of channels, L denotes the length of the signal sequence, and k denotes the convolution depth. For the dataset considered here, L = 1 and N = 1. The convolution kernels have a size of one with no padding and a stride of one. Because of the characteristics of the temperature sensor output data, using one-dimensional convolution in the hidden layer can improve accuracy and enable better learning of the correlation information between each node. Finally, the output layer uses the processed input sensor data to generate the final output. We used Monte Carlo simulations to generate the output results of 1000 sensors within a temperature range of −45 • C to 125 • C. We used the output data of 100 sensors with the same temperature as that in the input during training. The epoch was set to 1000 rounds. The learning rate was reduced from 10 −3 to 10 −10 using the cosine annealing strategy. The number of learning rate rounds for warming up was three, during which the learning rate was maintained constant. The Adam optimizer was used to optimize loss values, where the loss function was mean squared error (MSE) loss, which was calculated as follows: where x represents the input and y denotes the target. We used Monte Carlo simulations to generate the output results of 1000 sensors within a temperature range of −45 °C to 125 °C. We used the output data of 100 sensors with the same temperature as that in the input during training. The epoch was set to 1000 rounds. The learning rate was reduced from 10 to 10 using the cosine annealing strategy. The number of learning rate rounds for warming up was three, during which the learning rate was maintained constant. The Adam optimizer was used to optimize loss values, where the loss function was mean squared error (MSE) loss, which was calculated as follows: x y x y x y x y (18) where x represents the input and y denotes the target.

Experimental Results
The temperature sensor was implemented using a 0.18 µm CMOS process with six metal layers, resulting in an active area of 0.422 mm . Furthermore, to increase flexibility, the decimation filter and digital back end were implemented off chip. The sensor operates at 1.8 V supply voltage and a 25 kHz clock frequency, consuming a current of 201 µA. It takes 24 ms to complete 600 measurement cycles. Figure 9 illustrates the power spectrum of the bitstream of the modulator, highlighting its effective noise-shaping capability [24]. Figure 10 shows the quantization error of the same ADC. During the sensor test, a high-precision thermostat tank filled with silicone oil was utilized to maintain a constant temperature in the test environment. Furthermore, a high-precision PT100 platinum resistor served as the reference temperature sensor. Figure 11 depicts the sensor test setup and chip micrographs.

Experimental Results
The temperature sensor was implemented using a 0.18 µm CMOS process with six metal layers, resulting in an active area of 0.422 mm 2 . Furthermore, to increase flexibility, the decimation filter and digital back end were implemented off chip. The sensor operates at 1.8 V supply voltage and a 25 kHz clock frequency, consuming a current of 201 µA. It takes 24 ms to complete 600 measurement cycles. Figure 9 illustrates the power spectrum of the bitstream of the modulator, highlighting its effective noise-shaping capability [24]. Figure 10 shows the quantization error of the same ADC. During the sensor test, a high-precision thermostat tank filled with silicone oil was utilized to maintain a constant temperature in the test environment. Furthermore, a high-precision PT100 platinum resistor served as the reference temperature sensor. Figure 11 depicts the sensor test setup and chip micrographs.     To determine the temperature error of the sensors, 18 sensors from a single batch were mounted in a dual in-line package (DIP) and measured over a temperature range of −45 °C to 125 °C. Figure 12 shows the temperature error of the 18 samples before trimming. The 3σ spread over the range of 45 °C to 125 °C is 0.23 °C. To determine the temperature error of the sensors, 18 sensors from a single batch were mounted in a dual in-line package (DIP) and measured over a temperature range of −45 • C to 125 • C. Figure 12 shows the temperature error of the 18 samples before trimming. The 3σ spread over the range of 45 • C to 125 • C is 0.23 • C.
To enhance the accuracy of the sensor, we used the MLCP neural network algorithm for sensor calibration, which yielded the calibration results plotted in Figure 13. To determine the temperature error of the sensors, 18 sensors from a single batc were mounted in a dual in-line package (DIP) and measured over a temperature range −45 °C to 125 °C. Figure 12 shows the temperature error of the 18 samples before trimmin The 3σ spread over the range of 45 °C to 125 °C is 0.23 °C. To enhance the accuracy of the sensor, we used the MLCP neural network algorith for sensor calibration, which yielded the calibration results plotted in Figure 13. As shown in Figure 13, the maximum error decreased from 0.23 °C (3σ) to 0.11 °C (3σ), with a maximum error of less than 0.06 °C within the commonly used temperatur range of 0 to 100 °C, thereby validating the effectiveness of the MLCP neural networ model. To further evaluate the performance of the MLCP model, four additional model were implemented in this study: A_Linear, which replaced the hidden layer with a linea connection; B_Wide, which doubled the dimensionality in the hidden layer; C_Deep which added two extra hidden layers; and D_Less, which doubled the nodes in the outpu layer. All models were trained and tested in the same environment, and the final result are presented in Figure 14. As shown in Figure 13, the maximum error decreased from 0.23 • C (3σ) to 0.11 • C (3σ), with a maximum error of less than 0.06 • C within the commonly used temperature range of 0 to 100 • C, thereby validating the effectiveness of the MLCP neural network model. To further evaluate the performance of the MLCP model, four additional models were implemented in this study: A_Linear, which replaced the hidden layer with a linear connection; B_Wide, which doubled the dimensionality in the hidden layer; C_Deep, which added two extra hidden layers; and D_Less, which doubled the nodes in the output layer.
All models were trained and tested in the same environment, and the final results are presented in Figure 14. Figure 14 shows that, despite having fewer parameters (Param) than other algorithms, the developed model achieves the highest accuracy and has a significantly low maximum error. This result showcases the effectiveness of the proposed model design.
range of 0 to 100 °C, thereby validating the effectiveness of the MLCP neural network model. To further evaluate the performance of the MLCP model, four additional models were implemented in this study: A_Linear, which replaced the hidden layer with a linear connection; B_Wide, which doubled the dimensionality in the hidden layer; C_Deep, which added two extra hidden layers; and D_Less, which doubled the nodes in the output layer. All models were trained and tested in the same environment, and the final results are presented in Figure 14.  Figure 14 shows that, despite having fewer parameters (Param) than other algorithms, the developed model achieves the highest accuracy and has a significantly low maximum error. This result showcases the effectiveness of the proposed model design. Table 1 summarizes the performance of the proposed model and compares it to other temperature sensor technologies. Reference [25] presents a favorable cost advantage attributed to its small area; however, it necessitates the utilization of a high-frequency clock (fs = 20 MHz) for proper operation, which imposes limitations on its application environment. References [26,27] employ BJTs as the means to generate PTAT voltages,  Table 1 summarizes the performance of the proposed model and compares it to other temperature sensor technologies. Reference [25] presents a favorable cost advantage attributed to its small area; however, it necessitates the utilization of a high-frequency clock (fs = 20 MHz) for proper operation, which imposes limitations on its application environment. References [26,27] employ BJTs as the means to generate PTAT voltages, subsequently utilizing ADC acquisition for temperature conversion. However, this approach exhibits certain drawbacks in terms of resolution and accuracy when compared to the methodology proposed in this paper. Reference [12] presents notable advantages in terms of power consumption and accuracy. However, the inherent complexity of its circuit structure and the necessity for a third-order fit in obtaining results pose significant challenges in terms of implementation complexity and cost. The temperature sensor achieves a resolution of over 0.01 • C and consumes 275.4 µW of power at a 1.8 V supply. Its resolution FOM is 661 p. The sensor boasts impressive resolution and accuracy, albeit at a higher power consumption level than other sensors. Furthermore, its accuracy is as good as ±0.06 • C (3σ) within the commonly used temperature measurement range.

Conclusions
The BJT-based smart calibration temperature sensor was designed and verified in a 0.18 µm CMOS process. The sensor mitigates the mismatch of the current mirror cell and the BJT through ALL-DEM and CDS techniques, while reducing the sensor's inaccuracy caused by offset voltage of amplifier and ADC through the utilization of the chopping technique. Additionally, charge conservation is utilized to improve the utilization of the ADC dynamic range, thereby enhancing the system's robustness. To calibrate the output data and enhance the sensor's accuracy, the MLCP neural network algorithm is proposed in this paper, which reduces the sensor's inaccuracy from ±0.23 • C (3σ) to 0.11 • C (3σ) across a temperature range of −45 • C to 125 • C. Within the commonly used temperature measurement range of −35 • C to 100 • C, the accuracy is as good as ±0.06 • C (3σ). The sensor has an effective area of 0.42 mm 2 , and the conversion time of the sensor is 24 ms.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: