A Highly Efficient and Linear mm-Wave CMOS Power Amplifier Using a Compact Symmetrical Parallel–Parallel Power Combiner With IMD3 Cancellation for 5G Applications

This paper presents a fully integrated linear power amplifier (PA) in a 65-nm CMOS process for mm-wave 5G applications. The proposed linear PA employs a compact symmetrical 4-way parallel–parallel power combiner with a third-order intermodulation distortion (IMD3) cancellation method to achieve high linear output power with a high power-added efficiency (PAE). An on-chip 4-way parallel–parallel power combiner, which combines the output power from 8-unit PAs, is designed with a compact footprint ( $241\,\,\mu \text{m}\,\,\times 241\,\,\mu \text{m}$ ). Conventional series power-combining transformer based power combiners have poor symmetrical performance for the amplitude and phase of the input impedance among unit PAs owing to the parasitic effects of the power combiners. However, the proposed parallel–parallel power combiner, which is based on parallel power-combining transformer structures, shows good symmetrical performances among unit PAs. Moreover, an IMD3 cancellation method using a parallel–parallel power combiner is proposed in this work. The proposed IMD3 cancellation method can support high-order modulation signals without increasing the complexity and reduce the dependence for digital predistortion (DPD). Consequently, the proposed linearization method obtains a high linear POUT and PAE without DPD. The PA in 65-nm CMOS demonstrates a saturated output power (PSAT) of 23.2 dBm, a 15.9-dB power gain, a 1-dB compressed output power ( $\text{P}_{\mathrm {O,1dB}}$ ) of 22 dBm, and a peak power-added efficiency (PAE) of 33.5% at 28 GHz. The measured error vector magnitude with 100 Msym/s of 256/512-QAM is −31.2/−32.1 dB with average output power of 18.02/17.73 dBm, average PAE of 17.6/16.1%, and adjacent channel power ratio (ACPR) of −30/−33.1 dBc without DPD. To the best of the authors’ knowledge, the proposed PA demonstrates high output power with the highest PAE performance supporting 256/512-QAM compared to the recently published fully integrated mm-wave 5G CMOS PAs.


I. INTRODUCTION
In the past several decades, the demands for high data rates, a massive Internet of Things, and ultra-reliable low-latency have driven the tremendous development of mobile technologies [1]- [11]. With the roll-out of the new generation of wireless communication technology, i.e., 5G applications, The associate editor coordinating the review of this manuscript and approving it for publication was Vittorio Camarchia . expected soon, various mm-wave bands range from 24 to 80 GHz have been considered and evaluated for 5G systems. According to the Federal Communications Commission (FCC), the 28 GHz band has emerged as one of the candidates among multiple mm-wave frequency bands for 5G system in most countries [12], [13].
To keep up with the needs, mm-wave transceivers integrated circuits (ICs) for 5G systems have been intensively studied. Moreover, to fulfill these demands, an orthogonal frequency-division multiplexing (OFDM) signals with highorder quadrature amplitude modulation (QAM) at mm-wave are widely used in 5G systems thanks to their large bandwidths and high data rate. Because linearity requirements of mm-wave 5G systems are stringent, mm-wave power amplifiers (PAs) based on III-V compound semiconductors with the benefit of a high supply voltage and mobility are usually implemented to generate a higher linear output power with higher linearity, supporting OFDM with high-order QAM [14]- [18]. Even though the design of PA based on III-V compound semiconductor is an attractive solution, it is not suitable for integrating system-on-chip (SoC). Hence, a PA design using CMOS technology with a low cost and a high level of integration is spotlighted for high-level system integration. However, CMOS technology has a low breakdown voltage of the device and a loss substrate, which may degrade output power and power-added efficiency (PAE). Additionally, as 5G OFDM signals with high-order QAM have high peak-to-average power ratios (PAPR), the linear output power (P OUT ) and PAE of the CMOS PA are significantly degraded when PA is required to operate in large back-off levels to satisfy linearity requirements. Thus, the design of the CMOS PA remains a bottleneck for a high linear output power with a high PAE.
To efficiently combine the output power from four or more-unit PAs with a compact size in mm-wave PAs, transformer-based power combining techniques (e.g., transformer-based radial structures, series powercombining transformers (SCTs) and modified-SCTs) have been widely adopted [19]- [29], [58], [59]. In a transformerbased radial structure, the outputs of transformers from unit PAs are connected to radial distribution networks as a power combiner. However, this technique may increase layout complexity and lead to close proximity between the input and output lines [27]- [29], [48]. In the SCT method, this topology causes imbalances in the unit PAs owing to the parasitic capacitance between the primary and secondary windings, and the provided input impedance for unit PAs is lowered as the number of unit PAs is increased [21], [24], [48]. To mitigate these issues, a series (SCT)-parallel power combiner has been recently proposed [21], [24]. However, a SCTbased combiner can still cause imbalances in unit PAs. In this work, a parallel-parallel power combiner is proposed to improve imbalances in the unit PAs with a small size. Two parallel power-combining transformers (PCTs) are combined with a parallel T-line. In addition, balance characteristics of the input impedances in the SCT and PCT are analyzed with the unwanted parasitic capacitance between the primary and secondary windings.
Typically, to improve the linearity, the external linearization technique of digital predistortion (DPD) loop-back is frequently used in transmitters to correct the amplitude and phase distortion by pre-distorting the input stimulus with their inverse equivalents. However, this technique can be required in increasing DPD complexity [41]- [44]. Recently reported several PAs with linearization techniques (e.g., transformer-based amplitude-to-phase modulation (AM-PM) correction, inductive source degeneration, PMOS varactor AM-PM compensation, and second harmonic termination) for mm-wave 5G applications have been proposed [30]- [33], [46]. A transformer-based AM-PM correction technique provides optimum neutralization of the gate-drain capacitance (C gd ) in a common-source (CS) configuration by adopting a reconfigurable and tunable coupling-coefficient-based transformer [30]. An inductive source degeneration was proposed to improve and minimize the amplitude-to-amplitude modulation (AM-AM)/AM-PM response of the output stage [31]. An AM-PM compensation scheme with a PMOS varactor is used at the input of the amplifying stages to reduce the variation in the input capacitance [32]. One-stage differential structure PAs with harmonic control circuits is proposed to minimize the second harmonics produced by the CS amplifier at the drain and source [33], [46]. The AM-AM/AM-PM characteristics or inter-modulation (IM) distortions have been analyzed and improved in these works. Nevertheless, these PAs may increase design complexity or require additional circuits for a linearization technique. To improve linearity without adding design complexity, the IMD3 cancellation method using a parallel-parallel power combiner is proposed. This paper presents a fully integrated 28-GHz linear CMOS PA using a compact symmetrical 4-way parallel-parallel power combiner with an IMD3 cancellation method to obtain a high linear P OUT with a high PAE. The implemented PA delivers an output power of saturated output power (P SAT ) of 23.5/23.2 dBm, power gain of 16.6/15.9 dB, and peak PAE of 35.5/33.5% at 27.3/28GHz, respectively. This paper is an extension of [48]. In [48], the concept of a symmetrical 4-way parallel-parallel power combiner and the IMD3 cancellation method is introduced. In this paper, the detailed design and analysis of the proposed PA are presented. This paper is organized as follows. Section II covers a comparative analysis of the PCT-based proposed power combiner and SCT-based power combiner. Section III introduces the operation of the IMD3 cancellation method. In addition, the detailed architecture and implementation of the proposed linear PA are presented in Section III. In Section IV, the measurement results of the proposed mm-wave 5G CMOS PA are presented. Finally, this paper is concluded in Section VI.

II. ANALYSIS OF SYMMETRICAL PERFORMANCE FOR PROPOSED POWER COMBINER
Before delving into the proposed 4-way parallel-parallel power combiner, we review previously reported power combiner schemes. Fig. 1(a), (b), and (c) show conceptual diagrams of the transformer-based radial structures, SCTs, modified-SCTs. In addition, Fig. 1 includes the unwanted parasitic capacitors. In Fig. 1, all configurations of the power combiner are considered for combining the output power from 8-unit PAs. Thus, 8-unit PAs are connected to each primary winding. As mentioned in the Section I, because the intersection of the input and output lines may cause stability issues and a complex layout is required, the case of Fig. 1(a) is excluded in the analysis process.
For the dotted terminals between the primary and secondary windings in Fig. 1(b) and (c), the minus side of primary winding through the unwanted parasitic capacitors is connected to the middle point of the secondary winding, while the other minus side of the primary winding is directly connected to the ground. Thus, this configuration with unwanted parasitic capacitors causes an imbalance effect, resulting in an imbalance performance in the amplitude and phase of the input impedance at each primary winding. In particular, as the operating frequency increases, the parasitic capacitor dramatically affects the overall performance. The proposed 4-way parallel-parallel power combiner for mm-wave 5G applications is shown in Fig. 1(d). As shown in Fig. 1(d), the proposed power combiner employs PCTs and a parallel T-Line. For the proposed power combiner, two-unit PAs are connected to each primary winding of the two PCTs; i.e., the output powers from 8-unit PAs are combined by two PCTs to generate a high output power. In addition, the outputs of the two PCTs are summed by a parallel T-line combiner. In the contrast to the other combiners shown in Fig. 1(b) and (c), the configuration of each primary side is identical to the unwanted parasitic capacitors in this parallel-parallel power combining approach, resulting in the minimization of imbalances.
Additionally, the topologies shown in Fig. 1(b), (c), and (d) can be considered as two-stage power-combining topologies. As shown in Fig. 1, two-stage power-combining topologies comprise the 1 st stage and 2 nd stages. SCT is adopted in Fig. 1(b) and (c) for the 1 st stage of the power-combining topology, while a PCT is used in the 1 st stage of Fig. 1(d).
In the 2 nd stage of the power-combining topology shown in Fig. 1(b), two SCTs are connected in series at the secondary winding, resulting in a SCT topology with four primary windings. In contrast, for the 2 nd stage of the powercombining topology in Fig. 1(c) and (d), two SCTs or two PCTs are connected in parallel. The parallel connection in the 2 nd stage in the power-combining topology can be simply estimated, as this configuration does not cause an imbalance performance between two SCTs or PCTs, owing to the symmetrical physical geometry. Thus, only SCT and PCT configurations in the 1 st stage of the power-combining topology are covered to analyze the input impedance of these structures. Many research works have been dedicated for the analysis of SCTs and PCTs [51]- [54]. However, these works were focused on characteristics of input impedance and passive efficiency especially at lower gigahertz frequencies.
Thus, high frequency effects owing to parasitic capacitances were not considered. In this work, imbalance performances of the input impedances at each primary winding in the SCT and PCT are analyzed with the unwanted parasitic capacitors between the primary and secondary windings. Fig. 2 shows the equivalent circuits for the single-ended cases of the PCT and SCT with the unwanted parasitic capacitors. Fig. 2 is modeled by the net inductance of L 1a /L 1b and L 2a /L 2b for the primary and secondary winding, including the equivalent series resistors R 1a /R 1b and R 2a /R 2b . M a /M b are the mutual inductance, and C a /C b are the parasitic capacitor between the primary and secondary windings. In the case of a PCT with equivalent circuits, as shown in Fig. 2 , the input and output voltages are described as follows: L2 (1) where R L is the output resistance, which is typically 50 .
According to Kirchhoff's current law (KCL) at the primary winding node in Fig. 2(a), I L1 can be defined as In addition, by considering KCL at the secondary winding node in Fig.2 (a), I L2 can be defined as Fig. 2 (a) can be expressed as where R IN 1 and X IN 1 is the real and imaginary parts of Z IN 1 , respectively. The relationship between the voltages and currents for the PCT is given by (7), as shown at the bottom of the page. Because the derivation of the equation for Z IN 1 is too complex to obtain the summarized form (6), Z IN 1 is calculated and plotted using MATLAB by solving the linear matrix equation of (7) and V 2 = R L I 2 . Fig. 2(b) shows the equivalent circuit of the SCT, in which the input and output voltages can be derived using the same approach.
By considering jωC a V ca = I ca and jωC b V cb = I cb , applying KCL to the nodes in Fig. 2

(b) gives
Additionally, applying Kirchhoff's voltage law (KVL) to each primary winding node in Fig. 2 Fig. 2(b) is expressed as If Z 1 = R 1 + jωL 1 and Z 2 = R 2 + jωL 2 , using (8), (9), and (10), V 1a , V 1b , and V 2 can be expressed as follows: The currents I 1 and I 2 in the SCT are given by (16), as shown at the bottom of the page, which can be derived with I L1a , I L2a , I L1b , and I L2b by using (10) and (11). The relationship between the voltages and currents for the SCT is given by (17), as shown at the bottom of the page. Additionally, to obtain the calculated Z IN 1a and Z IN 1b in Fig. 2 (17) and V 2 = R L I 2 are also solved using MATLAB.
To plot the calculated Z IN 1 , R L is set as 50 , and all lumped elements are determined by considering the physical layout geometries. In Fig. 2(a) and ( the self-inductances are selected as L 1 = 219 pH and L 2 = 210 pH. Moreover, if R 1a = R 1b = R 1 and R 2a = R 2b = R 2 , the parasitic resistances R 1 and R 2 are calculated using R 1 = ωL 1 /Q 1 , R 2 = ωL 2 /Q 2 , where Q 1 and Q 2 are 15 and 17. In addition, if C a = C b = C, the parasitic capacitors between the primary and secondary windings are set as C = 60 fF. To check the effect of the parasitic capacitor in PCT and SCT, the calculated input impedances of the PCT and SCT are plotted, as shown in Fig. 3 and 4. Here, the calculated real and imaginary parts of the input impedances for the PCT and SCT are obtained via (7) and (17) using MATLAB. For the PCT case, the input impedance of one PA is identical to that of the other PA over the frequency range of interest, which indicates that symmetrical performance among PAs can be achieved. On the other hand, for the SCT, as the frequency ranges increase, the input impedance mismatch increase, leading to the input impedances being unequal from one PA to another PA and, depending on each other.
The proposed power combiner comprises two PCTs and a T-line parallel combiner, adding the P OUT of 8-unit PAs; its 3D view is shown in Fig. 5. Each PCT combines four-unit PAs, and the T-line parallel combiner combines the two PCTs. The proposed combiner in Fig. 5 has four differential input ports (P1, P2, P3, and P4) and one single-ended output port (P5). The back-end-of-line (BEOL) process of 65-nm CMOS technology provides a 1.45-µmthick aluminum pad layer (AP), 3.4-µm-thick copper layer (M8), and 0.9-µm-thick copper layer (M7). In addition, the spacing between AP and M8 is similar to the spacing between M8 and M7 (0.8 and 0.74 µm, respectively). In the configuration of the PCT, to achieve a high magnetic coupling factor (k), one primary winding (Primary1) is located below the secondary winding, while the other primary winding (Primary2) is located above the secondary winding, resulting in a vertical geometry. Thus, this configuration can reduce the occupied die area. An additional T-line parallel combiner can be simply added, thus minimizing the additional combining loss. The size of the proposed power combiner is 241 µm × 96 µm, which is optimized by considering the frequency band of interest. An 8-µm metal width is chosen for each primary and secondary winding.
To evaluate the performance of the proposed power combiner, the electromagnetic (EM) simulation was performed. As shown in Fig. 6(a), the simulated self-inductances and Q factors of Primary1 and Primary2 are very close to each other. Additionally, the magnitudes of the transmission coefficients (S51, S52, S53) and phase differences (| S51 − S52| , | S52 − S53|) are shown in Fig. 6 (b). Typically, the imbalance of phase and magnitude in the combining networks degrades the power gain, output power, and PAE performance in the PA [48]. However, as shown in Fig. 6 (a) and (b), the proposed combiner exhibits very little deviation from each other because of its symmetrical geometry. Thus, the application of the proposed power  combiner to the design of the PA is expected to achieve high performances. The simulated k is approximately 0.8 both between Primary1 and secondary windings and between Primary2 and the secondary winding, as shown in Fig. 6(c). A device parasitic capacitor (C P ) is added to the primary sides, as shown in Fig. 5. At 28 GHz, the simulated overall insertion loss of the 4-way parallel-parallel combiner is −0.70 dB, which proves that the proposed power combiner has a low passive loss while occupying a smaller die area. Thus, the proposed compact 4-way parallel-parallel power combiner adds the P OUT from multiple unit PAs with low loss and symmetrical phases/amplitudes. Fig. 7 presents the input impedances (Z IN 1 , Z IN 2 , Z IN 3 , and Z IN 4 ) at each port of the proposed power combiner. As indicated in Fig. 7, the tendencies of the simulated results also exhibits small differences in each other. Thus, symmetrical performance of the proposed power combiner is verified by both the simulation and calculation results.
For the study of nonlinearity, nonlinear outputs of a nonlinear system are usually analyzed, where a sinusoidal input signal can be assumed as In (18), A, V GS , and ω are the magnitudes of the input signal, gate-source voltage, and angular frequency, respectively. To obtained form of nonlinear, a nonlinear form can be obtained by applying the Taylor series to the drain current versus v GS : (19) where g mn is the nonlinearity coefficient of the nonlinear system output. In the last term of (19), g m3 is dominant contributor to IMD3 generation [34]- [38], [41]. With g m3 characteristics represented by impulses, the third-order products of i ds , i.e., i ds, 3 , can be expressed using the residual tem in (19): (20) where K i and V i are their magnitude and voltage positions of the impulses δ (·), respectively. The multi-gated transistors (MGTRs) method is widely used to achieve IMD cancellation between MGTRs, which are biased in the class-AB/class-C region for one of the MGTRs/other MGTRs. Furthermore, IMD3 cancellation methods between unit PAs in power combining structures  (e.g., SCT and single-and two-winding transformer) have been proposed and verified to improve the linearity [49], [50]. Additionally, if an expansion version of (18)- (20) is applied to the two stages, the input/output characteristics of the two VOLUME 9, 2021 stages, i.e., the driver stage (DA) and power stage in cascade, can be obtained. To improve the linearity in the two stages, an anti-phase method was proposed [38]- [40]. The anti-phase method cancels the nonlinearity of the power stage using the DA, which has a positive sign of g m3 . However, to satisfy the stringent requirements for the high-order QAM and OFDM signals from low-output power regions to high-power output power regions, mm-wave 5G CMOS PA still requires an effective linearization method. Fig. 8 shows the simulated transfer-function derivatives of the transistor (width of 42 µm) versus the gate-source voltage (V GS ). g mi is i th -order transconductance. As shown in Fig. 8, the impulse of the g m3 , which has either a positive or negative sign, is changed by the V GS . The g m3 near the V GS of 0.3 V has a positive sign, whereas the g m3 near the V GS of 0.45 V has a negative sign. Thus, g m3 is initially positive (below a V GS of 0.38 V), and as the V GS moves to the high-voltage region, g m3 becomes negative.
To improve linearity without adding design complexity, the IMD3 cancellation method is proposed, as shown in Fig. 9. The unit PAs connected to one PCT (PCT1) are biased to the class-AB region; thus, they with a two-tone signal result in a negative IMD3 current (I 1,IMD3 ). In contrast, because the unit PAs connected to the other PCT (PCT2) are biased closer to the class-C region, a positive IMD3 current (I 2,IMD3 ) is generated, as shown in Fig. 9. At the T-line parallel combiner, the fundamental currents (I 1,Fund and I 2,Fund ) of PCT1 and PCT2 are added, while I 1,IMD3 and I 2,IMD3 are cancelled each other, resulting in good linearity. Additionally, to effectively cancel the nonlinearity coefficients in the two nonlinear stages (DA and power stages), the DAs are biased closer to the class-C region using the antiphase method; thus, positive IMD3 currents (I DA,IMD3 ) are generated. Accordingly, the negative IMD3 currents generated by the PCTs in the power stage can be canceled using positive IMD3 currents from the DAs.
First, to validate the proposed linearization concept, a twotone simulation was performed only for the power stage, at a center frequency of 28 GHz with a tone spacing of 100 MHz. Fig. 10 shows the simulated IMD3 characteristics versus the output power with various gate bias voltages of the power stage. In the simulation of the power stage, the proposed power combiner is included. The gate bias voltages of the power stage are swept from 0.25 to 0.65 V. As shown in Fig. 10, with a lower bias, the IMD3s in both the low and middle output power regions are high level, resulting in poor linearity performance. In contrast, for a higher bias, the IMD3s are maintained at a low level over most of the  output power. Thus, using a higher bias for the gate bias voltage of the power stage is expected to achieve better linearity performance. However, if the bias point is too high, achieving a high PAE of the PA is difficult. Thus, the selection of bias points is important with regard to the IMD3 and PAE. Additionally, sweet-spot generation and movement depending on the gate bias voltages are also observed in Fig. 10. As shown in Fig. 10, the sweet-spots are generated at approximately 0.35-0.5 V. Accordingly, we can estimate that the selected gate bias voltage for the power stage is approximately 0.35-0.5 V. The optimum gate bias voltage of the power stage is considered to maintain the IMD3s below −30 dBc versus an output power of 0-14 dBm. Hence, a gate bias voltage of 0.5 V for the power stage is selected to satisfy Next, the simulated IMD3s of the two stages are plotted in Fig.11. The simulated PA consists of a two-stage structure. The DA includes an inter-stage matching networks. The gate bias voltages of the DA are swept from 0.15 to 0.65 V, while the gate bias voltage of the power stage is fixed at 0.5 V. The simulation results indicate the cancellation effect of IMD3s from the power stage. For a selected gate bias voltage of 0.3 V for the DA, IMD3s are significantly suppressed in the output-power range of 0-14 dBm. the selected gate bias voltage of0.5 V for the power stage is close to the class-AB region; thus, negative IMD3s are generated. Meanwhile, the obtained gate bias voltage of 0.3 V for the DA is in the class-C region, resulting in positive IMD3s. Thus, in the two cascaded nonlinear stages, the generated negative IMD3s from the power stage in the class-AB region are clearly canceled by the positive IMD3s from the DA in the class-C region [34]- [38], [41]. As mentioned previously, the cancellation effect in the anti-phase method is remarkably demonstrated through the simulation results of the two stages, as shown in Fig. 11.
The simulated IMD3 performances of the proposed CMOS PA for proof-of-concept IMD3 cancellation method with the four different cases are shown in Fig. 12. The four different VOLUME 9, 2021 cases are classified by different bias conditions of the DAs and the unit PAs connected to two PCTs, as shown in Table 1. In case 1, the DAs are biased at 0.3 V, and the unit PAs connected to the two PCTs are biased at 0.5 V, which are acquired from the result of Fig. 11. In other words, to improve the IMD3 performance, case 1 is applied to the anti-phase method. In case 2, the DAs are biased at 0.45 V, while the unit PAs connected to the two PCTs are biased at 0.5 V. In case 3, compared to case 1, the gate bias voltage of the unit PAs connected to the two PCTs is slightly increased to close the class-A region. Finally, to improve the linearity over a wide range of P OUT , the proposed IMD3 cancellation method with an anti-phase method is applied to the proposed case. With the proposed IMD3 cancellation method, DAs are biased at 0.3 V, which contributes to generating positive IMD3s. Additionally, the unit PAs connected to PCT1 is biased at 0.35 V; thus, positive IMD3s are generated, while other PCT2 is biased at 0.55 V, resulting in negative IMD3s, as shown in Fig. 8. As mentioned previously, in the proposed powercombiner structure, the nonlinearities of IMD3s from the unit PAs connected to PCTs are canceled each other. Additionally, the nonlinearities of the IMD3s in the two-stage PA can be canceled using the anti-phase method between the DAs and the power stage. As shown in Fig. 12, most cases satisfy the IMD3 of −30 dBc up to an output power of 10 dBm. However, to achieve a high linear output power up to 16 dBm, further improvements are required. In case 1, the IMD3s at output power of up to 15 dBm are effectively canceled. Nevertheless, the enhancement of IMD3s in the high-output power region is still needed to achieve high linearity at a higher output power. Compared with the other cases, the output power of the PA in the proposed case is improved from 12 to 19 dBm with an IMD3. performance of −30 dBc by using the proposed IMD3 cancellation method. The generated sweet spots in the proposed case can be pushed toward a high output power, resulting in good linearity. Hence, the proposed IMD3 cancellation is effective for achieving high linearity with a wide output-power range. In addition, to evaluate the linearity improvement with the proposed IMD3 cancellation method, the phases of the IMD3 components for the output currents of the PCT1 and PCT2 in the proposed case are compared with those in the case 1, as shown in Fig. 13. The phases of the IMD3 components for the two output currents in the case 1 are similar and no cancellation at the T-line combiner occurs in the case 1. In contrast, the phase difference between the IMD3 components of the output currents of the PCT1 and PCT2 in the proposed case is approximately 180 • from 16-dBm output power to 19-dBm output power. Thus, the IMD3 components for the output currents from the two PCTs are canceled each other. The proposed case demonstrates lower nonlinearity of IMD3 in a high output power region.
To test wideband performances, two-tone simulations with tone spacing of 200 MHz and 400 MHz were performed, as shown in Fig. 14 and 15. For the two-tone test with a tone spacing of 200 MHz, the simulated IMD3s of the proposed case are maintained less than approximately −30 dBc up to an output power of 17 dBm. For the case with a tone spacing of 400 MHz, the bias point of the proposed case is slightly tuned (PCT1: 0.37 V and PCT2: 0.58 V) for better performance. With the increase of a tone spacing, the IMD3 performances are slightly degraded. However, the trends from the simulated results with wider tone spacing agree with the trends from the simulated results with a tone spacing of 100 MHz. Fig. 16(a) and (b) show comparisons between the power gain and PAE of the proposed IMD3 cancellation method and those of other cases. In Fig. 16(a), the power gain of the proposed case is slightly lower than those of the other cases. However, because the gate bias voltage of the DAs and power stage for the proposed techniques is lower than those of the other cases, the proposed IMD3 cancellation method exhibits excellent performance for improving the PAE, as shown in Fig. 16(b). With this method, the low bias voltage of the DAs and the power stages reduces the current consumption of the proposed PA, while power gain of the PA is sacrificed. As explained previously, the proposed PA using the IMD3 cancellation method demonstrates the superiority of the improved IMD3s and PAE under the determined bias condition. The simulated AM-PM results are shown in Fig. 16(c) for four different cases. Compared with the other cases, the AM-PM performance of the proposed case is more flat with the increase of the output power. Fig. 17 shows the load-pull simulation results for a cascode single-ended configuration with an input power of 5 dBm. The simulated output power and PAE contours are plotted with V GS = 0.35 V and V GS = 0.55 V, which is applied in the proposed case. The optimum performances for output power and PAE are slightly different for two cases. However, the optimum impedance with V GS = 0.35 V is closed to that with V GS = 0.55 V. Thus, the optimization of the output combiner considering different gate bias voltage was not considered in this design. If it is well optimized further, PAE and output power can be slightly improved. Fig. 18 shows a detailed schematic of the proposed mmwave two-stage power amplifier. The proposed mm-wave PA is composed of five functional blocks: the input-matching network, DA, inter-stage matching network, power stage, and output-matching network. For the single-ended input and output, input and output networks are required for both balun function and matching.

B. DESIGN OF PROPOSED mm-WAVE POWER AMPLIFIER
For the input-matching network, two input transformers are designed to generate two differential signals. The input transformer has a 1:1 turn ratio, and the widths of the primary and secondary windings are 8 µm. The M8 and M7 layers are used for the primary and secondary windings in the design of the input-matching networks, respectively. The size of the input-matching network is 180 µm × 72 µm. The DA is designed using the CS topology for a 1.2-V supply voltage. Each differential DA drives two differential unit PAs, which have the same transistor size (42 µm/65 nm). Two 4-way inter-stage transformers are used to drive 8-unit PAs. Because the output impedance of the unit DA is typically higher than the input impedance of a unit PA, the topology of the series power divider is used for the 4-way inter-stage transformer. The size of the inter-stage matching network is 354 µm × 65 µm. For PA designs, a differential cascode configuration is adopted for a 2.4-V supply voltage to prevent source degeneration and breakdown of the transistor. The size VOLUME 9, 2021 of the common-gate device is identical to that of the CS device, i.e., 128 µm/65 nm. In addition, the differential DA and CS in the PA include two neutralized capacitors (C N s) for high reverse isolation. Because the increase of C gd in the CS transistor will degrade the stability and power gain of the PA, the stability issues of the CS transistor can be solved by adding the C N [44], [45].
To reduce the effect of bonding wires and create a solid ground plane, four VDD and six ground pads are placed on the chip. Additionally, the ground pads including input/output ground pads and ground plane are connected to each other, and the ground pads are bound by strips, which are composed of M8 and AP layers. The proposed power combiner was used for an output-matching network to combine the output power from the unit PAs and convert differential signals into a single-ended signal.

IV. MEASUREMENT RESUTLS
The proposed mm-wave PA using a 4-way parallel-parallel combiner with IMD3 cancellation was implemented in1P8M 65-nm CMOS technology that supports an ultra-thick metal (UTM) layer and an aluminum redistribution layer (RDL). Fig. 19 shows a chip photograph of the implemented PA, which has a size of 0.87 mm × 0.9 mm, including all pads. The chip incorporates the PA cores and input/output passive networks. Because all the matching circuits are integrated, there are no external matching components apart from the off-chip bypass capacitors. To validate the performance of the PA, the implemented PA is measured using on-wafer probing with a dc wire bond to an external FR4 printed circuit board (PCB), as shown in Fig. 19. Fig. 20 shows the experimental setup for the s-parameter, large-signal continuous-wave (CW), and modulation measurements. The s-parameter is measured using an Anritsu 37369c vector network analyzer with an input power of −30 dBm. For large-signal CW and modulation measurements, the output power is measured with a Rohde & Schwarz (R&S) SMW200A signal generator and a R&S FSVA3030 spectrum analyzer. Fig. 21 shows the simulated and measured results for the s-parameters. There is little deviation between the simulated and measured results, and their trends are similar. The measured peak S21 is 16.6 dB, and the   −3 dB bandwidth is 3.2 GHz, centered at approximately F 0 = 27.3 GHz. The input return loss (S11), at F 0 is better than −15 dB and remains better than −10 dB over the range of 25-31.1 GHz. The PA is well designed for the target band. Fig. 22(a) shows the measurement results for the CW power-sweep at 27.3/28 GHz. The PA delivers a saturated P OUT (P SAT ) of 23.5/23.2 dBm with a power gain of 16.6/15.9 dB while achieving a PAE SAT of 35.5/33.5% for 27.3/28 GHz. P SAT is defined as the P OUT at a 5-dB compression point. Additionally, the measured and simulated results at 28 GHz for a CW are plotted in Fig. 22(b).
The large-signal characteristic of the PA with frequency variation results are measured over the range of 25-30 GHz. As shown in Fig. 22(c), the peak CW signal performance is obtained at 27.3 GHz. For the frequency range of 26-28 GHz, the measured P SAT is >22 dBm with a PAE of 30%. Additionally, the measured 1-dB compressed output power (P O,1dB ) achieves above 20 dBm over the range of 26-28 GHz. The 3-dB bandwidths for both P SAT and P O,1dB are more than 5 GHz. Because wideband matching methods were not applied in this work, the bandwidth performance can be improved by incorporating wideband matching methods in [55]- [57]. Comparisons of the measured the IMD3 are shown in Fig. 23 (a). The IMD3 characteristics are measured using a two-tone signal centered at 28 GHz with a tone spacing of 100 MHz in the four different cases including the proposed linearization bias point. Compared with other three cases, the proposed IMD3 cancellation method clearly produces less IMD3 for a wide range of P OUT . The IMD3 of the PA is maintained under −30 dBc up to a P OUT of 19.0 dBm. This implies that the linear output power of the PA at an IMD3 of −30 dBc is improved by 6.5 dBm. The proposed PA achieves a high P O,AVG with a high PAE by employing the parallel-parallel power combiner with the IMD3 cancellation method.
Next, the PA is characterized using a 256/512-QAM signal (7.5-dB PAPR with a 100-MSym/s data rate owing to equipment limitations). Fig. 23 (b) Fig. 24 depicts the constellation and the corresponding output-power spectrum with 256/512-QAM at 27.3/28 GHz. Table 2 presents a comparison of the PA developed in this work with other state-of-the-art CMOS PAs for 5G applications. Among the PAs in the table, the proposed CMOS PA demonstrates the high linear P OUT with the highest PAE, supporting 256/512-QAM modulation without DPDs.

V. CONCLUSION
In this paper, a mm-wave CMOS PA based on a 4-way parallel-parallel power combiner with the IMD3 cancellation method using 65-nm CMOS technology is proposed for achieving a high linear output power and high PAE. To demonstrate the superiority of the proposed power combiner, two types of power-combining transformers, SCT and PCT, are comprehensively analyzed and compared with regard to their balance performances by considering the unwanted parasitic capacitors. Additionally, the proposed IMD3 linearization method can enhance the linear output power and PAE without a large back-off from the P SAT . The proposed CMOS PA demonstrates a highly linear P OUT with the highest PAE, supporting 256/512-QAM modulation without DPD. This is achieved by adopting the mm-wave PA design for 5G applications, the proposed output-power combiner, and the linearity improvement method. To the best of the authors' knowledge, the presented PA demonstrates the highest linear P OUT with the high PAE, supporting 256/512-QAM among the recently published fully integrated mm-wave 5G CMOS PAs [48].