Enabling Efficient Real-Time Calibration on Cloud Quantum Machines

Noisy intermediate-scale quantum computers are widely used for quantum computing (QC) from quantum cloud providers. Among them, superconducting quantum computers, with their high scalability and mature processing technology based on traditional silicon-based chips, have become the preferred solution for most commercial companies and research institutions to develop QC. However, superconducting quantum computers suffer from fluctuation due to noisy environments. To maintain reliability for every execution, calibration of the quantum processor is significantly important. During the long procedure to calibrate physical quantum bits (qubits), quantum processors must be turned into offline mode. In this work, we propose a real-time calibration framework (RCF) to execute quantum program tasks and calibrate in-demand qubits simultaneously, without interrupting quantum processors. Across a widely used noisy intermediate-scale quantum (NISQ) evaluation benchmark suite such as QASMBench, RCF achieves up to 18% reliability improvement for applications. For reliability on different physical qubits, RCF achieves an average gain of 15.7% (up to 36.7%). For cloud quantum machines, the throughput can be improved up to 9.5 throughput per minute (6.5 on average) based on baseline calibration time. In conclusion, RCF offers a reliable solution for large-scale, long-serving quantum machines.


I. INTRODUCTION
Quantum computing (QC), as a potential alternative to classical computing, has received intensive research in recent years. The current noisy intermediate-scale quantum (NISQ) computing model [46] is the most practical carrier for executing QC for high-performance computing (HPC) [8], [36], [38], [49], as well as providing an efficient cloud strategy for users to explore quantum problems [15]. A significant amount of research effort has been invested in QC to improve its execution efficiency and accuracy, from high-level quantum simulation [12] to low-level quantum processor architectures [18], and quantum gate computing [25], [45], [58]. Superconducting QC is one of the most mature solutions that uses the traditional chip manufacturing process to create a platform for carrying quantum bits (qubits) by coating, etching, and exposing them on a silicon base, which makes it highly scalable. This technology has gained widespread popularity. Recently, IBM announced the launch of a superconducting quantum chip with 433 qubits, and Google has made significant advancements in several basic science fields with its superconducting quantum chip called Sycamore, which was introduced in 2018.
In contrast to conventional systems, the process factors involved in the manufacturing of superconducting quantum chips and quasi-particle noise in the environment not only slow down processing but also cause errors and inaccuracies in quantum computations. More intuitively, process factors and quasi-particle noise can significantly affect the performance of qubits, reducing the fidelity of quantum logic gates and leading to errors and inaccuracies in quantum computations. To obtain accurate results in quantum applications, it is crucial to perform high-precision quantum gate operations and develop efficient quantum control mechanisms in the physical system. In addition to improving the chip manufacturing process and reducing ambient noise during the operation of superconducting quantum computers, the use of physical experimental techniques and scheduled maintenance policies can effectively reduce errors. For superconducting quantum computers, there are mainly three kinds of errors: decoherence error, gate operation error, and measurement error. Note that initialization is a crucial step in the execution of quantum circuits, and it involves preparing qubits in a specific state that represents the input to the circuit, which can be a source of the three errors. More importantly, since the quantum system is always coupled with a noisy environment, this imperfect coupling will cause significant control parameters to shift randomly all the time.
As the core part of superconducting quantum computers, Transmon superconducting quantum chips are mainly divided into two designs: fixed frequency and tunable frequency. Frequency-fixed qubits require adjusting parameters such as the amplitude of the qubit driving pulse, while frequency-tunable qubits may require adjusting the voltage applied to the Z-line to find the magnetic flux corresponding to the desired frequency. The specific design scheme and drive form of the two will be introduced in the Section II. Due to the entanglement between qubits, it is difficult to determine which qubit is experiencing parameter drift based on the low-fidelity results of quantum algorithms. Furthermore, the problem of parameter drift is compounded by the fact that it occurs randomly on small-scale time scales, making it impossible to quickly locate the qubit where the error occurred once multiple qubits are active. Additionally, for certain specific multibit quantum logic gates, ignoring the large and complex parameters involved in calibration can result in errors caused by drifting parameters accumulating over time, making the problem more complex and difficult to trace. The existence of these interfering factors makes computer science researchers realize that when using superconducting quantum computers to run quantum circuits, it is necessary to increase the number of repetitions of tasks to improve the accuracy of the results due to the parameter shift of the quantum chip. On the other hand, researchers have proposed many comprehensive bottom-level calibration techniques for these interference factors. For example, one general and widely used calibration is to adjust shifted parameters to their defaults [60]. But it is usually expensive to calibrate all physical qubits, and requires large amounts of parameters. Besides, it usually needs to block the entire task queue on quantum cloud during the long calibration phase (e.g., daily calibration in IBM Q machines [2]). As the number of qubits integrated by quantum chips increases to the next order of magnitude, it is foreseeable that the blocking phenomenon of quantum cloud services due to the need for parameter calibration shutdowns will become more severe [13]. One possible solution is to fuse the qubit measurement and control system with the cluster that provides cloud services, but this still cannot solve the error impact caused by reducing the large-scale parameter drift of qubits during the provision of cloud services. Little research work discusses how to make FIGURE 1. Concurrent calibration and task execution. Fifteen qubits are classified into two classes and executed concurrently. One class executes a quantum application, and the other performs a calibration circuit. Qubits of the two classes are measured simultaneously. We use this simultaneous calibration as a signal for when to perform full device calibration without the need for periodic calibration checks.
these calibration techniques more practical and efficient. For example, a calibration that only adjusts the most significant parameters, and calibrate needs-based qubits without interrupting tasks execution in cloud quantum machines. It could not only provide a significant reliability and throughput improvement, but also a better experience for users. To address these challenges, we propose a practical calibration framework, named real-time calibration framework (RCF), for cloud quantum machines to exploit concurrent calibration (the calibration to the drift of qubits drive frequency) and task execution (the user's quantum program), as shown in Fig. 1. Overall, the contributions to this work are as follows.
1) Characterizing the reliability variance of cloud quantum machines by executing a set of quantum applications during the day, as well as detailed analysis of the individual reliability (accuracy) of different physical qubits. To further analyze the performance effect of using the same physical qubits for a long time without calibration, we propose a noisy model and conduct experiments to characterize quantum machines' reliability. 2) Based on the characterization results, we propose RCF that supports concurrent partial calibration and task execution on cloud quantum machines. RCF combines a machine-learning model to predict qubits reliability fluctuation, which could not only select and calibrate possible in-demand qubits for parameter shift, but also provide relatively high-accuracy physical qubits to execute quantum applications without interruption. 3) We conducted calibration experiments based on frequency calibration using a ten-qubit frequency-fixed quantum chip, and used a set of single-qubit gate random benchmarks as verification indicators. By increasing the frequency of the experiment, we were able to 3100917 VOLUME 4, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply.  simulate the online operation process and demonstrate the advantages of using this calibration scheme. More importantly, we explain the experimental results obtained by quantum cloud through test comparison on real quantum computers. 4) In conclusion, to exploit concurrent calibration techniques and task execution on cloud quantum machines, RCF incorporates the following four major novelties: a) calibration and concurrent task execution,; b) real-time selection for qubits; c) connectivity-aware calibration and execution; d) segmented calibration (SC).
We evaluate our RCF on Qiskit Aer pulse simulator [4] with a set of popular quantum applications from QASM-Bench [32]. Based on the experimental results, RCF achieves up to 18% reliability improvement for applications. For reliability on different physical qubits, RCF achieves an average gain of 15.7% (up to 36.7%). Also, the throughput for cloud quantum machines can be improved up to 9.5 throughput per minute (TPM) (6.5 on average), which is a predefined metric to evaluate its execution efficiency based on the baseline calibration time. Furthermore, experimental results from a 10-qubit superconducting quantum computer with a fixed frequency show that the average fidelity of the single-qubit gate calibrated using the RCF model is significantly better than the general situation within a three-day timescale. In conclusion, our RCF has good scalability and robustness to help people design and use quantum cloud machines.

A. QUANTUM COMPUTING
A qubit is a fundamental unit of data in QC. A qubit is any two-level quantum system with associated basis states |0 and |1 , which act as analogs to the classical 0 and 1 states of a bit. Unlike a classical bit, the space of admissible quantum states can be expressed as a normalized linear combination of its bases: α |0 + β |1 , for α 2 + β 2 = 1, as seen in Fig. 2. Operations on qubits take the form of unitary operations, the physical realization of which is termed a "gate." Most nonoptical modern devices are limited to a small number of measurement operators, in part due to physical limitations imposed by the topology of the quantum chip itself, such as in the case of superconducting quantum chips. To execute QC applications, most algorithms are synthesized to a series of ideal gates, which are then implemented in hardware as a set of noisy operations on physical qubits. These qubits are then measured, and the resulting probability distribution over the set of measurement operators can be used to infer a classical result. However, some hybrid quantum algorithms, such as the variational quantum eigensolver (VQE), have been found to be surprisingly robust against the final generation of noisy operations, and their application execution is not significantly affected as much as one would expect. These algorithms possess a natural resilience to coherent errors during execution.
Commercial quantum computers typically contain between 5 and 128 qubits, as seen in current prototypes from providers such as Google, IBM, Origin, Rigetti, and IonQ [41]. As shown in Fig. 3, users can access these quantum computers by submitting quantum tasks to a queue through cloud services. Spatial and temporal noise is usually caused by the complexity of fabrication, the difficulty of manipulating gates, and external interference, and this noise can result in different types of errors when executing applications [30], [41], [42]. Due to the high error rates and the lack of fault tolerance capability in quantum computers, programs must be executed multiple times, with each execution being called a trial. Also, collecting statistics is a natural part of running a quantum computer. To obtain a reliable estimate of the outcome probabilities, it is necessary to run the same circuit multiple times. While noise can affect the accuracy of the statistics, it only increases the amount of data needed to achieve the desired level of precision. The reliability of quantum computers can be reflected by many metrics after executing applications, such as the probability of a successful trial (PST) [13].

B. SUPERCONDUCTING QUANTUM COMPUTERS AND ERRORS
Superconducting QC is indeed a mature solution for building quantum computers, and transmon qubits are widely used as the basic building blocks in superconducting quantum computers. Transmon qubits are highly designable, and their properties can be adjusted by changing the capacitance and Josephson junction structure. Transmon qubits have a long decoherence time, and they are easy to couple and scale up. The basic circuit of a transmon qubit is an LC resonant circuit formed by an inductor connected in parallel with a capacitor. By replacing the inductance with a Josephson junction, a two-level system can be created to form qubits.  There are two types of transmon qubits (as shown in Fig. 4): frequency-tunable and frequency-fixed. The frequency-tunable transmon qubits, such as those used in Google's Sycamore, can avoid frequency collisions in largescale, qubit-integrated chips. On the other hand, frequencyfixed transmon qubits, such as those used in IBM's Falcon, have one less Josephson junction and can avoid frequency drift problems. However, they still need to calibrate the single-qubit gate drive pulse.
On superconducting quantum computers, errors from the following five categories are the major bottlenecks that affect reliability.
1) Operational errors are errors resulting from gate operations during quantum computations, including singlequbit and two-qubit gate errors. 2) Measurement errors occur when qubits are measured at the end of quantum computations, e.g., reading a qubit in the |1 state becomes the |0 state. 3) Coherence errors occur as qubits can only maintain their state for a small duration of time, or coherence time. Therefore, coherence errors limit the number of gate operations that can be executed per qubit on a superconducting quantum machine. 4) Initialization errors occur in the initialization, which is a crucial step in the execution of quantum circuits, involves preparing qubits in a specific state that represents the input to the circuit. The initialization can be a source of operational, measurement, and coherence errors. 5) Cross-talk errors happen when the state of a qubit or a resonator between qubits affects the state of another adjacent qubit or resonator.
To reduce errors in quantum computations, compilers in superconducting quantum computers have a process called "transpilation," which provides several strategies to map program qubits in a program onto physical qubits in superconducting quantum computers with specific qubit topology and native gate operations. As the errors fluctuate depending on the physical qubits and environments, they change over time [59]. To fundamentally improve reliability and mitigate error rates in superconducting quantum computers, qubit parameters, like drive pulse parameters and qubit frequency, are necessary to calibrate [29].

C. BASICS OF QUANTUM CALIBRATION
After the superconducting quantum chip is placed in an extremely low-temperature environment, it needs to go through a series of parameter adjustments to finally run various gate operations. However, these parameters, obtained experimentally, cannot be used for a long time. Due to the defects during the quantum chip design process and environmental impact, these parameters may be shifted and need to be adjusted to maintain chip performance [57], [61].
Next, we briefly introduce the principles of several calibration experiments, which can obtain superconducting qubit parameters. Considering the interaction picture and the rotating wave approximation, if a drive with Hamiltonian, which can be expressed as is applied in a two-level system (H q = ω q 2σ z ). The corresponding quantum state evolves to rotate γ = (t )dt in a Bloch sphere at the angular velocity of ω/2 around the axis of rotation corresponding to the space vectorr = (cos ψ, sin ψ, 0). As long as we can achieve continuous adjustment of γ = (t )dt and ψ, we can realize a revolving door at any axis in the X-Y plane at any angle. In transmon qubits, inductive coupling or capacitive coupling is generally used to achieve σ x and σ y , in order to make the drive intensity can be continuously changed, we give the microwave a smooth envelope, by increasing the driving intensity, that is, increasing the voltage amplitude of the DA board output microwave, the qubit can be flipped around the X-Y plane to complete the Rabi experiment.
Through the Ramsey experiment, we can calibrate the qubit drive frequency to the targeted frequency within a threshold (e.g., 10 kHz). The experimental method is as follows: to prepare the qubit for the next step, we first apply a π/2 gate to the equatorial plane of the Bloch sphere. Next, we mark the qubit and allow it to evolve freely for a period of time, during which it will accumulate a phase that corresponds to its internal state. Then, another π/2 gate is applied, whose phase is the product of the set fringe frequency and the free evolution time. By changing the free evolution time between the two π/2 gates, the Ramsey oscillation frequency can be observed. Fig. 5 demonstrates a Ramsey pulse sequence, and the Ramsey frequency is obtained by running several of these pulses. An oscillatory signal with a certain frequency can be obtained by plotting, and fitting the data points. If we want to accurately calibrate the qubit frequency in this way, we only need to set a Ramsey oscillation frequency first, let its phase accumulate, that is, constantly change the phase of the last π/2 pulse, measure the experimental Ramsey data, fit the real oscillation frequency of Ramsey, and the absolute value of the two subtraction is the difference between the Ramsey set frequency and the real qubit frequency. With some basic iterative schemes, the true qubit frequency that satisfies the error threshold can be obtained.
In order to solve the single-qubit gate phase error and high energy level distribution leakage caused by high energy levels, we need to optimize the adiabatic performance of the drive to correct phase errors and high energy level leakage. In general, this can be achieved by accumulating the phase error of the driving waveform or correcting the weight coefficient. The process of amplifying phase error is shown in Fig. 6.

III. CHARACTERIZATION OF QUBITS RELIABILITY UNDER CALIBRATION
As mentioned previously, superconducting quantum computers need to frequently calibrate qubit parameters during longterm operation to reduce the error rate of quantum circuit operation. According to the different structural design of superconducting quantum chips, the corresponding qubit parameters that are prone to drift are also different. In order to be able to cover all types of superconducting qubits as much as possible, we chose the qubit frequency calibration experiment as the main experiment for our real-time calibration protocol. (Of course, qubit frequency calibration alone cannot adjust qubits to their optimal state.) qubit frequency is an important parameter that determines quantum logic gates, which can intuitively reflect the basic state of superconducting qubits. But for superconducting qubits with a fixed frequency, their qubit frequency will not shift, and this physical fact seems to conflict with the experimental data obtained by us through quantum cloud services. We will discuss this problem in Section V and use a real superconducting quantum computer to experiment from the ground up. In this section, we first undertake experiments to quantify qubit reliability on NISQ machines and assess the influence of current daily calibration on physical qubit reliability. We observe the reliability of quantum machines changes dramatically throughout the day, it only reaches a high peak after calibration for a short period of time. Then, we talk about the reliability of various physical qubits in quantum computers, and which of them need to be calibrated over time. Finally, we test the stability of physical qubits by executing a set of quantum applications without calibration and examining the potential performance loss.

A. RELIABILITY OF QUANTUM MACHINES UNDER DAILY CALIBRATION
The results obtained by conducting applications on quantum machines can be used to assess the reliability of the machine. We use a 15-qubit IBM quantum computer, the IBM Q16 Melbourne, to test the reliability of quantum machines. The QASMBench suite [32], which is a popular high-level benchmark suite for NISQ evaluation and simulation, is used to choose numerous quantum applications. The number of qubits (n) can be used to parameterize these benchmarks, which are as follows.
1) hs(n): The n-bit/qubit hidden-shift algorithm that determines the constant where the input of one function is increased (shifted) relative to the input of another oracular function, and n qubits are measured at the end. 2) bv(n): The Bernstein-Vazirani algorithm [6]. It has an N-bit string encoded in a function and reads out (n + 1) qubits. 3) qft(n): The n-bit/qubit quantum Fourier transform algorithm. It is used in many other quantum algorithms as a fundamental block, where n qubits are measured.
Figs. 7 and 8 demonstrate the reliability of IBM Q16 when executing hs (14) and bv (14) with 8192 shots in each run across three days. We use the metric PST to quantify the reliability when conducting hs (14) and bv (14) because they provide a reference output (the higher the PST, the better the reliability is), and time interval CT ( n) to demonstrate elapsed time between the last calibration and subsequent task VOLUME 4, 2023 Engineering uantum  execution. As shown in Table 1, their PST both fluctuate over time with the peaks attained after calibration: 4.4%, 4.7%, and 4.5% in hs (14), with three short time intervals: CT 1 ≈ 30 min, CT 2 ≈ 20 min, and CT 3 ≈ 25 min. 21.7%, 21%, and 20.2% in bv (14), with three time intervals: CT 4 ≈ 10 min, CT 5 ≈ 35 min, and CT 6 ≈ 60 min. However, they are unable to sustain at a high level, and rapidly deteriorate [decreased more than 55% and 61% in hs (14) and bv (14)] as time passes after calibration. Fig. 9 demonstrates the reliability of IBM Q16 when executing qft (15) with 8192 shots in each run. Correct results produced by qft (15) are not unique and follow a probability distribution after many runs. Therefore, we use the metric standard deviation (STDEV) between the result probability of error-free and erroneous cases in Fig. 9 to demonstrate the reliability of the investigated quantum machine, and a lower STDEV implies a higher level of reliability. As can be seen from the figure, the reliability reaches its highest value, i.e., lowest STDEV 0.107, 0.105, and 0.109 after calibrations  with CT 7 ≈ 30 min, CT 8 ≈ 20 min, and CT 9 ≈ 15 min. Nonetheless, the STDEV increases by up to 19%, for qft (15) as time goes by. From the aforementioned investigations, we observe the high demands of employing much more frequent calibrations instead of the existing daily calibration scheme to well maintain the quantum machine reliability.

B. RELIABILITY OF PHYSICAL QUBITS THROUGHOUT THE DAY
On a quantum machine, physical qubits run quantum applications. We examine physical qubits' reliability on 5-qubit IBM Q Santiago and 15-qubit IBM Q16 Melbourne to learn more about the factors that determine a machine's reliability.
PST or reliability of physical qubits was considerably varied but kept in a range over time, as shown in Figs. 10 and 11. For example, in Fig. 10, the PST of qubit 0 ranges from 92.5% to 98.9%. In comparison to qubit 0, the PST of qubit 1 ranges from 90.4% to 96.3%. It shows that calibrating qubit 1 can improve reliability by up to 6.5%, compared to 6.9% with calibrating qubit 0 due to differences in physical parameters. PST of qubit 9 can be enhanced from 59.5% to 88.2% in Fig. 11, a considerable 48.2% improvement, whereas PST of qubit 10 can only be improved from 83.5% to 93.3% (11.7% improvement), and it was always kept at a pretty high value at most times. As a result, the range of PST for executing applications in distinct quantum machines with numerous physical qubits may be substantially diverse, and calibrating certain qubits may provide a significant improvement. In addition, because qubit reliability has been maintained within a given range and most of them (besides qubit 3 in timestamp 18 in Fig. 10), exhibit comparable reliability changes over time, the reliability fluctuation per qubit is approximately consistent. Furthermore, because calibrating all qubits is costly and time consuming, the data suggest that just a subset of them needs to be calibrated if the range is reduced to a manageable level using an effective selectionbased calibration method.

C. ERROR MODELS
We characterize quantum machines' reliability by running a series of quantum applications sequentially to further investigate the performance effect of using the same physical qubits for a long period without calibration. Due to the restrictions of default daily calibration and queue-based processing in IBM Q quantum computers, we conduct research on the IBM Q quantum Aer simulator, where the noisy model copies all configuration and default data from the real backend IBM Q16. Therefore, it could accurately simulate using the initial calibration data. To build the frequency drift model, we tracked the time-dependent frequency drift of an available one-qubit pulse machine (ibmq−armonk) as the time-dependent variations, i.e., the default frequency with its time-dependent frequency drift (see Fig. 12). For other qubits with different parameters (e.g., phase accumulation time, default frequency), we model their time-dependent frequency drifts by referring to the frequency drift model in [57] to achieve a range of the estimated drift, and following the time-dependent variations from the information tracked by the real machine. In [57], the estimated qubit frequency shift is δ = ±arccos(2P 1 − 1) + 2π k + π/2 2πτ .   [57], i.e., 2% frequency drift incurs an average 18% ± 5% in single-qubit gate errors increment in statistics. Fig. 13 shows that quantum machines' PST reduces by 17% after performing 100 consecutive hs(4) tasks, while it also shows that the quantum machine's PST decreases by 15% after performing 110 bv(4) jobs in a row. As a result, after doing numerous jobs without qubit calibration, quantum machines' reliability will be substantially impacted, where a continuous frequency drift is the predominant factor.

E. OPPORTUNITIES AND KEY NOVELTIES
From the aforementioned observations, we propose an RCF that incorporates the following novelties to improve reliability and job throughputs on the NISQ machines.

1) NOVELTY 1: CALIBRATION AND CONCURRENT TASK EXECUTION
The state-of-the-art calibration mechanism can block the cloud queue for a long time (e.g., up to 4 h in [61]). On the one hand, it is required to adjust numerous parameters for each qubit. On the other hand, it needs to calibrate all qubits. However, some parameters are stable and not easily changed (e.g., pulses in [10], parameter φ in the PhaseFSimGate operation [1]). Also, qubits' status change over time within various ranges and some of them can always be utilized in

Engineering uantum
Transactions on IEEE Liu et al.: ENABLING EFFICIENT REAL-TIME CALIBRATION ON CLOUD QUANTUM MACHINES a good status (e.g., qubit 10 in Fig. 11), i.e., only specific qubits must be calibrated with a lesser level of accuracy at a time. Therefore, we present a scalable method for calibrating substantial frequency drift in a subset of qubits while task execution is not disrupted, which allows for real-time calibration, and all qubits can dynamically maintain a consistent high-reliability state.

2) NOVELTY 2: SELECTION FOR QUBITS
To distinguish qubits for calibration or task execution, we offer a real-time selection mechanism for various qubits based on the frequency drift and predicted reliability fluctuations via a machine-learning model. For quantum-chip designers and researchers with access to the underlying controls, the method is extremely scalable, allowing them to train and extend the model based on data acquired throughout the testing process and adding factors unique to quantum-chip design.

3) NOVELTY 3: TOPOLOGY-AWARE CALIBRATION AND EXECUTION
Qubit connectivity and machine topology will have an impact on qubit reliability. On the one hand, if qubits are not physically coupled, swap operations have a considerable impact on the reliability of two-qubit operations. On the other hand, if several qubits in the two classes (task execution and calibration) are coupled, two channels may be affected by each other due to additional unintended crosstalk [13], [48]. We propose an approach to optimize channel locality and minimize swap operations for high reliability to maximize locality for both task execution and calibration.

4) NOVELTY 4: SEGMENTED CALIBRATION (SC)
If the calibration procedure takes longer than expected, the task execution will be hampered (e.g., 2 h for IBM daily calibration [61] compared with large quantum applications, which even require several hours for compiling [11]). In addition, the reliability is harmed as a result of coherence errors from various channels. Therefore, we propose segmenting the complete calibration to vary the calibration time with job execution via distinct accuracy scenarios in order to handle it efficiently. We divide the entire time elapsed by the number of decomposed native gate operations in the compilation to anticipate task execution time. We are putting in place a delay mechanism to ensure that calibration and task reading can happen at the same time. We can ensure the continuity of this SC in some unusual instances by modifying the quantum weight over privilege and prioritizing calibration.

IV. REAL-TIME CALIBRATION
Our RCF for cloud NISQ machines aims at concurrent job execution (user's quantum program) and frequency calibration (calibration to the drift of qubits drive frequency). As Fig. 14 shows, RCF has four stages: precompilation, qubits selection, compilation, and execution. It partitions qubits into two classes to execute frequency calibration or finish a quantum program for users, and qubits from both classes can be executed without interruption at the same time, i.e., concurrent task execution and calibration.

A. PRECOMPILATION
Unlike conventional compilation, which only fetches one task from the task queue at a time, this precompilation fetches many tasks from the task queue at the same time, to accelerate the entire compilation.
1) It records the qubits needed for the tasks and computes the maximum number of qubits to be calibrated without interrupting task execution. 2) It transpiles the native gate operations of the tasks based on cloud NISQ machines (e.g., the number of one-qubit and two-qubit gates).
In this stage, RCF reads tasks and records their logical qubits in a buffer. Note that the number of loaded tasks is based on the offline profiling data and a suitable processing rate.

B. QUBITS SELECTION
As the number of qubits increases, the probability of quantum errors in NISQ machines grows. In a quantum system, errors can arise due to various factors such as environmental noise, imperfect gate operations, and coupling between qubits. As the number of qubits in the system increases, the likelihood of errors occurring due to any of these factors also increases [22], [47]. Also, due to quantum entanglement, the types of errors that occur and the calibration experiments that need to be taken will become more complicated. The stateof-the-art daily calibration solution is significantly inefficient, specifically, i.e., it stalls the task queue with decreased cloud throughput, and the reliability becomes worse when executing many tasks without in-time calibration. Therefore, qubits selection decides which in-demand qubits should be 3100917 VOLUME 4, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply. calibrated or high-accurate qubits to execute the task. In other words, it computes the calibration necessity for different physical qubits based on two selection mechanisms: single-qubit and two-qubit selection.

1) SINGLE-QUBIT SELECTION
As shown in Fig. 15, single-qubit selection contains three steps to achieve single weight (SW) based on the physical attributes of each qubit: Executing Ramsey fringe experiment, using the long short-term memory (LSTM) model to predict a qubit's reliability fluctuation, and calculating weight from the two previous steps. The Ramsey fringe experiment is utilized to detect the actual drive frequency, which is frequently used to calculate the potential energy and reliability loss of a qubit. A large difference between the actual drive and resonance frequency, or qubit frequency variation (QFV), necessitates calibrating frequency drift. However, in some conditions (e.g., when executing different quantum applications), the qubit's reliability could be high or acceptable even when the difference is substantial. For example, if the application is VQE, which has a natural resilience to coherent errors during execution, the reliability may still be good when QFV is significant. To mitigate this problem and precisely find the qubits required to be calibrated, RCF incorporates reliability features from NISQ machines: Since reliability fluctuation per qubit is related to usage time, applications, and quantum machines, RCF uses the LSTM model (as described in Section III-C) to predict the qubit reliability variation (QRV) or reliability fluctuation γ t for the next time point of a qubit.

Algorithm 1: Predict QRV Based on Profiling Statistics and Quantum Benchmarks.
Input: profile_stats, benchmarks Output: QRV lstm_model ← train_lstm_model(profile_stats) qubit_use_time ← run_benchmarks(benchmarks) QRV ← get_qrv(lstm_model, qubit_use_time) return QRV This prediction is based on the qubit's current usage time, which is measured from the last calibration of the qubit. In detail, as shown in Algorithm 1, it predicts reliability variation for qubits at a time point from profiling statistics data of QRV, by randomly executing a set of quantum benchmarks on a NISQ machine with sufficient use time. Taking Fig. 11 as an example, imaging QRV of qubit 2 (Q2) on IBM Q16 is supposed to be predicted with a negative value -8% at the current time point 5 (from 44% to 36% at next time point 6) from the LSTM model, and the current QFV of the qubit is 5%, calculated by the formula: QFV = |fa − fr|/fr from the Ramsey experiment. Note that f a and f r are the frequency of the actual drive and resonant frequency of a qubit, respectively. Then, the SW from the single-qubit selection is achieved from SW = |QFV − QRV|.

2) MULTIQUBIT SELECTION
Many qubits may have frequency drift at the same time, and the number of available qubits will be significantly decreased VOLUME 4, 2023 Engineering uantum for users to perform tasks if all of them are calibrated. Therefore, after single-qubit selection, RCF needs to maximize the locality and improve the performance of the two-qubit gate operation on the NISQ machines. One possible method is to use reinforcement learning models to make a completely automated decision. However, it is difficult to compress the model size with a long decision time to read the task execution results on time. To make it more scalable, RCF uses a more effective strategy with regard to physical connection and fidelity for qubits and gate operations: Considering that there are different design schemes for superconducting quantum computers due to packaging technology restrictions, each qubit's capability when used as a two-qubit gate is not the same, and it is limited by the physical connection of qubits when performing some two-qubit operations, such as a controlled Z gate.
The two-qubit gate's reliability is critical for the mapping and compilation of quantum circuits, as it accounts for the majority of gate operation errors on NISQ machines. To reflect the performance of NISQ machines to the greatest extent, RCF uses two key reference factors to calculate the calibration necessity of qubits: the maximum physical connection number of qubits and the two-qubit gate fidelity. As shown in Algorithm 2, the multiqubit selection can be expressed as follows: First, it executes single-qubit selection to get SW. Then, according to the state information of these qubits, such as the connection of physical qubits, the two-qubit gate's fidelity, it computes a final weight to reflect the overall calibration necessity. Finally, a sort to these weights is performed for physical qubits, which represents the calibration value. As a result, by sorting these qubits' calibration values, RCF can obtain a queue of qubits to be calibrated. Since the precompilation provides a maximum number of qubits to be calibrated, the calibration efficiency will be improved. Thus, multiqubit selection can meet user tasks' needs, assign qubits to the queue, which are sorted by the calibration value, and map the user tasks to the qubits with better performance in the compilation system. In this way, the NISQ machine can be successfully divided into the calibration area and task execution area, with maximum locality and suitable calibration necessity.

C. SC AND COMPILATION
The current NISQ machine requires simultaneous measurement for qubits to reduce noise from different channels as well as coherence errors [13], [19], [24]. However, the length of the execution time of a task and calibration process may be different, leading to an efficient mechanism for concurrent measurement. If the execution time of a task is larger, RCF uses delayed instruction scheduling proposed in [13], where all qubit measurements are performed only after task execution and calibration conclude. If the calibration time is more significant, it will significantly affect the measurement of the task. What is more, due to noise and other influences, one calibration often fails to the highest performance. Therefore, SC is utilized instead of single calibration, where the Algorithm 2: Multi-Qubit Selection. Require: Obtain task information from the pre-compilation for the qubit in qubits do Get QFV of the qubit Get QRV from the LSTM Compute the single weight: SW = |QFV − QRV| Update the SW end for return the SW Require: read the calibration data of the qubits for the qubit in qubits do Sum the fidelity of its two-qubit gates end for return the gate fidelity for the qubit in qubits do Sum the number of the qubit's connections on the topology end for return the connections for the qubit in qubits do Weight = (SW)e [(connections)+(gate fidelity)] Sort weight end for return the calibration queue calibration to the highest accuracy is segmented into several iterations. In detail, each SC has three steps, as shown in Fig. 16: sampling, manipulation of the resonance frequency, and accuracy verification. Ramsey fringe is the accuracy parameter in sampling to reach the current qubit frequency. When it is set, the corresponding sampling step and time are fixed to ensure the sample is correct. For example, if the fringe is 10 MHz, the delay needs to be long enough to get enough Ramsey oscillation data of qubits to fit and get the qubit frequency, like 1000 ns, and a step length should not be less than 10 ns. If the current calibration qubit frequency drift range cannot be obtained, the next calibration is required once the qubit frequency drift amount is greater than the set fringe value. It is manipulated to the resonance frequency of its driving pulse by the gradient optimization algorithm [3]. In addition, by increasing the qubit's weight coefficient, the calibration interval of the qubit can be shortened, which can prevent the qubit frequency from drifting in a large range again. Finally, another Ramsey fringe experiment is used to verify the calibration accuracy by examining the difference between the targeted resonance and the calibrated qubit frequency [50].
Similar to prevalent gate-based compilers (e.g., IBM Qiskit [4]), RCF involves three primary phases in the compilation: Transpiling an application to machine-based native gate operations; mapping logical qubits to physical qubits; and scheduling gate operations. Besides, to predict task execution time, RCF needs to compute the elapsed time of all 3100917 VOLUME 4, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply.   [20] shows the library of the compiler's gate set and corresponding pulse durations for different gates, where the runtime of circuits under gate-based compilation is indexed to these pulse durations.
In logical-physical qubits mapping, physical qubits from the execution qubit class will be constructed as a new backend topology, and the input task will be mapped by the state-of-the-art mapper Sabre [33] to achieve high execution reliability.

D. CALIBRATION WITH TASK EXECUTION
After accuracy parameters (e.g., fringe frequency) for SC are set, and compilation for task execution is finished, the task and SC will be executed by different qubit classes, and the results are measured at the end concurrently. The targeted calibration accuracy (e.g., SC with the highest accuracy) may be not reached after several SC segments, but the reliability of some qubits may be adequate. It indicates that they can be used for the next task without executing the remaining SC to improve efficiency. To decide whether qubits are needed to perform the remaining SC, RCF verifies SC's accuracy based on the Ramsey experiment and predicts qubit reliability variance. This step is the same as the single-qubit selection in the qubits selection, and can be combined to avoid repeated Ramsey experiments.

V. EXPERIMENTAL METHODOLOGY
In this section, we briefly introduce the benchmarks, system configuration, and the metrics used in this work.

A. BENCHMARKS
To evaluate the reliability of quantum machines or physical qubits with our proposed real-time calibration, we select a set of prevalent quantum applications from the QASMBench suite [32]. The number of qubits used in the benchmarks is compatible with the real-time calibration, and their detailed statistics are summarized in Table 2, where most of them have one reference output (besides qft(n), whose results follow a distribution).

B. SYSTEM CONFIGURATION
Existing open-source cloud quantum machines (such as IBM Q16) do not support flexible and user-defined calibration, which can create limitations for our framework. If we test our framework with a superconducting quantum computer that does not provide open ports and is only used for experimental exploration, we may not have enough data to train the predictive part of the RCF. Therefore, we executed experiments with 8192 shots on Qiskit Aer Simulator [4], to simulate IBM Q16 with 15 qubits (the topology is shown in Fig. 17), and evaluate our RCF. It could precisely simulate real quantum machines with initial calibration data and VOLUME 4, 2023 Engineering uantum support flexible noisy models. To further demonstrate the usefulness of our framework, we selected an experimental superconducting quantum computer for testing. However, instead of the real-time prediction part, we focused on timing calibration. By combining these two forms of experimentation, we aim to prove the efficiency of our framework as a solution.

C. 10-QUBIT SUPERCONDUCTING QUANTUM COMPUTER
We provide a reasonable explanation for why frequency drift still occurs on superconducting quantum chips with fixed frequencies, based on the test results obtained through the IBM Quantum Cloud. To test our framework, we used a real quantum computer with a ten-qubit Transmon, where each qubit is independent of the other and has a fixed frequency. Prior to the experiment, we obtained the basic parameters of superconducting qubits experimentally and confirmed that they meet the characterization conditions, where the superconducting quantum machine and quantum chip are shown in Figs. 18 and 19. In the test device, the mix chamber temperature of the dilution refrigerator (Blueforce XLD400) we used was maintained at 9.6mk ± 0.3mk during the test, and the phase noise (typical at 1 kHz frequency bias at 5 GHz) of the high-frequency arbitrary generation modules we use is less than −90 dBc.
Since the qubits in this superconducting quantum chip are not connected by capacitors, they cannot complete the two-qubit gate operation. Therefore, we used only the singlequbit random benchmarking results as the basis for the experiment. We also added other experiments to the calibration framework to confirm the drifting qubit parameters. Although the test protocol we used has changed, this single-qubit calibration method is consistent with the calibration framework we ran on the IBM Quantum Cloud Platform. This enabled us to explain the test conclusions of the quantum cloud platform and enable secondary verification.
The experiment was conducted over a six-day period using the real superconducting quantum computer. The environment was kept stable throughout the experiment to ensure that the qubits were running in a consistent state and there was no other interference. The whole experimental cycle was divided into two parts. First, we conducted the qubit frequency calibration experiment using the RCF and calibrated it using the single-qubit random benchmark experiment. After recalibrating and calibrating all the qubits, we conducted a combination of the Rabi experiment (π pulse correction) and the qubit frequency calibration experiment. We repeated the RCF and calibration process, collected relevant data, and compared the results.

D. EVALUATION METRICS FOR RELIABILITY
On NISQ machines, due to noise and high error rates, an application is executed multiple times to achieve acceptable reliability. Also, collecting statistics is a natural part of running a quantum computer. To obtain a reliable estimate of the outcome probabilities, it is significantly necessary to run However, in some applications like qft(4) and qft (15), the results follow a distribution, where all qubits are in equal superposition of the ground and excited states. To demonstrate the reliability of these applications, we use the metric STDEV to evaluate results distribution. As IBM does not provide an exact time to block the task queue during calibration, we mix the benchmarks into the task queue to estimate the increased throughput. Therefore, we define a metric TPM to evaluate throughput improvement on average, and the total improved throughput is the product of multiplying TPM by the calibration time TPM = Number of Executed Tasks Elapsed Time . (2)

VI. RESULTS
In this section, we compared the reliability of applications running on the baseline IBM Q16 and our proposed RCF, and potential throughput improvement.

A. RELIABILITY IMPROVEMENT 1) IMPROVEMENT OF OVERALL PST
As shown in Fig. 20 across different benchmarks, the reliability of applications with one reference result achieves a significant improvement of PST in RCF [12% in hs(4), 8% in hs (8), 18% in hs (12), 14% in bv (3), 15% in bv(4), and toffoli (3)]. In hs (12), the reliability can be increased up to 18.3% (from 7% to 25.3%), and the average reliability improvement is 14% across six quantum benchmarks compared with the baseline execution. Also, the scalability of RCF is significant, as the reliability improvement for the same quantum application with more qubits is also higher.

2) IMPROVEMENT OF STDEV
For another two benchmarks, qft(4) and qft (15), we evaluate their reliability by STDEV instead of PST as they have multiple reference results and each one is correct. Fig. 21 demonstrates that RCF results in a 32.6% improvement in STDEV for qft(4), as shown by the distribution outputs in    (15) with increased reliability. Note that the improvement of STDEV can be much higher than that after the current daily calibration, due to the concurrent execution of applications with a real-time calibration on the experiments.

3) RELIABILITY IMPROVEMENT OF QUBITS
On the baseline, the reliability of physical qubits is not stable and only keeps in good status for a short time after daily calibration. We executed hs (8) and hs (12) multiple times randomly to profile the average reliability of physical qubits on the baseline and RCF. For hs (8), as shown in Fig. 23   88.2%), for Q3 (qubit 3). Also, RCF achieves 15.7% reliability improvement for all qubits on average. Similarly, in hs (12), as shown in Fig. 24, the PST improvement can be up to 23.5% (from 57.6% to 81.1%), 20.9% (from 69.6% to 90.5%) for Q2 (qubit 2) and Q8 (qubit 8) successively, and RCF achieves 9% reliability improvement for all qubits on average. Therefore, they keep relatively higher reliability compared to the baseline. Fig. 25 shows the predicted and actual execution time across 8 benchmarks. The predicted runtimes of circuits under gatebased compilation are indexed to the pulse durations from the Pulse simulator (e.g., "cx" gate pulse duration for qubits (0,1) is 298.7 ns). Executing qft (15) is most time consuming (2511 s), containing 540 gates with 4.9% prediction error. The lowest prediction error is 2.4% to execute bv(3), and the average prediction error is 5.8%, demonstrating the efficiency of the task prediction function in RCF.

C. THROUGHPUT IMPROVEMENT
On the IBM cloud quantum machines, the state-of-the-art daily calibration will block the tasks queue for several hours to calibrate significant parameters for all qubits, significantly affecting throughput. In RCF, the task and calibration to frequency drift can be executed synchronously in different qubits, to improve throughput without expensive task downtime. As shown in Fig. 26, the TPM can be up to 9.5 (6.5 on  average) in RCF, and the throughput will be increased to 78 on average if the daily calibration time is 120 min.

D. SEGMENTED CALIBRATION (SC)
Limited by the current measurement and control equipment that must simultaneously read the different bit running results on the chip, we have adopted an SC method (each SC is about 7 s in our experiments), which enables the user's task and the calibration job to be mixed and compiled to run at the same time. Frequency accuracy is measured by the calibrated and targeted resonance drive frequency. As shown in Fig. 27, with the accuracy parameter (Ramsey fringe) and gradient descent algorithm, the frequency accuracy for calibrated qubits varies for frequency drift. If the drift is small (e.g., 5% and 10%), frequency accuracy can be increased to more than 98% within 2 SC. In other words, qubits with higher frequency drift (e.g., 25%) need more segments to reach to targeted frequency for higher frequency accuracy.

E. ANALYSIS OF TEST RESULTS
By comparing the experimental results on a real superconducting quantum computer, we discovered that the reason for frequency drift in superconducting qubits with fixed frequency, as found by using qubit frequency calibration, is that the π pulse parameters used in the qubit frequency calibration experiment have drifted. This causes the calibrated qubit frequency to be nonreal qubit frequency. This problem can be avoided by adding a Rabi experiment to correct the π pulse parameters before the qubit frequency calibration. As shown in Table 3, after we have corrected the π pulse, there is no more "frequency drift," which indicates that this frequency drift comes from the drift of the π pulse parameters used in the qubit frequency calibration experiment. However, it should be noted that even if the π pulse parameters are calibrated and the fixed frequency of superconducting qubits eventually returns to the correct value, the results of single-qubit random benchmarking characterization show that there are still other parameters drifting. Therefore, while we provide an RCF, adding more calibration experiments to the framework while ensuring efficiency will be the next research goal.

VII. RELATED WORK
Reducing errors and improving the reliability of NISQ machines is the most prevalent area of research in QC. The following near-term research directions are in three aspects.

B. QUANTUM DEVICE OPTIMIZATION
To tolerate noise and fault, several device-specific techniques are proposed to mitigate the errors with significant reliability gain [14], [16], [17], [27]. The authors in [9] and [40] proposed to use trapped-ion QC, where trapped ions have relatively long coherence times, which means that the qubits are long-lived, and have a lower decoherence error than current NISQ machines.

C. CALIBRATION
Calibration is a fundamental method to stabilize and improve the physical status of NISQ machines; researchers are developing specific hardware to control calibration or efficient calibration techniques [5], [21], [28], [29], [37], [51] to reduce calibration overheads. Ref. [10] proposed several optimized pulses to tolerate parameter drift to reduce calibration frequency. However, the state-of-the-art calibration method (e.g., IBM daily calibration [2]) lacks scalability and efficiency, stalling the task queue for several hours [61] to calibrate all qubits on cloud NISQ machines.

VIII. CONCLUSION
In this article, we propose the RCF on cloud NISQ machines, the first scalable hardware and software codesign that enables concurrent user-task execution and qubits calibration without interruption. RCF reduces calibration overhead only to calibrate specific physical qubits that are in a bad status, as well as for an efficient calibration strategy. Also, it improves the throughput on cloud-based NISQ machines to avoid stalling the task queue for calibration. We evaluate RCF on the IBM Aer simulator against the baseline calibration model, with a range of widely adopted quantum benchmarks from the QASMBench suite. For reliability on the NISQ machine, our RCF achieves an average improvement of 14.5% (up to 18%) over the state-of-the-art calibration implementation methodology. For reliability on different physical qubits, RCF achieves an average gain of 15.7% (up to 36.7%) with hs (8) and 9% (up to 23.5%) with hs (12). For cloud throughput, it can be improved up to 9.5 TPNM (6.5 on average) based on the baseline calibration time.