Nonlinear Effects in Superconducting Thin Film Microwave Resonators

We discuss how reactive and dissipative non-linearities affect the intrinsic response of superconducting thin-film resonators. We explain how most, if not all, of the complex phenomena commonly seen can be described by a model in which the underlying resonance is a single-pole Lorentzian, but whose centre frequency and quality factor change as external parameters, such as readout power and frequency, are varied. What is seen during a vector-network-analyser measurement is series of samples taken from an ideal Lorentzian that is shifting and spreading as the readout frequency is changed. According to this model, it is perfectly proper to refer to, and measure, the resonant frequency and quality factor of the underlying resonance, even though the swept-frequency curves appear highly distorted and hysteretic. In those cases where the resonance curve is highly distorted, the specific shape of the trajectory in the Argand plane gives valuable insights into the second-order physical processes present. We discuss the formulation and consequences of this approach in the case of non-linear kinetic inductance, two-level-system loss, quasiparticle generation, and a generic model based on a power-law form. The generic model captures the key features of specific dissipative non-linearities, but additionally leads to insights into how general dissipative processes create characteristic forms in the Argand plane. We provide detailed formulations in each case, and indicate how they lead to the wide variety of phenomena commonly seen in experimental data. We also explain how the properties of the underlying resonance can be extracted from this data. Overall, our paper provides a self-contained compendium of behaviour that will help practitioners interpret and determine important parameters from distorted swept-frequency measurements.


Introduction
Superconducting thin-film microwave resonators are being developed for a wide range of applications. For example, in astronomy, large arrays of Kinetic Inductance Detectors (KIDs) are 1 arXiv:2001.02540v1 [cond-mat.mes-hall] 6 Jan 2020 being developed for ultra-low-noise measurements (100-800 GHz) of the polarisation state of the cosmic microwave background radiation [1,2,3], to carry out galaxy surveys in the submillimetre-wave region [4,5,6,7], and for energy and time resolved optical and x-ray photon counting experiments in high energy astrophysics [8,9]. Arrays of superconducting resonators coupled to Superconducting Quantum Interference Devices (SQUIDs) provide a convenient way of reading out large arrays of ultra-low-noise devices that are not themselves easily multiplexed, such as Transition Edge Sensors [10,11,12]. In quantum computing, superconducting resonators are being coupled to tunnel junctions to create qubits [13], and to embedded spin systems to create memory elements [14]. More generally, thin-film superconducting resonators are a natural system for exploring chip-based Quantum Electrodynamics (QED) [15,16], and are being realised in exotic combinations, such as superconducting electromagnetic resonators coupled to micromechanical cantilevers for studying quantum-statistical processes [17].
Not only are the applications varied, the physical realisations are diverse. Superconducting resonators can take the form of microstrip or coplanar transmission lines, shaped conductors in waveguide cavities, or even lumped-element components based on thin-film inductors and capacitors. The metallic films usually take the form of Ultra High Vacuum (UHV) deposited normal metals or superconductors (Nb, Al, Ta, Ti, NbN, NbTiN) laid down on dielectric substrates (Si, SiN, and Sapphire). The conductors can also take the form of proximitised multilayers (TiAl, TiAu, MoAu) for customising the bulk properties of films, and the substrates can be irradiated (nitrogen-vacancy centres in diamond) or surface implanted with dopants (P,Bi) to enable spin-system coupling [14].
A crucial point is that when superconducting resonators are measured, they often do not behave in a simple linear way having a near-perfect Lorentzian response, but instead show transmission and reflection coefficients that display peculiar shapes in the complex plane. Moreover, their behaviour changes as the readout power is increased, and often the resonance curves switch hysteretically between two stable states as the readout frequency is swept up and down. These effects can vary significantly between two notionally identical devices, emphasising the importance of fabrication methods and conditions. It follows that although a device may be designed on the basis of near-ideal behaviour, the actual behaviour is influenced strongly by the non-ideal characteristics of the materials used. Understanding these 'second order' effects is an essential feature of any development programme, particularly when quantum-limited operation is sought.
In this paper, we review the theoretical description of superconducting resonators, and show how a simple model based on the notion of power and energy dependent resonance frequency and quality factor can account for a wide variety of phenomena seen. We show that a considerable amount of physical information is contained in the behaviour of the quality factor, not just in the resonant frequency, as external parameters, such as the readout power, are changed. In fact, particular shapes in the complex plane are characteristic of different physical mechanisms, and it is highly desirable to be able to identify these easily when carrying out experiments, or when, say, characterising films and geometries. We describe a range of methods for extracting physical information from distorted resonance curves, which can then be used for optimising performance, and for predicting operational aspects of behaviour such as optimal readout power, small signal nonlinearity, and noise.

Preliminaries
2.1 Definitions of key symbols used throughout the paper ν r Resonant frequency.
ν r,0 Resonant frequency in the limit of zero readout power.
ν Measurement/readout frequency. P r Applied power at readout port.

U
Total energy stored in the resonator.
P t Total power loss from the resonator.
P d Power dissipated internally in the resonator. The difference between P t and P d is the power loss to the readout circuit.
Q n General notation for quality factor contribution from a particular loss mechanism.
q n Normalised quality factor q n = Q n /Q c , where Q c is the coupling quality factor.
Q t Total resonator quality factor, accounting for all losses.
Q i Internal (or intrinsic) quality factor resulting from all internal losses (P d ).

P c,n
Scale power for a nonlinear effects due to a particular mechanism. In this paper n = tls, qp and nl, corresponding to quasiparticle losses, TLS losses and losses due to the power-law model process.
U c,n Scale energy for a nonlinear effects due to a particular mechanism.
x 'Realised' fractional detuning, as defined by (ν − ν r )/ν r . Here ν r is the instantaneous value of the resonant frequency. Because ν r can vary depending on the energy stored or dissipated in the resonator, x and x 0 are only equal in the absence of reactive nonlinearities or in the limit of zero readout power. It is the realised fraction detuning that determines the measured S parameters.
y Defined as y = Q c x. Scaling x by Q t yields the realised detuning as measured in linewidths from the resonator: when Q t x = 1 the readout frequency is tuned a resonance-width above the centre frequency. Since we assume Q c is fixed and by definition Q t ≤ Q c , y specifically corresponds to the maximum value Q t x can take for all readout powers.
y 0 Defined as y 0 = Q c x 0 . The applied detuning measured in linewidths.
T c Superconducting critical temperature.
n qp Quasiparticle number density in the active volume of the device.
n qp,th n qp,th is the value of n qp in the limit of zero readout power, i.e. arising from thermal processes alone.
n * Value of n qp,th at which Q qp = Q c .
n ω Number density of pair-breaking phonons with energy in excess of twice the superconducting gap energy, 2∆, in the active volume of the resonator.
n ω,th n ω,th is the value of n ω in the limit of zero readout power, i.e. arising from thermal processes alone.
V Volume of the active region of the device.
τ pb , τ l , R 0 , r Parameters in the Rothwarf-Taylor model. τ pb is the pair-breaking lifetime, τ l is the timescale on which pair-breaking phonons are lost to scattering, R 0 is the quasiparticle recombination rate and r is the efficiency with which dissipated readout power is converted to pair-breaking phonons.
Surface impedance of the superconductor. R s and X s are the resistive and reactive components, respectively.

Quality factor
Quality factor is a well known measure of energy loss in resonant circuits. When the loss is due to a combination of dissipative processes, it is common to define a Q-like measure for each of the processes. However, a range of conventions exist, and so in this section we outline the terminology that will be used in this paper.
Let U be the energy stored in a resonator having resonant frequency ν r . If the average total power dissipated is P t , then the overall quality factor Q t is defined by For a resonator coupled to an external circuit, P t includes the energy lost to that circuit. Now assume that the total loss is due to a number of different dissipative processes, such that P t = n P n . Then where Q n = 2πν r U P n are effective quality factors, or equivalently the actual quality factor when only the n'th loss is present. The total internal quality factor, Q i characterises losses 'internal' to the resonator, in the sense they would still exist if the resonator were isolated from the readout circuit. Q i may comprise contributions from several microscopic processes: Ohmic loss and dielectric loss are examples. Q i is also commonly referred to as the unloaded [18] or intrinsic [19] quality factor.
The total coupling quality factor, Q c , is associated with the power lost from the resonator to the readout circuit. This loss is a pure feature of coupling and exists by virtue of reciprocityif energy can be transferred into the resonator, it can also be transferred out of the resonator.
Q c may also comprise loss by several mechanisms, e.g. to different ports of a multiport readout system. Based on these definitions, we can always make the division Q t in this instance is also sometimes referred to as the loaded -Q of the device [18]. A device is said to be undercoupled or overcoupled if Q c > Q i or Q c < Q i , respectively. Throughout this paper, we will use lower-case q to denote a quality factor normalised to the coupling quality factor: q n is a measure of the degree to which power lost through mechanism n compares with the power lost to the readout circuit.

Microwave scattering parameters of common resonator circuits
Consider a device comprising a resonator embedded in, and lightly coupled to, a lossless, reciprocal, multiport readout circuit. Temporal coupled mode theory [20] can be used to show that the microwave scattering parameters {S mn } at the external ports of the overall circuit have the general form Γ is the scattering matrix of the isolated readout circuit, K is a symmetric coupling matrix, and x the realised fractional detuning, where ν is the readout frequency. We will also refer to the realised detuning y in coupling-Q linewidths, which we define by Strictly we are making a single-pole approximation by neglecting the contribution from the pole at ν = −ν r , requiring Q t 1.
(6) describes a very wide range of devices, but for illustrative purposes we will use the specific example of an embedding circuit having 2 external ports.
In the case of a short-circuited λ/4 'resonator', with a series coupling capacitor, connected in parallel with a through transmission line, the equivalent circuit takes the form of Figure 1 (a), and the scattering elements of the whole device become and which displays a maximum in reflection S 11 = S 22 = −Q t /Q c and a minimum in transmission An optimally coupled resonator Q t = Q c displays near ideal behaviour, reducing the transmitted signal to zero at resonance. This is a good model of many devices, such as kinetic inductance detectors (KIDs), independent of the specific physical realisation [21].
In the case of a λ/2 'resonator', with two series coupling capacitors, connected in series with a through transmission line, the equivalent circuit takes the form of Figure 1 (b), and the scattering elements of the whole device become superconducting resonator superconducting resonator Figure 1: (a) the LCR tank represents the superconducting resonator, which could be a shorted quarter-wave superconducting transmission line [22,6], an open-ended half-wave line [23,24] or an implementation in discrete components [25,26]. The LCR tank is lightly capacitively shuntcoupled across the readout transmission line, giving a null in transmission (S 21 and S 12 ) on resonance. (b) the LCR tank circuit represents a superconducting half-wave line that is open at both ends. This is lightly capacitively coupled in series in the readout line, giving a maximum in transmission on resonance. Inductively coupled implementations of both designs are also possible [27,28]. and which displays a minimum in reflection S 11 = S 22 = 1 − Q t /Q c and maximum in transmission S 12 = −Q t /Q c at resonance, illustrating the duality of parallel-and series-resonant circuits.
In many devices, one seeks a resonant notch that approaches zero, or a resonant peak that approaches unity, and in both of these cases, the coupling quality factor must be chosen to dominate the losses, which limits the operating Q t to a value lower than that implied by Q i .

Non-linear behaviour
Non-linear behaviour manifests itself as variations in the {S mn } as the amplitude of the readout signal is changed. For example, swept-frequency measurements of complex-valued scattering parameters with a vector network analyzer (VNA) can lead to traces that vary with readout power. Numerous distorted and hysteretic resonance shapes can occur [29,30,31]. Here we explain many of the observed effects, and in particular consider the broad category of nonlinear behaviour that can be described as a dependence of the resonance frequency and/or quality factor on the power dissipated P d internally (as distinct from the total power flowing out of the resonator, P t , which also includes the coupling loss): ν r (P d ) and Q n (P d ) respectively. It is clear that the dissipated power can be calculated once the scattering parameters are known.
In some cases, such as heating, the dependence on P d is direct. However, it follows from (1)- (3) that U and {P n } can all be expressed in terms of P d provided the {Q n } are known, and so the resonant frequency and quality factor can be written in terms of P d even for mechanisms that do not involve heating directly. We will refer to changes in resonant frequency with dissipated power, ν r (P d ), as reactive non-linearities, as they are primarily caused by changes in the reactive elements of a resonator. This will be illustrated for specific cases later. Equivalently, we will refer to changes in quality factor with dissipated power, Q n (P d ), as dissipative non-linearities, as they are primarily caused by changes in the resistive elements of a resonator. In this context, we will make two assumptions: (i) The coupling quality factor exhibits no non-linear behaviour, which is true for most devices because the coupling is via a near-perfect capacitance, selfinductance, or mutual inductance. Modifying the forthcoming analysis to relax this assumption is not in itself difficult, but adds a significant algebraic overhead that distracts from the main results. (ii) The scattering parameters are described by the functional form given in (6), but with nonlinearity occurring through ν r (P d ) and Q n (P d ) under all conditions. Physically, this corresponds to the situation where the circuit topology remains constant, and it is only the component values that change with readout signal level. Within this framework, the values of {S mn } can be found for a given applied signal level through finding self consistent solutions to (6), and ν r (P d ) and Q n (P d ). Indeed it is this generic mechanism that creates, under different conditions, many of the physical phenomena seen.

Non-linearity in the Argand plane
A characteristic of linear resonant circuits is that the scattering parameters all trace out circular paths in the Argand plane as a function of frequency: only the centres and radii change with the circuit topology and circuit parameters. This behaviour occurs because expressions having the form of (6) constitute bilinear maps.
To illustrate this feature consider S 21 for a parallel resonant circuit in the linear regime, as shown by the blue (solid) lines in Figure 2. The left diagram shows the data in the Argand plane, while the right diagram shows the equivalent plots of transmission magnitude and phase as a function of the detuning in linewidths, y 0 , relative to the resonant frequency with infinitesimal readout power. From (10) we can derive and where (13) implies S 21 is constrained to lie on a circle, with C the centre. θ, as defined, is the angle  ] versus y 0 on the right. y 0 is the applied detuning in (minimum) linewidths relative to the resonant frequency at infinitesimal readout power. Blue (solid) lines show the ideal linear behaviour, with the circles indicating a set of evenly space frequency points. Green lines on the right show the behaviour when the non-linearity is purely reactive, with the diagonal crosses indicating a set of evenly spaced frequency points (same points indicated on the left). Solid lines show the curve measured sweeping down in frequency, while the dashed lines show the curve on sweeping up. Red (dashed) line shows the behaviour for a hypothetical, purely dissipative, non-linearity, with the horizontal crosses indicating a set of evenly spaced frequency points. Note that S 21 traces clockwise with increasing frequency. subtended by S 21 at C as measured anticlockwise from the real axis; (14) therefore describes the motion of S 21 around the circle as a function of frequency. The blue circles in Figure 2 indicate the value of S 21 at a set of evenly spaced frequency points spanning the resonance with S 21 moving clockwise around the circle as a function of frequency.
Non-linear behaviour can result in the resonance circle becoming distorted. First we note that for purely reactive nonlinear behaviour, with Q t invariant over a sweep, (13) still constrains S 21 to lie on a circle. The motion of S 21 around the circle with frequency may change, with the green diagonal crosses in Figure 2 indicating, for example, how the frequency points corresponding to the blue circles might move. Hysteresis with sweep direction may also be observed, and some points of the circle may even become inaccessible [29]. The radius and centre of the circle contain important information, even though the resonance curve is hysteretic. (13) shows that only nonlinear dissipative behaviour can distort S 21 from a circular path. For example, the red (dashed) lines in Figure 2 show hypothetical curves for a device where Q t decreases with dissipated power, causing the effective radius of the 'circle' to decrease closer to resonance. In fact, two characteristic circles seem to be present. In addition, (14) indicates that dissipative non-linearities can also influence the rate at which S 21 moves around the circle in the same way as reactive non-linearities. The preceding discussion applies equally well to any scattering parameter of any device described by (6). In what follows we will show that different dissipative processes produce characteristic distortions, making the shapes, radii, and centres of resonance 'circles' powerful diagnostics of underlying physical mechanisms.
3 Distortions in swept-frequency S-parameter measurements 3

.1 Origin of distortion
Consider an idealised model of a swept-frequency S-parameter measurement with a VNA or homodyne readout system [21]. The device under test is a two-port non-linear resonator of the type described in Section 2.2, with generalised S-parameters given by (9) and (10). Assume that all S-parameters and power-wave amplitudes are defined relative to reference impedance Z 0 .
A sinusoidal voltage source with frequency ν and real output impedance Z 0 is used to drive the resonator at port 1 and a load of impedance Z 0 is connected to port 2. Under these conditions, S 11 = b 1 /a 1 and S 21 = b 2 /a 1 are the scattering parameters referenced to Z 0 , and a 1 , b 1 and b 2 are the measured complex amplitudes of the incident travelling wave at port 1, outgoing wave at port 1 and outgoing wave at port 2, respectively. Assume that the source frequency is swept to measure S 11 (ν) and S 21 (ν) while keeping the readout power P r = |a 1 | 2 constant.
If the resonator is driven into a non-linear regime, the variation in the dissipated power with frequency will generally result in distortion of the measured data compared with (9) and (10).
Visually, we will record resonance curves that look like the red and green lines on the right in Arg(S 21 ) (deg) Figure 3: The black lines in each plot shows the measured, distorted, resonance trace. Each of the coloured curves indicate the behaviour of the 'underlying' single-pole resonance as the trace is swept out. Each of these curves is calculated assuming constant quality factor and resonant frequency equal to Q i (P d ) and ν r (P d ) of the resonator at the points indicated by the markers on the black lines. By definition, each coloured curve intersects the black line at the location of the corresponding marker. The second set of markers, lying purely on the coloured lines, have been included simply to allow each underlying curve to be matched to the corresponding intersection point more easily.
in our framework.
The dissipated power is the difference between the outgoing power at ports 1 and 2 and the incoming power at port 1, Using (9) and (10) to substitute for the S-parameters, we obtain where p = P d /P r is the normalised power dissipation, η = q −1 i = Q c /Q i is the normalised internal dissipation factor and y is the detuning in linewidths as defined in Section 2.2. (17) indicates that the dissipated power peaks sharply at 2η/(1+η) 2 as the source frequency is tuned through resonance, and falls to zero either side. In the same notation However, η and y are both functions of the dissipated power through their dependence on Q i and the resonant frequency, with y also dependent on the readout frequency. Since the incident readout power is fixed in a frequency sweep, we can alternatively express this as a dependence on normalised dissipated power: η(p) and y(ν, p), respectively. When the source frequency is changed to a new value, the dissipated power (and with it Q i and the resonant frequency) evolves to a new equilibrium. It follows from (17) that the normalised dissipated power in the final state, p 0 , must satisfy the condition at the readout frequency ν. The dynamical process by which the circuit moves to the equilibrium condition depends on the physical realisation, and an example has been described by Thompson [31]. In the subsequent discussion we will assume that ν is always swept slowly enough that (19) is satisfied at all points, for example that there are no thermal delays, and we will use the notation p, rather than p 0 , without confusion. Note that there may be multiple solutions of (19), in which case hysteretic behaviour can occur.
The proceeding discussion indicates how quality factor and resonant frequency can become functions of the measurement frequency, giving distorted resonance curves of the kind shown in Figure 2. There is a simple visualisation of the process: Figure 3. At each measurement frequency, the circuit has a simple Lorentzian resonance, and the measurement simply samples one point on this resonance. If the measurement frequency is changed, the underlying resonance curve changes, giving rise to a new sample taken from a new Lorentzian. Thus the observed shape is merely a manisfistation of the fact that a simple underlying Lorentzian is sweeping through the sample points taken: the underlying curve being swept out, as defined by (9) and (10), changes as we proceed through the swept-frequency measurement process. Crucially, the origins of the distortions lie in translations and rescalings of the underlying linear resonance, and this puts constraints on the observed behaviour. In fact, certain features of the linear resonant behaviour carry over to even highly distorted curves, as we will now show.
This model emphasises why the experimenter does not usually have direct control over the detuning x as given by (7): they can set ν, but in the presence of reactive non-linearities they may not know ν r . We will refer to x throughout as the 'realised' detuning at a particular frequency. It is x that is used in (9)-(12) to calculate S, and which determines the underlying resonance curve at a point as illustrated in Figure 3. However, it is still often useful to express a readout frequency as a detuning. To do so we can use the limiting value ν r,0 of the readout frequency at zero (or sufficiently low) readout power as our reference frequency. Accordingly, we define the 'applied' detuning x 0 as The concepts of applied and realised detuning will prove particularly useful in the next section.

Point of zero realised detuning
The point of zero realised detuning, x = y = 0, occurs when the measurement frequency is  (9) and (10), equivalent conditions can be determined that will depend on Γ mn and K mn in (6).
In the case of a non-linear resonator, we must look for the point in the sweep where y(ν, p) = 0. Here, the readout frequency is equal to the resonant frequency of the underlying resonance.
We will now show that aspects of the conditions (i)-(iii) carry over to distorted, and even hysteretic, cases. Again we will assume the measurement arrangement of Section 3.1, and that the S-parameters of the device under test are given by (9) and (10). The same methods can be applied to other types of device to derive equivalent conditions.
Consider the phase-shift on transmission through the non-linear resonator, as given by the argument of S 21 . The distorted curve is generated from (10)  A possible source of confusion occurs experimentally when a device exhibits switching. For example, the green dashed curve in Figure 2 appears to pass through zero near y 0 ≈ −1.2, but in actual fact the device is merely changing state, and the response is discontinuous: y = 0. In practice, it should be easy to identify such cases because they coincide with similar discontinuities in R and T .
To determine the stationary points of R and T for a non-linear resonator, we must calculate their derivatives with respect to the readout frequency. It follows from (16) and (18) that and Taking the total derivatives of (21) and (22) with respect to ν and then using the chain rule we dη dp dp dν (23) and where we have suppressed the dependence of η on p in the notation for convenience.
By taking the total derivative of (19) with respect to ν, we can obtain the follow condition involving dp/dv dp However, it also follows by partial differentiation that Using (26) to substitute for dy/dν in (25) and then solving the resulting equation for dp/dv, we obtain dp dν = − 4p 2 κy η ∂y ∂ν p where According to (23), (24) and (27), the derivatives can therefore be written as and (29) and (30) indicate that R and T are stationary with respect to the sweep frequency at the point of zero-detuning of a non-linear resonator y = 0, as for a linear device.
To evaluate the nature of the stationary point in each case we need to take a further derivative and evaluate the result at y = 0. Differentiating (29) using the chain rule, discarding terms proportional to y and noting that dp/dv = 0 at y = 0, we obtain (31) and (32) indicate that R is still minimised and T is maximised at zero realised detuning provided the content of each square bracket is positive. Violation of the latter conditions requires non-linear dissipation, because dη/dp would need to be significantly different from zero. (29) and (30) also show that R and T can also be stationary if the contents of the square bracket in each expression are zero. Unlike for a linear resonator, we can therefore no longer automatically assume that any stationary point in R and T is a point of zero realised detuning.
However, notice that the contents of the square bracket can only be zero for one or other of (29) and (30) at any time. Therefore if R and T are stationary simultaneously, or the phase is also zero, we can still identify the point as corresponding to zero realised detuning.
Being able to identify the point of zero realised detuning using the conditions above is particularly convenient for parameter extraction, even under highly nonlinear conditions. Most obviously, we know that if the point of zero realised detuning is at measurement frequency ν, However, it follows from (17) and (18) with y = 0 that we can also calculate Q i (P d ) and P d from the S-parameters at the zero realised detuning point using and Thus the internal quality factor and dissipated power follow from measurements of the scattering parameters at the point of zero realised detuning, which are real, even for a nonlinear device.
This technique can be used to great effect (Section 8).

Other stationary points
It is instructive to consider the other cases where R and T can be stationary with frequency, as these might, potentially, be confused experimentally with the case y = 0. For both R and T , the only other circumstance when this can occur is when the contents of the square brackets in (23) and (24) are zero. In the case of R, this requires 2p η dη dp which corresponds to the situation where the change in p/η due to the change in readout frequency is cancelled out by the corresponding change in η due to nonlinear behaviour. It is straightforward to show that for a simple power model given by η = αp n , (36) can only be satisfied if n = 1. Furthermore, when n = 1 the condition is actually satisfied for all p, and so R becomes independent of readout frequency. This behaviour would be easily distinguished from the case where y = 0.
Similarly, in the case of T we require For the power law model used above, this condition can be satisfied at a spot power provided n > 2. However, the case where n > 2 is a very strong nonlinearity, which we will see in Section 6.5 produces a high level of distortion of the resonance shape. As a result, it is unlikely we would confuse a stationary point resulting from this effect with one resulting from realising zero detuning.

Kinetic inductance
To this point the analysis has been general, making no assumptions about the origins of the physical mechanisms that cause the resonant frequency and line width to depend on readout power, and perhaps other variables such as temperature. In superconducting films, kinetic inductance introduces a reactive nonlinearity. Kinetic inductance is the circuit-theoretic representation of energy stored in the inertial motion of Cooper pairs. It has the beneficial effect that distributed resonators based on superconducting films are physically smaller than resonators based on normal metals. However, for large currents I, the kinetic inductance is nonlinear: where I * 1 and I * 2 are scaling currents. This nonlinearity can be used to create superconducting devices, such as travelling wave parametric amplifiers [32], but in the context of resonators, it leads to a redistribution of frequency points on the resonance circle, as shown by the green crosses in the left plot of Figure 2, and can cause hysteretic switching, as shown in the right plot.
Strictly, the inclusion of nonlinear inductance leads to complicated periodic forms for the voltage, current and inductance, but using the expression ν r = (LC) −1/2 ; keeping only the quadratic term in (39); concentrating on those spectral components that are at the same frequency as the readout tone; and using the stored energy as a proxy for the square of the average  [29]. Blue (solid) line shows the solution of (42) for a = 5 and the dotted line y = y 0 for comparison. Red arrows indicate the trajectory of the resonator in the (y 0 , y)-plane when y 0 is swept in the negative direction from a large, positive, starting value. Dashed-green arrows show the opposing case where y 0 is instead swept in the positive direction from a large negative value. Lines with matching format in the inset show the variation in |S 21 | with y 0 in each case where ν r,0 is the resonance frequency in the low-energy limit, and U c,kin scales the size of the nonlinear effect. Swenson's model [29] assumes that the resonant frequency decreases linearly with stored energy U , and has been found experimentally to provide a good description of certain non-linear reactive behaviour in superconducting resonators [29,33,34]. The internal quality Q i is, according to the model, constant, and so the system has only reactive nonlinearity.
Substituting (40) into (7) and then Taylor expanding assuming small U/U c,kin , we find the detuning y, as defined by (8), becomes where y 0 = Q c x 0 is the applied detuning relative to ν r,0 , as defined by (20).
Not only does kinetic inductance redistribute the frequency points on the resonance circle, it can cause hysteretic switching. One consequence is that a point of zero detuning may not be found anywhere during a swept frequency measurement. We can illustrate this effect as follows.
Using (17) to substitute for p, we obtain where a = Q 3 t P r /πν r,0 Q c U c,kin is Swenson's non-linearity parameter in our notation. Note that y as defined in Swenson's paper [29] corresponds to y/(1+η) in our formulation, as they measure linewidths relative to Q t rather than Q c . For given applied detuning y 0 , (42) can be solved to find the realised detuning y and entered into (9) and (10).
For values of a > 4 √ 3/9 (Appendix A), y is not monotonic in y 0 and different resonant curves are obtained depending on whether the measurement frequency is swept up or down. This is illustrated for a = 5 in Figure 4, which reproduces part of Figure 2 from [29]. When the readout frequency is swept up, the resonator follows the trajectory in the (y 0 , y)-plane indicated by the dotted-green arrows. Critically, at y 0 = (1 + η)u + 0 the value of y jumps discontinuously from (1 + η)u + to some higher value. Similarly, when the readout frequency is swept down it follows the trajectory shown by the solid-red arrows and y jumps discontinously from u − to some smaller value at y 0 = y − 0 . The inset of Figure 4 shows the corresponding curves of |S 21 | versus y 0 /(1 + η).
It is possible for y to skip through the point of zero-detuning in one of these jumps; whether it does so depends on the values of u + and u − , as well as the value of u afterwards. It can be seen from Figure 4 that the jump points correspond to stationary points of y 0 as a function of y. Taking the derivative of (42) with respect to y and then setting dy/dy 0 equal to zero, we find u + and u − must satisfy Since the non-linearity parameter is always positive, (43) implies that u + and u − are both always negative. Viewing Figure 4 from the perspective of y as a function of y 0 , it is apparent that y is always guaranteed to pass through zero detuning (y = 0) on a downward sweep from well above resonance: extrema then appear in the magnitudes of the scattering parameters. However, on an upward sweep, the resonator may jump to a positive or negative value of detuning, depending on the precise shape, and extrema will only appear in the former case. Note that if instead the resonant frequency increases with stored energy (e.g. as observed in the higher temperature data in [24]), this behaviour would be reversed. The shape of measured hysteretic resonance curves therefore change in specific ways, revealing key information about the underlying nonlinearities.
Effects of this kind are seen routinely in experimental resonance curves. Some further useful results concerning the locations of the switching points are described in Appendix A.

Two-level systems
In low-temperature superconducting resonators, two mechanisms are found to be dominant sources of dissipative nonlinearity. The first relates to the presence of Two Level Systems (TLS) in deposited bulk and unintended surface oxides (such as SiO 2 ), and the second relates to heating and pair breaking in the films that make up the transmission lines.
TLSs occur in amorphous insulating materials where local configurational changes in the atoms that make up the material lead to changes in stored mechanical energy. According to the low-temperature TLS model, a system can tunnel between one configurational state and another, introducing a new low-energy degree of freedom into the dynamical behaviour [35,36,37].
The TLS model has been highly effective at describing the low temperature behaviour of heat capacity, sound speed, and acoustic attenuation. If, additionally, the TLSs have an electric dipole moment, they can contribute significantly to the electromagnetic properties, leading to an enhanced dielectric constant, which may have a dissipative part due to energy being carried away by elastic waves. TLSs have proven extremely successful at explaining empirical data for detuning, loss and noise in thin-film superconducting resonators [38,39,40,41,42].
In most practical devices, the exact nature and locations of the TLSs are not known, and it is usual to imagine some density of TLSs having an assumed energy distribution. Detailed theoretical models exist for the real and imaginary parts of the dielectric constant, but for our purposes the important features are as follows: (i) The dielectric constant has two parts, one of which is due to the coupling of the TLSs to the phonon system, which acts as a thermalising reservoir, and the other is caused by resonant transitions between TLS states. (ii) The first relaxation process gives a complex dielectric constant that is independent of field strength, and leads to damped linear-resonator behaviour. (iii) The second resonant process has a real part that depends only weakly on field strength, giving a weak reactive nonlinearity, and a lossy imaginary part that depends strongly on field strength, giving a strong dissipative nonlinearity.
For a sufficiently strong field, the resonant energy states can be driven to have equal populations, and the losses become zero. For parameterisation, it is sufficient to know that where U is the energy stored in the electric field, U c,tls characterises the energy at which the TLSs saturate, and Q −1 tls,min characterises the maximum power loss. This expression should be compared with the functional form in (40), where U c,kin characterises the energy at which nonlinear inductance starts to become significant. In resonators of practical importance U c,kin > U c,tls , allowing for some intermediate range of readout power where linear resonator behaviour can be found. This is usually regarded as the 'sweet spot', U c,kin > U > U c,tls , for device operation.
In (44) U can be replaced by either the internal resonator power P int = 2πν r U (different to P r or P d ) [40] or the square |E| 2 of some measure E of the electric field strength in the capacitive part of the resonator [43]. All three forms of (44) are equivalent, but we choose to work with U because it can be defined in a geometry independent manner, with all details of the design of the device absorbed into U c,tls .
Consider a resonator where TLSs are the only source of nonlinearity. Using the definition of the internal quality factor, and (17), it can be shown that the total energy U stored in the resonator is where P r is the incident readout power. This expression is true for both the series and shunt single-pole resonant circuits. It is immediately obvious from (44) and (46) that Q tls depends on P r . However, the functional form of the relationship is difficult to obtain. (46) cannot simply be used to calculate U as an input to (44), as Q t is itself a function of Q tls . The two equations must instead be solved as a pair of nonlinear simultaneous equations.
Experimental studies to verify (44) have avoided this difficulty by exploiting the fact that the value of Q t measured to calculate Q tls can be used to convert P r to U (or actually usually P int ) [40,43]. However, there are many situations where is it valuable to calculate Q t as a function of P r , for example when explaining experimental data directly or when designing a device. To our knowledge this problem has not been addressed in the literature, so we will do so in the next section.

Large signal model and numerical solution
Assume that the nonlinear behaviour of TLSs only affects the dissipative response of the resonator, so the detuning x is fixed. Using (44), the total resonator quality factor is where u = U/U c,tls and Q −1 i,max represents any other sources of internal loss that ultimately limit the achievable quality factor. It is convenient to rewrite (47) in the form where are, respectively, the smallest and largest values Q t can take, r = Q t,min /Q tls,min = (Q t,max − Q t,min )/Q t,max , and measures the state of the TLS system under applied power. 0 ≤ r, α ≤ 1 by definition. α = 0 and 1 correspond to the limits where the TLSs are fully unsaturated and saturated, respectively.
To determine the steady-state behaviour, we must solve for α at the readout power level given known x, r, ν r , Q c and Q t,min .
Substituting (46) into (49), we find that the determination of α can be posed as the fixedpoint problem where is a dimensionless nonlinearity parameter, is a scale power and is the quantity normally referred to as the detuning efficiency [21].
By definition, ξ tls ≥ 0 and 0 ≤ χ ≤ 1. The advantage of putting the problem in this form is that certain fixed-point theorems can be applied to its solution. A full discussion is given in Appendix B, but the key results can be summarised as follows. First, we can show that (50) always has one unique solution satisfying the physical constraint 0 < α < 1, which precludes the existence of hysteretic effects due to the action of TLS alone. Second, we can show that the iterative sequence defined by always converges to this solution in the limit n → ∞ provided the sequence is started from a 0 = 0 + . this striking behaviour in a number of our own devices having high levels of TLS loss.

Simulated behaviour
This switch-on behaviour and associated distortion is illustrated in Figure 7, which shows calculated resonance curves for different values of ξ tls assuming the same device parameters as in Figure 6. The cases ξ tls = 0.1 (magenta line) and 10 3 (red line) correspond to the limits where the TLS are fully unsaturated and fully saturated, respectively (as can be seen from Figure 6).
In the case ξ tls = 0.1, the resonance curve is too shallow to be seen on the graph scales we have used. In these regimes the behaviour of the amplitude and phase as a function of frequency is indistinguishable from that of a linear device, as we will see explicitly when we consider the resonance curves in the Argand plane. worst case regime, the distortion in amplitude is relatively slight, although there is a stronger effect in the phase. Such distortion may still affect the fitting of (10) to experimental curves; in particular, we might expect a good fit to either the width or depth, but not both simultaneously.
The distortion of the resonance curve is most apparent in the Argand plane. Figure 8 shows calculated response in the Argand plane, using the same parameters and colours of Figure   7. The resonance curves form circles when the TLSs are either fully unsaturated or saturated shape is seen. The formation of this teardrop shape is a result of Q tls , and therefore the radius of the resonance circle (Section 2.5), decreasing rapidly as the measurement signal is tuned off resonance and the energy stored in the resonator decreases. We have seen this effect in many of our microstrip devices and Figure 9 shows typical measured data. In this case the device was a half-wave resonator and the microstrip with a 2 µm wide, 400 nm thick, Nb trace, 500 nm sputtered SiO 2 dielectric layer and a 150 nm thick Nb ground plane (T c ≈ 8 K). The measurements were taken at 110 mK.

Quasiparticle heating
In superconducting resonators, Ohmic dissipated readout power can have a marked effect on resonance curves, even when the readout frequency is well below the pair-breaking energy gap of the material. Multiple sequential photon absorption events, starting with a thermal population, can pump the quasiparticle system into a highly non-equilibrium state, which loses energy to the phonon system of the underlying material. The application of readout power effects both the energy distribution of the quasiparticles and their number density. The complex processes by which the quasiparticle and phonon energy spectra are modified in the presence of subgap photons have been studied at the microscopic level by Goldie [44], and the predictions have subsequently been found to be in excellent agreement with experimental results [24]. In the context of resonator dynamics, a key observation is that the consequential macroscopic behaviour can be described by a reduced model where the quasiparticles are ascribed an effective temperature above their physical temperature. The power dissipated by the readout signal effectively heats the quasiparticles [30], and an equilibrium state is formed when the heating power is balanced by the cooling power flow to the phonons [44,45]. This electrothermal model has been used to account for both large-signal [30,31] and small-signal [46] device behaviour.
Here we introduce an alternate, but equivalent, macroscopic model based on the Rothwarf-Taylor equations [47], which replaces the effective temperature with the total quasiparticle number. We will show this model is closely related to the electrothermal model, but is advantageous for our application because it allows approximate forms for Q i as a function of P r to be obtained easily for comparison with experimental results.

Description of the model
Our primary aim is to calculate how the internal quality factor Q qp varies with applied readout power. In the limit where the operating temperature is well below the critical temperature of the superconductor T c (usually taken as T /T c < 0.1), and the resonant frequency is well below the pair-breaking frequency, Mattis-Bardeen theory predicts Q qp to be inversely proportional to the number density of quasiparticles n qp in the active part of the resonator (see Appendix C for proof, also noted by McCarrick [33]). For our purposes, it will be convenient to express this relationship in the form The scaling factor n * absorbs the effects of temperature, frequency and resonator geometry, and can be recognised as the quasiparticle density at which Q qp = Q c . Choosing Q c as the characteristic scale for Q qp will be advantageous later when we consider how the actual power dissipated in the resonator relates to the applied readout power P r .
The 'active part' of the resonator in this context is determined by the current distribution.
By definition, (3), Q qp is inversely proportional to the total Ohmic power dissipation in certain volumes, V 1 , V 2 , . . . V N , of the superconducting device. In the temperature-frequency range of interest, the resistivity of a superconductor is small and approximately proportional to the local quasiparticle density (Appendix C). Hence we expect where J is the local induced current density and V i . . . dτ denotes the volume integral over V i .
(57) indicates Q qp will be predominantly determined by n qp in the region of highest current density; for example, nearest the shorted end of a quarter-wave resonator. Similarly, most of the power will be dissipated in the same region. Consequently, it is sufficient to only consider the evolution of n qp in high-current regions when determining Q qp to first order.
To determine how n qp depends on P r , our starting point is the Rothwarf-Taylor equations [47] in the form n qp is the number density of quasiparticles in the active volume of the resonator, n ω is the number density of pair-breaking phonons in the same volume, and n ω,th is the value of n ω in thermal equilibrium, Γ r = 0 (no forcing). τ pb is the pair-breaking time, R 0 is the quasiparticle recombination rate and τ l is the lifetime of a pair-breaking phonon in the absence of interactions with the quasiparticle system. Γ r is the rate at which pair-breaking phonons are generated by the readout signal.
We are interested in the steady-state behaviour, and so we set ∂ t n qp and ∂ t n ω equal to zero.
(59) can then be used to eliminate n ω in (58), and we find the steady-state value of n qp must satisfy R 0 n 2 qp = 2τ l τ pb Γ r + 1 τ l n ω,th .
A further simplification is possible if recognise that n qp must be equal to the expected value thermal n qp,th when Γ r = 0, so 2n ω,th /τ l = R 0 n 2 qp,th . (60) can therefore be re-expressed as where it has been shown that where T , ∆ and N 0 are respectively the temperature, gap energy and single spin density of states at the Fermi surface of the superconductor [46].
As of yet we have not said anything about how the generation rate is related to quasiparticle number density and readout power. As a first approximation, we assume where P qp is the total power dissipated in the quasiparticle system, V is the volume of the active part of the resonator and r is a generation efficiency.
To relate P qp to the applied readout power P r , we must consider both the effects of the resonator circuit and the division of power between the different loss mechanisms. Let where Q other collects together all other internal losses in the resonator. By definition, where P d is the total power dissipated in the resonator given by Due to the way in which experimental data is often taken, we assume x = 0 in the subsequent analysis. However, it is also straightforward to apply the model for finite x and to also account for distortion caused by the resonant frequency changing with n qp (δx ∝ n −1 qp ), but we shall not do so here.
Combining (63)-(66), (56) can be used to rewrite (67) in terms of quasiparticle number densities instead of quality factors. Doing so, and substituting the result into (61), yields which must be solved to find n qp in equilibrium. (68) can be rearranged into a quartic equation in n qp , and must generally be solved numerically, as will be discussed in subsequent sections.
However, first consider the relationship between this model and previous models of quasiparticle heating in superconducting resonators.

Relation to effective temperature models
(60) suggests that the dynamics of n qp near equilibrium can be described by the rate equation with the implication being that recombination dominates the loss mechanisms. For operating temperature T 0 well below the critical temperature of the superconductor and small enough n qp , the total energy U qp of the quasiparticle system is ≈ n qp V ∆: see Thomas [46]. Further, let us use (62) to assign an effective temperature T qp to the quasiparticles which makes the expected thermal value equal to the nonequilibrium value n qp . Multiplying (69) through by V ∆ and using (62) to replace n qp and n qp,th with expressions in terms of effective temperatures results in the energy balance equation for P in = 2τ l Γ r ∆/τ pb and κ 0 = 2R 0 n 0 V 2πk b ∆ 2 . (70) reproduces the effective temperature and superconducting cooling curve model developed in the series of papers [30,44,31,45,46].
The model introduced in this paper can therefore be viewed as a reformulation of the existing microscopic electrothermal model, but the approach taken here is favoured because it simplifies some of the subsequent mathematics.
It is interesting also to compare the model proposed here with that from Section 5.6.4 of Zmuidzinas [21]. His model is based on the empirical observation that the quasiparticle relaxation time τ saturates at τ max as T /T c is reduced. Given an assumed dependence Zmuidzinas derives, in our notation, a total generation rate where n τ τ max = 1/R 0 . This differs from the total generation rate in (69) by the term linear in n qp , so we expect the models to diverge in the regime n qp ≈ n qp,th . Since we will be mainly concerned with the regime where n qp n qp,th , we will not dwell on this difference. However, in Section 5.4 will show that in our model n qp,th limits at n qp,min as the temperature is reduced, as a result of readout power heating. This gives rise to the behaviour described by (71), without the need to impose a limited relaxation time.

Full solution
for normalised variables n = n qp /n * and q qp,th = Q qp,th /Q c , where Q qp,th = n * Q c /n qp,th is the quality factor expected from thermal quasiparticles alone and is a scaling power. (73) can be solved numerically using a root-finding algorithm and selecting for the roots that satisfy the physical requirements that n must be real and greater than or equal to zero. In all the simulations described here, this procedure yielded a single solution. either Q qp,th or Q other , with no apparent difference in the shape of the curve depending on the source of the limiting value. In the sections that follow we will derive simplified forms for Q qp as a function of applied readout power in a number of relevant cases.

Behaviour of an over-coupled device
n qp,min = 4ητ l P r τ pb n * V R 0 (77) satisfies the physical requirement that n qp ≥ 0.
(76) indicates that n qp will not decrease indefinitely as device temperature is reduced in the presence of a readout signal. Instead it reaches a minimum value n qp,min corresponding to an excess population maintained by readout power dissipated in the device. At first this may seem counter-intuitive; if the losses are decreased to zero, where does the dissipated power to both initiate and then maintain this population come from? The answer is the influence of the resonator circuit. The right-hand side of (75) indicates the electrical behaviour of the resonator provides positive feedback in the overcoupled-limit: a small increase in n qp produces an increase in generation rate, tending to further increase n qp . Consequently, the generation of a few quasiparticles -for example, by a noise process or optical event-is sufficient to start the process. The loss from this process then provides sufficient dissipated power to sustain the population.

Behaviour of an under-coupled device
A device is over-coupled if Q i Q c . If quasiparticles again dominate the internal loss in the resonator, (56) now requires n qp n * and (68) can be approximated by The quasiparticle term on the right-hand side is inverted compared with the over-coupled case, (75), and so the resonator power provides negative feedback: an increase in n qp reduces the rate at which quasiparticles are generated.
(78) can be rearranged into a cubic equation and an analytic solution is possible, however we will make a further simplification. We will assume n qp n qp,th , which would correspond experimentally to the case where the measured Q i is much smaller than would be predicted on the basis of an assumed thermal population of quasiparticles. We therefore approximate n 2 qp − n 2 qp,th ≈ n 2 qp , in which case (78) yields Substituting (79) and P r = GP VNA into (56) and taking the logarithm of the result, we obtain the prediction which may be readily compared with experimental data. represented by the green triangles. The measurements were taken at 100 mK using the method described in Section 8. VNA power is a proportional measure of the readout power P r applied at the device.

Comparison with experiment
The device with the lower value of Q c (green triangles) is under-coupled at even the lowest readout powers and should, therefore, be in the regime discussed in Section 5.5. The dashed red line in Figure 11 shows a fit of the straight-line model (80) to the corresponding data with the intercept as a free variable. The model can indeed be seen to provide a good account of the behaviour of q i with readout power. As an additional test, we also attempted fitting both the gradient and intercept simultaneously using linear regression. This gave a value for the reciprocal of the gradient of 26±0.3 dBm, which is close to but slightly below the value 30 dBm in (80). However, this is consistent with the fact the gradient of 1/30 dBm −1 is the limit for very high powers and that the actual gradient approaches it from above, as shown in Figure 10. The discussion above already indicates the quasiparticle heating model gives a reasonable account of the behaviour of each device individually. However, we can go further and relate the values measured between devices. The behaviour shown in Figure 10 is universal, therefore any horizontal offset between the two sets of points in Figure 11 should result purely from the difference in the scaling powers P c,qp of the devices. Given they are of similar design and composition, (74) indicates the ratio of the scaling powers should be proportional to the ratio of n 2 * for the two devices. However, by definition n * is inversely proportional to Q c under the same conditions. Consequently, given the measured values of Q c we should expect P c,tls for the strongly coupled device (blue circles) to be approximately fifteen times that for the more weakly coupled device (green triangles). In turn, this translates into a predicted shift of 12 dB between the two datasets in Figure 11 at similar values of Q i /Q c . As can be seen, this is a very good description of what is actually observed.
6 Power law models 6

.1 Model and method of solution
In previous sections, we considered the effects of TLS and quasiparticle heating. For these specific mechanisms we are able to calculate the functional form of the quality factor with respect to dissipated power, and explain what was seen experimentally. Often, however, we are in the converse situation: we have measured a set of distorted resonance curves and we would like to determine, or at least infer, the functional form of the underlying physical process. In this section, we will describe a power-law model, which helps to develop an intuition for functional forms that produce specific shapes in distorted resonance circles.
Let Q i be decomposed into a power-independent contribution Q other and a contribution Q nl from nonlinear dissipative processes, where Now assume that Q nl has a simple power-law form, dependent on the power P nl dissipated by the nonlinear process: where n and d are positive integers (meaning that the exponent is always a rational number) and P c,nl is a parameter that determines the readout power level at which any nonlinear behaviour is seen. In physical terms, this model describes a process where the dissipation increases with dissipated power; heating, for example.
We have assumed that Q nl depends directly on P nl rather than the total power dissipated in the device, P d , because this condition is expected to be more reflective of real processes. For example, consider the quasiparticle heating model of Section 5. In this case, sub-gap readout photons are able to indirectly break pairs because the power P qp dissipated in the quasiparticle system is reprocessed into pair-breaking phonons. Breaking pairs increases the quasiparticle number, which in turn increases dissipation and decreases Q qp ; hence Q qp decreases with P nl .
However, we would not expect power dissipated in the dielectric or elsewhere to have the same effect (at least in the absence of significant heating). Thus the correct dependence is Q qp (P qp ) in this case, not Q qp (P d ). A counter example, would be if a device is poorly thermally anchored to its refrigerator, and then all of the dissipated power would lead to a change in temperature, and loss.
The value of P nl at a given value of y and P r can be found as follows. From (3) it follows that P nl is related to the total power dissipated in the resonator by P nl = Q i P d /Q nl , and so using (17) and the notation of previous sections, In the steady state, P nl must satisfy (83) for Q i given by (81) and (82). This condition can be expressed as the fixed point problem where ρ = P nl /P c,nl , ρ r = P r /P c,nl and which provides a way of calculating a set of possible values of P nl .
Although we could solve (84) by iteration, as in Section 4.1, here there is a better alternative.
The condition x = h(x) can be rearranged into the form where κ = x 1/d . It can now be seen that the fixed points of h(x) correspond to the nth powers of the roots of the polynomial in κ on the left-hand side of (86). As a result the full set of fixed points can be quickly found using a polynomial root-finding algorithm, which are common in mathematical software packages. It also follows that h(x) has at most 2n + d unique fixed points.
Given the set of fixed points, how can we determine which corresponds to the realised value of ρ? As a first step, fixed points that correspond to unphysical solutions can be eliminated: as a normalised power, ρ must be purely real and greater than or equal to zero. If multiple possibilities remain, which fixed point is realised at the operating point will depend on the stability of the corresponding state and the history of the device. Unstable states will not be realised in practice. If multiple stable states remain, then how the device has been prepared becomes important. For example, when a parameter is being swept, each time it changes the resonator will tend to move to which ever of the new states is closest to its previous state with respect to P nl .
Normally the stability of a state would be assessed in relation to some potential equation in the underlying physical model. This is not possible here, and so we adopt as our stability condition the requirement that the iterative sequence x n+1 = h(x n ) started near enough the fixed point x = x 0 will converge to x 0 as n tends to infinity. The physical motivation is that the iterative process mirrors how the resonator will move to the new operating point when a parameter is changed, or, perhaps more importantly, how it will move back to the state if perturbed from it. The only difference is that, in reality, the process is continuous and limited by the dynamical times of the resonator.
The stability condition is equivalent to requiring |h (x)| < 1 for x 0 − δ − < x < x 0 + δ + for some δ − and δ + > 0, where x 0 is the fixed point and h (x) = dh/dx. As a result, it is impossible for the fixed point to correspond to a stable solution if |h (x 0 )| ≥ 1. Differentiating (85), it is x 0 = 0 and n > d We see that ρ = 0 is never a stable state for finite ρ r if n < d. As far as we can tell, h(x) is a relatively well behaved function for n ≥ d, so we make the assumption it is sufficiently smooth that if |h (x 0 )| < 1 we can also find a small region around x 0 for which |h (x)| is also < 1. Hence the stability conditions become: i) If x 0 = 0, stability requires ii) If x 0 = 0 and n > d then x 0 always corresponds to a stable state. iii) If x 0 = 0 and n = d, then for stability requires iv) If x 0 = 0 and n < d, the corresponding state is always unstable.
Finally, it is useful to consider the limiting behaviour of the model when y = 0 and ρ r → ∞.
This is relevant to measurements of resonance depth as a function of applied readout power. In Q nl ∝ P −1/2 nl (b) Behaviour for n = 1, d = 2 and q other = 10 8 . ρ r = 1/2 4 = 0.0625 for the orange line and quadruples between lines as the circles get smaller, terminating in ρ r = 8 for the yellow line. Note that the range of values of ρ r shown is larger than in (a), i.e. the circle shrinks less rapidly as the applied power is increased. Figure 12: Distorted resonance circles calculated using the model of Section 6.1. In both plots the dashed black line shows a circle of radius 0.5 centered on S 21 = 0.5, which would be the expected behaviour of a highly over-coupled device (q i >> 1).
that we can make the approximation in which case The resulting expression for the depth of the resonance is which is a simple power law. Figures 12a and 12b show simulated resonance 'circles' in the Argand plane resulting from frequency sweeps at different readout power levels, for n/d = 1/3 and n/d = 1/2 with q other = 10 8 . These illustrate typical behaviour when n/d < 1. In both cases, the size of the resonance circle is observed to decrease with applied readout power. At high powers the trajectory becomes distinctly non-circular and it is evident that it would not be possible to fit a single-pole model with fixed q i to the data. Decreasing d is observed to have two effects. First, we see that the rate at which the size of the circle shrinks increases; in Figure 12a the difference in ρ r between neighbouring lines is a factor of four, while in Figure 12b it is only a factor of two. This is consistent with (92). Second, the circle is seen to become more asymmetric. Finally, we draw attention to the fact that at high powers the radius of the circle is reduced at even high values of y. This is a result of the fact the solution ρ = 0 is always unstable for n/d1. As we will see shortly, the behaviour is very different when n/d > 1.

Power law exponent equal to one
In the case n = d, yielding Q nl ∝ P nl , the model has an analytic solution. (86) reduces to the cubic equation with up to three unique solutions. As factored it can be immediately seen that one solution is ρ = 0. The other two solutions, ρ = ρ + and ρ = ρ − , follow by solving the quadratic equation that results when the contents of the parentheses is set equal to zero, yielding Of the three solutions, only ρ = 0 and ρ = ρ + correspond to possible physical states as ρ − is negative for all ρ r and y. Further, ρ + is only positive if ρ r is greater than a threshold power ρ t , where Following the stability analysis of the previous section, (89), it is straightforward to show that ρ t also corresponds to the power threshold for ρ r at which the solution ρ = 0 transitions from being stable state to an unstable state. Hence we might expect ρ = 0 for ρ r ≤ ρ r and ρ = ρ + for ρ r > ρ r . However, strictly we should also check ρ = ρ + corresponds to a stable state, as the resonator may simply become unstable above the threshold power. This requires we demonstrate (88) is always true for ρ = ρ + when ρ + > 0. Applying the triangle inequality to the numerator on the left-hand side of (88) gives The condition ρ + > 0 can be rearranged to show (1 + q −1 other ) 2ρ r − (2y) 2 ≥ (1 + q −1 other ) 2 , which when applied to (96) and (97) implies However, ρ + > 0 also implies (2y) 2 − (1 + q −1 other ) 2 > 2ρ r , so we have succeeded in showing ensuring (88) is true and therefore that ρ = ρ + is stable state for ρ + > 0. Hence, in conclusion we find (100) completely determines how the steady-state behaviour of the resonator changes in response to readout power. Consider the trajectory of S 21 in the Argand plane as a function of y; (100) is used to calculate q t , then the result is substituted into (10). After some rearrangement, it can be be shown that S 21 satisfies below threshold and above it. (101) and (102) both describe circular paths in the Argand plane, as illustrated in Figure 13. The red (solid) circle shows the curve described by (101), for q −1 other = 0 in this case, which is simply the resonance circle that would be traced out by a purely linear device. The green (dashed) circles show the circles described by (102) for different values of ρ r . These are centred on S 21 = 1 and have radius 1/ √ 2ρ r . Figure 13 can be used to understand the trajectory of S 21 of the resonator as y is swept from −∞ to +∞. A device that is below threshold for all y, i.e 2ρ r ≤ (1 + q −1 other ) 2 , will trace out the red circle clockwise, starting at ending at S 21 = 1. If 2ρ r > (1 + q −1 other ) 2 , the resonator will be above threshold for at least some values of y. However, it must start below threshold and so S 21 begins on the green circle, moving clockwise from S 21 = 1. It will continue along the red (solid) circle until the intersection with the circle for the above threshold solution for ρ r ; at this point 2ρ r = (1 + q −1 other ) 2 + (2y) 2 . A further increase in y moves the device above threshold, so S 21 starts to move clockwise around the green (dashed) circle. This gives rise to a sharp point of inflection in the path. S 21 will continue along the green (dashed) circle until it intersects the red (solid) circle again, at which point it drops below threshold again and traces the red (solid) path back to S 21 = 1 at y = ∞. The blue (thick solid) line illustrates the overall path for ρ r = 2, illustrating the characteristic distortion pattern associated with the model.
The analysis above can also be linked back to earlier results. Using (102) and (10), above threshold we have i.e. R = |S 11 | 2 is maintained at a fixed value by feedback. This is exactly as was predicted in Section 3.3.

Experimental observations
We have observed the remarkable behaviour described in Section 6.3 in many resonators. One such device is the resonator with the higher Q c out of the two NbN devices described previously, in Section 5.6. circles; in the data this transition is softer than the model predicts. By using the full power law model we found that this behaviour can be reproduced by using a value of n/d close to but slightly less than one.   device. Given the model, we would expect At a fit of this model to the data, allowing P c,nl to vary, is shown by the blue line in Figure   14b. The agreement between model and data is again very good. However, if anything, the gradient of the data is slightly shallower than the model would predict. This would suggest a value of n/d slightly less than one, which is consistent with the observations of the shape of the resonance 'circles'. As described in Section 5.6, there is strong evidence the underlying physical mechanism is quasiparticle heating in this case. However, it has also been shown that superconducting weak links can play a role in nonlinear behaviour in NbN resonators [48].

Power law exponent greater than one
The behaviour for n/d > 1 is significantly different and much more complicated than the other cases, as illustrated by the plots in Figure 15. These plots show a set of simulated curves for different ρ r for the case n = 3 and d = 2. Figure 15a shows the measured resonance curves in the Argand plane, while Figure 15b shows the measured amplitude of S 21 as a function of the applied detuning.
Below ρ r ≈ 1.1, ρ = 0 is the only solution. For ρ r = 1.25 (the orange line in Figure 15a), we see the formation of a feature near y = 0. When this feature is viewed on a plot of amplitude versus detuning, it appears as a small peak in S 21 at the bottom of the resonance trough ( Figure   15b). As ρ r is increased further this feature opens out and folds back on itself, leading to shapes reminiscent of those for the case n/d ≤ 1, e.g. the purple and brown curves. However, when ρ r is further increased we see a surprising new feature arise where near y = 0 where the device switches back to the state ρ = 0 in the region where dissipation should be strongest. This suggests there is a high power state at which the dissipative state can effectively switch itself off; the rate of increase in dissipation with ρ r is sufficient that the dissipated power actually begins to fall with increased ρ r , so the dissipation cannot sustain itself.
What is not clear from Figure 15a is that the trajectory of S 21 in the Argand plane also becomes discontinuous. This is better illustrated by Figure 15b, which shows |S 21 | as a function of y for ρ r = 2, 4, 8 and 16. As can be seen, there are now step discontinuities in |S 21 | in the wings of the resonance feature. These occur where S 21 departs from the circle for ρ = 0 in the Argand plane.
What may complicate the observation of such behaviour in practice is the fact the state ρ = 0 is also always stable for n/d > 0. As discussed before, which state the device ends up in will depend on how the device has been prepared, e.g. is y or power being swept? Without further detailed analysis it is not possible to say what method, if one exists, is needed to see the unusual behaviour shown.
Similar step discontinuities to those shown in Figure 15a has been observed by Abdo [48] in a set of NbN resonators. In addition, they observe hysteresis around these steps with sweep direction. This latter behaviour can be explained by the resonator switching from a state with ρ > 0 to the one with ρ = 0 at the first transition point, then remaining in this state as it passes through the location of the second discontinuity. They also see the on-resonance transmission initially increase with increasing readout power, then jumping suddenly to a fixed, higher, value; this is consistent with the behaviour predicted by Figure 15 if the device were a transmission resonator. They attribute this behaviour to either weak-link formation in the NbN grain structure or, alternatively, the formation of localised hot spots.

Simultaneous action of several mechanisms
We have considered each nonlinear process acting in isolation, but in some cases, it is the interaction between different processes that determines behaviour. As an example, consider a resonator limited by TLS loss. The results of Section 4 when taken alone suggest that the quality factor can be improved by increasing the readout power so as to saturate the TLS.
However, at some point as the readout power is increased, quasiparticle heating may become significant, resulting in the quality factor decreasing as the power is increased further. The maximum achievable quality factor is determined by the interplay of the two processes, and their relative characteristic power scales. This 'sweet spot' is the operating regime often chosen for best device performance. In extreme cases, we have observed that quasiparticle heating can prevent TLS saturation, and so the quality factor only decreases as power is applied.
Given the importance of these effects, it is valuable to consider how the models presented can be modified to include interactions. The procedure is conceptually straightforward, but computationally involved. A single variable fully characterises the 'state' of the nonlinearity for each process considered: U for the reactive non-linearity and TLS loss, n qp for quasiparticle heating, and P nl for a general physically unidentified nonlinearity. Further, for a particular set of readout conditions the value of this state parameter is found by solving a single equation, often an equilibrium or self-consistency condition: (49), (68) and (82). It is therefore possible to model several processes acting together by solving these equations simultaneously, replacing the Q other term in the individual models by the contributions from other processes. We have developed a convenient conceptual framework for structuring these calculations and easily including new processes. However, space precludes a full description of the method and an exploration of the rich set of behaviours that results. Instead they will be detailed in a companion publication [49].

Extracting behaviour from data
Finally, we indicate how key parametric information can be determined easily from experimental data. It is normally straightforward to record a set of swept-frequency resonance curves at different readout power levels using a VNA or homodyne readout system. The difficulty lies in extracting the underlying nonlinear behaviour when the resonance curves become distorted.
In other words, distorted resonance curves are merely manifestations of the change in the resonance frequency and Q of the underlying simple Lorentzian resonance changing as the readout frequency and power are varied. In principle, we could fit a full nonlinear model of the type described in Section 4-7 and obtain the associated physical parameters, but to do so we need to know the expected nonlinear behaviour in advance. Additionally, as the model becomes more complex so does the fitting process. Section 3, however, motivates a different approach.
The aim is to directly extract the quality factor and resonant frequency at zero realised detuning, for different readout power levels. To do so, we must ensure that the swept-frequency measurements pass through the point of zero realised detuning. This is discussed in Section 3.4, and the process is normally straightforward; for example, if the resonant frequency is known to decrease with applied power, the frequency must be swept downwards when the resonance curves are measured. Next we must identify the point of zero realised detuning in each resonance curve. The rules derived in Section 3.2 can be used to do so: this is as simple as finding the extrema in the transmission gain or point of zero phase shift. Finally, having located the point, the resonant frequency follows from the readout frequency, and the quality factors from the measured S-parameter using (9)- (12). This process is repeated to give the key parameters as a function of applied power.
This method has several attractive features. First, the data and processing needed are straightforward. Second, it is applicable to highly distorted curves, and can therefore be used over wide power ranges. In other words, it is still possible to extract mathematically meaningful, and physically well-defined, resonance frequencies and quality factors, even though the measured resonance curves switch hysteretically, and bear no resemblance to simple Lorentzians. Third, by definition we know the realised detuning at which the parameters were obtained, and this makes it straightforward to convert the applied readout power into the quantities that control the nonlinear behaviour.
As an example, consider a resonator exhibiting a mixture of reactive and dissipative nonlinear behaviour. Assume that the reactive nonlinearity results in Duffing-like behaviour with an increasingly negative frequency shift at high readout powers. To apply our parameter-extraction scheme a set of swept-frequency resonant curves would be recorded at different readout power levels, being careful to sweep the readout frequency downwards in each measurement, which is in the opposite direction to the usual VNA settings. The recorded data would then be processed by first removing any experimental artefacts, such as gain-and phase slopes. The maximum in transmission gain of each resonance curve would be located, checked against phase, and used to calculate values of ν r and Q t /Q c at the corresponding readout power and x = 0 via (33) and (34). The data shown in Figures 11 and 14 was taken in this manner.

Conclusions
Superconducting thin-film resonators are used extensively in many applications. They can take a variety of physical forms, and can be fabricated using a wide range of materials, including proximitised superconducting multilayers. From a device perspective, it is usually assumed that the resonator alone acts as an a near-ideal linear device, exhibiting a perfect response in the form of a Lorentizian notch or peak. In reality this simple behaviour is rarely seen, and non-linear behaviour becomes apparent when the readout power is increased to optimise some aspect of overall device performance.
We have discussed how reactive and dissipative non-linearities can, and do, change the intrinsic response of thin-film resonators considerably, leading to complex behaviour that can mask or degrade the primary device-operation being sought. At its most minor, resonance curve distortion can indicate heating, which may increase the noise generated by the device; at its most significant, resonance curve distortion can be associated with hysteretic switching between different stable states, and the operating point can depend on the order in which the external parameters are changed.
We have shown that most, if not all, of the complex phenomena commonly seen in experiments can be described by a model in which the underlying resonance is a single-pole Lorentizian, but whose centre frequency and quality factor change depending on the energy stored in the resonator and/or the power dissipated in various physical processes. What is seen experimentally are samples taken from an ideal resonance curve that is moving and changing width as external parameters, such as readout frequency and power, are swept. According to this model, it is perfectly proper to refer to, and to measure, the Q of the underlying resonance, even though the swept frequency curves appear highly distorted and perhaps hysteretic. Indeed, there is a great deal of information contained in the parametric dependence of the Q of the underlying resonance, not just in the resonant frequency. In those cases where the resonance curve is highly distorted, the shape of the trajectory in the Argand plane gives valuable insights into the physical processes present.
Kinetic inductance is an example of a reactive nonlinearity, which leads to a shift in the resonance frequency, and eventually hysteretic switching, but the trajectory in the complex plane remains circular. The point of zero detuning is important, and can still be found from zero crossings and stationary points in the transmission and reflection amplitudes, as for a linear device. Two Level Systems in oxides primarily introduce a dissipative nonlinearity. We have described a fixed point method for calculating measured resonance curves, and shown how the trajectory in the Argand plane takes on a characteristic 'tear drop' shape. We have also shown that TLSs cannot produce hysteresis, but they lead to a phenomenon, seen experimentally, where an apparently absent resonance suddenly switches on as the readout power is increased.
Quasiparticle heating leads to a completely different kind of dissipative nonlinearity. Sub-gap readout photons change the energy distribution and number density of quasiparticles, which themselves change the dissipation factor. We have presented a model based of the Rothwarf Taylor equations that gives a simple expression for the internal quality factor as a function of readout power. This formulation leads to a scheme in which resonator dynamics is described by a quartic equation, and we discussed the stabilities of the roots of this equation under different coupling conditions. We find different behaviours in the undercoupled and overcoupled cases, due to the existence of negative and positive feedback respectively in the quasiparticle generation process. Crucially, the trajectory in the complex plane takes on a highly characteristic two-part piecewise circular form. In this case, the points of zero detuning can be identified directly, and the quality factor of the underlying resonance found. Finally, we introduced a generic power law model, where the internal quality factor depends on the dissipated power raised to the power of a rational number. This generic model captures the key features of specific dissipative non-linearities, but additionally leads to insights into how general dissipative processes create characteristic forms of behaviour in the Argand plane. We have found these insights to be highly valuable when interpreting the rich variety of behaviour seen experimentally in different kinds of device.
B Proofs relating to the TLS model B.1 Proof that solution of (50) exists Let I denote the interval [0, 1], which corresponds to the range of values of α, where α = f (α).
We will use square brackets to denote an interval limit that includes the end point and curved brackets to indicate a limit that excludes the end point. For example, the interval [a, b] of x corresponds to a ≤ x ≤ b and [a, b) to a ≤ x < b. Given definitions (51) and (54), it is straightforward to show that for the problem in hand We know ξ tls ≥ 0, 0 ≤ r ≤ 1 and 0 ≤ χ ≤ 1, so df /dα ≥ 0 for any real α. B.2 Proof of convergence of (55) and physical uniqueness of solution for x = 0 We will make use of the following fixed-point theorem: if a function g(x) maps an interval I into itself and |dg/dx| < 1 for x ∈ 1, then g(x) has a unique fixed point x = f (x) that is the limit n → ∞ of the sequence x n = g(x n−1 ) for x 0 ∈ I. This is the one-dimensional form of Banach's fixed-point theorem. In Section B.1 we showed f (α) maps the interval I into itself, so to prove (55) converges we only need to consider the conditions on the derivative.
If x = 0, then χ = 1 for all α. With χ = 1 in (117), we can define three cases to cover all possible physical situations. Case 1 is where ξ tls > r 2 , so df /dα < 1 for all α. If instead ξ tls ≤ r 2 , it is straightforward to show that df /dα < 1 if α is less than regime in which superconducting resonators are employed, the Mattis-Bardeen [51] equations for σ can be approximated by and Here σ n is the normal state conductivity and ∆ 0 the superconducting gap energy at absolute zero. These results can understood physically in terms of a two-fluid model. In the regime considered the dominant charge carriers are the Cooper pairs, which move without scattering and hence do not contribute the real part of the conductivity, σ 1 . Instead, their inertia manifests itself as an inductance like term as described by σ 2 (kinetic inductance). However, some fraction of the Cooper pairs are broken into quasiparticles, either by thermal processes or by external forcing. This loss of Cooper pairs reduces the inductive response, as described by the second term in (123). In addition, the quasiparticles behave electrically approximately like normal state Drude model electrons, leading to a resistance contribution proportional to n qp : (122).
We must now link Q with σ. In the case of a lumped element device, this is relatively straightforward. This is because the superconductor film is normally used in a regime where it is electrically thin and the contribution from geometric reactance is small, so it can be approximated as an impedance Z given by where t is the film thickness and N sq is the length of the superconducting trace expressed in squares. Z constitutes the parallel inductance L and resistance R in (b) of Figure 1. Using the normal result for the quality factor of a parallel tank circuit, we find Making use of (123) we then have Q −1 ∝ n qp , as assumed in (56).
In the case of a transmission line resonator of length l, if γ is the complex propagation constant of waves on the line then it can be shown that Strictly this expression accounts for both Ohmic and dielectric losses; in what follows we will assume there are only Ohmic losses so Q i = Q qp . If the metallisation of a transmission line is superconducting, the series impedance per unit length of line, Z, is modified to where L g is the inductance per unit length in the case of PEC conductors, Z s = R s + iX s is the surface impedance of the superconductors and g is a geometrical factor. The shunt admittance per unit length is the same as the PEC case. In general, Z s is a non-trivial function of σ.
However, for most resonators of practical interest |X s | R s and we may approximate where the factor κ f = gX s /(2πiνL g + gX s ) is normally referred to as the kinetic inductance fraction of the superconducting line. Zmuidzinas [21] has shown that if σ 2 σ 1 then where κ g is a scaling factor that varies in magnitude between 1/3 and 1 depending on the thickness of the film and whether or not it is in the extreme anomalous limit. Combining (126), (128) and (129) we again obtain the approximation Q qp ∝ n qp .