Narrowband Searches for Continuous and Long-duration Transient Gravitational Waves from Known Pulsars in the LIGO-Virgo Third Observing Run

Isolated neutron stars that are asymmetric with respect to their spin axis are possible sources of detectable continuous gravitational waves. This paper presents a fully coherent search for such signals from eighteen pulsars in data from LIGO and Virgo ’ s third observing run ( O3 ) . For known pulsars, ef ﬁ cient and sensitive matched-ﬁ lter searches can be carried out if one assumes the gravitational radiation is phase-locked to the electromagnetic emission. In the search presented here, we relax this assumption and allow both the frequency and the time derivative of the frequency of the gravitational waves to vary in a small range around those inferred from electromagnetic observations. We ﬁ nd no evidence for continuous gravitational waves, and set upper limits on the strain amplitude for each target. These limits are more constraining for seven of the targets than the spin-down limit de ﬁ ned by ascribing all rotational energy loss to gravitational radiation. In an additional search, we look in O3 data for long-duration ( hours – months ) transient gravitational waves in the aftermath of pulsar glitches for six targets with a total of nine glitches. We report two marginal outliers from this search, but ﬁ nd no clear evidence for such emission either. The resulting duration-dependent strain upper limits do not surpass indirect energy constraints for any of these targets.


INTRODUCTION
Continuous gravitational waves (CWs) are quasimonochromatic signals expected to be ever-present in the data of gravitational-wave (GW) detectors such as Advanced LIGO (Aasi et al. 2015a) and Advanced Virgo (Acernese et al. 2015). While the observation of transient GWs from compact binary coalescences has become nearly commonplace (Abbott et al. 2021b), CWs have yet to be detected as of the third observing run (O3). One of the most enticing and commonly sought after sources of CWs is a rapidly spinning, asymmetric neutron star (NS); see Sieniawska & Bejger (2019) and Haskell & Schwenzer (2020) for recent reviews. In the case of a triaxial NS, CW emission occurs at twice the rotation frequency of the star.
Many NSs are observed as pulsars by radio, X-ray or γ-ray telescopes (Lorimer & Kramer 2012). Pulsars can often be timed extremely precisely-in the best cases the arrival of new pulses can be predicted to within tens of nano-seconds. This precision can enable exciting science, including sensitive tests of general relativity (Wex & Kramer 2020), placing constraints on the equation of state of the dense matter inside NSs (Lattimer & Prakash 2004;Kramer & Wex 2009;Ho et al. 2015), and using deviations in timing residuals of pulsars to search for a stochastic gravitational-wave background (Verbiest et al. 2016;Arzoumanian et al. 2020;Goncharov et al. 2021). Detecting GWs from spinning NSs would add a completely new messenger to the study of these extreme objects (Glampedakis & Gualtieri 2018;Haskell & Schwenzer 2020).
In this paper, we search in LIGO-Virgo O3 data (taken in 2019-2020) for CWs from NSs that have been observed as pulsars and precisely timed in either the radio or X-ray bands. We identify eighteen promising candidates for which the observed spin-down (negative frequency derivative) implies indirect limits on CW emission that fall within a factor of 3 of the expected sensitivity of the search. Some other analyses assume the phase of CWs to be locked to the rotational phase of the crust of the star as observed by electromagnetic (EM) telescopes (Aasi et al. 2014;Abbott et al. 2017aAbbott et al. , 2018Abbott et al. , 2019aAbbott et al. , 2020bAbbott et al. , 2021cAshok et al. 2021;Nieder et al. 2020). Here we relax that assumption, and allow for the frequency of rotation, and its derivative, to differ from the EM-observed values by a small factor: the so-called "narrowband" search approach (Abbott et al. 2008;Aasi et al. 2015b;Abbott et al. 2017bAbbott et al. , 2019bAshok et al. 2021;Nieder et al. 2020). We use two separate analysis pipelines, the 5n-vector (Astone et al. 2014;Mastrogiovanni et al. 2017) and the frequency-domain Fstatistic (Jaranowski et al. 1998;Wette et al. 2018b) pipelines, to perform phase-coherent searches for CWs on O3 data over our widened parameter space. Using two separate pipelines allows us to cross-check results between pipelines, compare limits set by the two methods, and increase confidence in any potential detection by requiring both pipelines to see the same signal.
A scenario where GW and EM emission have similar, but slightly different phase evolution is plausible, e.g., when there is differential rotation between the rigid crust and superfluid parts of the star. Possible observational evidence for this comes, for example, from pulsar glitches (Link et al. 1992(Link et al. , 2000Lyne et al. 2000;Fuentes et al. 2017;Haskell 2017): sudden spin-up events that are often followed by an exponential relaxation back to the simple spin-down scenario. In many models (Haskell & Melatos 2015), the superfluid and non-superfluid components build up a lag with respect to one another, and a glitch represents the sudden re-coupling of the two components (Anderson & Itoh 1975;Lyne et al. 2000).
Glitches are also directly relevant for GW searches: first, some of our CW search targets glitched during O3. For these, we perform separate phase-coherent searches covering the parts of O3 before and after the glitches. Second, it is also possible that glitches trigger increased GW emission in their aftermath (van Eysden & Melatos 2008;Bennett et al. 2010;Prix et al. 2011;Melatos et al. 2015;Singh 2017;Yim & Jones 2020). Hence, we also perform additional searches for long-duration transient CW-like signals with durations from a few hours to four months after observed glitches during (or shortly before) O3, covering nine glitches from six pulsars.
The rest of this paper is structured as follows. We discuss our selection of target pulsars, and the EM observations that we use to guide our search in Sec. 2 and the O3 GW data set in Sec. 3. In Sec. 4 we describe our analysis methods. We present the results of the two CW searches in Sec. 5. We then cover the detailed setup and the results of the search for GW emission in the aftermath of pulsar glitches in Sec. 6. In Sec. 7 we conclude the paper with a discussion of the results obtained by all three analyses and their astrophysical implications. Throughout the rest of the paper we will often refer to the analyses using the F-statistic and 5nvector pipelines search over all data as the two "CW" searches, and the search for long-duration transients after glitches as the "transient search."

ELECTROMAGNETIC DATA AND TARGET SELECTION
We start with timing solutions from EM observations of pulsars (also referred to as ephemerides), and search in a narrow band in frequency and spin-down, as further described below in Sec. 4. The widths of the frequency and spin-down search bands are usually larger than the uncertainty in the timing solutions. The observations we use were made in radio and X-ray bands, and were provided by the following observatories: the Canadian Hydrogen Intensity Mapping Experiment (CHIME) (Bandura et al. 2014;Amiri et al. 2021), University of Tasmania's Mount Pleasant Observatory 26 m telescope, the 42 ft telescope and Lovell telescope at Jodrell Bank, the MeerKAT telescope (observations made as part of the MeerTime project, Bailes et al. 2020), the Nançay Decimetric Radio Telescope, the Neutron Star Interior Composition Explorer (NICER) (Gendreau 2016) and the UTMOST timing program with the Molonglo Observatory Synthesis Telescope (MOST) (Bailes et al. 2017;Jankowski et al. 2018;Lower et al. 2020). The Tempo (Nice et al. 2015) and Tempo2 (Edwards et al. 2006) timing packages were used to fit the model parameters and provide timing solutions.
We select our targets for the CW searches in a similar manner to Abbott et al. (2019b), based on the sensitive frequency band of the Advanced LIGO and Virgo detectors and availability of precise ephemerides over the duration of O3 from EM observations. We have analyzed all isolated pulsars, except for one, with a rotation frequency between 10 and 350 Hz and for which the spin-down limit falls within a factor of 3 of the expected sensitivity of the full network over the course of O3. This frequency range includes all the high-value targets identified in Abbott et al. (2021a) for which it could be possible to go below the spin-down limit. Pulsars in this frequency range would produce CWs between 20 Hz and 700 Hz. The only pulsar we do not analyze that satisfies these criteria is PSR J0537-6910, which was analyzed in detail using a narrowband approach to search for r-mode emission in Abbott et al. (2021d) and using a targeted approach in Abbott et al. (2021c). Searches lasting up to 120 days during inter-glitch periods for this pulsar are performed by the post-glitch transient search presented in Sections 4.4 and 6.1.
We estimate the expected sensitivity as where S(f ) is the power spectral density as a function of frequency for the LIGO Hanford, LIGO Livingston and Virgo detectors (H1, L1 and V1 respectively) and T obs is the observing time assuming 11 months of data with a duty cycles of 75%, 77% and 76% respectively. The factor Θ ∼ 30 encodes the scaling of typical narrowband searches but is a function of the number of templates employed in the analysis (Astone et al. 2014).
On the other hand, the spin-down limit is an indirect upper limit (UL) on the GW amplitude assuming all of a pulsar's rotational energy loss comes in the form of GWs (Jaranowski et al. 1998;Prix 2009 where I 38 is the star's moment of inertia in units of 10 38 kg m 2 , f spin is its rotation frequency, and |ḟ spin | is the absolute value of its spin-down rate (Abbott et al. 2019b). We have computed spin-down limits according to the most recent distance estimates given in the ATNF catalog (Manchester et al. 2005), version 1.64, and extrapolating the rotational frequencies and spindown rates to O3. For several pulsars, we have used more recent distance estimates from the literature. In Table 1 we present a list of targets along with the ranges of GW frequency and spin-down parameters and corresponding number of templates covered for the two pipelines used to conduct the CW searches. We discuss our parameter range choices for each pipeline in Sec. 4. The observatory yielding the timing solution we use for each source is noted in the far right column.
For this reason, we perform two separate searches, before and after the estimated glitch epoch, which are identified in Table 1 with suffixes "bg" and "ag" for "before glitch" and "after glitch" respectively.
For PSR J0534+2200, we use O3 data until the last observation before the glitch. The analysis after the glitch uses data from ten days after the glitch epoch, accounting for the estimated relaxation time, until the end of O3.
For PSR J1105-6107, we only search after the glitch since the glitch occurs nearly at the start of the run. The after-glitch search uses data from two days after the estimated glitch until the end of O3. For PSR J1813-1749, we perform two separate searches: one fully-coherent search across the full O3 duration, assuming the pulsar did not glitch, and a search before the glitch. No search after the glitch is performed since the glitch occurs near the end of the EM timing model (although a post-glitch transient search is conducted assuming the timing model remains valid within uncertainties; see below). Further details about these glitches will also be discussed in Sec. 6.
For the post-glitch transient search, we have used information from the glitch catalogs maintained at ATNF (Hobbs et al. 2021) and Jodrell Bank (Espinoza et al. 2011;Basu et al. 2022;Shaw et al. 2021). We have selected six pulsars with GW frequencies f > 15 Hz and with glitches observed during (or shortly before) O3: PSR J0534+2200, PSR J0537-6910, PSR J0908-4913, PSR J1105-6107, PSR J1813-1749, and PSR J1826-1334. Another pulsar within our frequency band of interest, PSR J2021+3651, was observed to glitch during O3 by Jodrell Bank, but the glitch time uncertainty is too large to make our transient search setup feasible. As mentioned before, for PSR J1813-1749 it is not certain if a glitch actually occurred (Ho et al. 2020a), but we perform an opportunistic search here with its assumed parameters.
The ephemerides for PSR J0534+2200 and PSR J1826-1334 were provided by Jodrell Bank; a detailed discussion of the PSR J0534+2200 glitch is given in Shaw et al. (2021). Ephemerides for PSR J0908-4913 and PSR J1105-6107 are derived from UTMOST radio observations; these are discussed further in Lower et al. (2019Lower et al. ( , 2020. A potential ambiguity in timing solutions from periodically-scheduled surveys like UTMOST, particularly affecting glitch size estimates, was discussed by Dunn et al. (2021), but for these two targets it has been confirmed that the provided values are correct. For PSR J1813-1749 and PSR J0537-6910 we use NICER X-ray observations, as reported in Ho et al. (2020a,b) and Abbott et al. (2021c). Details for all targets will be listed in Sec. 6.1.

GW DATA
We analyze GW strain data taken at the LIGO Hanford Observatory (H1), LIGO Livingston Observatory (L1) and Virgo (V1), during their O3 observing run. O3 consisted of two separate sections, separated by a month-long commissioning break. O3a ran from 2019 April 1, 15:00 UTC until 2019 October 1, 15:00 UTC. O3b ran from 2019 November 1, 15:00 UTC, to 2020 March 27, 17:00 UTC. See Buikema et al. (2020) and Acernese et al. (2019) for general descriptions of the performances of the LIGO and Virgo detectors respectively during O3. The calibration of this data set and its uncertainty budget are described in Sun et al. (2020Sun et al. ( , 2021 and Acernese et al. (2022a).
Data preparation for all searches starts with removing times of large transient noise whose cause is known, referred to as "CAT1" vetoes (Davis et al. 2021;Acernese et al. 2022b) from calibrated strain data.  search discussed in Sec. 4.3 and 5v stands for the 5n-vector method discussed in Sec. 4.2. The suffixes "bg" and "ag" refer to "before" and "after" glitch respectively, while "full" corresponds to the full run in the case of PSR J1813-1749. This procedure removed 0.7 days out of 246 days of H1 data, 2.2 days out of 256 days of L1 data, and 0.46 days out of 251 days of V1 data. The data are then cleaned to remove some known monochromatic or quasi-monochromatic detector artifacts that can be effectively monitored by external sensors, like signals injected for calibration, or contamination from power mains (Davis et al. 2019;Viets & Wade 2021;Sun et al. 2020Sun et al. , 2021Acernese et al. 2022a,b). The H1 and L1 data near the 60 Hz power mains frequency are also cleaned using a non-linear filtering method outlined in Vajente et al. (2020).
The data from L1 and H1 are then "gated" to remove further large transient artifacts, which removes an additional 1.0 days of data from L1 and 0.62 days of data from H1 (Zweizig & Riles 2021;Davis et al. 2021).
There are some differences in how each search pipeline uses these data, as will also be discussed in detail in Sec. 4. The 5n-vector search creates Short Fourier Databases (SFDBs) from the time domain data. SFDBs consist of a collection of short-duration (1024 s long) fast Fourier transforms (FFTs), overlapped by half and Tukey-windowed with a window shape parameter of α = 0.001. During the construction of the SFDBs, an extra time-domain cleaning (Astone et al. 2005 , in addition to those discussed in the previous paragraph) is applied to identify delta-like time-domain disturbances. The F-statistic and transient pipelines create 1800 s Tukey-windowed short Fourier transforms (SFTs) with a window shape parameter of α = 0.001, and then extract a narrow frequency range around the frequency band for each pulsar.
Additional narrow lines with identified instrumental origin are listed in Goetz et al. (2021) and Piccinni et al. (2021); we use these for pipeline-level cleaning or postprocessing vetoes, as discussed in later sections.

Signal model
With all three pipelines, we search for quasimonochromatic GW signals with fixed intrinsic amplitude at approximately twice the pulsar rotation frequency, f ≈ 2f spin , allowing for some deviation in the narrowband search approach. The general frequency evolution model for such GWs, following the standard model for a pulsar's f spin as a function of time t at the pulsar, is For most pulsars, no glitches have been observed during O3, and hence only the first term for the long-time spin-down evolution is relevant: with a reference time T ref and up to N frequency derivatives f (k) . We will also useḟ ,f and ... f for the first three derivatives in this paper. The three search methods in this paper have different maximum N they can handle, as discussed in Appendix A.
For glitching pulsars, the additional term is (or multiple such terms for pulsars with multiple glitches), where Θ(x) is the Heaviside step function, T gl is the glitch time, ∆f (k) gl with M ≤ N are the permanent jumps in the frequency and its derivatives, δf R is the part of the frequency jump that relaxes back, and τ R is the relaxation time. Relaxation is not necessarily observed for all glitches. The glitch term is typically a small correction on top of f Taylor (t ).
Building on this frequency evolution model, the full GW signal received by a detector is where F +,× (t; α, δ, ψ) is the detector response to + and × polarized GWs received at the detector at time t, coming from right ascension α and declination δ with polarization angle ψ (Jaranowski et al. 1998). Timing corrections for translating from t to t are discussed below.
The amplitude of each GW polarization can be written as where ι is the inclination angle of the pulsar, φ 0 is the initial phase, h 0 is an overall dimensionless strain amplitude as discussed below, and Φ(t; {f (k) }, α, δ) is the integrated phase from the timing model of Eq. (3).
To convert from t at the source to t at the detector, several timing corrections typically need to be addressed: proper motion of the source, binary orbital motion, and the motion of the detectors around the solar system barycenter. Proper motions between the pulsar and the solar system can usually be ignored (Covas 2020) for these searches, and in this analysis we do not cover sources in binary orbits as in, e.g., Abbott et al. (2021aAbbott et al. ( ,e, 2022; Ashok et al. (2021). However, the sky position dependent correction between solar system barycenter and detector frame is a crucial part of the analysis methods described in the following, and is implemented in all cases (Jaranowski et al. 1998).
For GWs from a non-axisymmetric star, the amplitude h 0 can be written as where d is the distance to the NS, f is the GW frequency, and I zz is the principal moment of inertia. The ellipticity, , is given by In summary, our CW signal model depends on two sets of parameters, the frequency-evolution parameters λ = {f (k) , α, δ}, and the amplitude parameters A = {h 0 , cos ι, ψ, φ 0 }. The signal model for the longduration transient search is also based on the one discussed here, but with an additional window function in time, as will be discussed below in Sec. 4.4. For both CW pipelines, we search over a narrow range of GW frequencies and first spin-down terms centered on their respective central estimates from the source ephemeris: f = 2f spin (1 ± κ) andḟ = 2ḟ spin (1 ± κ), where we take κ = 2 × 10 −3 . This value is intended to allow for physical offsets between the frequency of EMobserved pulsar rotation and the GW emission process, as discussed in the introduction and in more detail in Abbott et al. (2008Abbott et al. ( , 2019b. The frequency and spin-down changes due to glitches offer evidence of how large of a physical offset we might expect. While glitches with ∆f (1) gl /ḟ spin 2 × 10 −3 are observed occasionally, this value of κ encompasses most glitches that have been observed (see, e.g., Fuentes et al. 2017;Lower et al. 2021) and increasing this value significantly to encompass all observed glitch sizes would not be computationally feasible. The full widths we allow for both parameters, ∆f = 4κf spin and ∆ḟ = 4κḟ spin , are shown in Table 1 for each target we consider. Higherorder frequency derivatives are handled differently by each pipeline, which we discuss in the next two subsections and in Appendix A.

5n-vector narrowband pipeline
The 5n-vector narrowband pipeline uses the 5n-vector method of Astone et al. (2010Astone et al. ( , 2012 to search for CWs in a narrow frequency and spin-down range around the source ephemerides. This method searches for the characteristic modulations imprinted by the Earth's rotation on a GW signal, splitting it into 5 harmonics. The search is applied for the n detectors present in the network. For more details on the implementation of the method we use here, see Mastrogiovanni et al. (2017).
The starting point of the analysis is a collection of 1024 s long FFTs built as explained in the previous section. For each target, we extract a frequency band large enough to contain the Doppler modulation of every template that we consider. When higher-order frequency derivatives are used to fit the EM data, the corresponding GW spin-down terms (includingf ) are fixed to twice the values provided in the source ephemerides.
For each target, we then correct for the Doppler modulation in the time domain using a non-uniform downsampling to 1 Hz. Using this time-series, we then apply two matched filters (for the two CW polarizations) and construct a detection statistic S combining coherently the matched filter results from all detectors (Mastrogiovanni et al. 2017) for each template in the f andḟ space. The detection statistic is defined as where A +/× are the Fourier components of F +/× (t;n, ψ = 0) and H +/× are GW amplitude estimators built from the Fourier components of the data and A +/× . The template bank bin size in frequency is given by the inverse of the time-series duration and the spin-down bin size is given by its inverse squared. The final step is to select the local maximum of the detection statistic in every 10 −4 Hz band over all spindown values considered. Within this set of points in the parameter space, we select as outliers those with a p-value (defined as the tail probability of the detection statistic noise-only distribution) below a 1% threshold, taking into account the number of templates applied in the f -ḟ space. The noise-only distribution of the detection statistic used to estimate the threshold for candidate selection is built using all the templates excluded from the analysis in each 10 −4 Hz band. The detection statistic value corresponding to the threshold for candidate selection is extrapolated from the noise-only distribution using an exponential fit of the tail.
Finally, to compute the 95% confidence level ULs h 95% 0 on the CW amplitude in the case of no detection, we perform software injections with simulated GW signals in each 10 −4 Hz sub-band to estimate the value of h 0 at which we achieve 95% detection efficiency at a falsealarm probability of 1%.

Frequency-domain F-statistic pipeline
We also use the frequency-domain F-statistic pipeline to search for CWs in a narrow frequency and spin-down range around the source ephemerides. The F-statistic is a matched filter which is maximized over the amplitude parameters A of the quasi-monochromatic signals (Jaranowski et al. 1998;Cutler & Schutz 2005); see Tenorio et al. (2021a) for a review covering its applications.
For each target we search over the range of GW frequency f and spin-downḟ , with matched-filter templates chosen using the multi-detector F-statistic metric (Prix 2007;Wette et al. 2008), which we call through the LALSuite (LIGO Scientific Collaboration 2021) program lalapps Weave (Wette et al. 2018b). If the second spin derivativef spin is non-zero, we also search over the rangef = 2(f spin ± 3σf spin ) where σf spin is the 1 σ uncertainty given in the source ephemeris. Higher order frequency derivatives, up to f (4) , are fixed to the EMmeasured values. Because of constraints on the pipeline infrastructure, if the source ephemeris for a target includes derivatives above f (4) , we refit the arrival times with fewer frequency derivatives. This results in slightly different ephemerides being used for this search, and for the 5n-vector search. We discuss targets for which this is the case in Appendix A.
We place templates with a maximum matched-filter mismatch from a putative source of 0.02, which leads to a very dense grid compared to the one used by the 5nvector search. This can be seen by comparing the final two (non-reference) columns of Table 1. The analysis uses a set of 1800 s Tukey-windowed SFTs over the full data span for all three detectors, and produces a detection statistic, which we denote 2F, for each matchedfilter template. See Sec. 3 for a full discussion of data processing and data quality cuts made before generation of the SFTs.
Following Tenorio et al. (2022), we construct 10 4 randomly-chosen batches of templates from our search results for each pulsar, and fit a Gumbel distribution to the distribution of the maxima of these batches. We can then propagate this Gumbel distribution based on the total number of templates to estimate the tail probability (p-value) of the largest template across the full search for that target. We outline our implementation of this method more fully in Appendix B.1.
If any templates have a p-value of less than 1% and corresponding large 2F values, we perform follow-up studies of these templates. A list of frequencies of known narrow spectral artifacts, f l , and their nominal widths, w l , are collated for all three detectors (Goetz et al. 2021;Piccinni et al. 2021). If |f l − f large | < w l , where f large is the frequency of the large-2F template, then we veto that template. We also look at the single-detector 2F value calculated for each detector. If the individual 2F value for such a template in a single detector is larger than the 2F value calculated using the whole network, then we also veto that large-2F template (Keitel et al. 2014;Leaci 2015). Any remaining large-2F templates with p < 1% are then flagged for further follow-up, and could be considered candidates for detection (subject to further studies).
In the absence of a detection, we set ULs at 95% confidence, h 95% 0 , for each target using the software-injection scheme set out in, e.g., Abbott et al. (2007).

Post-glitch transient search
The search for long-duration transient CW-like signals from glitching pulsars is motivated by the idea that some of the excess energy liberated in a glitch could correspond to transient changes in the quadrupole moment of the star and be radiated away in GWs ( To search for such signals, we use the transient Fstatistic method introduced by Prix et al. (2011) and previously applied to LIGO O2 data in searches for GWs from glitches in the Crab and Vela pulsars (Keitel et al. 2019). The idea is to search for long-duration transients that are "CW-like" in the sense of following the standard quasi-monochromatic CW signal model from Sec. 4.1 with only an additional window function (t; t 0 , τ ) in time applied: where the transient parameters T consist of the window shape, signal start time t 0 , and a duration parameter τ . As in Keitel et al. (2019), we limit ourselves here to the simplest case of rectangular windows, meaning the signal is exactly a standard CW that starts at time t 0 and is cut off at time t 0 + τ , with no additional amplitude evolution. Some form of amplitude decay would be expected for a realistic signal linked to post-glitch relaxation (Yim & Jones 2020). As demonstrated in Prix et al. (2011) and Keitel et al. (2019), the signal-to-noise ratio (S/N) loss from using rectangular windows in a search for exponentially decaying signals is mild, while using exponentially windowed search templates would mean a much higher computational cost (Keitel & Ashton 2018 Figure 1. The red solid, blue dashed, and purple dotted curves show the expected sensitivities for H1, L1, and V1 respectively, using Eq. 1. The blue pentagons indicate the median 95% CL ULs from the 5n-vector search across all 10 −4 Hz sub-bands for each source. The black crosses indicate 95% CL ULs from the F-statistic search, which are set across the full search range for each target. The orange triangles indicate the spin-down limit, h sd , with error bars that reflect uncertainty in the distance to each source. In a few cases the error bars are smaller than the size of the markers. We discuss and compare these limits in more detail in Sec. 7. We do not show uncertainties on ULs here, although we discuss uncertainties due to the UL-setting method as well as calibration uncertainties in Sec. 7. The data for this figure are available online (LIGO Scientific Collaboration et al. 2022).
lalapps ComputeFstatistic v2 program which was used before, e.g., in searches for CWs from supernova remnants (Aasi et al. 2015c;Abbott et al. 2019c;Lindblom & Owen 2020). For each target, we perform a search at fixed sky location over a grid in f (k) parameters, with the number of spin-down terms depending on the pulsar ephemerides, going up to at most a ... f term. At each grid point λ, the standard multi-detector F-statistic is calculated over the maximum duration of interest. Intermediate per-SFT quantities from this calculation, the so-called F-statistic atoms, are kept. As described in Prix et al. (2011), partial sums of these atoms are evaluated in a loop over a range of {t 0 , τ }. The resulting matrix of F mn = F(t 0m , τ n ) statistics gives the likelihoods of transient CW-like signals in each time range. We marginalize F mn over uniform priors on t 0 and τ , obtaining the Bayes factor B tS/G for transient CW-like signals against Gaussian noise as our detection statistic. As demonstrated by Prix et al. (2011), B tS/G improves detection efficiency compared to taking the maximum of F mn , and as demonstrated by Tenorio et al. (2022), it also produces cleaner background distributions on real data. The computational cost of these searches scales linearly with the number of λ templates and with the product of the numbers of t 0 and τ values, and is dominated by the partial summing steps over the latter two parameters. See Prix et al. (2011) and Keitel & Ashton (2018) for details.
As this is only the second search thus far for longtransient GWs of this type (after Keitel et al. 2019), we describe its practical setup in more detail in Sec. 6.1. Table 2. UL results on CW strain from the 5n-vector (5v) and F-statistic (F) pipelines. Note-We show the ULs on strain amplitude at 95% confidence for both pipelines, h 95 0 , the spin-down limit for each target h sd along with the implied limit on ellipticity for each pipeline, and the limit on ellipticity implied by the spin-down limit sd . Finally, we also show the ratio of the 95% confidence UL and the spin-down limit. We surpass the spindown limit for seven targets: PSR J0534+2200, PSR J0835-4510, PSR J1105-6107, PSR J1813-1749, PSR J1913+1011, PSR J1952+3252, and PSR J2229+6114. We do not show uncertainty on ULs here, although we discuss uncertainty due to the UL-setting method as well as calibration uncertainty in Sec. As described below, we find no significant candidates, although both the F-statistic and 5n-vector CW searches find outliers requiring further follow up after the vetoes in Sec. 4 were applied. Hence, we set observational ULs on the GW strain from each target, constraining GW emission below the spin-down limit on seven of the eighteen target pulsars. All UL results are shown in Fig. 1, and listed in Table 2. We give details of the results from both pipelines, including outliers that were followed up, in the rest of this section.

CW SEARCH RESULTS
When discussing ULs in the following section, it should be noted that physically meaningful constraints on the GW energy emission are set when the ULs are lower than the spin-down limit (i.e., when the limit has been surpassed ). Ellipticity constraints for pulsars whose spin-down limit is not surpassed would imply a |ḟ spin | higher than the one observed.
Finally, distance estimates used for the spin-down limits are given in Table 2. These are either from the ATNF pulsar catalogue (based on dispersion measures), or from the literature. For the pulsar PSR J1813-1749, different models of the electron density in the Galaxy yield different distance estimates (Camilo et al. 2021). We have used the more optimistic estimate d = 6.2 ± 2.4 kpc, although a different model gives a more pessimistic estimate of d = 12 kpc.  The 5n-vector pipeline found outliers only for two targets: PSR J1828-1101 and PSR J1838-0655. Figures 2 and 3 show the distribution of outliers in the searched frequency band (marginalized over the spindown plane), along with the power spectral density (PSD) for each of the three detectors in that band. For PSR J1828-1101 we find 10 outliers around 27.7116 Hz with p-values between 10 −5 − 10 −2 , and for PSR J1838-0655 we find 13 outliers around 28.3134 Hz with p-values between 10 −3 − 10 −2 . The nominal p-values we report here assume underlying Gaussian noise, an assumption that can break down in the presence of instrumental artifacts.
These outliers are all due to noise disturbances identified in H1 data. For PSR J1828-1101, the PSD for each of the three detectors around the outliers are shown in the top panel of Fig. 2  are produced by a known noise line at 27.71 Hz with width of 0.02 Hz caused by a resonance in one of the suspended optics at H1. For PSR J1838-0655 the outliers are due to an identified broad noise line of unknown origin (Goetz et al. 2021). The top panel of Fig. 3 shows the PSD for each of the three detectors around these outliers. As can be seen, various broad noise disturbances are present in H1 data in the frequency band of PSR J1838-0655. Even if the origin of these noise lines is unknown, we can confidently exclude them as astrophysical CW signals since, if they were real, they would have been present also in the L1 data which is more sensitive. In Sec. 5.2 we also perform a dedicated follow-up for the candidates due to these noise disturbances showing that they are only visible in H1 data.
The median ULs across the 10 −4 Hz sub-bands, h 95% 0 , for each pulsar are shown in column 3 of Table 2 and plotted with blue pentagons in Fig. 1  The orange markers indicate the outliers found. We were unable to veto this explicitly due to the known line, but after follow-up detailed in Appendix C it is clear that these outliers are caused by an artifact that affects the first two weeks of O3.
8. We discuss and compare these limits to those of the F-statistic pipeline, to previous searches for the same targets, and to the spin-down limits in Sections 5.3 and 7.

Frequency-domain F-statistic pipeline
The F-statistic pipeline identified outliers with pvalue < 0.01 that could not be vetoed with the methods described in Sec. 4.3 for two targets, PSR J1813-1749 bg and PSR J1838-0655. As described in Sec. 4.3, a first round of vetoes was made where we rejected outliers that were close to known narrow spectral artifacts. This resulted in vetoing outliers in PSR J1828-1101, which are shown in the bottom panel of Fig. 2 for completeness and comparison with the 5n-vector search. We also vetoed outliers with a 2F value that was larger running on data from a single detector than when running on the full network. In Figs. 3 and 4 we show the PSD in the top panel in the search band for the remaining outliers for PSR J1838-0655 and PSR J1813-1749 bg respectively, while the search pipeline statistic, the F-statistic discussed in Sec. 4.3, is shown in the bottom panel(s).
The outliers for PSR J1813-1749 bg are associated with an artifact that contaminates the first two weeks of O3. While nearby, it is outside the nominal width of the line specified in Goetz et al. (2021), which can be seen in Figure 4, and is therefore unlikely to be explicitly caused by that line. However, in Appendix C, we perform a follow-up study that shows that the outliers in this band increase in significance when running on data only from the first two weeks of the run. This behavior is inconsistent with a CW signal, and so we veto these outliers.
The remaining outliers from PSR J1838-0655, shown in Fig. 3, are in the same frequency range as those for the 5n-vector pipeline. These outliers are all near a clear disturbance in H1, and the 2F value for all of them is larger when running only on H1 data than when running only on L1 data. However, L1 is significantly more sensitive than H1 in the frequency bands of interest. In Appendix C we detail a follow-up study on the remaining outliers. We show, based on a realistic set of injections, that for a true signal it is very unlikely to have a larger 2F value when running only on H1 data than when running only on L1 data, due to the difference in the sensitivity for the two detectors.
One pulsar, PSR J1856+0245, was in a frequency band that had significant instrumental disturbances, and so we could not reliably estimate the background distribution, and do not report ULs (or search results) for it. We discuss this case in Appendix B.1.
Given that there are no remaining candidates that exceed our follow-up threshold, we set ULs at 95% confidence on the strain amplitude, h 0 , of a potential CW signal produced by our targets. The UL, calculated across the full parameter space, h 95% 0 , for each pulsar is shown in column 4 of Table 2 and plotted with black crosses in Fig. 1. The ratio h 95% 0 /h sd is shown in column 7 of Tab. 2, and the limit on the ellipticity of each pulsar is shown in column 9. We discuss these limits in detail, and in the context of indirect limits and the 5n-vector results, in Section 7.

Comparison between pipelines
The ULs from the F-statistic pipeline are, on average, 19% higher than those set by the 5n-vector pipeline. They are not directly comparable, however, as the 5nvector pipeline results are the median across 10 −4 Hz sub-bands, while the F-statistic limits are set across the whole parameter space for each pulsar. A more equitable comparison is to compare the largest UL across all subbands for the 5n-vector, which is on average 9% lower than the ULs set by the F-statistic pipeline.
A direct comparison would be very involved due to the differing pre-treatment of the data and methods used by the two pipelines. However, there are a few clear methodological differences, which could explain the difference in ULs, which we highlight below.
The most obvious difference is in the effective "trials factors" used in generating the thresholds for followup, which are then used for the threshold of "detection" when performing the software injections used to calculate ULs. Looking at a larger number of templates reduces the effective mismatch between templates and a signal, but also increases the likelihood of finding a larger detection statistic due to a noise fluctuation. The F-statistic search uses significantly more templates than the 5n-vector pipeline in nearly all cases. Fitting the relationship between the ratio of templates used in the two pipelines to the ratio of their ULs, we find that for an equal number of templates, the F-statistic search would have limits ∼ 5% higher than the maximum UL compared to the 5n-vector search. This is consistent with the scaling found in Astone et al. (2014). Inevitably, these differences are a reflection of the full range of choices made for the two pipelines, which also include how thresholds for follow-up are set (e.g., the method outlined in Appendix B.1 versus extrapolating the tail of the background distribution), the cleaning procedures used in preparing the data, as well as the densities of the templates used to perform the searches.

Search setup
For all six targets in the post-glitch transient searches, we use SFTs of duration 1800 s (as discussed in Sec. 3) from both LIGO detectors and Virgo, except for PSR J0908-4913 and PSR J1826-1334. The frequency bands searched for these two targets are affected by broad-band instrumental disturbances in Virgo. For this analysis, we additionally clean narrow instrumental lines with known instrumental origin (Goetz et al. 2021;Piccinni et al. 2021) before the searches by replacing affected SFT bins with Gaussian noise matching the PSD in the surrounding range.
Observed parameters and search setups for the five pulsars that have a single known glitch during our time of interest are summarized in Table 3. For the "Big Glitcher" PSR J0537-6910 (Middleditch et al. 2006;Ho et al. 2020b), which was previously considered in searches for persistent CWs in the inter-glitch periods by Fesik & Papa (2020) and Abbott et al. (2021c,d), we perform four separate searches following four observed glitches as illustrated in Fig. 5. Corresponding parameters are listed in Table 4.
We derive the ranges of frequency evolution parameters λ and transient parameters T to be covered in each search from the ephemerides for that pulsar, the uncertainty δT gl in the glitch time T gl , and the availability of GW data relative to T gl . We set a maximum signal duration of τ max = 120 days to be searched for each glitch, guided both by observed glitch recovery times being typically on shorter timescales (Lyne et al. 2000;Espinoza et al. 2011;Yu et al. 2013;Yim & Jones 2020;Lower et al. 2021) and by computational cost constraints (Keitel & Ashton 2018). Signal start times t 0 cover T gl ± max(δT gl , 1 d) and the reference time T ref for the frequency and spin-down parameter grids (again assuming GW emission near twice the rotation period) is set to the earliest SFT timestamp in this range. On each side of the nominal f (k) values extrapolated to T ref , the grids cover the maximum of (i) 0.001 times the nominal value (this is one quarter of the width used by the CW searches as described in Sec. 4.1), (ii) the 1σ fit uncertainties from the ephemerides propagated to T ref , and (iii) the glitch step size in that parameter. We place the grids using the metric from Prix (2007) and the template placement algorithm from Wette (2014), as implemented in lalapps ComputeFstatistic v2 and lalpulsar (LIGO Scientific Collaboration 2021), with a mismatch parameter of 0.2. The algorithm can add some additional grid points outside the nominal ranges to reduce mismatch near the edges. But for targets witḧ f and higher terms in their timing solution, we strictly limit template placement in all spin-down terms to the nominal range, to limit computational cost. Our resolution in T is dt 0 = dτ = 2 T SFT = 3600 s. Due to implementation details of the F-statistic (Prix 2015), the minimum duration is τ min = 2 T SFT .
In some special cases we modified the default setup to match the availability of GW data, explaining some apparent anomalies in the configurations listed in Tables  3 and 4. The glitch from PSR J0908-4913 happened in October of 2019, during the maintenance break between O3a and O3b, a month for which no GW data is available. We still search for the standard set of τ up to 120 days after this glitch, but fix t 0 to the first available SFT timestamp after the break -any templates with a different t 0 in the regular band around the glitch time would yield identical detection statistics -and are insensitive to shorter-duration signals in this case. For PSR J1826-1334, O3 ended 84 days after the nominal glitch time, so we shorten the analysis accordingly.
For PSR J0537-6910, as illustrated in Fig. 5 and following the numbering from Ho et al. (2020b), we perform searches covering the durations from each of glitches 5-7 until the next glitch, and for glitch 8 we Note-This table contains information about the five pulsars with a single post-glitch transient analysis. See Table 4 for the special case of PSR J0537-6910. Distances are taken from the ATNF catalog unless indicated otherwise with a footnote, and all other parameters are derived from the ephemerides as discussed in the text. ∆f gl , ∆ḟ gl and ∆f gl are the glitch step sizes. For each search, the reference time T ref is chosen to match the earliest SFT data timestamp in T gl ± min(∆T gl , 1 d), and the GW frequency and spin-down parameters f ,ḟ etc. are extrapolated to this time. ∆f , ∆ḟ etc. refer to their corresponding search band widths. N λ is the total number of frequency-evolution parameters covered by each search. search until the end of O3. Glitch 5 happened in late March of 2019, shortly before the start of O3, and hence, as for PSR J0908-4913, t 0 is fixed to the data start time.

Results
To determine whether or not there are interesting outliers in the results of each search, for most targets we again follow the procedure described in Tenorio et al. (2022) to empirically estimate the distribution of the expected highest outlier. For two targets, PSR J1105-6107 and PSR J1826-1334, the number of templates is too low for this method to deliver robust results. Instead, for these targets we use spatial off-sourcing  to estimate the background distribution and set a threshold. The specific implementations for both cases are explained in Appendix B.2.
For eight of the nine searches, there are no candidates above threshold. The background distribution from the search after the eighth glitch of PSR J0537-6910 is less clean and requires an additional notching step, as discussed in Appendix B.2. We then find two marginal outliers at about 2 and 1 standard deviations respectively from the mean of the estimated extreme-value distribu-  Table 3 for details on the listed parameters.
tion (4% and 14% tail probability). The best-fit signal durations are about 60 and 45 days, respectively. Appendix D lists the full parameters of these outliers and describes additional analyses to follow them up. In summary, we cannot associate either outlier with any known spectral artifact or single-detector feature in the data. They pass several consistency tests and vetoes, and behave as weak CW-like candidates are expected to in a fully phase-coherent targeted follow-up (Dupuis & Woan 2005;Pitkin et al. 2017) over the corresponding subsets of O3 data. Nevertheless, we do not consider these as promising detection candidates for two main reasons: first, they would imply a signal with far higher amplitude than allowed by the estimated glitch energy for this target (see discussion of ULs below). Second, their significance is low, compared also with the 1% tail probability thresholds used by the two CW pipelines in this paper. This is reinforced by follow-up 5n-vector fixed-duration analyses, which recover local S-statistic peaks at the parameters of each outlier, but not as the loudest candidates in the band. Still, we conservatively use a higher threshold (from including the outliers in the background estimate with no notching) in the UL procedure for PSR J0537-6910 glitch 8.
Having identified no convincing detection candidates above threshold for any of the targets, we set ULs on the GW strain after each glitch. As in the search, we assume CW-like signals that are constant over the signal duration τ , and we evaluate the ULs as a function of τ . To do so, we simulate signals following Eq. (11) with λ parameters and t 0 drawn uniformly from within the search ranges, several discrete steps in τ and h 0 , and the remaining amplitude parameters randomized over their natural ranges. We add these signals to the original SFTs without the spectral cleaning (as described in Sec. 6.1), redo the cleaning step to ensure any signals falsely dismissed due to cleaning are properly accounted for, and perform small searches over a portion of the original λ grid (0.1 mHz in f and the closest spin-down points) and the full T range. The h 95% 0 ULs are obtained from a sigmoid fit to the recovery fraction p det above the B tS/G threshold from the main search, as a function of injected h 0 at fixed τ . Results for all targets are shown in Fig. 6.
We compare these empirical ULs to an indirect constraint derived in Prix et al. (2011) according to the two-fluid model of NS glitches (Lyne et al. 2000). In this model, the excess superfluid energy liberated in a glitch gives a UL to the total GW energy, which corresponds to a τ -dependent strain UL Strain ULs h 95% 0 obtained for the six glitching pulsars targeted with the long-duration transient search, as a function of signal duration τ (dashed curves with points representing the injection set). The four searches following glitches in PSR J0537-6910 are shown in separate panels for readability. Uncertainties from finite injection sample size are at the 3-5% level, comparable to or lower than amplitude uncertainty from detector calibration as discussed in Sec. 7. The indirect glitch excess energy ULs from Eq. (12), originally derived in Prix et al. (2011) and based on the estimated size of each glitch and distance to the pulsar as listed in Tables 3 and 4 Qualitatively similar ULs also hold for other glitch models such as crust-cracking starquakes (Middleditch et al. 2006); see again Prix et al. (2011) for details. For showing this indirect UL in Fig. 6, we have assumed a fiducial moment of inertia I = 10 38 kg m 2 and distances to each target as listed in Tables 3 and 4.
As can be see in Fig. 6, our observational ULs do not reach below the indirect energy constraint for any of the targets. While the sensitivity of O3 was improved over O2, none of the O3 targets had such a favorable combination of large frequency step size and small distance as the one from the Vela pulsar during O2 (Palfreyman 2016;Sarkissian et al. 2017;Palfreyman et al. 2018). Our closest result to the benchmark is for PSR J1105-6107, which had the second largest glitch of the selected targets in terms of ∆f gl /f , and is at a closer distance and more favorable frequency than the only other target with a larger glitch (PSR J1826-1334). For this glitch, the closest the τ -dependent ULs get to the indirect energy constraint is within a factor of 1.6. As discussed, these empirical ULs are valid specifically for GW signals following the simple CW-like model of Eq. (11), whereas many other models for GW emission following pulsar glitches are conceivable.

CONCLUSION
We have used data from the LIGO-Virgo observing run O3 to search for CW signals from eighteen known pulsars, allowing for a mismatch between the frequencies of EM and GW emission, and for long-duration CW-like transients from six glitching pulsars. All statistically significant outliers were associated with instrumental disturbances and vetoed, and so we set ULs on the GW strain amplitude from each target.
The narrowband CW results obtained here complement the more constraining limits obtained from perfectly targeted searches on the same O3 data (Abbott et al. 2021a), where the GW frequency was assumed to be at exactly once or twice the EM-measured rotation frequency. With respect to the narrowband search done with the first half of O3 (Abbott et al. 2020b), we have improved our ULs by a factor of ∼ 35% for PSR J0534+2200 and ∼ 10% for PSR J0711-6830 and PSR J0835-4510, using a parameter space ∼ 6.5, 1.2 and 7.1 larger for these three pulsars. For the remaining pulsars which were also searched for using the same method in O2, the limits presented here are a factor of 2-3 lower than those set in Abbott et al. (2019b). We surpass the spin-down limits for seven pulsars, including PSR J1105-6107 and PSR J1913+1011, for the first time using the narrowband method.
The best upper limit set across the whole frequency band is h 0 2.3 × 10 −26 for PSR J0711-6830 at roughly 364 Hz using the 5n-vector search. We also set the best limit on pulsar ellipticity with PSR J0711-6830 at 1.7 × 10 −8 . However, neither of these limits surpass the indirect spin-down limits for this pulsar, which are a factor of ∼ 2.2 lower than the upper limits for the search. For pulsars with h 95% 0 < h sd , the best h 95% 0 is h 0 3.5×10 −26 for PSR J1913+1011 and best ellipticity limit is 2.7 × 10 −5 for PSR J0534+2200. Our long-duration transient search results for signals following nine glitches in six pulsars significantly extend the target set for such searches over the two glitches from two pulsars previously covered with O2 data by Keitel et al. (2019), demonstrating the flexibility of the search method. We report two marginal outliers from the search for the last glitch of PSR J0537-6910 during O3, but do not consider them as significant detection candidates. We have set duration-dependent strain ULs for each post-glitch search and compared the results with the indirect glitch energy UL benchmark established in Prix et al. (2011), but not reached this benchmark for any of these targets, with PSR J1105-6107 coming closest to it.
Although we do not explicitly show systematic error or statistical uncertainty on upper limits presented in Figs. 1 and 6 or Tab. 2, their impact on these results is discussed here. Statistical uncertainty due to finite sampling when calculating limits is ∼ 3% for the Fstatistic CW search and 3-5% for the post-glitch transient search. For the 5n-vector search, uncertainty due to which part of the frequency band is sampled (i.e., typical distance from median UL across all 0.1 mHz subbands) is ∼ 2.4%. Meanwhile, systematic error in the calibration of the O3 strain data is quantified in Sun et al. (2020Sun et al. ( , 2021 for LIGO and Acernese et al. (2022a) for Virgo. The error depends upon the time at which data was collected and upon the frequency band of the measurement. The maximum estimated systematic error in absolute magnitude of the strain across all frequencies and time is 7% for the LIGO detectors in O3a, 11% for the LIGO detectors in O3b, and 5% for Virgo, though typically the error is smaller than these levels. Additionally, calibration systematics are not correlated between detectors, and so it is likely that the absolute error in our combined measurements would be less than these values.
We expect that with future upgrades of the LIGO-Virgo-KAGRA network (Abbott et al. 2020a) we can further improve GW results on the known pulsar population, and constrain emission scenarios and nuclear physics both for quiescent-state pulsars and those perturbed by glitches.

ACKNOWLEDGEMENTS
This material is based upon work supported by NSF's LIGO Laboratory which is a major facility fully funded by the National Science Foundation. The authors also gratefully acknowledge the support of the Science and Technology Facilities Council (STFC) of the United Kingdom, the Max-Planck-Society (MPS), and the State of Niedersachsen/Germany for support of the construction of Advanced LIGO and construction and operation of the GEO 600 detector. Additional support for Advanced LIGO was provided by the Australian Research Council. The authors gratefully acknowledge the Italian Istituto Nazionale di Fisica Nucleare (INFN), the French Centre National de la Recherche Scientifique (CNRS) and the Netherlands Organization for Scientific Research (NWO), for the construction and operation of the Virgo detector and the creation and support of the EGO consortium. The authors also gratefully acknowledge research support from these agencies as well as by the Council of Scientific  We would like to thank all of the essential workers who put their health at risk during the COVID-19 pandemic, without whom we would not have been able to complete this work.
We acknowledge that CHIME is located on the traditional, ancestral, and unceded territory of the Syilx/Okanagan people. We are grateful to the staff of the Dominion Radio Astrophysical Observatory, which is operated by the National Research Council of Canada. CHIME is funded by a grant from the Canada Foundation for Innovation (CFI) 2012 Leading Edge Fund (Project 31170) and by contributions from the provinces of British Columbia, Québec and Ontario. The CHIME/FRB Project, which enabled development in common with the CHIME/Pulsar instrument, is funded by a grant from the CFI 2015 Innovation Fund (Project 33213)  We acknowledge support from EPSRC/STFC fellowship (EP/T017325/1), ANID/FONDECYT grants 1171421 and 1211964, and NASA grants 80NSSC19K1444 and 80NSSC21K0091. This work is supported by NASA through the NICER mission and the Astrophysics Explorers Program and uses data and software provided by the High Energy Astrophysics Science Archive Research Center (HEASARC), which is a service of the Astrophysics Science Division at NASA/GSFC and High Energy Astrophysics Division of the Smithsonian Astrophysical Observatory.

A. DIFFERENCES IN PULSAR EPHEMERIDES USED FOR DIFFERENT PIPELINES
Due to variations in code infrastructure for the F-statistic and 5n-vector pipelines, there were some cases where we used ephemerides with different parameterizations to search for the same source. In general, the lalapps Weave code used for the F-statistic CW search can search up to the fourth derivative in frequency, and the automatic template placement will work best if the reference time at which the frequency and spin-downs are defined is during or close to the time at which we are conducting the search. Similarly, the transient search (also based on the F-statistic, but using the lalapps ComputeFstatistic v2 code) can search up to ... f . On the other hand, the 5n-vector pipeline has no such limitations on the number of frequency derivatives.
In several cases, which we document below, the EM ephemerides used for the 5n-vector search included higher-order frequency derivatives to improve the phenomenological fit and reduce the effects of timing noise. In those cases, we refit the ephemerides using at most a fourth frequency derivative for the F-statistic pipelines and we then used the simple heuristic from Eq. 11 of Prix & Itoh (2005) to verify that the effect of the increased timing residuals on our matched-filter analysis was less than the maximum mismatch used to place templates for the search. The affected pulsars were: • PSR J0534+2200: The Crab pulsar was originally fit with twelve frequency derivatives, with an average timing residual of TRES = 135.0 µs, and this ephemeris was used for the 5n-vector search. For both F-statistic-based pipelines we search with up to two frequency derivatives with an average timing residual of TRES = 482.382 µs.
• PSR J0835-4510: The Vela pulsar was originally fit with seven frequency derivatives, with an average timing residual of TRES = 137.1 µs, and this ephemeris was used for the 5n-vector search. The F-statistic CW pipeline ran with a search up to four frequency derivatives with an average timing residual of TRES = 133.2 µs.
• PSR J1809-1917: This pulsar was originally fit with five frequency derivatives, with an average timing residual of TRES = 1766.4 µs, and this ephemeris was used for the 5n-vector search. The F-statistic pipeline ran with a search up to four frequency derivatives with an average timing residual of TRES = 489.4 µs. In this case, it is likely that the timing residuals are smaller because the reference time used for the fits was moved to the middle of the observation time. In this appendix, we outline the details of the method used for estimating the distribution of the expected largest outlier for the frequency-domain F-statistic CW search. We then set a threshold corresponding to a p-value of 1% under the assumption that no signal is present in the data. We use the distromax method introduced by Tenorio et al. (2022), with some slight modifications due to the specific situation at hand.
We want to estimate the distribution of the maximum value of 2F for the full search for a single pulsar, under the assumption that no signal is present. To be conservative, we will exclude bands corresponding to known disturbances when estimating this distribution, and as such we will have outliers that exceed our threshold to follow up that are associated with the disturbances. However, this means that we will not set our threshold accounting for the disturbances, which would generally push them upwards and potentially miss a signal. The procedure is as follows: 5. Split the remaining 2F values into 10 4 random batches of size ∼ 2 × 10 3 . Find the maximum 2F in each of those batches, and fit a Gumbel distribution to that set of maxima.
6. Propagate that Gumbel distribution based on the number of batches (Tenorio et al. 2022), and the total number of templates (i.e., the number of templates used before we perform notching, since this is the number of templates used to perform the search) to obtain a distribution for the maximum of the full search for that individual pulsar.
What is left is a distribution on the maximum value of 2F under the assumption that no signal is present, and we can then integrate this distribution to find the value of 2F that corresponds to a p-value of 1%. Any of the 2 × 10 7 original templates with p <1% are then subject to first the vetoing procedure described in Sec. 4.3, and if they pass this procedure, then they are flagged for more extensive follow-up.
With the exception of one pulsar, PSR J1856+0245, the notching procedure above removes less than 40% of the frequency band. In the case of PSR J1856+0245, the known and unknown lines notching procedure removes the full frequency band, and so we do not search for CWs from this pulsar with the F-statistic pipeline.

B.2. Transient search
The basic approach for most targets in this analysis is the same as for the CW search, following the distromax method introduced by Tenorio et al. (2022): we fit a Gumbel distribution to the measured maxima of our detection statistic B tS/G over random subsets ("batchmaxes"). We then propagate the parameters of this distribution considering the full number of templates. However, we then set the threshold less deep in the tails of the distribution than in the CW search, namely at the mean plus one standard deviation of the propagated Gumbel distribution. This same threshold is used both for candidate identification and later in the ULs procedure. The results from the seven transient searches where we apply this method are generally relatively clean, as indicated by the batchmax histograms and goodness of the Gumbel distribution fits, and so we do not employ any notching of disturbances across the board. However, for glitches 7 and 8 of PSR J0537-6910 additional features appear in the batchmax histograms and degrade the Gumbel fit. In both cases, with three iterations of the automated notching procedure from Tenorio et al. (2022) we can produce clean batchmax histograms. For glitch 7, the threshold on B tS/G is quite robust under notching (changing from 11.3 to 11.0), and no outliers are recovered. On the other hand, for glitch 8, the threshold is lowered sufficiently (from 12.5 to 10.5) which reclassifies the largest values at two frequencies as marginal outliers. Follow-up of these outliers is detailed in Appendix D.
For PSR J1105-6107 and PSR J1826-1334 the number of templates is too small to obtain robust Gumbel fits with the distromax method. Instead, we use the off-sourcing approach : we indirectly estimate the background distribution by rerunning the full search 1000 times on different sky positions but with otherwise identical settings. This is feasible in these cases as the two searches are very cheap, less than a single core hour per run. By off-sourcing on the sky we sample combinations of the same data that are statistically independent from the templates used in the main search. The practical implementation here matches that of Tenorio et al. (2021b): we keep declination fixed and change only right ascension α, excluding a part of the sky closer than 0.5π to the pulsar to ensure statistical independence. We then fit a Gumbel distribution to the set of the highest B tS/G values from each off-sourced search and again set the threshold at the mean plus one standard deviation of this distribution.

C. FOLLOW-UP OF REMAINING OUTLIERS FOR CW SEARCHES
In this section, we discuss follow-up of outliers found when searching in the direction of PSR J1838-0655 and PSR J1813-1749 (before glitch) with the F-statistic. The same region of parameter space that produced outliers for PSR J1838-0655 in the F-statistic search, also produced outliers in the 5n-vector search.
The F-statistic results of PSR J1813-1749, searching before the potential glitch, show an outlier at 44.62443216 Hz, close to a known line at H1 (see Fig. 4). However, the outlier frequency is outside the nominal width of the line specified in Goetz et al. (2021), and is therefore unlikely to be caused by that specific artifact. On April 16, 2019, the frequencies of the calibration lines were changed to improve detector calibration which altered the character of the persistent narrowband artifacts seen in H1 data (Sun et al. 2020). The known line in Fig. 4 is an example of one such artifact. To test whether this candidate is caused by a separate, but similar artifact, we run the search from the start of the run until only April 16, 2019. In Fig. 7 we show that, zooming in on the frequency range around the outliers, quite a few templates have significantly higher values of 2F than the outliers do for the full run. This behavior is inconsistent with a CW signal, and so we veto these outliers.  In Fig. 3 we show that the outliers associated with PSR J1838-0655 are near an unknown disturbance at H1. The 2F values running only on H1 data are larger for all of these candidates than they are when running on only L1 data, despite L1 being more sensitive in this frequency band. To test whether this is reasonable for a signal, we use the 1454 software injections that were detected above the search threshold when calculating ULs. We plot the 2F calculated using L1 on the y-axis and 2F calculated using H1 on the x-axis of Fig. 8 for all injections (blue to yellow varied colors), and for the outliers (red). The dashed line indicates where these two values are equal. Only 2 out of 1454 injections cross the red line, meaning that if we veto any candidates to the right of the red line, we incur a false dismissal of 0.1%. Therefore, we veto this outlier in both the F-statistic and the 5n-vector searches. and τ ML correspond to the location of the max 2F value in the Fmn map. The posterior estimates (assuming flat priors, as for the B tS/G statistic) are within 1 hour of the ML values, with the exception of τ for outlier A which is about 7 hours longer than the ML value. The tail probability refers to the propagated Gumbel distribution following the distromax method with three notching iterations. Note-These values are all obtained from the Fmn maps evaluated at the λ parameters as listed for each outlier in Table 5 Table 5. Another template with B tS/G marginally above threshold is not offset by more than a single bin in any parameter from outlier A; we therefore do not follow that one up separately.
First, we checked for any evidence that the outliers may be caused by instrumental artifacts. There are no spectral artifacts of identified or unidentified origin listed in the relevant frequency band in Goetz et al. (2021) for LIGO nor Piccinni et al. (2021) for Virgo. Comparing the time-frequency tracks of the outliers with single-detector spectrograms (Weldon et al. 2019) also reveals no suspicious structures. As a multi-detector consistency check (Keitel et al. 2014;Leaci 2015), we recomputed the F mn maps at each outlier's λ points for each individual detector. The corresponding max 2F and B tS/G values and best-fit (t 0 , τ ) pairs are collected in Table 6. Virgo does not recover these candidates, as expected from its much lower sensitivity in this frequency range, and returns unrelated short-duration peaks instead. But the H1 and L1 results are fully consistent with each other and with the multi-detector result.
Other possible vetoes against an astrophysical origin include checking whether or not the detection statistics would increase when turning off Doppler modulation inside the F-statistic code ("DMoff"; Zhu et al. 2017), or searching at a different sky position (off-sourcing, Isi et al. 2020). With a DMoff re-run of the full search band, we recover neither of the outliers as a local maximum, and the overall maximum at log 10 B tS/G = 10.1143 is lower, meaning that the outliers pass this check. For off-sourcing, we perform 1000 analyses at different α (at least 0.5π away from PSR J0537-6910) over 5 mHz wide sub-bands of the original search. This does not turn out to veto the outliers either, as we do not find extended sets of similar or larger B tS/G values at many other sky positions and similar frequencies, as could be expected from some types of instrumental artifacts. However, we can once again fit a Gumbel distribution to the maxima of these off-sourced searches and propagate to the corresponding expected distribution over the whole search bank taking into account the missing trials factor of 25. This results in tail probabilities of 2% and 3% for the two outliers, which are lower than the 4% and 14% from distromax but still above the 1% threshold used by the two CW search pipelines.
We also followed up both outliers with four separate pipelines: (i) grid-less follow-up with PyFstat (Ashton & Prix 2018;Ashton et al. 2021;Keitel et al. 2021;Tenorio et al. 2021b); (ii) the F-statistic CW pipeline using lalapps Weave, as described in Sec. 4.3; (iii) the 5n-vector method as described in Sec. 4.2; and (iv) the targeted time-domain Bayesian method (Dupuis & Woan 2005;Pitkin et al. 2017), as also used for the targeted searches in Abbott et al. (2021a). In (i) we use the same implementation of the transient F-statistic as in the main search. But for (ii)-(iv), we treat the candidates as putative CW signals of fixed length, fixing the start time t 0 and end time t 0 + τ of these analyses based on the outlier parameters from Table 5.
The PyFstat package Keitel et al. 2021) contains an implementation of Markov-Chain Monte Carlo (MCMC) grid-less follow-up (Ashton & Prix 2018;Tenorio et al. 2021b) using the ptemcee sampler (Foreman-Mackey et al. 2013;Vousden et al. 2015) and the same F-statistic code in LALSuite (LIGO Scientific Collaboration 2021; Wette 2020). We do not employ this here as a veto or for significance estimation, but only to check that the candidates can be recovered independently of the original search grid setup, when putting Gaussian priors on the λ parameters around their previously recovered values, and now also allowing variations in sky position. These PyFstat analyses recover consistent max 2F values and locations.
The lalapps Weave pipeline also uses the same underlying F-statistic implementation. The main difference is to not search over different signal start times and durations, instead fixing start time t 0 and end time t 0 +τ and only searching over λ parameters. As expected, this follow-up recovers F-statistic peaks consistent with the outlier parameters.
With the 5n-vector method and a similar follow-up setup as for lalapps Weave, we again recover local peaks at the frequency of each outlier. But over the given data span, this follow-up also recovers other peaks with higher S-statistic values across the probed 0.45 Hz bands and spin-down ranges. Thus, it does not confirm the outliers as candidates of interest. For outlier A, there are 15 peaks with higher detection statistic (corresponding to 0.36% of the top level candidates in every 10 −4 Hz sub-band and marginalized over the spin-down region). For outlier B, there are 5 peaks with higher detection statistic (0.15% of the top level candidates in every 10 −4 Hz sub-band and spin-down region).
Finally, the targeted time-domain Bayesian method fixes the signal duration as well as the λ parameters, and performs nested sampling over the four amplitude parameters A. For both outliers, this returned Bayes factors of about 500 in favor of a coherent signal in all three detectors as opposed to noise or incoherent signals in the individual detectors. However, this fully targeted analysis does not provide any additional assessment of statistical significance. In addition, the estimated strain amplitudes h 0 from this follow-up, 4.5 +1.5 −1.4 ×10 −26 and 5.2 +1.7 −1.7 ×10 −26 at 95% credible intervals, are a factor 10 above the indirect glitch energy UL from Eq. 12 assuming a signal from PSR J0537-6910 and with the NICER-observed glitch size.