Relative Moment Tensor Inversion for Microseismicity: Application to Clustered Earthquakes in the Cascadia Forearc

The relative abundance of small earthquakes affords significant opportunities for improved understanding of regional seismotectonics; however, determining moment tensors for such events recorded on regional networks is complicated by low signal-to-noise ratios, sparse station sampling and complex wave propagation at short periods. We build upon previous work in designing a multiple-event, simultaneous mo-menttensorinversionschemeforsmallearthquakesthatemploysconstraintsfromP-wavepolarities,relative amplitudesofP-andS-wavesrecordedatcommonstations,andlocalmagnitudeestimates.Ourmethoddoes notrequire a priori knowledge of a reference moment tensor. High-fidelity polarity and relative amplitude data are recovered using principal component decomposition of clustered-event waveforms. These data are employed within a multi-stage iterativeframeworktoinvert formoment tensors and incorporatelocalmagni-tudeinformation. Syntheticexamplesemployingasfewasfourhigh-qualityandspatially-distributedstations yield accurate moment tensor estimates. We demonstrate our approach on a cluster of seismicity near San JuanIsland,Washington,USA,withintheCascadiaforearc. Ourresultsareconsistentwithpreviouscharacter-ization


Introduction
Moment tensors (MTs) characterize the size and radiation pattern of seismic events approximated as point sources.Accordingly, they offer fundamental information on source type (e.g., shear, implosion/explosion or hybrid mechanisms) and, in particular, the style (normal, reverse, strike-slip) of shear faulting that describes most earthquakes.When moment tensors from many events are available for a given region, they reveal the complexity of fault systems, magnitude-frequency statistics and the nature of the tectonic stress regime, and consequently are useful in seismic hazard assessment.
Moment tensors are routinely and automatically computed for larger events, usually M W ≥ 4 (e.g., Ekström et al., 2012).The predominant approaches are based on centroid moment tensors (CMTs) that involve the modelling of long-period (≥∼ 20s) waveforms using Green's functions that are typically computed for radially stratified Earth models.Events smaller than M W 4 generally possess low signal-to-noise ratios in the long-period band, and are poorly modelled using 1-D Earth models, a tendency that becomes increasingly pronounced with decreasing magnitude.This circumstance is unfortunate given that smaller events occur with greater frequency, such that accurate modelling of their moment tensors could provide improved characterization of fault systems and stress variability.The classic approach to source characterization of small events involves the recovery of the focal mechanism, a binary representation of the P-wave radiation pattern assuming a shear-faulting (double-couple) source (e.g., Shearer, 2009).First-motion P-polarities are identified for a given event across an array of stations encompassing diverse azimuth and take-off angles.Thereafter, two orthogonal planes bounding quadrants of positive and negative polarity are determined, often employing grid search methods.Although straightforward in principle, P-wave polarity assignment can be ambiguous and time-consuming in practice.Furthermore, the event-station geometry controls the uncertainties of the recovered focal mechanism and is frequently insufficient to constrain a unique solution.P-polarities are sometimes augmented with S-to-P amplitude ratios to provide further constraint on focal mechanisms.Hardebeck and Shearer (2003) note only rather modest improvements in focal mechanism recovery, likely due to large measurement uncertainties (see however Shelly et al. (2022)), and increased sensitivity to the typically poorly constrained P and S-velocity and attenuation models.
Consideration of multiple events in spatial proximity affords additional opportunities for constraints on moment tensors using relative P and S amplitudes (Dahm, 1996) that are analogous to those afforded by traveltime double differences in the determination of relative hypocenters (Waldhauser and Ellsworth, 2000).In particular, the use of a common mode normalizes propagation effects such that the relative amplitudes are sensitive primarily to the differences in source strength and orientation.Plourde and Bostock (2019) have extended Dahm's original approach by introducing a general treatment for inclusion of S-waves and employing principal component analysis in the measurement of precise relative amplitudes.Their approach has been applied to yield improved understanding of the mechanics of deep Wadati-Benioff seismicity in Alaska (Plourde and Bostock, 2019;Drolet et al., 2022).Other recent studies incorporating relative amplitude measurements include the work of Cheng et al. (2023) and Zhang (2023).
In this study, we present an improved version of the relative moment tensor inversion method of Plourde and Bostock (2019) tailored for small earthquakes, that incorporates P-wave first motion polarities and removes the requirement of a known reference MT.In addition, our method solves for relative or absolute magnitudes depending on availability of local magnitudes.In the following section, we provide an overview of the general relative moment tensor method and a detailed description of our implementation.We present synthetic examples to gauge the performance of our method, and apply it a dense cluster of seismicity in the northern Cascadia forearc.

Methodology
We begin this section by introducing the general assumptions underpinning body-wave relative moment tensor inversion and then follows with a detailed description of our implementation.

Assumptions
The usual assumptions made in developing relative moment tensor inversion using body waves are depicted in Fig. 1 and rationalized as follows: 1) Ray theory governs wave propagation, implying that direct arrivals can be traced back to the focal sphere.This information is utilized for take-off, azimuth, and incident angle calculations.2) The immediate source region representing the focal sphere is assumed to be homogeneous and isotropic, ensuring no perturbation in seismic radiation patterns due to heterogeneity or anisotropy.This as-sumption allows the use of standard far-field expressions for the radiation of direct body waves.3) The Green's function is the same for pairs or triples of events, requiring events to be clustered such that the event-station distance is much greater than the interevent distance.This requirement ensures the normalization of propagation effects (Green's function) in event comparisons.4) Direct P-and S-waves are assumed to be isolated from other arrivals within shortduration windows, to avoid contamination by multipathing.This assumption facilitates precise relative amplitude calculation and, for P-waves, allows for improved polarity determination.5) It is assumed that slip is unidirectional, and that the source time function is normalized to a common bandlimited signature for all events.This assumption involves low-pass filtering seismograms below their respective source corner frequencies, effectively removing the temporal source signature in event comparisons.

Relative Constraints
Under the assumptions laid out above, we can directly relate the relative amplitudes of P or S-waves of multiple events measured at a given receiver to their relative amplitudes on the focal sphere that surrounds their common source location (see Plourde and Bostock (2019), for more details).In particular, for P-waves with polarization constrained to a single direction (e.g., parallel to the ray for isotropic media), we can relate the displacement waveforms u i (x, t) measured at a receiver at x in direction i for a pair of events a, b with differing moment tensors as (1) while at the focal sphere, a similar relation holds where γ i represents the direction cosine (and so incorporates the take-off angle and azimuth) at the focal sphere, R ab is a geometric correction to compensate for the difference in source locations (necessarily close to 1) and e.g., M a ij is the moment tensor for source a.Each relative P-wave measurement thus provides a constraint on the moment tensors of two events.S-wave polarizations are constrained to lie within a plane and hence exhibit 2 degrees of freedom (vs 1 for P).Accordingly, we relate relative amplitude waves across triples of events c,d,e at the receiver as: while at the focal sphere, the relation is Note that (4) yields 2 independent constraints; hence each S-wave triple places 2 constraints on 3 moment tensors.
The relative amplitude coefficients A ab , B cde , B ced defined above can be accurately determined using principal component analysis as described in Plourde and Bostock (2019), and we expect the highest fidelity results for 3-component waveforms.Equations ( 2) and (4) represent linear constraints on the MT elements incorporated within a large simultaneous system.In the ideal case, where all possible pairs and triples are formed for N events and K stations, there are N (N − 1)K/2 possible P-wave constraints and 2(N (N − 1)(N − 2)K/6) possible S-wave constraints on 6N unknown MT elements.However, these equations alone create an under-determined linear system because the only information used is relative and so a) the signs of the MTs are not uniquely determined (i.e., the negative of a solution vector of MTs remains a solution vector), and b) the absolute scale (or equivalently, the absolute scalar moments) is not determined.Therefore, extra information is required to solve the system.This information can be supplied in the form of one or more a priori known, high-quality reference MTs (e.g., Plourde and Bostock, 2019;Drolet et al., 2022).However, accurate MTs are typically not available for small events and poorly constrained estimates may lead to biassed solutions (Dahm, 1993).Instead, we will incorporate P-polarities and local magnitudes to constrain the polarity of the radiation pattern and scaling of the seismic moments, respectively.

The Algorithm
In order to recover reliable MTs using relative MT inversions, we adopt the 9-step framework depicted in Fig. 2 and summarized below.

Step 1 -Waveform Processing
The first step is to collect all displacement waveforms, preferably with instrument response removed in the case that individual stations have experienced changes in equipment over time.Ideally waveforms should be recorded at 3-component stations for S-waves and at 1or 3-component stations for P-waves.One must then determine the optimal frequency band for waveform comparison over which signal-to-noise ratio is maximized.Morever, the chosen band must lie below the event corner frequency to remove details of rupture and so validate the point-source approximation (assumption 5).Waveforms with a signal-to-noise ratio below 1 are discarded.
Next, waveforms must be accurately aligned.This step is crucial since misalignment can significantly impact the relative amplitude measurement, and, for example, a half-cycle skip will yield an incorrect sign.For this purpose we employ a multi-channel alignment technique that extends the work of VanDecar and Crosson (1990) to S-waves generated by sources with variable MTs (Bostock et al., 2021), by exploiting principal component analysis.The alignment is systematically performed for all events recorded at a given station (or instrument, if accurate response removal is not achieved) for a given (P or S) mode.Only waveforms with accurate alignment conforming to the rank 1 (P) or rank 2 (S) principal component description are accepted for the next stage.When this alignment technique is applied to a large number of events, it is generally straightforward to visually discern and correct misaligned events.

Step 2 -Relative Amplitude Measurement
Only pairs or triplets of events in sufficiently close proximity, and so sharing a similar ray path, will have their relative amplitude computed.We define a shared path value (ψ) between events a and b as: where e.g., x a is the source-to-receiver vector for event a. Event comparisons with ψ values below a default threshold of 0.6 are discarded.
To calculate relative amplitudes, we exploit principal components which mitigate the effect of noise by comparing only the common signal between pairs or triplets of waveforms.Waveforms are bandpass-filtered to the highest optimal frequency band found in the previous step within the pair or triplet.Details on how to calculate the relative amplitude are found in Plourde and Bostock (2019).The misfit between the reconstructed waveform using the relative amplitude and the original waveform is calculated and assigned as a quality value.Relative amplitudes associated with high misfits (default greater than 0.6) are discarded.Furthermore, misfit values are used to weight the confidence of the relative amplitudes in step 6.
Use of 3-component waveforms for determining Pwave relative amplitudes is recommended, as they enable minor rotation adjustment of one waveform a pair into the orientation of the other, thereby improving measurement accuracy.However, when events are extremely close to one another compared to the eventstation distance, the improvement is negligible; thus, it is possible to use single-component stations.Note that 3-component recordings are still more relevant for Swave relative amplitudes due to the two degrees of freedom in their polarizations.
While it is possible to employ either P-or S-wave relative amplitudes alone within relative MT inversion, it is preferable to use both modes when the data permit.In particular, in environments characterized by high levels of random noise affecting both P-and S-wave phases equally, the algorithm demonstrates superior performance when both phases are included.

Step 3 -P Polarity Measurement
As in traditional focal mechanism determination, the acquisition of P-wave first motion polarity presents challenges for small regional events.To improve polarity estimation, we incorporate it within the principal component analysis.We measure on the Z-component as it typically exhibits the most pronounced first motion, although, in principle, the E-or N-components can also be employed.For each station, a waveform stack (X) is created by weighting the waveform of each event i by its first principal component expansion coefficient Y i1 as follows, using the singular value decomposition: Here the matrix W in (6) contains the filtered, amplitude-normalized and aligned waveform section, whereas Ŵki in ( 8) is its unfiltered (or filtered to as wide a band as signal levels allow) counterpart.Each column represents a different event i, whereas the rows represent discrete time samples k.We employ a band-pass (minimum-phase) Bessel filter that reduces phase shift relative to e.g., the corresponding Butterworth filter.The expansion coefficient Y i1 implicitly includes the sign of the waveform for event i such that the weighted waveforms on the right-hand side of (8) constructively interfere, yielding a high SNR stacked trace X k , whereas the inclusion of unfiltered waveforms reduces potential ambiguity in polarity assignment introduced by band limitation.The user then only has to visually assign the first motion polarity of the stacked trace X k to uniquely define the polarities of all individual events i that follow from their expansion coefficients, i.e. (9) Step 4 -Event Cull After these processing steps, some events may not have sufficient constraints to fulfill our quality control criteria and are, therefore, removed.More specifically, we require each event to have measured relative amplitudes for at least four different stations that represent independent source-receiver geometries.As a rule of thumb, we require the minimum of stations to equal the number of independent MT elements to be resolved (4 for DC, 5 for isotropic and 6 for full MT) when noise levels are low.
Step 5 -Geometrical Factors Our linear equations for relative amplitudes (2,4) require the direction cosine and, if desired, geometric corrections for source separation, which are obtained using the event and station locations assuming a velocity model.The directions cosines specify the ray exit points at the focal sphere for a given velocity model under our assumption 1.

Step 6 -Linear System Assembly
We construct a linear system incorporating equations ( 2) and ( 4) for all relevant event combinations.As the number of constraints increases rapidly with the number of events (O(N 2 ) for P, O(N 3 ) for S), but the constraints only involve 2 or 3 MTs per equation, we employ sparse matrices to store information efficiently.The resultant system is represented as Am = b, where matrix A incorporates misfit weighted equations pertaining to both relative and, if available, MTs for reference events.Vector b consists primarily of zero entries (with the exception of equations linked to the reference event(s) if available), and vector m encompasses all independent moment tensor elements.The length of vector m depends on whether the recovered moment tensors are to be full (i.e.include isotropic component) or deviatoric, resulting in dimensions of either 6 or 5 times the number of events N , respectively.A comprehensive description of the system construction methodology can be found in Plourde and Bostock (2019).

Step 7 -System conditioning
Large linear systems are difficult to solve accurately if not properly conditioned.Therefore, we perform scaling of A and b, to reduce numerical error.We scale our elements by events and by equations.In case one or multiple reference events are included within the system, we apply magnitude scaling for the reference event equations.The event and, as relevant, reference event scaling factors are retained for reversing scaling of the MT elements after solution of the linear system.
Step 8 -System Solution Different strategies are employed to solve the linear system dependent on the nature of the MT polarity (i.e.reference MT vs. P polarities) and model (DC vs. deviatoric / full) constraints.There are two cases that require attention for solution.First, the absence or inclusion of reference event(s) will dictate if the linear system is under-determined or over-determined.Secondly, if DC solutions are desired, a non-linear constraint is added to the system, and a gradient descent optimization approach is adopted.Our solution algorithm is structured into four distinct stages to accommodate the different cases.A summary of these stages is presented in Fig. 3, while detailed explanations of each stage are provided in subsequent subsections.For the present, we do not solve for tensile MTs without CLVD component, but note that this could be accomplished by modifying stage 2 with the projected Newton descent technique used by Plourde and Bostock (2019).

Step 8 -Stage 1: Model Initialization
At this stage, we introduce an initial moment tensor guess solution (m ∅ ) for our linear system, which plays a pivotal role when employing gradient-descent optimization as required to recover DC MTs.A good initial guess will significantly improve the convergence of our algorithm and so diminish computation time.An intuitive way of securing an m ∅ would be to solve the linear system (Am ∅ = b).In instances where one or more reference events are included, this approach is possible because the system Am ∅ = b is over-determined.
However, when reference events are not incorporated, we encounter an under-determined linear system.Accordingly, we temporarily augment the linear system by appending an equation: 1000x T m ∅ = m k = 1000, where x represents a unit column vector.The new equation forces the kth element of m ∅ to a value of 1 and therefore fixes the sign of all MTs.The new system is It is important to note that this procedure does not guarantee the correct polarity of the solution m k ∅ as reflected by the measured P polarities; thus, we perform a check.If less than 50% of the P polarities agree with those predicted by m k ∅ ; the sign of . We further recommend discarding m k ∅ when the P polarity agreement is less than 60% to avoid polarity reversals during the iterative solution.In addition, low polarity agreement may signify that the corresponding element m k k is either near zero or that some polarity information is incorrect.This manner of initial guess selection should force m ∅ closer to the eventual desired solution than a purely random selection.Consequently, we resolve the linear system A k m k ∅ = b k for each k, always requiring that m k ∅ satisfy at least 60% of the P polarities constraints, thereby generating M N m ∅ initial solutions (where M = 5, 6 for deviatoric and full moment tensors, respectively, and N is the number of events) to be used in subsequent stages.Note that in the further steps, the augmented system is not used.Step 8 -Stage 2: Iterative Process At this stage, we determine whether an iterative approach (gradient descent) is necessary for the inversion process.In particular, if the desired MTs are full or deviatoric, an iterative approach is unnecessary, and m is set equal to m ∅ .Conversely, if DC MTs are desired, we employ a gradient descent approach to retrieve the optimal m.To do so, we have to define a non-linear objective function satisfying the relative amplitude and the DC constraints.
DC solution implies that a MT is characterized by only four independent elements, in contrast to the six independent elements that must be specified for a full MT.One degree of freedom is lost by requiring an nonisotropic source (no volume change), which is enforced mathematically by setting the MT trace to 0. This condition is addressed during the construction of the relevant matrices (step 6).Another degree of freedom is lost by prohibiting a compensated-linear-vector-dipole (CLVD) component, which is achieved by forcing one of the MT eigenvalues to 0.
To effectively apply this latter constraint, we employ the star norm, also referred to as the nuclear norm.The star norm is defined as the sum of the singular values, mathematically represented as: Here, σ i is the ith singular value of M. For a square symmetric matrix such as M, the singular values correspond to the absolute values of the eigenvalues.Since σ 1 ≥ σ 2 ≥ σ 3 and σ 1 = σ 2 for a DC source, the star norm is minimized when σ 3 = 0.In the more general case of our linear system, we wish to minimize the sum of the star norm of the MTs for all events.Accordingly, we define a modified solution for our objective function as the sum of a quadratic misfit term and a penalty term that enforces DC MTs for all sources: where α is a tradeoff parameter that weighs the importance of the DC constraint over the relative constraints.By default, the value is set to 10.
During the gradient descent, m is updated as m l+1 = m l − µ δm at each iteration l with its gradient δm calculated with respect to the objective function f (m) and where µ quantifies the step size in the gradient direction.The gradient is calculated using the autograd function from the python package PyTorch (Paszke et al., 2019).To improve convergence, the step size µ varies through the iterations.For each iteration, µ is divided by two until the penalty term, computed from m l+1 = m l − µ δm, decreases.If this criterion is met on the first try, µ is increased by 1.5 for the next iteration.
The iteration process stops when one of the following conditions is met: 1) the DC solution is achieved within an error margin, 2) the step size µ becomes too small, 3) a maximum number of iterations is reached or 4) the improvement for the past 200 iterations is very small.Upon completion, the solution is converted to the closest perfect DC solution.In the case of condition 3), the solution can not be trusted and thus is discarded.This situation is generally due to noisy data rendering the system ill-conditioned.
Note that we prefer the star norm over the Langrangier multiplier method used by Plourde and Bostock (2019) to set the determinant of each event to 0. Our technique is faster and can be used when no reference events are given (when A is under-determined).As a final remark, when solving for DC solutions using reference events, all reference events should be supplied as DC; otherwise, instabilities may arise.
This stage is applied to each m k ∅ , and is easily performed in parallel.

Step 8 -Stage 3: Final Solution
In the presence of a reference event(s) only one solution, the final solution, makes it to this stage.However, in the absence of a reference event, multiple solutions are evaluated corresponding to the different initial guesses m k ∅ .Most solutions are local minima that characterize the presence of noise and, for DC solving, our non-linear optimization function.These solutions can be assessed by plotting their percentage of incorrect polarities vs. the relative constraint misfit ||Am k ∅ ||, the two values we are trying to minimize.The optimal solution should lie near the lower left corner of this plot, where both the number of incorrect polarities and the misfit are near zero.The solution with the lowest relative constraint misfit (||Am||) within 95-percentile solutions with the lowest percentage of incorrect polarities becomes our final solution.Specification of the misfit percentile may vary depending on the number of events, so the lowest misfit solution is chosen among a range of solutions.

Step 8 -Stage 4: Quality Control
This stage is only applied in the absence of reference event(s).Here, we identify stable event solutions by selecting a subset of possible solutions from our misfit vs. incorrect polarity plot.We select the most likely solution by taking the bottom left solutions delimited by 95-percentile (or chosen-percentile) solutions with the most polarity agreement and the 60-percentile solutions with the lowest misfit.Subsequently, the Kagan angles between these solutions and the final solution are calculated if the solutions are DC.The Kagan angle is a metric that quantifies the minimum angle necessary to rotate one DC focal mechanism into another orientation.Its numerical value spans a range from 0 • for absolute agreement between the two mechanisms, to 120 • signifying complete disagreement (Kagan, 1991).Typically, when the Kagan angle falls below 60 • , it is regarded as indicative of a good correspondence (Pondrelli et al., 2006).The Kagan angle standard deviation for each event serves as an uncertainty proxy, representing a measure of solution stability.Events are considered 'stable' if their standard deviation is below 20 • .Solutions with a standard deviation between 20-30 • are classified as 'likely', since there uncertainties are larger.Events with a standard deviation above 30 • are considered 'bad' and are discarded for further steps which improve the magnitude scaling accuracy.For non-DC solutions, we employ the cross-correlation value between the six independent elements of the normalized MTs as a proxy agreement between MT radiation patterns, instead of the Kagan angle.A value of 1 would represent the same radiation pattern, whereas a value of -1 would constitute the MT with opposite polarity.Standard deviations of cross-correlation values below 0.15 are considered 'stable', between 0.15 and 0.2 'likely' and above 0.2 'bad'.

Step 9 -Solution Scaling
The MTs must be rescaled after solution, by the inverse of the scaling factors applied in step 7.In the case of the use of a reference event(s), the recovered MTs are now properly referenced to their seismic moments M 0 .In the absence of reference event(s), all MTs values are relative, but by considering magnitude information can be rescaled to their (approximately) correct M 0 values.
In most cases, only local magnitudes will be provided.Local magnitudes (M L ) can be converted to moment magnitudes (M W ) using empirical relations (e.g., Shearer et al., 2006): Note that this conversion is based on California earthquakes, and more appropriate conversions may be derived for other regions.Moment magnitudes are then converted to seismic moments using the defining formula from Hanks and Kanamori (1979): When a catalogue of seismic magnitudes is available, a square system is assembled to recover optimal scalar moments for all events.The linear system comprises the constraints on M 0i for each event i from our relative MT inversion along with scalar moment estimates M CAT 0i based on catalogue magnitudes which need not be available for all N events.For example, the system for 4 events with catalogue magnitudes provided for events 1 and 2 is written as: This system ensures that relative scalar moments ratios determined from the relative MT solution are honored whereas local magnitude information supplies the optimal scaling.Since the matrix in ( 14) is square (i.e.not overdetermined), no weighting of the equations is necessary.Although other scalar moment reconstructions are possible (e.g., Hayward and Bostock, 2017), we have found through synthetic tests that the above reconstruction yields superior accuracy across the different schemes considered.

Synthetic Examples
To demonstrate the efficacy of our algorithm, we showcase its performance through synthetic examples.In this section, we concentrate on scenarios without reference events since scenarios with reference events have already been examined in Plourde and Bostock (2019) and Drolet et al. (2022).
Our examples comprise a cluster of 20 earthquakes randomly located within an area of 25km 2 with depths varying between 20 and 25km.The event area is enclosed within a 120km x 120km area that includes two station configurations.Four/six stations are randomly distributed within this larger area and employed to study DC/full MT recovery, respectively.The MTs for each event are randomly defined with logarithmically distributed magnitudes between Mw 1 and 3.For simplicity, we employ a 1-D homogeneous velocity model so that relevant angular quantities are easily defined.Synthetic amplitudes are contaminated by up to ±20% of random noise, and 10% of the polarities are reversed from their correct values and polarities close to nodal planes are removed.Only 10% of the total possible Pand S-wave relative amplitude information (76/114 Pand 456/684 S-comparisons for DC/full MT) is included within the data set to simulate a real-case scenario.The largest event magnitude is assumed known and used to scale MTs to their scalar moments.
Fig. 4 displays the MT inversion results for the setup described above for a typical realization.Since no reference event is employed, many candidate solutions are produced.The final solution is the solution with the lowest relative amplitude misfit within 80-percentile solutions with the lowest percentage of incorrect polarities.Note that this figure does not display the best synthetic results but, rather, more typical results demonstrating a typical range in uncertainty.The magnitudes and radiation patterns are in good agreement for both scenarios (DC/full MTs).
To demonstrate the robustness of our algorithm, we repeat the previous scenario 1,000 times with different random number seeds.Fig. 5 displays the results.Step 8.4 (Quality Control) was not applied to simplify comparison so that unstable events can be part of the final solution.Generally, a larger error in radiation pattern accompanies the largest error in scalar moment recovery.There is no obvious bias toward under/overestimating magnitude for DC solutions.However, a strong bias toward overestimating the moment magnitude is noticeable for full MT solutions.We have not found an explanation for this bias yet.Ninety percent of the median errors are below 16.7 • /0.88 for DC/full MT solutions, respectively, indicating good agreement.Al- most all median magnitude errors lie within ±0.1 for DC and full MTs.Figs S1 and S2 are similar to Fig. 5, but with 2 extra stations added and with 10 added events, respectively.As expected, increasing the number of stations significantly improves the accuracy of the recovered MTs.The spatial coverage also affects results: improved coverage yields more accurate solutions.Increasing the number of does not appear to significantly affect the quality of the results while the relative amplitude errors remain the same.

Real Case Example: San Juan Cluster
In this section, we apply our moment tensor inversion algorithm to one of the largest forearc seismicity concentrations in northern Cascadia.The earthquake cluster is located near the Canada-U.S. border along the northeast coast of the San Juan Islands Archipelago, Washington State (see Fig. 6), and we examine a suite of more than 700 crustal earthquakes (maximum depth 30km) with local magnitudes ≤ 3.2 recorded between January 2000 and December 2020 which were well recorded by regional stations.We further note that the largest event ever recorded within this region was an M L 3.4 earthquake on September 24, 1997.(U.S. Geo- The upper crust of the San Juan Island Archipelago (comprising San Juan, Orcas, Lopez and numerous smaller islands) is composed of a 5-10 km thick mid-Cretaceous sequence of nappes and thrust faults referred to as the San Juan-Cascade Thrust System.Brown (2012) suggests that the nappes were formed through initial accretion and subduction-zone metamorphism south of the current location, followed by uplift, exhumation, orogen-parallel northward transport as a forear sliver component, and ultimately obduction at the current site over the truncated southern end of Wrangellia and the Coast Plutonic Complex.However, the nature of the deeper crust beneath the nappes and within which the majority of the seismicity likely occurs, is unconstrained.Fig. 7 displays the event timeline and identifies the most pronounced sequences by their dates.Half of these sequences are associated with an initial mainshock, whereas the other half behave more like earthquake swarms (a group of events occurring closely in time and space without a single dominant shock).Swarms have often been interpreted to be caused by fluid (Vidale and Shearer, 2006) or magmatic activity (Hill, 1977).

Data Set and Processing
We employed the event catalogue from Merrill et al. ( 2022) and stations within 200 km of San Juan Island (Fig. 6) from diverse networks.Waveforms were downloaded from data centers of the Incorporated Research Institutions for Seismology (IRIS) Center and Natural Resources Canada.Phase picks for stations in Canada were obtained from Merrill et al. (2022) as a mix of manually and automatically detected phases.All picks were visually checked and revised as needed.Phases at U.S. stations were manually picked and represent approximately half of the arrival time catalogue.First motion P-wave polarities were primarily determined using the technique described in step 3, but some were also manually picked if a clear and consistent polarity was observed across 4 frequency bands ([1, 6],[2, 7],[3, 8],[0.5, 8]Hz).If inconsistency was noted between manual and automatic polarities, those measurements were discarded.

Earthquake Relocation
After augmenting the arrival-time phase catalogue, we relocated earthquakes utilizing the Double-Difference Hypocenter Locations (hypoDD) program developed by Waldhauser and Ellsworth (2000).For this relocation, we generated a 1-D velocity model as the average of the Merrill et al. (2022) 3-D velocity model centred on the cluster and extending out to a radius of approximately 50 km (see Fig. S3 for previous location).The 549 relocated earthquakes are displayed in Fig. 8.
HypoDD relocation places most of the earthquakes at depths between 10-23 km within reasonably tightly con- strained subclusters that also tend to be clustered in time.Very few events are located above 10km within the nappe structures described by Brown (2012).Further-more, the seismicity in aggregate displays a northward dip which was previously noted by Balfour et al. (2012), whereas the east-west profile displays some clusters as near-vertical planar structures.

Moment Tensor Inversion
We follow the steps outlined in section 2.3 using our default parameters.Take-off and azimuth angles were obtained from the hypoDD outputs.Missing values from the hypoDD file (especially for P-wave first motion) were calculated using our 1-D velocity model and the python package obspy.taup(Crotwell et al., 1999).
A total of 350 events passed all quality control steps and were admitted to the double-couple MT inversion.We selected a subset of 20 solutions based on the 95th percentile polarity and 60th-percentile of amplitude constraint misfits, to assess uncertainty in our final solution of 350 moment tensors as described in step 8.4 (Quality Control).Fig. 9 displays the solution distribution in relative residual versus polarity space.There are 171 stable events, 134 likely events and 45 bad events.Of the 305 (stable and likely) events retained for analysis, 23 events had a local magnitude (calculation based on S-waves) reported by the Geological Survey of

Stress Inversion
We cluster the individual MTs into 6 geographic groups using a K-means algorithm and perform stress inversions using the Vavryčuk (2014) method.Our 305 best (stable and likely) events, with their uncertainties, were employed in the inversion.Results are displayed in Fig 12.
In their study of the regional (southwest B.C. and northwest Washington) stress regime based on focal mechanism data, Balfour et al. (2011) showed that orientation of the maximum horizontal compressive stress (σ Hmax ) varies with distance to the trench: along the Vancouver Island coast, σ Hmax is approximately margin-normal due to stress accumulation along the locked subduction zone interface; whereas 100-150 km inland of the coast, σ Hmax is margin parallel due to the northward push of the clockwise rotation of Oregon block (McCaffrey et al., 2000).However, the local stress orientation recovered for the San Juan cluster (Balfour et al., 2011) deviates from the regional pattern in a counter-clockwise direction, with σ Hmax oriented northwest-southeast.In their study, Balfour et al. (2011) used 38 earthquakes from the San Juan Island cluster with depths ranging from 0.03 to 24.5 km that occurred between August 1980 to October 2009.Their data set is almost independent of that employed here (317 earthquakes), with only one shared event (2009/06/25).We note the close agreement (∼ 3 • ) between their recovered stress orientation and our average (in white) solution for the entire MT suite.

Discussion
The San Juan earthquake cluster represents a dense concentration of seismic activity apparently associated with a local perturbation to the regional stress regime (Balfour et al., 2011).It comprises spatiotemporally segregated subclusters that, in aggregate, outline a broadly north dipping fault network (Fig. 8C).The temporal sequences associated with the individual subclusters are typically not preceded by large magnitude foreshocks, but rather comprise persistent low-magnitude events.Moreover, several of the temporally confined subclusters appear to form near-vertical structures (see eastwest profile in Fig. 8B).These observations are consistent with swarm behavior, suggesting fluid migration as the underlying cause for seismicity (e.g., Vidale and Shearer, 2006).We interpret the source of fluids to be the underlying, dehydrating Juan de Fuca plate.
A range of independent observations supports this interpretation.The San Juan cluster represents one of the largest concentrations of forearc seismicity in the region, a part of Cascadia that has been imaged in a number of geophysical studies.The tomographic models of  2023).The stereonet shows the direction of all stress axes (σ 1 :compression; σ 2 : neutral; σ 3 :tension) with the same colour code as map.Savard et al. (2018) and Merrill et al. (2022) locate the San Juan cluster in a zone of low Vp/Vs whereas magnetotelluric work (Rippe et al., 2013;Wannamaker et al., 2014;Calvert et al., 2020) characterizes the crust here as highly conductive.Both observations favour an interpretation involving the presence of fluids.A correlation between seismicity and low Vp/Vs has been observed throughout the Puget Sound -Strait of Juan de Fuca region (Savard et al., 2018;Merrill et al., 2022) and may manifest, at least partially, the presence of silica precipitated by fluids rising from the slab (Ramachandran et al., 2006;Audet and Bürgmann, 2014;Bostock et al., 2019).Furthermore, the MOCHA magnetotelluric experiment that imaged the forearc/arc between southern Washington and northern California (Egbert et al., 2022) has revealed conductivity anomalies interpreted to represent a fluid concentration peak between 17.5 and 30 km depth controlled by crustal structure and metamorphic processes.This depth range matches that of the San Juan cluster (majority of the events between 13 and 23 km depth).
Another interesting correspondence involves the location of the San Juan cluster landward of a pronounced deep tectonic tremor concentration, as mapped by Armbruster et al. (2014) (see Fig. 13), marking the downdip limit of tremor beneath the eastern Haro Strait.As such, it represents a local example of the more general tendency for forearc seismicity and tremor to be anticorrelated in their respective epicentral (and depth) distributions in Cascadia (e.g., Kao et al., 2009;Boyarko and Brudzinski, 2010).Tremor is thought to occur where fluids are trapped at near-lithostatic pressures in the vicinity of the plate boundary (Audet et al., 2009;Rubinstein et al., 2008).Bostock et al. (2019) have proposed that the complementary epicentral distribu-tions of tremor and forearc crustal seismicity results from a landward breach of the permeability seal responsible for maintaining high fluid pressures in the tremor zone, plausibly due to i) eclogitization of oceanic crust which involves a ∼ 10% volume reduction, and/or ii) high strain associated with along-strike curvature in the subducting slab together with increased slab dip, that allows entry of fluids into the overlying crust and the promotion of fluid-driven swarm seismicity like that proposed here below San Juan Island.This interpretation is supported by the resistivity dip profiles across the Cascadia margin compiled by Wannamaker et al. (2014), indicating high conductivity zones emanating from 35-40 km depth near the downdip termination of tremor and evolving landward, in some instances as here, into crustal depths.We remark, however, that according to Fig. 7 there appears to be no obvious temporal correlation between the occurrence of tremor and seismicity below San Juan Island.If the general north-dipping geometry of the seismicity clusters evident between ∼ 13-23 km depth in Fig. 8c is extrapolated updip, it would surface in proximity to the trace of the Devils Mountain Fault Zone (DMFZ) or South Lopez-San Juan (SL-SJF).The DMFZ and its westward extension into the Leech River Fault (LRF) on southern Vancouver Island are believed to be currently active (Johnson et al., 2001;Morell et al., 2017), and potentially capable of generating M6-7 earthquakes.At the surface, the DFMZ is reported to dip at between 45 • to 75 • to the north (Johnson et al., 2001).It is therefore conceivable that the deeper reaches of seismicity present below San Juan Island (Fig. 8) are associated with this structure.However, the more limited, shallower (< 13 km depth) events do not follow this extrapolation but instead occur directly above the deeper seismicity suite and may indicate intersection of the fault network with steeper secondary structures extending to the surface.We further note that Lidar and bathymetry data (Greene and Barrie, 2022) reveal that the San Juan Island Archipelago, including San Juan Island in particular, are crisscrossed by multiple fault strands at the surface.Helium isotope ratios measured at hotsprings to the south in Puget Sound, where the Seattle Fault crosses Bainbridge Island, provide evidence that slabderived fluids do reach the surface along major crustal faults (McCrory et al., 2016).Additionally, on Vancouver Island near the transition from the DMFZ to the LRF, mineralization is interpreted as resulting from the distillation of volatiles and solutes from the Juan de Fuca plate (Fyfe et al., 1987).Moreover, Greene and Barrie (2022) have identified pockmarks and a mud volcano in the proximity of the DMFZ, providing evidence of fluid seepage in the region (see their Fig.10).However, Johnson et al. (2022) have analyzed methane emissions in gas plumes issuing from major fault traces on the southern Puget Sound seafloor and do not record an obvious thermal or chemical signature that would favour a deepseated slab origin.

Conclusions
We have modified the relative moment tensor inversion approach of Plourde and Bostock (2019) to incorporate constraints from P-wave first motion polarities, thereby resolving the moment-tensor polarity ambiguity that exists in the absence of available reference moment tensors.This adaptation is particularly important for studying small earthquakes, where conventional moment tensor algorithms based on waveform modelling may fail due to the complexities of high-frequency wave propagation.Moreover, we tackle the challenge of moment tensor scaling or, equivalently, scalar moment recovery, through a post-inversion step utilizing available local magnitude information.The efficacy of the algorithm is demonstrated through both synthetic examples and application to a dense concentration of seismicity beneath San Juan Island in Washington State, in the Cascadia forearc.
The principal stress directions obtained from the moment tensors recovered for the San Juan Island dataset exhibit little variability across subclusters and are consistent with findings from a previous, regional focalmechanism study conducted by Balfour et al. (2011), using an independent dataset.This agreement provides increased confidence in the observation that the stress regime local to the San Juan Island region is significantly perturbed from the regional trends defined in Balfour et al. (2011).Moreover, other characteristics of the seismicity, namely the temporal dependence of moment magnitudes, hypocentral distribution, and association with low Poisson's ratio and high conduc-tivity imaged in previous structural studies are consistent within an interpretation involving seismic swarms.More specifically, we attribute the San Juan seismicity to migration of fluids originating from the dehydration of the underlying subducting Juan de Fuca plate.These findings collectively enhance our understanding of earthquake processes in the Cascadia forearc.

Figure 1
Figure 1 Schematic diagram summarizing the assumptions required to apply the relative moment tensor inversion.

Figure 2
Figure 2 Summary of the relative moment tensor algorithm.Steps, inputs, and output are in yellow, green, and red boxes.Detailed descriptions of each step are given in section 2.3.

Figure 3
Figure 3 Summary of the four solving stages.Each of the stages is described in the main text.

Figure 4
Figure 4 Synthetic examples as described in the main text.A,B,C) show the event-station distribution.Red triangles are stations used for both inversions, whereas green triangles are only used for the full MT inversion.Moment tensors are plotted in lower and southern hemisphere representation for B and C) respectively and scale with their magnitude.The full MT solutions are displayed; the closest DC solutions are used for the DC inversion.The lower panel shows the DC MT (D) and full MT (E) solutions.The true solution is blue, whereas the red contours show the recovered solution.True magnitude is labelled in black and recovered magnitude in gray, below each MT.Dots are polarities used in the inversion.Filled/empty dots are associated with up/down first motion; dots highlighted in yellow are incorrect polarities used for the inversion.For the DC or full solution, the Kagan angle or cross-correlation between the true and the recovered solutions is indicated below the magnitude.

Figure 5
Figure 5 Synthetic results for 1,000 realizations DC MTs (left) and full MTs (right).Note that the normalized cross-correlation axis on the right figure has been flipped for better visual comparison.The magnitude difference is calculated as the truth minus the recovered moment magnitude.

Figure 6
Figure 6 Seismic stations used for MT inversion of local seismicity near San Juan Island.Blue, red and purple triangles are stations that contributed relative amplitudes, polarities, and both, respectively.Black dots are earthquakes from Merrill et al. (2020, 2022).The gray dashed line is the Canada-U.S. border.The purple lines are the Moho contour of the subducting Juan a Fuca Plate from Bloch et al. (2023).The red box defines the study area.

Figure 7
Figure 7 Temporal distribution of San Juan Island seismicity.Each vertical bar represents a seismic event; a darker region signifies a shorter recurrence interval.Middle panel in red displays events comprising the San Juan cluster; more pronounced sequences are identified by their starting date.Red dots are events with recovered magnitudes from the MT inversion (see axis at right for M W scale). Top/bottom panels in black/blue are the timelines for earthquakes/tremors surrounding the San Juan seismicity.The surrounding area is 9 times larger than the study region and is identified in Fig. 13. Background seismicity does not include events from the San Juan cluster.Tremors between 2003 and 2007 are high accuracy (waveform correlation) locations and times from Armbruster et al. (2014) whereas tremors after 2009 are from Pacific Northwest Seismic Network (PNSN) (envelope correlation) catalogue.Note that the deployment of the Polaris array (2002-02-26 to 2006-07-04) considerably increased the number of stations in the region and hence decreased earthquake detection thresholds over the period.

Figure 8
Figure 8 Earthquake distribution color-coded by year.A) Top view with a black cross indicating the cluster center.B) East-west cross-section centred on the cluster.C) Southnorth cross-section centred on the cluster.

Figure 9
Figure 9 Optimal solution selection.Blue dots represent all recovered solutions.Green dots are the subset used for the uncertainty assessment based on 95th-polarity and 65th-linear constraint percentiles.The red dot is our preferred final solution.Note that the x-axis is the residual calculated using the conditioned matrix A from step 7 and m is not scaled to absolute value.The inset is a histogram of the Kagan angle standard deviation showing the distribution of stable (green), likely (yellow) and poor (red) MTs.The final solution agrees with 83% (1624/1948) of the polarity measurements and possesses an amplitude constraint misfit of 1.82.Figs S4 to S13 show the individual MT solutions along with their numbers of constraints and uncertainties.Of the 305 (stable and likely) events retained for analysis, 23 events had a local magnitude (calculation based on S-waves) reported by the Geological Survey of

Figure 10
Figure 10 Stable and likely MTs colour-coded by depth with zoom-in sections.All MTs are shown in lower hemisphere representation.Map region shown on Fig. 6.

Figure 12
Figure 12Stress inversion results.Maximal horizontal compressive stress directions are plotted as colored arrows corresponding to the contributing event epicenters (colored dots) displayed on map of epicenters.Note that the green events represent the shallow group (< 12 km).The white arrows represent the average horizontal compressive stress using stable and likely MT solutions, and compare closely with that reported byBalfour et al. (2011)  plotted in black.Black epicenters represent earthquakes for which MTs were not retrieved.Purple lines are the 52 and 56 km oceanic Moho contours fromBloch et al. (2023).The stereonet shows the direction of all stress axes (σ 1 :compression; σ 2 : neutral; σ 3 :tension) with the same colour code as map.

Figure 13
Figure 13 Seismicity Map.The red dots are the earthquake part of the San Juan cluster.Blue dots are tremors between 2003 and 2007 relocated by Armbruster et al. (2014).Gray dots are other earthquakes detected in the area.The orange box identifies the area used for Fig. 7.The oceanic Moho from Bloch et al. (2023) is displayed in purple lines.The bold black lines are the Devil's Mountain Fault Zone (DMFZ) (Barrie and Greene, 2018), Skipjack Island Fault Zone (SJIFZ), the South Lopez-San Juan Fault (SLSJF) and the Leech River Fault (LRF).Solid black lines are other mapped faults in the region Greene et al. (2018); Greene and Barrie (2022).