Target Trial Emulation: Does surgical versus non-surgical management of cranial cruciate ligament rupture in dogs cause different outcomes?

Target trial emulation applies design principles from randomised controlled trials to the analysis of observational data for causal inference and is increasingly used within human epidemiology. Using anonymised veterinary clinical data from the VetCompass Programme, this study applied the target trial emulation framework to determine whether surgical (compared to non-surgical) management for cranial cruciate ligament (CCL) rupture in dogs causes improved short-and long-term lameness and analgesia outcomes. The emulated target trial included dogs diagnosed with CCL rupture between January 1, 2019 and December 31, 2019 within the VetCompass database. Inclusion in the emulated trial required dogs aged ≥ 1.5 and < 12 years, first diagnosed with unilateral CCL rupture during 2019 and with no prior history of contralateral ligament rupture or stifle surgery. Dogs were retrospectively observed to have surgical or non-surgical management. Informed from a directed acyclic graph derived from expert opinion, data on the following variables were collected: age, breed, bodyweight, neuter status, insurance status, non-orthopaedic comorbidities, orthopaedic comorbidities and veterinary group. Inverse probability of treatment weighting (IPTW) was used to adjust for confounding, with weights calculated based on a binary logistic regression exposure model. Censored dogs were accounted for in the IPTW analysis using inverse probability of censoring weighting (IPCW). The IPCWs were combined with IPTWs and used to weight each dog ’ s contribution to binary logistic regression outcome models. Standardized mean differences (SMD) examined the balance of covariate distribution between treatment groups. The emulated trial included 615 surgical CCL rupture cases and 200 non-surgical cases. The risk difference for short-term lameness in surgically managed cases (compared with non-surgically managed cases) was (cid:0) 25.7% (95% confidence interval (CI) (cid:0) 36.7% to (cid:0) 15.9%) and the risk difference for long-term lameness (cid:0) 31.7% (95% CI (cid:0) 37.9% to (cid:0) 18.1%). The study demonstrated the application of the target trial framework to veterinary observational data. The findings show that surgical management causes a reduction in short-and long-term lameness compared with non-surgical management in dogs.


Introduction
Veterinarians are increasingly encouraged to apply 'evidence based principles' in their clinical decision-making (Holmes and Cockcroft, 2004) but the paucity of relevant and reliable published evidence on veterinary interventions has limited the potential clinical welfare gains Abbreviations: CCL, cranial cruciate ligament; CI, confidence interval; DAG, directed acyclic graph; EPR, electronic patient record; IP, inverse probability; IPCW, inverse probability of censoring weighting; IPTW, inverse probability of treatment weighting; IQR, interquartile range, RCT, randomised controlled trial; RD, risk difference; RR, risk ratio; SMD, standardised mean difference; TPLO, tibial plateau levelling osteotomy; TTA, tibial tuberosity advancement.
for dogs (Dean, 2017).Randomised controlled trials (RCTs), along with their synthesis in the forms of systematic reviews and meta-analyses, are considered as the gold standard method for assessing the effectiveness of treatment interventions and are a valuable source of information on which to base clinical decisions (Balshem et al., 2011;Wareham et al., 2017).The primary advantage of RCTs is that randomisation of the exposure aims to ensure that the treatment effect is unconfounded because the groups (treatment and controls) should be balanced in both observed and unobserved confounders (Pfeiffer, 2010).However, randomisation only addresses confounding at baseline, whilst post-randomisation confounding can still occur.Additionally, RCTs often require large groups, can be costly and ethically challenging, the required duration can be long if time from intervention to outcome is prolonged and the eligibility criteria may result in trial participants not representing the wider population of interest (Pfeiffer, 2010).Therefore, observational data are increasingly recognised to represent a valuable alternative resource for information to estimate real-world causal effects, especially in the absence of available randomised experiments (or in complement), and might even reflect the "usual conditions" under which a treatment would be taken more accurately (Maringe et al., 2020).
Causal inference refers to an intellectual discipline that considers the assumptions, study designs, and estimation strategies that allow researchers to draw causal conclusions based on data (Hill and Stuart., 2015).In more simple terms, causal inference describes the process of drawing a conclusion that a specific treatment or exposure (i.e., intervention) was the "cause" of the effect (or outcome) that was observed (Frey, 2018).When analysing observational data to answer causal questions, an observational study can be conceptualised as a conditionally randomised experiment, given the measured covariates (Hernán and Robins, 2020).Causal inference from large observational databases can be viewed as an attempt to emulate the randomised experiment -the target experiment or target trial -that would answer the question of interest (Hernán and Robins, 2016).Hernán and Robins (2016) have described a framework for research into comparative effectiveness using large observational databases that aims to make the target trial explicit i.e. the "target trial emulation framework" (Hernán and Robins, 2016).In broad terms, the framework explicitly defines the "target trial" as the trial you would like to conduct if it were feasible and then describes how to emulate this target trial using observational data.Observational data are subject to various confounding, selection and information biases that can result in underestimation or overestimation of the effect of interest (Dohoo et al., 2009;Hammerton and Munafò, 2021).The target trial emulation, in conjunction with other statistical techniques, such as inverse probability of treatment weighting (IPTW) later defined in the methods, aim to address these possible biases.
The VetCompass database holds large volumes of anonymised veterinary clinical data with the potential for use to answer causal questions of interest under the no unobserved confounding assumption i.e., no unmeasured variables affecting both exposure and outcome.Cranial cruciate ligament (CCL) rupture is one of the most frequent specific cause of lameness in dogs (Johnson et al., 1994) and is clinically managed either surgically or non-surgically (Kirkness, 2020).Surgical management is considered the gold standard treatment of choice for CCL rupture (Fauron and Perry, 2017), with multiple surgical techniques available that aim to stabilise the affected stifle and return the dog to full function where possible (Kirkness, 2020).The range of surgical techniques can be broadly categorised into two groups based on their approach: osteotomy-based techniques and suture-based techniques (ACVS., 2021).Of the osteotomy-based techniques, tibial plateau levelling osteotomy (TPLO) and tibial tuberosity advancement (TTA) are the most commonly performed, whilst extra-capsular suture stabilisation (also known as "lateral suture") is the most common of the suture-based techniques (ACVS., 2021).
It is reported, based on limited evidence, that the clinical outcomes for dogs managed surgically are superior to dogs managed nonsurgically (Comerford et al., 2013;Wucherer et al., 2013), although reasonable functional outcomes for small breed dogs managed non-surgically have been reported (Pond and Campbell, 1972;Vasseur, 1984).However, outcomes in dogs across the range of sizes managed surgically versus non-surgically have not previously been directly compared.
Two previous retrospective studies evaluated non-surgical management alone for CCL rupture in dogs.A US-based study on non-surgically managed dogs (n = 85) attending a referral veterinary medical teaching hospital reported that 85.7% of dogs weighing ≤ 15 kg compared with 19.3% of dogs weighing > 15 kg were considered clinically normal or improved after an average follow-up of 3 years (Vasseur, 1984).An older UK-based study of non-surgically managed dogs (n = 107) reported that 90% of small breed dogs compared with 78% of large breed dogs had no detectable lameness reported by the owner; although the time to assessment was not clearly specified (Pond and Campbell, 1972).Inference from both studies is limited by relatively small sample size and neither study compared non-surgical to surgical management.Additionally, it is possible the dogs in these older studies are no longer fully representative of the current UK dog population.A more recent study evaluated short-term and long-term outcomes for overweight dogs with bodyweight > 20 kg treated surgically (n = 21) compared with non-surgically (n = 19) for CCL rupture.A successful outcome was defined as ground reaction force > 85% of the value for a clinically normal dog and owner questionnaire responses indicating an improvement of ≥ 10% in lameness and quality of life scores between pre-and post-intervention questionnaires.A successful outcome was reported in 68%, 93%, and 75% dogs managed surgically and 47%, 33%, and 64% of dogs managed non-surgically at 12, 24, and 52 weeks after enrolment in the study, respectively (Wucherer et al., 2013); thus suggesting improved outcomes in dogs managed surgically, albeit less marked at 52 weeks.However, the study included only 40 dogs and focused on overweight, larger breed dogs limiting inference to the overall canine population.
Using anonymised veterinary clinical data from the VetCompass Programme (VetCompass, 2019), the present study aimed to implement the target trial emulation framework to answer causal questions for veterinary first opinion observational data.More specifically, this study aimed to compare the effects of surgical management relative to non-surgical management on short-and long-term lameness outcome for CCL rupture in dogs.Duration of analgesia prescription was also assessed as a secondary outcome.Whilst an RCT might be possible to conduct for this specific research question, there would be ethical concerns in randomising dogs to either surgical or non-surgical management when surgery is still considered "gold standard" and a body size difference in outcome may exist (Comerford et al., 2013;Wucherer et al., 2013;Fauron and Perry, 2017).Likewise, other more general disadvantages of RCTs previously mentioned (cost, duration, representativeness etc) would apply in this setting (Pfeiffer, 2010).
Based on previous evidence (Pond and Campbell, 1972;Vasseur, 1984;Wucherer et al., 2013), it was hypothesized that surgical management causes a reduction in the presence of lameness at short-and long-term follow-up compared with non-surgical management.Since the study utilises observational data, target trial emulation and causal inference analyses were adopted to attenuate most biases.The degree to which explicit target trial emulation from human observational studies aligns with results from randomised controlled trials is not fully known, but the evidence is encouraging (Hernán et al., 2008;Cain et al., 2016;García-Albéniz et al., 2017;Labrecque and Swanson, 2017;Admon et al., 2019).It has been stated that the principles of target trial emulation should be adopted regardless, as they encourage known good practices (Labrecque and Swanson, 2017).To the authors' knowledge, this is one of the first studies to adopt the target trial emulation framework for veterinary observational data, whilst also the first to focus on estimating the treatment effect of surgical versus non-surgical C. Pegram et al. management of CCL rupture in dogs, and will provide veterinarians and owners with an evidence-base to guide their clinical decision making in relation to CCL rupture management in dogs.

Data source and power calculation
VetCompass collates de-identified electronic patient record (EPR) data from primary-care veterinary practices in the UK for epidemiological research (VetCompass, 2019).The CCL study population included all available dogs under primary veterinary care at clinics participating in the VetCompass Programme during 2019.Dogs under veterinary care were defined as those with at least one EPR (free-text clinical note, treatment or bodyweight) recorded during 2019.Available data fields included a unique animal identifier along with species, breed, date of birth, sex, neuter status, insurance status and bodyweight, and also clinical information from free-form text clinical notes, summary diagnosis terms (The VeNom Coding Group., 2019) and treatment with relevant dates.VetCompass operates under an 'opt-out' client consent process for data-sharing, therefore data from dogs where owners have opted out are not included (VetCompass., 2023).
The study design applied target trial emulation (Hernán and Robins, 2016).Sample size calculations estimated that approximately 224 surgical cases and 54 non-surgical cases were required, at a minimum, assuming 30% of surgical cases and 50% non-surgical cases had long term lameness, 80% power, 95% confidence and a ratio of surgical to non-surgical cases of 4:1 (Vasseur, 1984;Christopher et al., 2013;Taylor-Brown et al., 2015), using ClinCalc Sample Size Calculator (www .clincalc.com).A 4:1 ratio was estimated, based on a previous primary-care study that reported 69 -78% dogs aged < 12 years with CCL rupture were managed surgically (Taylor-Brown et al., 2015).Ethics approval was obtained from the RVC Social Science Ethical Review Board (reference number SR2018-1652).

Case definition, finding and covariates
Incident cases were included in the study, with these cases defined as dogs that were first diagnosed with CCL rupture in either stifle between January 1, 2019 and December 31, 2019.Candidate cases were identified by applying search terms relevant to the diagnosis and management of CCL rupture in the clinical notes (acl, ccl, cranial draw*, cruciate rupture, cruciate ligament, tta, tplo, lateral sut*, extracapsular sut*) (Taylor-Brown et al., 2015).The search findings were merged, and a subset of candidate cases, randomly presented through the online database using the RAND function in SQL Server (Microsoft Learn., 2022), had their clinical notes examined manually in detail to identify whether they met the case definition (Pegram et al., 2023).For dogs that met the case definition, demographic data were extracted automatically from the VetCompass database, with further data relating to clinical management extracted manually from the EPR (Table 1).
Based on existing evidence and expert opinion, a directed acyclic graph (DAG) was constructed using DAGitty version 3.0 (Textor et al., 2016), Fig. 1, that encapsulates beliefs about the causal relationships relevant to the question of interest.The DAG was used to identify which variables should be controlled for (Greenland et al., 1999), therefore data on the following variables were collected: age, breed, bodyweight, overweight status, neuter status, insurance status, non-orthopaedic comorbidities, orthopaedic comorbidities and veterinary group (Fig. 1).Long-term lameness was evaluated as a primary outcome, with short-term lameness and analgesia prescription secondary outcomes (with the same covariate set believed to apply, based on existing evidence and expert opinion, for analgesia prescription and short-term lameness as for long-term lameness).

Bodyweight
The median of all bodyweight (kg) values recorded for each dog after reaching 18 months old.
Overweight status Overweight status required information recorded within the electronic patient record indicating that the dog was obese or overweight within the year prior to CCL rupture diagnosis (Rolph et al., 2014;Pegram et al., 2021).

Neuter status
The final available electronic patient record neuter value.

Insurance status
Status at the final available electronic patient record.

Veterinary group
The practice groups involved in the study.

Orthopaedic comorbidities at diagnosis
Orthopaedic comorbidity as diagnosed by a veterinary surgeon at or within one month prior to CCL rupture diagnosis.
Included as a distinct disorder if diagnosed in at least 10 dogs managed either surgically or non-surgically, otherwise recorded as "other comorbid disorders".

Non-orthopaedic comorbidities at diagnosis
Presence of non-orthopaedic comorbidities diagnosed at or within one month prior to CCL rupture diagnosis.

Lameness
Binary outcome (presence or absence).Short-term lameness was defined as evidence of lameness in the limb diagnosed with CCL rupture reported in the electronic patient record at 6 weeks -4.5 months after diagnosis of CCL rupture.Long-term lameness was defined as evidence of lameness in the limb diagnosed with CCL rupture reported in the electronic patient record at 7.5 months -16.5 months after diagnosis.

Analgesia
Final prescription date for analgesia dispensed specifically for CCL rupture was extracted and prescription at 3-, 6-and 12months after diagnosis recorded as a binary outcome.
C. Pegram et al.

Target Trial specification and emulation
A target trial of interest was specified and emulated using EPR data (Danaei et al., 2013;Hernán andRobins, 2016, 2020).The protocols of the target trial, and the trial emulation, are summarised in Table 2.

Descriptive analysis
Demographic data were described.Date of follow-up for descriptive analysis was date of final available EPR. Continuous variables were assessed graphically for their distribution and summarised using median, interquartile range (IQR) and range if non-normally distributed.Chi-square test was used to compare categorical variables and the Student's t-test or Mann-Whitney U test for univariable comparison of continuous variables between management groups as appropriate (Kirkwood., 2003).

Statistical analysis of the emulated trial
Lameness and prescription of analgesia at follow-up were compared between dogs managed surgically and dogs managed non-surgically for CCL rupture.For the analysis to be causal, the assumptions of consistency, no interference, no unobserved confounding and positivity should hold (Hernán and Robins, 2020).The consistency assumption implies that a dog's potential outcome under their observed exposure history is the outcome that will actually be observed for that dog (Rehkopf et al., 2016) i.e. the values of treatment under comparison correspond to well-defined interventions that correspond to the versions of treatment in the data (Hernán and Robins, 2020).No interference refers to the assumption that the potential outcomes of one dog are unaffected by the treatment assignment of other dogs (Hudgens and Halloran, 2008).Positivity refers to the assumption that the probability of receiving each treatment conditional on measured covariates is greater than zero (Hernán and Robins, 2020).Positivity violations occur when certain subgroups (defined by a combination of covariates) in a sample rarely or never receive some treatments of interest.Positivity violations can be diagnosed through basic descriptive analyses, examination of the distribution of IP-weights and standardised mean differences (discussed in more detail below) (Petersen et al., 2012).
To emulate randomisation at the baseline point of diagnosis with CCL rupture, the following variables were considered sufficient to control for confounding: age, breed, bodyweight, insurance status, neuter status, orthopaedic comorbidities at diagnosis, non-orthopaedic comorbidities at diagnosis, overweight status and veterinary group attended (as defined and categorised in Table 1 and based on the DAG in Fig. 1).Therefore, confounder selection was DAG-driven rather than data-driven.Although many methods for covariate selection exist, DAGs explicitly consider the role of each variable in relation to the exposure and outcome and demonstrate knowledge, theories and assumptions about the causal relationship between variables in a simple and transparent way (Tennant et al., 2021).
Inverse probability of treatment weighting (IPTW) was used to adjust for confounding.IPTW is a propensity-score based method, with the propensity-score defined as the probability of treatment assignment conditional on observed baseline characteristics (Austin, 2011).For IPTW, a pseudo-population is created by weighting each individual in the population by the inverse of the conditional probability of receiving the treatment level they indeed received (Hernán and Robins, 2020).For example, if an individual dog managed surgically received an IPTW of 4 (reflecting that given their measured covariates, they were not very likely to receive surgery), their outcome would be "up-weighted" in the analysis.So the pseudo-population would include three copies (four appearances in total) of this individual dog in the surgical management group.The goal is to balance covariates between the two treatment groups (Barter, 2017).
To derive the weights using IPTW, a logistic regression model was first fitted, with treatment (surgical versus non-surgical) as the outcome regressed on the main term confounding variables described above (Table 1).Biologically plausible interaction terms were added to the model and their effect on the Akaike information criterion (AIC) and standardised mean differences (SMDs) were assessed for inclusion, with interaction terms included if they improved covariate balance (Austin and Stuart, 2015).SMD examines the balance of covariate distribution between treatment groups (Zhang et al., 2019).For each covariate, SMD between pre-and post-IPTW were calculated, with SMD < 0.1 indicating good covariate balance between the two treatment arms (Austin, 2009;Yang and Dalton, 2014).The linearity assumption was assessed by visually inspecting the scatter plot between the continuous predictor (age) and the logit values (Osborne and Waters, 2002).The model generated predicted probabilities of receiving either treatment for each dog, which were then used to calculate stabilised inverse probability (IP) weights (Xu et al., 2010).When a treated patient has an extremely low or high propensity score, an extreme weight is created.Extreme weights can increase the variability of the estimated treatment effect, leading to potentially biased results.The use of stabilised weights was achieved by replacing the numerator (which is 1 in the unstabilised weights) with the crude probability of exposure (i.e.given by the propensity score model without covariates) (Allan et al., 2020;Chesnaye et al., 2022).Censored dogs were accounted for in the IPTW analysis using inverse probability of censoring weighting (IPCW).IPCW was first developed in the 1990 s by Robins et al. (Robins and Finkelstein, 2000) and aims to reduce bias introduced by informative censoring (Jiménez-Moro and Gómez, 2014), with the assumption of exchangeability and correct model specification (Howe et al., 2011).IPCW compensates for censored subjects by giving more weight to subjects with similar characteristics who are not censored (Dong et al., 2020).To perform IPCW, a binary logistic regression model at each follow-up time point (short-and long-term lameness and analgesia prescription at 3-, 6-and 12-months) was fitted, with censor as the outcome regressed on treatment and the confounding variables described.The model generated predicted  probabilities of being censored, which were used to calculate IP of censoring weights.Thus, separate censor models were created at each fixed time point.Different censoring models can be built according to the different reasons for censoring e.g., missing outcome data due to death/administrative censoring and censoring due to protocol deviation (Hernán and Robins, 2020).Due to the data structure in the current study, these two processes were combined, thus the resulting effect estimate is the probability of the outcome given that the dog remains adherent to treatment assigned during the grace period and remains alive or under veterinary care at the 3-, 6-and 12-month time points.These IP of censoring weights were combined (by multiplication) with the stabilised IP weights generated from IPTW and used to weight each dog's contribution to binary logistic regression outcome models (for the presence or absence of lameness at short-and long-term follow up and the presence or absence of analgesia prescription at 3-, 6-and 12 months) (Robins and Finkelstein, 2000;Robins et al., 2000;Hernán et al., 2001;Jiménez-Moro and Gómez, 2014;Hernán and Robins, 2020).Separate models were created for each outcome (short-term lameness, long-term lameness and analgesia prescription at 3-, 6-and 12-months), based on the weighted data and regressed on treatment.The robust (or sandwich) variance estimator was used to obtain valid standard errors (Zou, 2004).Effect modification was assessed by adding biologically plausible interaction terms to the outcome model (namely an interaction between management type and each of the following in turn: bodyweight, breed, age, overweight status, insurance status, orthopaedic comorbidities and non-orthopaedic comorbidities) and evaluating their effect on the confidence intervals and AIC.
Following assessment of the DAG, evaluation of the distribution of combined IP treatment and censoring weights, the effect of interaction terms in the propensity-score model and assessment of effect modification in the outcome model, the SMDs were used as a means of model evaluation.The SMDs compare the distribution of measured baseline covariates between treated and untreated subjects, assessing whether the propensity score model has been adequately specified (Austin, 2011;Zhang et al., 2019).
The variables for which corresponding SMDs were >0.1 were added to outcome models as independent variables.To ensure that the estimand was marginal, these outcome models were used to predict potential outcomes under both surgical management and non-surgical management, and the mean differences calculated (a weighted gcomputation).
There may be some confounders in a study that are unknown or not measured and hence unobserved.Sensitivity analysis can examine the extent to which results are affected by values of unmeasured variables (Thabane et al., 2013).E-values were computed in the current study, with an E-value defined as the minimum strength of association, on the risk ratio scale, that an unmeasured confounder would need to have with both the treatment and outcome, conditional on the measured covariates, to fully explain away a specific treatment-outcome association.This technique does not require specification of the prevalence of unmeasured confounders and does not make assumptions about their nature (VanderWeele and Ding, 2017).An online e-value calculator has been developed, in which the following are specified prior to calculation: outcome type, estimate type, number of diseased and non-diseased exposed dogs, number of diseased and non-diseased unexposed dogs, alpha level for confidence interval and the true causal effect to which to shift the estimate.The number of diseased and non-diseased (exposed and unexposed) dogs was calculated using the adjusted risks calculated after IPTW (VanderWeele and Ding, 2017;Mathur et al., 2018).
Missing data were handled using the missing-indicator method, which uses a dummy variable in the statistical model to indicate whether the value for that variable is missing (Burton and Altman, 2004;Miettinen, 2012).The missing-indicator method assumes that the confounder variable is only a confounder (simultaneously associated with treatment and outcome) when observed, and not when missing.Additionally, we assume that there is no interaction between the missing indicator and the fully observed confounder in the true propensity score.The missing-indicator method is biased under standard missing at random assumptions, however in the propensity score context, the missing-indicator method can be used in a principled way (Blake et al., 2020).Data were checked for internal validity i.e., to check for consistency between variables available within the study and cleaned in Excel (Microsoft Office Excel 2013, Microsoft Corp.), with analyses conducted using R version 4.0.2(R Core Team, Vienna, Austria).The "IPW" package was used to generate IP weights (and validated manually) (van der Wal and Geskus, 2011), with code for IPCW derived from Hernán and Robins (2020).The "survey" package was used for binary logistic regression outcome modelling (Oberski, 2014).

Results
The denominator population included 2250,741 dogs under primary veterinary care in the VetCompass database during 2019.CCL search terms yielded 32,372 candidate cases, of which 4122 (12.7%) were manually reviewed by the lead author.Of these, 815 (19.8%) were confirmed CCL cases eligible for the emulated trial i.e., fitting with the inclusion and exclusion criteria.Specifically, the emulated trial included 615 (75.5%) surgical cases and 200 (24.5%)non-surgical cases.Of the surgical cases, the most common surgical methods used were TPLO (266, 43.3%), lateral suture (139; 22.6%) and TTA (114; 18.5%).There were 19 (9.5%) non-surgical cases that went on to have surgery within the 12 month follow-up period and were censored at date of surgery.There were no dogs initially assigned to surgical management that died before having surgery.There were 3/815 (0.4%) dogs that died within the grace period, with evidence in the EPRs of non-surgical management at baseline, and so were included in the non-surgical management group.Likewise, 4/815 (0.5%) dogs were lost to follow-up within the grace period that had not had surgery (and with evidence that nonsurgical management was recommended at baseline), and so were included in the non-surgical management group.The median follow-up time from first diagnosis (given a cut-off time of 12 months) was 12.0 months (IQR 7.1-9.3,range 0.0 -12.0) and did not significantly differ in dogs managed non-surgically (median 12.0 months, IQR 8.2 -12.0, range 0.3 -12.0) compared to dogs managed surgically (median 12.0 months, IQR 6.6 -12.0, range 0.0 -12.0) (p = 0.560).The median time between diagnosis of CCL rupture and date of surgery was 7.0 days (IQR 0.0 -15.0, range 0.0 -30.0).

Model evaluation
The standardised mean differences between groups for each of the covariates pre and post IPTW are shown in Table 5.Other than breed, the standardised mean differences in the weighted sample were all below 0.1 for each covariate, indicating well-balanced groups post weighting.Therefore, breed was added to weighted outcome models, to further control for any residual confounding (with the results in Table 6 including the adjustment for breed).These outcome models were then used to predict the potential outcomes under surgical management or non-surgical management to obtain marginal risk differences.
The median propensity score in dogs treated non-surgically was 0.66 (range 0.10-0.96),whilst the median propensity score in dogs treated surgically was 0.83 (range 0.21-0.98).The range of stabilised IP weights (when treatment weights were combined with censoring weights for each outcome) was 0.23-15.0.The linear assumption for age as a continuous covariate was met.Compared to the model including main terms only, inclusion of biologically plausible interaction terms did not improve model fit.Based on the SMDs, only breed did not sufficiently balance between treatment arms.Inclusion of interaction terms did not improve balance of breed (i.e., SMD was consistently > 0.1).Likewise, the decrease in the absolute value of SMDs for some covariates, when interaction terms were included, did not outweigh the increase in SMDs for other covariates.Therefore, the final models included the following covariates to generate propensity scores: age, bodyweight, overweight   status, neuter status, insurance status, non-orthopaedic comorbidities, orthopaedic comorbidities and veterinary group.Management and breed were included as separate terms in the outcome models.Significant effect modification was not evident.

Emulated trial results
After balancing covariates between the surgical and non-surgical dogs using IPTW, and accounting for censoring using IPCW, surgical management reduced risk of short-and long-term lameness compared with non-surgical management.Specifically, the risk difference for short-term lameness in dogs treated surgically versus non-surgically was − 25.7% (95% CI − 36.7 to − 15.9), whilst the risk difference for longterm lameness was − 31.7% (-37.9 to − 18.1).Additionally, surgical management reduced risk of analgesia prescription, with a risk difference for analgesia prescription in dogs treated surgically versus nonsurgically of − 38.9% (-44.0.7 to − 28.1) at 3 months, − 34.1% (-40.4 to − 24.0) at 6 months and − 32.7% (-38.9 to − 18.3) at 12 months following diagnosis (Table 6).
There was no significant interaction between bodyweight (< 15 kg versus ≥ 15 kg), management type and lameness outcomes.The interaction was assessed by fitting to the outcome model on weighed data and the result was not significant and did not improve model fit (AIC).Specifically, the risk difference for short-term lameness in dogs < 15 kg treated surgically versus non-surgically was 8.9% (95% CI − 29.3-47.1),whilst the risk difference for dogs ≥ 15 kg was − 3.2% (-40.5-34.1).The risk difference for long-term lameness in dogs < 15 kg treated surgically versus non-surgically was 23.1% (95% CI − 12.9-59.0),whilst the risk difference for dogs ≥ 15 kg was − 1.4% (-39.5-36.7).

Sensitivity analysis
E-values were calculated to determine the risk ratio that an unmeasured confounder would need to have (conditional on the measured covariates) to fully explain away the treatment-outcome associations.Surgical treatment was used as the baseline group; therefore the Evalues represent the risk ratio in dogs treated non-surgically compared to surgically.The E-values ranged from 2.92 to 5.35 (with lower confidence intervals 2.37-4.14)(Table 7).

Discussion
To the authors' knowledge, this is one of the first studies to adopt the target trial emulation framework for veterinary observational data, whilst also the first to focus on estimating the treatment effect of surgical versus non-surgical management of CCL rupture in dogs.IPTW in conjunction with IPCW identified that surgical management reduces the risk of short-and long-term lameness compared with non-surgical management.Additionally, surgical management resulted in a reduced risk of analgesic prescription at 3-, 6-and 12-months compared with non-surgical management.
Historically, regression adjustment has been used more frequently than IPTW to account for differences in measured baseline characteristics between treated and untreated subjects (Austin, 2011).However, IPTW has become increasingly popular to adjust for confounding in observational studies, with a number of theoretical advantages proposed (Austin, 2011;Elze et al., 2017;Ali et al., 2019;Austin et al., 2021).First, IPTW allows for estimation of the marginal treatment effect, i.e. the average effect of treatment on the population (Armitage and Colton, 1998;Austin, 2011), whilst conventional covariate adjustment estimates the conditional treatment effect, i.e. the average effect of treatment on the individual.Therefore, if the objective of an observational study is to answer the same question as an RCT, the marginal effect is often of greater interest (Austin, 2011).Second, the standardised mean differences compare the distribution of measured baseline covariates between treated and untreated subjects, assessing whether the propensity score model has been adequately specified (Austin, 2011;Zhang et al., 2019).Conversely, goodness-of-fit tests used for conventional covariate adjustment do not determine the degree to which fitted regression models have successfully eliminated systematic differences between treated and untreated subjects (Austin, 2011).
After balancing covariates between the surgical and non-surgical dogs using IPTW, accounting for censoring using IPCW and adjusting further for breed, the marginal risk for short-term lameness and longterm lameness in dogs treated surgically was reduced by 25.7% and 31.7%respectively compared to non-surgically managed dogs.Clinically, these results highlight that surgical management of CCL rupture causes improved short-and long-term lameness outcomes compared to non-surgical management, and to a similar extent at both time points.That said, the absolute risks for short-and long-term lameness in dogs treated surgically (33.9% and 16.3% respectively) are not negligible and provide a benchmark for veterinarian-owner decision making.These results reflect findings from a previous study in which subjective lameness outcomes at 3 and 12 months respectively were improved by 20.6% and 11.4% in dogs managed surgically compared with those managed non-surgically (Wucherer et al., 2013).The long-term lameness improvement was less marked at 12 months in the previous study, compared to the current study, however this previous study used descriptive analysis methods only and was restricted to just 40

Table 6
Adjusted risk and corresponding risk difference (RD) (and 95% confidence intervals) for lameness and analgesia outcomes in dogs under UK primary veterinary care managed surgically (n = 615) or non-surgically (n = 200) for cranial cruciate ligament (CCL) rupture.The estimates represent the risk in dogs managed surgically compared with dogs managed non-surgically using inverse probability of treatment and censoring weighting to adjust for confounding.overweight dogs weighing > 20 kg.
There is some prior evidence that dogs weighing 15 kg or less have reasonable short-and long-term lameness outcomes when managed non-surgically (Pond and Campbell, 1972;Vasseur, 1984), however this effect by bodyweight was not evident in the current study.This may highlight surgical management of CCL rupture as superior, regardless of bodyweight, bringing into question the pervading veterinary view that smaller dogs have less requirement for surgery (Comerford et al., 2013;Taylor-Brown et al., 2015).However, reduced study power due to missing outcome data in the current study may in part explain this finding, which further studies could help to clarify.
The findings for analgesia prescription in the current study reflect those of lameness outcome, with the risk difference for analgesia prescription in dogs treated surgically versus non-surgically − 38.9% at 3 months, − 34.1% at 6 months and − 32.7% at 12 months following diagnosis.Likewise, the counterfactual risks of short-term lameness and analgesia prescription at 3 months in dogs treated surgically and nonsurgically are relatively similar, as are the counterfactual risks of longterm lameness and analgesia prescription at 12 months.This finding suggests analgesia prescription could be used as a useful and reliable proxy measure for lameness.Whilst the risk difference for lameness slightly increased from short-to long-term follow-up, the risk difference for analgesia prescription reduced, i.e. was greatest at 3 months following diagnosis.It is generally recommended that dogs are prescribed analgesia for a period of two weeks following surgery for CCL rupture to account for post-operative pain, provided there are no postoperative complications affecting this decision (Gruen et al., 2014).Whilst there is no definitive guidance for analgesia prescription in dogs managed non-surgically for CCL rupture, there is evidence that veterinarians opt for an initial 6-12 week period of recommended analgesia use (Vasseur, 1984;Comerford et al., 2013).Therefore, this might explain the slightly increased risk difference in analgesia prescription at 3 months relative to 6 and 12 months, as uncomplicated surgical cases are likely to cease analgesic treatment by two weeks after surgery, with residual usage more likely to relate to CCL-related pain rather than post-operative.
The current study shows that on average, surgical management leads to reduced lameness and analgesic prescription outcomes compared with non-surgical management.This finding is in line with the beliefs of many veterinarians about a substantial clinical benefit to dogs from cruciate surgery.All surgical methods were considered in the current study, and whilst the evidence for choosing one technique over another is limited (Bergh et al., 2014), further studies could evaluate clinical outcomes between different surgery types.Based on the SMDs, there was evidence of balance between the surgical and non-surgical groups for all covariates, other than breed.If there is evidence of imbalance (i.e.SMD > 0.1), mis-specification of the propensity score model should be evaluated (Zhang et al., 2019).It was reasoned that both breed and bodyweight could affect treatment decision-making, therefore breed was retained in the propensity-score model, but additionally added to the outcome model to further improve the control for confounding.
The limitations of this study mirror previous VetCompass studies, and are largely based on the nature of retrospective analysis of electronic patient record data, including issues related to missing and misclassified data and application of a case definition to the data available (O'Neill et al., 2014).However, we used appropriate statistical methods to account for missing data and loss to follow-up.Expert opinion was sought in construction of the DAG, however it is possible unmeasured confounders could influence the risk differences calculated.E-values were calculated to quantify the minimum strength of association, on the risk ratio scale, that an unmeasured confounder would need to have with both the treatment and outcome, conditional on the measured covariates, to fully explain away the treatment-outcome effects (Vander-Weele and Ding, 2017).The E-values ranged from 2.92 to 5.35, with the lower estimate i.e., 2.92 for short-term lameness as an outcome.Therefore, the risk difference for short-term lameness of − 25.7 (and corresponding risk ratio of 1.76) can be explained away by an unmeasured confounder that was associated with both treatment (management of CCL rupture) and outcome (short-term lameness) by a risk ratio of 2.92, but weaker confounding could not do so (VanderWeele and Ding, 2017).E-values should be interpreted in context and with other strengths and weaknesses of the study and design (VanderWeele and Ding, 2017), but these results suggest that the causal interpretation is strongest for long-term lameness as an outcome (E-value 5.35), whilst more moderate for short-term lameness.
We used a one-month grace period in the current study to ensure the treatment strategies were realistic i.e., dogs will not often have surgery for CCL rupture on the day of diagnosis.However, a consequence of this is that data for an individual dog can be consistent with both treatment strategies during the grace period i.e., they can be non-surgically managed whilst still having the opportunity to be surgically managed within 4 weeks.If a dog dies or is lost to follow-up during this period, randomly assigning the dog to one strategy or the "clone-and-censor" approach have been proposed to avoid bias (Hernán and Robins, 2016).A small number of dogs died or were lost to follow-up within the grace period in the current study (n = 7), and were assigned to the non-surgical arm.Although the outcomes were at 3, 6 and 12 months (so by definition could not occur within the grace period), the missing outcome data for these dogs could bias the results in either direction.In the present study, censoring weights were calculated at fixed time points.Censoring weights can be modelled in different ways, namely using a binary logistic regression model, pooled logistic regression model (or in the case of time-to-event data), Kaplan-Meier Estimator or Cox proportional hazards model, depending on the study design.Additionally, different forms of censoring, e.g.loss to follow-up and protocol deviation, can be modelled separately if the data structure and proportion of censored patients allows (Matsouaka and Atem, 2020;Murray et al., 2021).It should be noted that IPCW based on a fixed binary logistic regression model, with different types of censoring combined, can be less informative than other approaches, albeit less likely to result in extreme weights (Matsouaka and Atem, 2020).

Conclusions
Overall, this study demonstrated the application of the target trial framework to veterinary observational data.CCL rupture was used as the condition of interest, with the findings showing that surgical management causes a reduction in short-and long-term lameness, and analgesic prescription, compared with non-surgical management in dogs of all sizes.These findings can inform discussions between veterinarians and owners when deciding on treatment for CCL rupture.

Fig. 1 .
Fig. 1.Directed acyclic graph (DAG) based on existing evidence and expert opinion to estimate our belief about the total effect of management of cranial cruciate ligament rupture (surgical versus non-surgical) on short-term and long-term lameness.The same causal structure was believed to apply with analgesia prescription as the outcome of interest.

Table 1
Definition and categorisation of demographic and clinical data extracted from the anonymised electronic patient records of dogs diagnosed with cranial cruciate ligament (CCL) rupture (n = 815) attending primary-care veterinary practices in the VetCompass™ Programme in the UK.

Table 2
Specification and emulation of target trial to estimate the effect of surgical or non-surgical management for cranial cruciate ligament (CCL) rupture on lameness and analgesia prescription as outcomes.

Table 3
Count (%) of surgical cases (n = 615) and non-surgical cases (n = 200) for categorical variables recorded in dogs diagnosed with cranial cruciate ligament (CCL) rupture attending primary-care veterinary practices in the VetCompass Programme in the UK.

Table 4
Count (%) and unadjusted risk difference (RD) of lameness and analgesia outcomes in surgical cases (n = 533) and non-surgical cases (n = 176) of dogs diagnosed with cranial cruciate ligament (CCL) rupture attending primary-care veterinary practices in the VetCompass Programme in the UK.The RD estimates represent the risk in dogs managed surgically compared with dogs managed nonsurgically.Dogs with missing outcome data (n = 231/815; 28.3% for short-term lameness, n = 370; 45.4% for long-term lameness, n = 106; 13.0% for analgesia prescription at 3 months, n = 189; 23.2% for analgesia prescription at 6 months and n = 357; 43.8% for analgesia prescription at 12 months) were excluded.

Table 5
Standardised mean differences (SMD) before and after applying inverse probability weighting.This table shows the SMD for each of the prespecified covariates pre and post weighting.

Table 7
Sensitivity analysis: E-values* (on the risk ratio scale, with lower confidence interval (CI)) for lameness and analgesia outcomes in dogs with cranial cruciate ligament (CCL) rupture under UK primary veterinary care managed nonsurgically (n = 200) compared to surgically (n = 615).The risk differences for each outcome have been converted to a risk ratio for dogs managed nonsurgically compared to surgically (n = 815).
* The e-value represents the risk ratio that an unmeasured confounder would need to have (conditional on the measured covariates) to fully explain away the treatment-outcome associations.C.Pegram et al.