FormalPara Key Summary Points

The objective of this systematic literature review (SLR) and network meta-analysis (NMA) was to evaluate the comparative efficacy and safety of crisaborole versus other topical pharmacologic treatments for mild-to-moderate atopic dermatitis (AD) among patients aged ≥ 2 years

Our search of Embase®, MEDLINE®, CENTRAL, and DARE using Ovid identified 894 articles published through 10 March 2020; after screening and the feasibility assessment, nine unique randomized clinical trials (RCTs) were deemed eligible for evaluation through NMA

Efficacy was evaluated using the Investigator’s Static Global Assessment (ISGA) score of clear (0) or almost clear (1) at 28–42 days with relative treatment effects expressed by hazard ratios (HR) with 95% credible intervals

Crisaborole 2% ointment was shown to be superior to pimecrolimus 1% cream and vehicle, and comparable to tacrolimus 0.1% or 0.03% ointment, in achieving the ISGA 0/1 score at 28–42 days

This evaluation of comparative efficacy of crisaborole 2% ointment further supports its use as an effective therapeutic option in patients aged ≥ 2 years with mild-to-moderate AD

Introduction

Atopic dermatitis (AD), a common chronic inflammatory skin disorder characterized by eczematous, lichenified lesions and intense pruritus, usually appears in childhood and is often associated with comorbidities such as asthma and allergic rhinitis [1,2,3]. AD affects 15–20% of children (< 18 years of age) and 1–3% of adults [4]. Most patients with AD suffer from mild-to-moderate disease [5,6,7].

The goal of AD management is the prevention and care of disease flares. US treatment guidelines recommend topical corticosteroids (TCSs) and/or topical calcineurin inhibitors (TCIs) as well as phototherapy for mild-to-moderate AD and immunosuppressants or biologics for moderate-to-severe/refractory disease [8]. Although there are safety concerns with the prolonged use of high-potency TCSs [9], a more significant problem is nonadherence to therapy because of fear of skin atrophy, which in turn leads to poor disease control [10]. While TCIs reduce AD severity, special warnings highlight a possible risk for lymphoma and skin cancer, and application site reactions may reduce its use [3, 8, 11].

Crisaborole is a nonsteroidal topical phosphodiesterase 4 inhibitor (PDE4i) that acts by regulating inflammatory cytokine production, which is overactive in patients with AD [12, 13]. Crisaborole was initially approved by the US Food and Drug Administration (FDA) in December 2016 for use as a topical treatment of mild-to-moderate atopic dermatitis in patients ≥ 2 years of age. In March 2020, the FDA approved a supplemental New Drug Application that expanded the use of crisaborole to include children ≥ 3 months of age. Crisaborole was approved in the European Union in March 2020 for the treatment of mild-to-moderate atopic dermatitis in adults and pediatric patients from 2 years of age with ≤ 40% body surface area affected.

Crisaborole was previously approved in Australia, Canada, and Israel. Crisaborole applied twice daily was shown to be effective in patients ≥ 2 years of age with mild-to-moderate AD and was associated with a low incidence of treatment-related/treatment-emergent adverse events (AEs) [14]. A recent systematic review and network meta-analysis for PDE4is versus vehicle has shown that topical PDE4is are more effective than vehicle alone for patients with mild-to-moderate AD [15]. Nevertheless, there is a need to compare crisaborole with other topical treatments and to synthesize available evidence from newly published randomized clinical trials (RCTs).

A systematic literature review and a network meta-analysis were performed to evaluate the comparative efficacy and safety of crisaborole versus other topical pharmacologic therapies for mild-to-moderate AD among patients aged ≥ 2 years.

Methods

Systematic Literature Review

Searches were conducted in MEDLINE (Ovid), Embase (Ovid), the Cochrane Collection Central Register of Clinical Trials (CENTRAL; Ovid), and the Database of Abstracts of Reviews of Effects (DARE; Ovid) to identify English language articles published between inception and 10 March 2020 reporting RCTs for evaluation of possible treatments for patients with mild-to-moderate AD. This systematic literature review adhered to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines [16, 17]. The search strategies included a combination of controlled vocabulary terms as well as free-text search terms for disease and study designs of interest (Supplement Tables S1–S3). In addition, we hand-searched abstracts from the 2015–2018 scientific meetings of the American Academy of Dermatology and the European Academy of Dermatology and Venereology, as well as bibliographies of included publications and systematic reviews identified in the search.

Identification and Selection of Studies

The review was conducted using a prespecified protocol. Predefined eligibility criteria involved the use of the Population, Interventions, Comparisons, Outcomes, and Study design tool (PICOS; Table S4). Two blinded, independent reviewers examined the citations; any discrepancies were resolved by a third reviewer. The outcome of interest was Investigator’s Static Global Assessment (ISGA) of 0/1 (clear/almost clear) at 28–42 days. Secondary outcomes of interest were AEs.

Data Extraction

The relevant information extracted from eligible studies included study design and methods, patient characteristics, intervention details (e.g., dosing, schedule, components of vehicle), and efficacy and safety outcomes, along with time points for outcome assessments. A single reviewer extracted data, and a second reviewer quality-assessed the data accuracy.

Quality Assessments

A risk-of-bias assessment was undertaken using the Cochrane tool, in accordance with the National Institute for Health and Care Excellence (NICE) single technology appraisal guidelines for evidence submissions [18, 19].

Feasibility Assessments

Prior to analysis, a feasibility assessment determined the availability of evidence and identified potential sources of heterogeneity. All studies were compared with respect to study- and patient-level characteristics, outcome definitions, and time points of evaluation.

Network Meta-Analysis

A network meta-analysis was performed to obtain relative treatment effects for achievement of ISGA 0/1 at 28–42 days. All analyses were conducted within a Bayesian framework [20] and involved a 100,000-run-in iteration phase and a 100,000 iteration phase for parameter estimation. All calculations were performed using OpenBugs 3.2.3 [21]. Models using fixed effects and random effects on treatment effects were explored. Baseline risk regression was used to adjust for differences in vehicle response across RCTs; this was driven by variation in vehicle composition and by heterogeneity in patient characteristics. Baseline risk adjustment indirectly adjusted for heterogeneity in effect modifiers across RCTs [22]. Class-effects models with baseline risk regression used fixed effects across RCTs but random effects for treatments within class; classes included crisaborole, vehicle, and non-crisaborole treatments. Model fit was explored by comparing the deviance information criterion (DIC) and the posterior mean of the residual deviance for fixed- and random-effects models [23]. The model with the lowest DIC was considered to be the best fitting. Hazard ratios reflect the “hazard” of response; thus, hazard ratios (HRs) > 1.0 for comparisons between two treatments imply better performance for the first treatment. A detailed description of the statistical methods can be found in the Supplement.

Compliance with Ethics Guidelines

This article is based on previously conducted studies and does not contain any studies with human participants or animals performed by any of the authors. The review adhered to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines.

Results

Systematic Literature Review

Study Selection

The search strategy identified 894 records after duplicates were removed, of which 212 were screened for full-text eligibility after title/abstract screening and duplicate removal. In total, nine RCTs (reported in 8 publications; 1 of which reported in 2 RCTs) were eligible for inclusion in the network meta-analysis (Fig. 1).

Fig. 1
figure 1

PRISMA flow diagram of the systematic literature review. AD atopic dermatitis, ISGA Investigator’s Static Global Assessment, NMA network meta-analysis, PRISMA Preferred Reporting Items for Systematic Reviews and Meta-Analyses; RCT randomized clinical trial, SLR systematic literature review

Study Characteristics

The follow-up duration across all included studies ranged from 28 [14] to 42 days [24]. Only five RCTs reported the recruitment period, which ranged from 2001–2015. Sample sizes varied from 133 [25] to 764 [14] (Table S5).

Patient Characteristics

The average age ranged from 6.4 [26] to 39.1 [27] years, reflecting a mix of pediatric and adult populations (one RCT did not report mean age [25]). Per the inclusion criteria, all evaluated patients ≥ 2 years of age, except for Eichenfield et al. [24], which enrolled patients aged 1–17 years and reported an overall average patient age of 6.7 years [24]. Most RCTs (44%) included children and adolescents (2–17 years); an additional 33% included adult and pediatric patients. Six RCTs enrolled mixed mild-to-moderate populations, whereas one enrolled only mild [26] and two enrolled only moderate [27, 28]. Most studies (n = 8) defined disease severity according to ISGA. Three studies provided baseline Eczema Area and Severity Index scores, but none provided baseline SCORing of AD scores [24, 26, 27, 29]. Baseline percentage of body surface area affected measurements were provided by seven RCTs, ranging from 11.1% [30] to 25.9% [24].

Treatment Characteristics

Treatments evaluated included crisaborole (PDE4i), pimecrolimus, and tacrolimus, 0.1% or 0.03% (TCIs), each administered as monotherapy. All doses administered in the RCTs were FDA-approved for the treatment of AD. Pimecrolimus is approved for mild-to-moderate disease, whereas tacrolimus (in both available concentrations) is approved for moderate-to-severe AD. All treatments were applied twice per day. Treatment durations ranged from 28 to 42 days.

Six RCTs were vehicle-controlled, and three included active comparators. The vehicles were formulated with different emollient properties, and none of the included RCTs reported on the contents of the vehicle or the proportion of the ingredients. The contents were assumed to be identical to the contents of the base used for the interventions.

Outcome Assessments

Clinical disease severity was assessed by ISGA. Three RCTs reported using a five-point ISGA scale, whereas four used a six-point scale, although scores of 0 = clear or 1 = almost clear were defined similarly across scales. Although ISGA was evaluated at various time points, the 28- to 42-day time point was the primary point of interest.

Risk-of-Bias Assessments

Most of the RCTs were of good quality, with a low risk of bias; there were few concerns regarding the level of bias (Supplement Figure S1).

Network Meta-Analysis Results

Six sets of network meta-analyses were performed. Because of the observed variation of vehicle effect (baseline risk), analyses adjusting for baseline risk were conducted with or without class effect. A complementary log-log (clog-log) link was used to adjust for different follow-up durations across RCTs. After fitting a variety of statistical models, the clog-log model with adjustment for baseline risk and class effects (model 1) was deemed the most appropriate because it had the lowest DIC and adjusted for baseline risk and class effects (see Table S6).

Efficacy: ISGA 0/1 at 28–42 Days

For ISGA 0/1 at 28–42 days, the clog-log model adjusting for pooled evaluation time points, baseline risk, and class effects had the lowest DIC and a residual deviance suggesting good fit (Fig. 2). As expected, the baseline risk model found strong evidence of a relationship between vehicle effect and relative treatment effect versus vehicle (slope: − 0.89 [95% credible interval − 1.26 to − 0.47]) (Fig. 3, Table 1).

Fig. 2
figure 2

Evidence network for ISGA 0/1 at 28–42 days. For studies that reported data at both 28 and 42 days, the 28-day data were used in the analyses. AD atopic dermatitis, bid twice per day, d day, ISGA Investigator’s Static Global Assessment

Fig. 3
figure 3

ISGA 0/1 at 28–42 days (model 1: clog-log model adjusted for baseline risk and class effects). ISGA Investigator’s Static Global Assessment

Table 1 Qualitative summary of safety data of included trials

Patients on crisaborole or tacrolimus, 0.1% or 0.03%, were more likely to achieve an ISGA 0/1 at 28–42 days versus vehicle (i.e., 95% credible interval did not include 1), with the greatest point estimate observed for the crisaborole comparison (HR: 2.07; 95% credible interval 1.76–2.36; probability HR above 1 [p better]: 100.0%); there was weak evidence of a difference between pimecrolimus and vehicle (1.28; 0.92–1.78; 93.5%). Patients on crisaborole were also more likely to achieve ISGA 0/1 versus pimecrolimus (1.62; 1.04–2.48; 98.3%). There was weak evidence of a difference between crisaborole and tacrolimus, 0.03% (1.35; 0.95–1.84; 95.7%) and no evidence of a difference with tacrolimus, 0.1%.

Safety

A network meta-analysis of safety outcomes was infeasible because of differences in reporting of safety data for comparators (e.g., different thresholds used [AEs in ≥ 1%, ≥ 10%]), outcome definitions (e.g., definitions of withdrawal because of AEs), and study periods between RCTs (changes in reporting of outcomes data over time; older vs. newer RCTs). Additional reasons were outcomes not reported (difficult to determine whether an outcome is not reported because of the threshold/definitions or the outcome not occurring) and the overall sparsity of safety data reported in the included RCTs. Misalignment in the type of data reported could bias the results of any comparative quantitative analyses and might lead to under- or overestimation of results. Therefore, safety results are described qualitatively.

Overall Adverse Events

The rates of overall AEs ranged from 15.4% [26] to 55.6% [28]. The rates of patients reporting at least one treatment-emergent AE with crisaborole (29.3% and 29.4%) were similar to the rates experienced in the vehicle group (19.8% and 32.0%) [14]. Rates of overall AEs reported for tacrolimus, 0.03%, ranged from 15.4% [26] to 55.6% [25] across three RCTs and for tacrolimus, 0.1%, was 32.7% in one RCT [27]. These rates were 16.6% [26] to 44.0% [24] for pimecrolimus across three RCTs.

Common Adverse Events

Frequently reported AEs were application site burning/stinging, upper respiratory tract infections, skin infections, and erythema (Table 1). The incidence of application site burning/stinging varied across studies and depended on the outcome definition: some studies included pain or warmth, whereas others reported only burning or stinging. Rates of application site pain AEs were 6.2% [AD-301] and 2.7% [AD-302] versus 1.2% for vehicle in each study [14], 1.9% for tacrolimus, 0.03% versus 1.8% for pimecrolimus [26], and 3.1% for tacrolimus 0.1% versus 0% for pimecrolimus [27]. Only three RCTs reported the rates of upper respiratory tract infections (2.0% [14] to 14.2% [24]). The incidence of skin infections across all RCTs was generally low (Table 1). The incidence of erythema ranged from 0% [14, 28] to 18.9% [29], but with various definitions of erythema (Table 1).

Discussion

This systematic literature review and network meta-analysis were undertaken to evaluate the comparative effectiveness and safety of crisaborole versus other topical pharmacologic therapies for the treatment of mild-to-moderate AD. In the systematic literature review, no studies were identified that compared crisaborole to other active treatments. Consequently, a network meta-analysis indirectly compared treatments for which no head-to-head trials were available and synthesized available evidence across treatments. No studies of TCSs were identified that reported data on ISGA 0/1; therefore, they were not included in the network meta-analysis.

With respect to efficacy, slightly different versions of the ISGA scale were used among the RCTs. The crisaborole trials used a five-point ISGA scale as an endpoint, whereas other trials evaluated a six-point ISGA scale. Despite this, disease severity measured by baseline ISGA reported across the RCTs seemed to be comparable, with most patients having baseline ISGAs of 2–3 (mild-to-moderate). We have assumed that the “clear” (ISGA = 0) and “almost clear” (ISGA = 1) categories are similar for both scales for analysis purposes because treatment response is defined similarly across both scales. A high response in the vehicle arm in the crisaborole trials was observed with respect to ISGA 0/1, which was greater than that seen in the vehicle arms of most RCTs that evaluated other topical therapies. This suggests that vehicle preparations in some of the RCTs do not have as many therapeutic benefits as those administered in crisaborole RCTs. Heterogeneity in patient characteristics [22], difference in the season when trials were conducted [31], and differences in the potency between creams and ointments [32] may have modified observed treatment effects. Properties of vehicle formulations may affect drug delivery and efficacy, as well as drug tolerance profiles [33]. Some vehicle excipients have a more pronounced therapeutic effect on the skin and can improve clinical appearance and skin barrier function directly [33].

There was strong evidence that patients treated with crisaborole or tacrolimus, 0.1% or 0.03%, were more likely to achieve ISGA 0/1 at 28–42 days than those receiving vehicle. Furthermore, there was evidence that patients treated with crisaborole were more likely to achieve ISGA 0/1 at 28–42 days than those treated with pimecrolimus 1%. Although there was weak evidence of a difference between crisaborole 2% and tacrolimus 0.03%, and no evidence of a difference with tacrolimus 0.1% in model 1, all point estimates favored crisaborole.

Our findings are roughly consistent with other reported network meta-analyses on crisaborole in patients with mild-to-moderate AD; however, this may be limited given that other studies did not adjust for baseline risk (variation in efficacy rates for vehicle) [34]. The Institute for Clinical and Economic Review (ICER) report suggested that pimecrolimus was trending as superior to crisaborole [34]. However, the results of their analyses showed wide credible intervals and showed no or little evidence of any possible difference in efficacy between treatments. Although the authors of the ICER report noted there was a substantial difference in baseline risk across RCTs regarding treatment response for vehicles, they did not adjust for this in their analyses. The NICE Decision Support Unit recommends regression on baseline response as a means of adjusting for heterogeneity where appropriate [35], and, in the present case, the credible interval for the interaction term was far from zero, with a slope of − 0.89 and a 95% credible interval of − 1.26 to − 0.47. In the Drug Effectiveness Review Project review, significantly more patients had treatment response with crisaborole than with vehicle [36]. The authors of this report also did not perform any adjusted analyses. As stated previously, a recent systematic literature review and network meta-analysis for PDE4is that included crisaborole and other PDE4is versus vehicle showed that topical PDE4is, particularly crisaborole, were more effective than vehicle alone [15].

Safety outcomes were not analyzed by means of a network meta-analysis in the present study because this was deemed inappropriate for a variety of reasons (e.g., difference in outcome definitions, sparsity of data). Therefore, the results for safety were only described qualitatively, and no definite conclusions regarding relative safety of crisaborole versus TCIs could be drawn. Caution should be taken in the interpretation of naive comparisons because no formal comparative (indirect) assessments were made.

The strengths of our study include various key aspects relative to the innovative application of meta-analysis methodologies to address the need for comparative efficacy evidence. The systematic literature review was performed in accordance with published guidelines, and the network meta-analysis was based on well-established Bayesian methodology [20, 37, 38]. Our systematic review and network meta-analysis was rigorous, used sophisticated statistical models, and reached conclusions that have not been previously documented. Heterogeneity was addressed, where possible, to fulfill the homogeneity assumption necessary for the network meta-analysis. A comprehensive feasibility assessment was conducted a priori, including an evaluation of the clinical heterogeneity between trials that showed that studies were similar for many of the characteristics of interest. Baseline risk regression was performed to adjust for differences in vehicle response and heterogeneity in treatment effects across trials.

There are several limitations to this study. First, the interval for the primary time point of interest was wide at 28–42 days. Because efficacy for interventions may change with prolonged use, this is a potential source of heterogeneity and may have impacted the results for this outcome (i.e., ISGA 0/1 at 28–42 days). To control for this variability in follow-up time, a clog-log model was applied for ISGA 0/1 at the 28- to 42-day time point.

A second limitation refers to the efficacy data being evaluated, given possible confounding factors and the issue that data for some other efficacy outcomes also important in AD were not available. The efficacy difference may not be generalizable to some real clinic settings, as there may be other confounding factors associated with the use and benefits of active treatment in real clinic settings (e.g., access issues). It was also not possible to fully explore all potential confounders by means of subgroup analyses. Further adjustment for differences in baseline characteristics could not be explored using meta-regression techniques because of the limited number of studies available for comparators. Also, some important efficacy outcomes could not be evaluated because of data limitations (e.g., pruritus reduction, quality of life benefit).

A third limitation was that safety outcomes could only be described qualitatively. Network meta-analysis for safety was inappropriate because of sparse data across studies, including differences in outcome definitions used, in reporting of data for comparators, issues with outcomes not reported, and differences in study period.

There are no head-to-head trials comparing crisaborole versus other active treatments. We could only indirectly compare treatments using network meta-analysis. Results should be interpreted with caution and cannot replace a direct head-to-head evaluation.

Conclusion

This network meta-analysis showed that crisaborole was superior to vehicle and pimecrolimus and comparable to tacrolimus, 0.1% or 0.03% in achieving ISGA 0/1 at 28–42 days in patients aged ≥ 2 years with mild-to-moderate AD. In the crisaborole pivotal studies (AD-301/AD-302), crisaborole was shown to be well tolerated, with low rates of treatment-related AEs. More research is needed to establish the comparative efficacy of crisaborole with respect to other key clinical efficacy outcomes, including other severity scales (e.g., Eczema Area and Severity Index [EASI], the SCORing Atopic Dermatitis [SCORAD]) and assessments of pruritus severity, in addition to other patient-relevant outcomes (e.g., QOL and functional status).