The Banff 2022 Kidney Meeting Work Plan: Data-driven refinement of the Banff Classification for renal allografts

The XVIth Banff Meeting for Allograft Pathology was held in Banff, Alberta, Canada, from September 19 to 23, 2022, as a joint meeting with the Canadian Society of Transplantation. In addition to a key focus on the impact of microvascular inflammation and biopsy-based transcript analysis on the Banff Classification, further sessions were devoted to other aspects of kidney transplant pathology, in particular T cell–mediated rejection, activity and chronicity indices, digital pathology, xenotransplantation, clinical trials, and surrogate endpoints. Although the output of these sessions has not led to any changes in the classification, the key role of Banff Working Groups in phrasing unanswered questions, and coordinating and disseminating results of investigations addressing these unanswered questions was emphasized. This paper summarizes the key Banff Meeting 2022 sessions not covered in the Banff Kidney Meeting 2022 Report paper and also provides an update on other Banff Working Group activities relevant to kidney allografts.

current Banff Classification. 1 This Banff 2022 Kidney Meeting Work Plan summarizes progress reports, discussions, and subsequent work plans developed by the various Banff Working Groups (Tables 1-6; Supplementary Tables S1-S10), with more detail given for those topics that were covered in sessions and talks and widely commented on at the Banff 2022 meeting: T cell-mediated rejection, activity and chronicity indices, digital pathology and machine learning, xenotransplantation, and use of the Banff Classification in clinical trials and observational investigations.The Banff Working Group activities (Table 1) represent the scientific engine driving the constant data-and evidence-based refinement of the international Banff Classification for Allograft Pathology.

T cell-mediated rejection and the role of inflammation in areas of interstitial fibrosis and tubular atrophy (i-IFTA)
In a session on tubulointerstitial inflammation at the Banff 2022 meeting, 2 of the more problematic diagnoses in the Banff Classification -chronic active T cell-mediated rejection (caTCMR) and Borderline (suspicious) for acute TCMR-were critically appraised.In addition, a set of questions for further investigation was outlined by the TCMR Working Group (Table 2).

caTCMR
Tubulointerstitial (grade I) caTCMR was introduced in the Banff Classification in 2017 and updated in 2019, based on the strong association between i-IFTA and graft loss, and evidence of its association with underimmunosuppression and previous active TCMR. 2,3owever, i-IFTA is noted not only in patients with previous TCMR but also in those with previous or concurrent pyelonephritis, polyomavirus-associated nephropathy, recurrent or de novo kidney diseases, and AMR.This nonspecificity of the i-IFTA lesion justified a definition of caTCMR that included (1) a requirement to exclude other diseases, and (2) a requirement for a minimum threshold of concurrent tubulitis and ti lesion.i-IFTA has to be considered along with the ti score, to avoid giving too much weight to very small areas of scarred cortex with heavy inflammation.Accordingly, it was acknowledged that the definitions for caTCMR need further clinical validation.
In a dedicated Banff 2022 session, the high prevalence of i-IFTA was highlighted (~50% of biopsies taken >1-year posttransplant), 4 as well as its association with the temporal development of cv and cg lesions scores over time.The following further data supporting the association between caTCMR and poor outcomes were reported and discussed at the meeting. 5,6First, 8-year survival from the composite outcome (graft loss and doubling of serum creatinine) following a diagnosis of caTCMR made on a protocol biopsy (1-year posttransplant) is around 65%, and in an indication biopsy around 50%.Second, clinical response to treatment may be observed in a minority of cases of caTCMR (20%, reported 7 ).However, some cases appear to show histological response, with a significant reduction in activity scores t, ti, and i-IFTA in posttreatment biopsies, without reduction in chronicity scores. 8ene expression studies in bulk tissue revealed a different molecular profile related to inflammation in cases with caTCMR as opposed to the inflammation in cases of active TCMR, the latter showing predominant interferon-gamma pathway activation with acute kidney injury, and the former showing signals related to injury-repair and mast cells. 7,9n 1 study, more cases of i-IFTA had AMR-related transcriptomic signatures than TCMRassociated ones, 9 and acute kidney injury-associated transcripts drove the poor outcomes in i-IFTA cases.In keeping with this observation there is a significant increase in patients with donor-specific antibodies (DSA) and/or C4d-positivity in patients with i-IFTA. 6nally, previous changes in the definitions of tubular atrophy, t, and t-IFTA have led to confusion and likely inconsistent application of the definition of caTCMR, even among specialists.The definitions of these lesions were clarified and agreed upon (reported in the Banff 2022 Kidney Meeting Report). 1 In summary, current evidence converges to suggest that i-IFTA is nonspecific both in its clinical associations and its molecular profile, but that it has a poor prognosis, even at a low i-IFTA = 1 score, and worse than fibrosis without inflammation.However, further data are needed to provide evidence that the Banff Classification thresholds for caTCMR (including t and ti thresholds in addition to i-IFTA) guarantee an increased diagnostic specificity for a T cell-mediated rejection process.Moreover, the effectiveness of therapeutic approaches for caTCMR has not been examined sufficiently, and it remains unclear whether the disease process and its association with graft failure can be halted.An alternative approach discussed at the Banff 2022 meeting would be to consider the i-IFTA lesion score as a diagnosis-agnostic prognostic feature only (see discussion below on activity and chronicity indices).Further investigations are needed to determine the diagnostic and prognostic value of i-IFTA in the Banff Classification, a focus of the TCMR Working Group (Table 2).

Borderline (suspicious) for acute TCMR (aTCMR)
The definition of borderline for TCMR (BL) arose during conversations at the time of merger of the Collaborative ClinicalTrialsin Transplantation (CCTT) classification with the Banff Classification.The threshold of the Banff i lesion score for TCMR at >25% established at these discussions during the 1990s is relevant to an era in which less effective immunosuppressive drugs were used, and thus may be too high.This threshold currently leads to a high number of cases being called BL based on active infiltrates that represent <25% of nonscarred interstitium.Recent evidence shows that a significant portion of BL behave as a TCMR. 10,11Patients with more HLA molecular mismatches have an increased risk of BL which-in the current era of immunosuppressive drugs-likely represents an alloimmune response in many cases. 11Patients with a first episode of BL or TCMR have a second episode in about 50% of cases, with this persistent or recurrent insult increasing the risk of graft loss. 12,13udies have shown that BL is often treated in indication biopsies but inconsistently in protocol/surveillance biopsies, while no evidence-based standards for treatment of borderline and TCMR exist. 14As indicated at the Banff 2022 meeting, a re-evaluation of the aTCMR and BL definitions and thresholds for the current era of immunosuppression is needed.

AMR
Although the Banff 2022 Meeting Report 1 proposes an approach for dealing with biopsies showing incomplete features of AMR, such as microvascular inflammation (MVI) that is DSA-negative and C4d-negative, or MVI below the threshold for a diagnosis of AMR, further investigation into the impact of these phenotypes is needed.The AMR Working Group will focus on this and related questions: Does DSA modify the risk of graft failure related to MVI?Is this risk modified by population characteristics?What are key population, assays, outcome definitions, and therapy variables that must be considered and reported on when designing studies geared toward reducing antibody-mediated injury?How can we optimize communication across the multidisciplinary team to make the AMR diagnosis?(Supplementary Table S1).

Biopsy-based transcript diagnostics
The Banff 2022 Kidney Meeting Report summarizes the output of discussions around the applicability of transcript analysis for the diagnosis of rejection. 1The Banff community felt that premature introduction of transcript-based diagnosis-which is not widely available and requires further validation-within the diagnostic classification, could create confusion among users of the classification and potentially lead to incorrect clinical decisions.For this reason, the wording in the classification "if thoroughly validated" was replaced by "if thoroughly validated for this context of use and available." In order to generate the evidence needed to justify the introduction of transcript-based diagnosis, the Biopsy-based Transcript Diagnostics (formerly "molecular diagnostics") Working Group has formulated questions that need to be answered.Although both the MMDx ® platform and the Banff Human Organ Transplant (B-HOT) gene panel designed for the Nanostring nCounter platform (and indeed other methods of gene expression investigation such as RT-PCR or reverse transcriptase multiplex ligation-dependent probe amplification) 15,16 can produce a multitude of different classifiers, it remains to be determined which classifier(s) perform(s) best for TCMR and for AMR diagnosis.Which thresholds should be used for clinical diagnosis and decision-making?Does addition of gene expression analysis significantly improve patient care and outcomes?How could an MMDx ® score convert into a B-HOT score and vice versa, to allow standardization of clinical decisions between centers using different platforms? 17Do biopsy-based transcript diagnostics offer similar or additional information to "activity and chronicity indices" provided by light microscopy?What is the clinical context of the use of biopsy-based transcript diagnostics?It was suggested to primarily implement molecular assessment in cases fulfilling some, but not all Banff criteria of AMR or TCMR (eg, BL and suspicious cases), or when there is a discrepancy between clinical/serological findings vs histology.What is the cost/benefit health-economic analysis of biopsy-based transcript diagnostics?What minimal number of transcripts needs to be measured?It is likely to be much less than what current platforms (MMDx ® ; B-HOT panel) provide.What is the predictive value of biopsy-based transcript diagnostics for effectively guiding therapy?
A proposed validation work plan including assessment of the context of use and clinical utility of biopsy-based transcript diagnostics was developed by the Biopsy-based Transcript Diagnostics (formerly "molecular diagnostics") Working Group (Table 3).

Banff active/chronic lesion scores, activity/chronicity indices
In the current Banff Classification, the temporal disease dynamics are reflected in the active, chronic/active, and chronic subcategories of TCMR and AMR.The evaluation of kidney transplant biopsies, both histologically and using molecular tools, contains information on the disease stage and reversibility of disease processes.A distinction is made between "active Banff Lesion Scores" (i, t, v, g, ptc, C4d), "chronic Banff Lesion Scores" (ci, ct, cv, cg, ptcml), and as proposed in Banff 2019 also "active & chronic Banff scores" (ti, i-IFTA, t-IFTA, pvl).As outlined in Banff 2019 3 and on the official Banff Classification website (https://banfffoundation.org/central-repository-forbanff-2019-resources-3/) inclusion of individual lesion scores in the biopsy report is advised.Conceptually, temporal links have been proposed between the active lesions leading to later chronic counterparts as the matrix is remodeled as a result of the inflammation.For instance, g associates with later cg, t and t-IFTA with later ct, i with later ci, v with later cv, and ptc with later ptcml.
The distinction between different disease stages (active, chronic/active, and chronic disease) is based on arbitrary thresholds.These thresholds have been the topic of heavy debate at past Banff meetings and were discussed again at Banff 2022 (see TCMR).At the Banff 2019 meeting (and later refined in a viewpoint paper), it was proposed that activity and chronicity indices could be used for cases of AMR, 3,18 similar to what has been done for lupus nephritis. 19,20This approach with a calculated "chronicity index" based on Banff Lesion Scores was further explored and recently shown to be associated with graft failure in patients with AMR. 21yond application in the context of AMR, data from studies carried out in Leuven, Belgium, and validated in Paris and Lyon, France, were presented that made a case for extending the assessment of such activity and chronicity indices to all kidney transplant biopsies, independent of the disease entity (https://rejectionclass.eu.pythonanywhere.com). 22,23These indices, calculated based on their relative association with graft failure, are strong prognostic factors for graft failure, and, therefore clinically meaningful.Moreover, such indices might provide clinicians with greater insight as to the severity of injury and its reversibility.High activity with low chronicity in an adequate biopsy sample may be more reversible than a biopsy with low activity/high chronicity.Activity and chronicity indices may therefore have predictive potential (ie, predicting which therapies could work, and which less), although this needs to be tested.The continuous nature of some of these indices would also allow longitudinal assessment of the dynamics of disease severity (activity/ chronicity) more easily over time.Similar to chronic injury detected histologically, increased injury/damage-related transcripts are also associated with later graft loss in either TCMR or AMR. 24,25espite the enthusiasm at the Banff 2019 and Banff 2022 meetings to adopt activity and chronicity indices, concern was also expressed that lesion scores and calculated indices might be erroneously equated with diagnosis.Clinical context (graft function in particular) is also important.In addition, individual lesion scores and activity/chronicity indices do not replace overall assessment of tissue for diagnoses other than rejection that also impact Banff Lesion Scores (eg, pyelonephritis, recurrent glomerulonephritis).In addition, it is important to recall that highly sensitized patients undergoing biopsies early posttransplant may show very "active" AMR with features like thrombotic microangiopathy or severe acute tubular injury, not captured in the activity index. 26In other words, it should be made clear that activity and chronicity indices do not replace diagnosis but add to it.In summary, several published studies and the discussions held at the meeting strongly support use of activity and chronicity indices in reporting kidney transplant biopsies.However, the relation to the current Banff subcategories (eg, active, chronic/active, chronic) remains unclear.Ultimately, activity and chronicity indices might replace Banff subcategories and thus lead to significant changes in the classification, similar to the recently modified NIH classification for lupus nephritis. 202][23] Therefore, it was decided to initiate a new Banff Working Group, with specific questions to answer with the goal of further validating activity and chronicity scoring in a clinical context (Table 4).

Digital transplant pathology and machine learning
Several sessions in the Banff 2022 meeting reviewed advances in the field of machine learning applied to digital pathology.The current work of the Banff Digital Pathology Working Group is the subject of 2 publications 27 (Farris et al, under review), and future plans are outlined in Supplementary Table S10.A growing number of kidney transplant pathologists are using digital pathology for their daily diagnostic practice, research, and clinical trials, facilitating accessibility, and sharing of digitized whole slide images (WSI), without apparent quality loss. 28Moreover, many are starting to curate datasets of WSI with or without annotations that can be used to train models to perform diagnostic tasks.
Potential deep learning-derived decision-support systems in kidney transplant pathology include applications that deliver segmentation of microanatomical structures (eg, glomeruli 29 ), high-throughput morphometric quantification (eg, tubular diameters 30 ), object detection (eg, inflammatory cells 31,32 ), and weakly supervised slide-level classification (eg, rejection versus nonrejection to prioritize workload), with or without computerassisted diagnosis overlays to WSI that guide the nephropathologist toward biopsy zones with highly informative features. 33Also, the evaluation of difficult lesions like transplant glomerulopathy could be improved with prognostic implications, as recently demonstrated. 34It has been shown that digital features significantly correlate with Banff Lesion Scores, but are also more sensitive to subtle pathological changes, below the thresholds in the Banff grading system. 35Composite damage scores calculated from those digital features outperformed Banff scores with superior graft loss prediction accuracy, indicating the potential of these models in addition to or beyond the Banff scoring system, eg, also when applied to time zero biopsies. 35 major bottleneck in developing deep learning models remains the access to large datasets of WSI annotated both for their pathology features and clinical features including outcomes.Methods such as the H-AI-L (Human Artificial Intelligence Loop) have facilitated algorithmic training with human input for annotations minimized. 36Algorithmic auditing by sensitivity analysis and stress-testing, supported byearlyadoption of AI-specific reporting guidelines (eg, DECIDE-AI 37 ), has been highlighted to allow for fair and transparent algorithm development and implementation.

Xenotransplant pathology
A joint session with the CST covered progress in genetically modified pig-to-human xenotransplantation.Within this session, kidney xenograft pathology was covered, including both pig-to-primate experimental models, [38][39][40] and early pig-to-human decedent temporary transplants. 41,42Some similarities in the pathology were noted comparing experimental models with first in human observations, in some (but not all) cases, glomerular fibrin and platelet thrombi and other features of endothelial injury were observed, while typical Banff rejection features were not represented.There is a potential for unique antibody-mediated pathology to appear over time, related to the myriad of antigenic differences in pigs and primates, and as grafts survive longer.
The B-HOT Nanostring panel of transcripts was considered for its potential to serve as a Banff Pig Organ Transplant panel, by analyzing gene homology, with early results documenting an increase in AMR-associated gene expression. 43However, a custom panel with a mix of pig selective and human/nonhuman primate selective genes might ultimately be needed.Recommendations for reporting of pathology in xenotransplantation are currently being considered and a proposal for a Banff Working Group for Xenotransplantation pathology was put forward to coordinate efforts (Table 5).Grading of lesions of thrombotic microangiopathy and C4d are central to biopsy evaluation, and immunostaining for immunoglobulin and complement, ultrastructural and RNA analysis (bulk and/or spatial) are recommended.

Histological endpoints
In a session dedicated to the use of the Banff Classification for clinical trials (both as trial endpoint and for definition of inclusion criteria) with input from the "Surrogate Endpoints Working Group," it was concluded that the classification for clinical use should fully align with the definitions accepted for clinical trials.If histological endpoints are used as endpoints for clinical trials, there is consensus on the need to distinguish between AMR and TCMR and avoid the use of "biopsy-proven acute rejection," as was outlined recently by a working group of the European Society of Organ Transplantation.][46] It was also noted that for clinical trials, discontinuous/semiquantitative histologic scoring leads to difficult reproducibility.This is a major issue that leads to lower power of studies and the need for larger study groups and increasing study costs.It is therefore key to use well-defined phenotypes in a clinical trial.This requires primarily a clear diagnostic definition according to the most current Banff Classification, panels of (central) pathologists (3 optimal to avoid a tie), adjudication mechanisms in case of disagreements, WSI for centralized slide review, and auditable assessments (Table 6).In addition, it is hoped that the automated digital image analysis could help increase the reproducibility and estimation of the severity of kidney transplant injury.For now, none of these digital pathology algorithms is accepted as the primary endpoint definition for clinical trials.
A minimal set of variables, including clinical information, is needed to come to robust final diagnoses, as example, indicated in a recent study on eculizumab in the prevention of AMR, where there was a discrepancy between local and central pathology results (influencing the statistical analysis), due to an initial lack of clinical information such as DSA available to central pathologists. 47though biopsy-based transcript diagnostics could hold promise for reproducible and quantitative assessment of biopsies, their added value, usefulness, and acceptability by health authorities as an endpoint for therapy registration trials need further discussion.
Finally, it is advised that future clinical trials do not solely collect and report the final Banff Diagnostic Categories (which are approved as primary endpoints in novel therapy registration trials [44][45][46]48 ), but also the granular Banff Lesion Scores and Additional Diagnostic Parameters. Ths would allow one to retrospectively reassess trial results 49 and to retrospectively apply multidimensional prognostication systems such as iBox 50,51 to the trial results.Open reporting of these granular data and public access is therefore recommended.

Surrogate endpoints
Discussions also covered the fact that the short-term outcomes traditionally used for trials in kidney transplantation are not necessarily reflected in long-term success, given the complex and multifactorial causes of late graft failure. 52,53Multidimensional (surrogate) endpoints that capture the multifactorial causes of graft failure at an early stage are therefore desirable.Several have been proposed, 54 but only the iBox algorithm, essentially a multivariable Cox proportional hazards model, is sufficiently validated and calibrated for this purpose at this time. 50e iBox Scoring System is a composite biomarker built on the following clinically relevant components: eGFR; proteinuria; presence of anti-HLA DSA; Banff Lesion Scores (IFTA grade, g + ptc, cg, and i + t scores).Inclusion of these histological lesion scores is, however, not always required for the prediction accuracy of the composite biomarker.In some contexts, using the Abbreviated iBox Scoring System (inclusion of only graft functional parameters and DSA) could be sufficient.However, in other specific situations, for example, in sensitized patients and in trials on AMR, the use of the Full iBox Scoring System adds further prediction accuracy.
Recently Critical Path Institute's Therapeutics Consortium, a public-private partnership with the FDA and various transplant stakeholders (professional societies, academia, industry, diagnostics, and patients), submitted the iBox Scoring System for consideration as a novel endpoint to facilitate new therapeutic development through EMA's qualification of novel methodologies for drug development.EMA qualified the iBox Scoring System as a secondary endpoint prognostic for death-censored allograft loss in kidney transplant recipients, to be used in clinical trials to support the evaluation of novel immunosuppressive therapy applications. 55In this specific context of use, the c-statistics for the derivation dataset were 0.809 (0.01 standard error) for the Full (with biopsy) and 0.803 (0.01 standard error) for the abbreviated (without biopsy) iBox Scoring System.Full iBox Scoring System (with biopsy), has an Area Under the Curve (AUC) of the Receiver Operating Characteristic (ROC) of 0.85 (0.02 standard error), while the Abbreviated iBox Scoring System (without biopsy) reaches a ROC AUC of 0.84 (0.02 standard error).Although EMA did not qualify the iBox Scoring System as a surrogate endpoint, EMA encourages further development of the scoring system targeting potential future qualification as a surrogate endpoint.Parallel to the discussions in Europe, the qualification of the iBox Scoring System as a reasonably likely surrogate endpoint to support FDA's Accelerated Approval program is ongoing.

Conclusions
As indicated in a recent review, over the past 30 years, the Banff Classification for Renal Allograft Pathology 18 has shown agility in adapting to advancing pathophysiological insights, changing clinical and regulatory contexts, and new diagnostic techniques.This agility remains vital.An integrated consensus process is followed to avoid overly frequent and minor changes to the classification and consequent difficulties in following the Banff rules in routine clinical care and in clinical trials. 1 In this paper, we outlined the existing knowledge gaps with a view to stimulating efforts by the internationaltransplantcommunitytogeneratetherespectivedataneeded to find answers to key unanswered questions.Progress will be addressed at the next Banff 2024 meeting, which will be held in Paris, France, September 16-20, 2024.TCMR Working Group: Questions to be answered.
Acute TCMR and borderline (suspicious) for acute TCMR: • What would be the effect of revised criteria for borderline (suspicious) for acute TCMR, including elements of CCTT classification, on frequencies, associations, and outcomes of this category?
• What is the relevance of the v lesion in the context of tubulointerstitial inflammation and the pathophysiologic and clinical distinction between TCMR grades, and the therapeutic implications?
• Can we define histologic features that distinguish between clinically significant borderline changes and those that need therapeutic intervention (edema, tubular injury other than tubulitis, eosinophils, ongoing tubulitis in areas with IFTA, etc.)?
• What is the clinical course of pure TCMR presenting in conjunction with DSA-negative, C4d-negative, microvascular inflammation?Chronic active TCMR (caTCMR): • What is the reproducibility of i-IFTA & t-IFTA lesions, and can the variability in the scoring be reduced, eg, through creation of educational materials, and possibly an External Quality Assessment Program?
• What are the frequencies, associations, and outcomes of caTCMR; alone and in combination with borderline changes and acute TCMR grade 1 and 2?
• Should extent of tubular atrophy and interstitial fibrosis (ie, ct and ci scores) be considered in the definition of caTCMR alongside i-IFTA and ti?
• Can activity and chronicity indices be used in TCMR rather than or in addition to reporting of acute vs caTCMR • What is the exact definition of additional diagnostic parameter "other known causes of i-IFTA should be ruled out?" • What effect does diagnosis of caTCMR have on patient management, as opposed to just acute TCMR and borderline changes?
• Does caTCMR respond to the treatments used for aTCMR?
• Could lowering the Banff 2019 threshold for caTCMR (eg, the threshold for i-IFTA score) be helpful to identify cases that are currently passing through the cracks and getting the label of caTCMR only when advanced fibrosis has developed?Table 3 Biopsy-based Transcript Diagnostics Working Group (formerly "molecular diagnostics"): Validation plan.
• Since no transcript has diagnostic specificity (similar to Banff histologic lesions), consensus thresholds for molecular classifiers and gene sets associated with Banff lesions and diagnosis need to be established and validated for defined clinical context(s) of use • What is the Analytical Validity of different molecular assays?○ Reproducibility and normalization/calibration studies are needed to make molecular results comparable between centers and sequential biopsies ○ Head-to-head comparisons of different assays on the same biopsy are needed to allow conversion of results from different platforms.
• What is the clinical validity of different molecular assays?○ Comprehensive multicenter (not only biopsies from different centers, but assays run at different centers on the same biopsies) and multiplatform studies of well-annotated cohorts including the full spectrum of the diseases and controls (including native kidneys/recurrent diseases) need to be conducted.
• What is the clinical utility of different molecular assays?○ Diagnostic vs prognostic vs theragnostic claims need to be validated in prospective or retrospective, randomized trials showing improved outcomes using molecular tests ○ Health-economic parameters need to be assessed to demonstrate value for money of molecular tests in kidney transplant care.Am J Transplant.Author manuscript; available in PMC 2024 May 29.

Table 5
Xenotransplantation Working Group: Questions to be answered.
• Which pathologic patterns unique to xenograft biopsies require separate criteria for evaluation and scoring?
• Given the overlapping pathologic features of thrombotic microangiopathy and acute antibody-mediated rejection, what are useful clinical, laboratory, and other markers/methodologies that can improve diagnostic accuracy?
• How can we develop a standardized method for reporting antipig antibodies and apply it to diagnostic pathologic evaluation?Am J Transplant.Author manuscript; available in PMC 2024 May 29.