A systematic review of in vitro models of drug-induced kidney injury

Drug-induced nephrotoxicity is a major cause of kidney dysfunction with potentially fatal consequences and can hamper the research and development of newpharmaceuticals. This emphasises the need for new methods for earlier and more accurate diagnosis to avoid drug-induced kidney injury. Here, we present a systematic review of the available approaches to study drug-induced kidney injury, as one of the most common reasons for drug withdrawal, in vitro . The systematic review approach was selected to ensure that our findings are as objective and reproducible as possible. A novel study quality checklist, named validation score, was developed based on published regulatory guidance and industrial perspectives, and models returned by the search strategy were analysed as per their overall complexity and the kidney region studied. Our search strategy returned 1731 articles supplemented by 337 from secondary sources, of which 57 articles met the inclusion criteria for final analysis. Our results show that the proximal tubule dominates the field (84%), followed by the glomerulus and Bowman ’ s capsule (7%). Ofall drugs investigated, the focus was most on cisplatin (n = 29, 50.1% of final inclusions). We found that with increasing model complexity the validation score increased, reflecting the value of innovative in vitro models. Furthermore, although the highly diverse usage of cell lines and modelling approaches prevented a strong statistical verification through a meta-analysis, our findings show the downstream potential of such approaches in personalised medicine and for rare diseases where traditional trials are not feasible.


Introduction
Despite ever increasing expenditure on R&D, the drug discovery pipeline (DDP) has seen fewer products making their way into the clinic [1].Traditionally, the decision to advance to clinical trials, where R&D expenditure is highest, is based on data obtained using wellcharacterised animal models for the assessment of pharmacokinetics and drug safety.Advancing through the stages of the DDP should be based on models that accurately reflect human physiology, but despite this, animal studies have long been the gold standard of preclinical research.However, there is increasing evidence of low predictivity of animal data for human effects in terms of efficacy and safety [2].This mismatch can lead to unexpected adverse events in clinical trials or postmarketing, which contribute to the rising attrition rate and various (expensive) product recalls due to drug toxicity.
Toxicity has been estimated to be responsible for adverse events leading to attrition of up to a third [3] of drug candidates and is a major contributor to the high cost of drug development, particularly when not recognised until late in the clinical trials or postmarketing surveillance.The kidney is particularly susceptible to drug injury because of its high share of cardiac output and its role in excretion of waste compounds from the body.Drug-induced kidney injury (DIKI) is a relatively common clinical condition, particularly in critical care settings caused by acute kidney injury (AKI).Up to 30% of all critically ill patients develop AKI with nearly 6% of those diagnosed requiring subsequent kidney replacement therapy [4].Despite this, only a fraction (2%) of drug candidates are rejected due to nephrotoxicity in early (phase I) clinical studies, and many forms of DIKI are not measurable until very late in the DDP [5].This can be caused by species to species translation issues such as drug-metabolising enzyme (cytochrome P450; CYP) expression and individual differences in clearance performance [6].The role of nephrotoxicity is reflected in the percentage attrition due to nephrotoxicity increasing to 19% in phase III clinical trials [4].The difficulty of screening for DIKI in the preclinical setting is down to a number of factorsdnot least that the kidney is an incredibly complex organ composed of diverse tissue and terminally differentiated, specialised cells and often the mechanism of action of a substance is unknown, so selection of appropriate in vitro models to support in vivo findings is difficult.Clinical diagnosis is closely associated with overall excretory function using glomerular filtration rate, estimated from serum creatine and urinary albumin levels.Such outcomes have been clearly demonstrated to be imperfect indications of any form of kidney injury because of the delay from injury to measurable change [7,8].Furthermore, there is still no universally accepted definition of DIKI as it may involve tubular injuries or glomerulopathies, resulting in AKI or chronic failure, and is often diagnosed late [9].These limitations and an inadequate preservation of the organs' microenvironment in the models hampered currently used in vitro assays to adequately mimic native physiology and/or predict in vivo observed effects [10].
The lack of predictive models and the poor predictive aspects of animal studies for clinical trials show the clear need for better approaches to recapitulate kidney function in vitro.In recent years, advanced in vitro models emerged that integrated complex tissue cultures in microfluidic platforms to more closely mimic the human kidney and DIKI.Here, we used an adapted systematic review methodology to investigate whether currently available in vitro models are capable of successfully predicting safety outcomes in humans before going on to highlight new approaches that improve the preclinical processes and therefore translation rates to the clinic.Eventually, these models should improve (long-term) patient outcomes through more effective and safer novel medications entering the market.Furthermore, these models can also be used for detecting underlying causes of DIKI of existing drugs and therapies.

Review protocol
The review methodology was prespecified as per a standard Cochrane review of medical interventions.The standard SYRCLE [11] protocol was modified to replace their animal model search strategy with in vitro models of kidney injury, for which the search terms have been adapted as presented in the protocol in appendix 1 to answer the research question "can current in vitro models accurately provide safety information for potential extrapolation to human applications?"

Literature search
A systematic search of the PubMed and EMBASE databases was performed, with articles up to July 2019 selected for further screening if they met inclusion criteria defined in the protocol.To ensure a complete overview of the literature, reference lists of included studies or relevant reviews identified through the search were also reviewed.The list of relevant reviews is included in the protocol at appendix 1.
Articles identified in this search were selected independently by two researchers based on title and abstract screening and as per the inclusion and exclusion criteria stated in the protocol.In case of discrepancies between the two independent reviewers, a third investigator was involved in the screening and discussion.No language restrictions were applied.If any articles meeting inclusion criteria were published in a non-native language, scientists with native language skills would have been asked to translate.

Inclusion and exclusion criteria
Articles were included if they were primary research studies presenting unique data using an in vitro methodology to assess damage to kidney cells.Every effort was made to detail novel and alternative approaches such as organ-on-a-chip and predictive in silico modelling that may provide more human relevant responses at the preclinical stage.Relevant drugs included are presented in Table 1.Screening was carried out using Rayyan QCRI [12], and citations were managed using EndNote X9 and Microsoft Excel.Data analysis was carried out in Microsoft Excel with manuscript figures prepared in GraphPad (GraphPad Prism, version 8.4.3, for Windows, Utrecht University).

Data extraction and analysis
The following outcome measures related to safety were extracted: molecular indicators of cell damage and safety outcomes, that is cell death percentage, inflammatory markers and kidney-specific damage biomarkers such as clusterin, cystatin-C, KIM-1, N-acetyl-b-D-glucosaminidase, neutrophil gelatinase-associated lipocalin and osteopontin (see supplemental information for details on the markers).Full outcome measures are listed in the protocol at appendix 1.
Because of the high heterogeneity of the studies that met the inclusion criteria, a meta-analysis was not possible.Instead, the methodological quality of the included studies was assessed using a combination of SYRCLE's risk of bias tool [11] and a number of other factors extracted from references presented in the supplemental table S.1.These factors led to the creation of a preliminary validation checklist of ten questions to assess the quality of the study and also to assess each experimental approach's potential use as an alternative to current in vitro models.These questions are as presented in supplemental table S.2.Every study meeting the inclusion criteria in the protocol was assessed as per these questions and scored on how well the study complied.The studies were scored on a scale of 0 (no compliance) to 2 (full compliance) to give a numeric and transparent assessment of each approach.Further stratification of experimental approaches was carried out as per the functional regions of the kidney nephron, separating the studies into five groups as shown in Figure 1.

Currently used in vitro models systematically reviewed
The search strategy returned a total of 1731 results, and secondary sources contributed another 337 articles.After removal of duplicates (n = 26), a total of 1824 article abstracts were screened, of which 96 met the inclusion criteria mentioned previously and progressed to full-text screening.Full-text screening led to the exclusion of a further 39 articles because of incorrect study design (no in vitro data) (9), wrong publication type (17), no DIKI outcomes (12)   All the studies presented in performed toxicity assessment in platforms capable of keeping cells cultured for >7 days.Three-dimensional architecture [16,21,32] Features such as the presence of apical to basolateral polarity, in vivo like flow conditions, are all vital to assess drug uptake.
A systematic review is an unbiased, transparent and reproducible method for literature review, but is hampered by low standardisation across studies identified Table S.3 displays a summary of the studies and articles that fully met the criteria and were subjected to data extraction and analysis.Across all results, the overall quality was considered high (10.82).All the studies indicated the origin and type of cells and relevant control groups to their scientific hypothesis (51/57 (89.5%) having full compliance and 6 (10.5%) having medium compliance).However, there were some that did not clearly indicate whether the data were obtained from independent experiments, and the incidence of intralaboratory or interlaboratory repetitions was very low (only 2 showing full compliance (3.5%)).Similarly, there was limited compliance with regard to drugedrug interactions and media interaction, as well as low compliance with drug stability data.No studies scored perfectly on randomisation and blinding, and only 3 (5.3%)showed partial compliance to these criteria.Interestingly, 35% of studies failed to attempt an in vitroein vivo extrapolation [13].Data are presented in Because of the diversity of techniques and the focus on the presentation of as wide a range of novel methodologies as possible, a meta-analysis of the selected studies was not feasible.To perform such an analysis, much higher standardisation with regard to dosage, time of exposure, blinding and defined control groups would have been needed.Instead, articles meeting the inclusion criteria were stratified by the type of model used and assessed for their potential further use in the drug development pathway, with a particular focus on their ability to screen for nephrotoxic compounds.
In Figure S.3, it is demonstrated that reproducible criteria (8e10) show the lowest compliance.Building on the lack of standardisation across the studies, of particular note is the distribution of results that were generated by systematically examining the literature and manual searching using reference lists of review articles.From our systematic search, 27 articles that met the inclusion criteria were returned in comparison with 30 from the manual one, meaning our results rely on over 50% of studies that could not be located by the search strategy (after discarding of duplicates).Furthermore, when examining the validation score of the articles, our secondary sources show a higher score of 11.96 versus 9.61 for the systematic search.

Common features and consensus to lay the foundations for validating in vitro models
The most commonly investigated drug was cisplatin (n = 29, 50.1% of final inclusions), which was also included in every study that used a panel of nephrotoxic compounds (n = 14).Cisplatin (cis-diamminedichloroplatinum II) is a first-line chemotherapeutic drug used to treat diverse types of cancers.Despite being one of the most effective chemotherapeutic agents, its clinical use is limited because of the severe side effects, including AKI which can develop because of tubular cell accumulation [14].Common drugs including acetaminophen, chemotherapeutics and environmental Kidney regions and model stratification.Category 1 models using proximal tubule cells, category 2 models-Bowman's capsule and glomerulus, category 3-distal tubule, category 4-collecting duct and category 5-loop of Henle.The studies were also assigned a model category as described in supplementary Table S.4.The type of the outcome expressed in each methodological approach was also categorised as per supplementary Table S.5.This was created with BioRender.
toxins modelled by the studies showing the highest validation scores are presented in Table 1.
Advanced kidney models have been developed mainly for the proximal tubule with cytotoxicity as the most common endpoint Typically in vitro modelling has focussed on the culturing of singular cell types on plates or slides with toxicity of compounds being assessed via percentage of cells undergoing apoptosis under stimulus.Advanced in vitro models identified in this study and their characteristics are presented in Table S.6.
Figure 2 shows the vast majority of our results modelled the proximal tubule, indeed section 1 models accounted for 84% of results with an average validation score of 11.
The second most well-modelled kidney region was section 2 (bowman's capsule and glomerulus) with 7% of results averaging a score of 10.Our results display one model of the distal tubule and 2 models of the loop of Henle.There was no specific modelling found for the collecting duct; however, some organoid models did model the full nephron structure.
With the proximal tubule dominating, the most common endpoints assessed were category 2, which included classical toxicity endpoints like cellular levels of ATP, and extracellular lactate dehydrogenase and MTT, 3-(4,5-dimethylthiazol-2-yl)-2,5-diphenyltetrazolium bromide that can be reduced to an insoluble formazan, which has a purple color and indicative of the cells metabolic activity [15].Further outcomes and the modelling approaches that assess them have been summarised in Table 1.

Higher validation score observed for studies using advanced in vitro models
The lack of ability to perform meta-analysis on in vitro studies makes the synthesis and extrapolation of data to a patient population much more challenging.The 'validation score' metric developed here aims to identify techniques and models with clear applications for the identification of compounds toxic to humansdwith a particular focus on the kidney.
The most common approach met the criteria for category 1d2D culture using a traditional cell line (n = 30), followed by category 4 that includes organoids and 3D cell culture (n = 13).In Figure 3, we see that category 4 models have the highest mean validation score (12.86, n = 4) closely followed by category 2 models (12.67, n = 5).In Figure 3, we can also observe a trend of a higher validation score with increasing 'complexity' of the model.There was further difficulty in comparing studies as many articles were comparing novel methods with established cell lines and toxicity, and so conformed to multiple model categories.
This trend is reinforced when accounting for the unequal size of the categories via the modal value with category 4 (advanced in vitro) showing a mode validation score of 14. Increasingly complex models that scored highly showed criteria such as increased physiological relevance (3D structure of tubules, extracellular matrix, multiple compartments, presence of multiple cell types, cellular maturation, expression of relevant transporters) [16,17], inclusion of extracellular matrix components [18], compatibility with high-throughput screening and advanced imaging techniques [19,20] and ability to inform a personalised medicine approach with regard to safety testing.These common physiological features of the highest scoring models have been summarised and presented in Table 1.
The highest validation score was received by the study conducted by L Aschauer et al.
[21], a score of 16 out of 20 (Table S.6).Here, the researchers used a transcriptomics approach to compare the toxicity responses of RPTEC/TERT1 cell lines cultured on filter inserts to give apical and basolateral polarisation.In addition to this, fourteen other models demonstrated a high validation score 13 (Table S.6).However, the Maschmeyer et al.
[22] study breaks the general trend of more complex model scoring higher on the study quality checklist.The researchers here presented an interconnected four-organ-chip co-culture, the most complex model identified in our literature search.However, the study scored 8 of 20 on the study quality checklist, showing poor compliance with 'robustness' and context of use criteria on the checklist (scoring 0 on q. 1, 4 and 5).This can be attributed to the study's overall focus more on absorption, distribution, metabolism, excretion (ADME) outcomes and proving physiological relevance and cellular health within the co-culture environment rather than a focus on clear toxicological endpoints.

Choucha et al.
[23] also present a co-culture system and demonstrate the key role of liver metabolism for assessing toxicity by modelling ifosfamide toxicity via its' metabolite chloroacetaldehyde.Chang et al. [24] further reinforce the importance of accounting for hepatic metabolism showing that the biotransformation of AA-I by liver enzymes increases kidney toxicity and conversely that inclusion of an organic anion transporter inhibitor (probenecid) attenuates uptake by human cell cultures in a microphysical system.This implies that incorporation of a liver model or compartment in, for instance, an organ-on-chip or other system would greatly advance predictivity and in vivo relevance for humans.

Future perspectives
Our systematic review of the literature had an overall inclusion rate of 3.3%, naturally leading to questions over the use of our methodology to identify relevant in vitro literature.One of the key causes of this low inclusion rate was the lack of follow-up articles to abstracts, but it could also be explained by stringent inclusion and exclusion criteria and/or a restrictive search strategy with an opening for a significant contribution by the semantics of in vitro modelling versus preclinical modelling while searching the PubMed and Embase databases.
This failure of the preclinical pipeline strongly highlights the need for improved, physiologically relevant in vitro models that can better serve as reliable drugscreening and disease modelling tools.Here, we show the improved ability of more advanced models to predict DIKI during drug development.However, there is still a clear lack of specific indicators of DIKI that can be reliably measured in vitro.We identified a number of techniques, such as the use of microphysical culture systems and the integration of in silico modelling, that have high potential to fulfil this role using a novel evaluation metric.The models that scored the highest all used a combination of human cells that are reflective of the in vivo environment and multiple, diverse outcomes.This was exemplified in the work by Adler et al.
[25] using both HO-1 and traditional cell death/viability assays to predict DIKI caused by nephrotoxic compounds from the DrugMatrix and TG-GATES databases.Our results clearly demonstrate the viability of expanding toxicity assays beyond single-endpoint measures of cell death.With special regard to kidney toxicity, analysis of a combination of cell health parameters and the use of a physiologically relevant cell line are key aspects for successfully developing an assay to predict drug-induced nephrotoxicity.
However, advanced in vitro models will have to balance physiological relevance and enhanced predictivity with factors such as ease of use and high-throughput analysis to enter into widespread use.Our results suggest that a lack of standardisation of experimental approaches coupled with an incomplete understanding of the pathology makes modelling of DIKI incredibly challenging.Furthermore, methods for and the quality of narrative reviews of in vitro studies are highly variable, and improved reporting standards are needed.Our validation metric attempts to do this; however, the metric itself needs further validation.
Any validation and uptake of in vitro models as animal replacement models takes time.Translation to the clinical setting requires both scientific excellence and quality assurance.Here, our quality assessment plays a role, and alongside the outcomes, compounds, and the physiological features in Table 1 provide the basis of a validation strategy for those seeking to develop a robust in vitro model of DIKI.
Of these common features, it is the integration of in silico approaches that is of paramount importance.Doing so allows the adoption of endpoints such as those presented in Table 1, other parameters of cell condition and endpoints from the clinical and regulatory setting.
As such, models rely on high-quality 'known' data points to form predictions based on, that is, characteristics, biological complexity and experimental scalability both essential for producing meaningful and robust data sets to allow the integration of in silico predictive models.With this in mind, there is an argument that early integration of regulatory compliance (using recently qualified biomarkers) can help drive standardisation and translation of in vitro data to humans.The integration of such techniques requires effective input from all stakeholders within the DDP, that is, regulators, investigators, clinicians and patients, as well as the collaboration of diverse academic disciplines to move in vitro innovation forward.

Declaration of competing interest
The authors declare the following financial interests/ personal relationships which may be considered as potential competing interests: R.M. reports financial support was provided by Utrecht University, by H2020-MSCA-COFUND-2017-801540 RESCUE COFUND and by ONTOX, grant number 963845 of the European Commission under the Hori-zon2020 research and innovation framework programme.
differences in renal drug clearance.Drug Discov Today 2020, 25:706-717.The authors present a systematic review of the available literature of animal models used to study pharmacokinetics in the pre-clinical phase of drug discovery with a focus on the 20 renally excreted drugs.The authors demonstrated that rats despite being the most commonly used model are an inadequate species for preclinical drug clearance testing.
and no full article (1), meaning 57 studies have been included for full analysis (figure S.1).Two conference abstracts, by Silva et al. (2017) and Cappandona et al. (2017), returned by the search strategy subsequently released full texts, and these were included for analysis.The latter study changed the first author and has been changed to Milansi et al. (2018) in this write-up.

Figure 2 Regions
Figure 2

Figure 3 The
Figure 3

Table 1
Relevant endpoints, reference drugs, measurement methods and parameters to assess organ functionality.
A cytosolic enzyme present in many different cell types that is released into the cell culture medium on damage to the plasma membrane.Live/death assay [19,20,24] Quick and simple three-colour assay to measure cell viability.Interestingly, this was not found to be a strong predictor for predictive in silico models