Development of Web and Mobile Applications for Chemical Toxicity Prediction

Computational tools are recognized to provide high-quality predictions for the assessment of chemical toxicity. In the recent years, mobile devices have become ubiquitous, allowing for the development of innovative and useful models implemented as chemical software applications. Here, we will briefly discuss this recent uptick in the development of web-based and mobile applications for chemical problems, focusing on best practices, development, usage and interpretation. As an example, we also describe two innovative apps (Pred-hERG and Pred-Skin) for chemical toxicity prediction developed in our laboratory. These applications are based on predictive quantitative structure-activity relationships (QSAR) models developed using the largest publicly available datasets of structurally diverse compounds. The developed tools ensure both highly accurate predictions and easy interpretation of the models, allowing users to discriminate potential toxicants and to purpose structural modifications to design safer chemicals.


Introduction
Chemical safety assessment is a fundamental step in development and regulation of drugs, cosmetics and other chemicals. 1In the last few decades, society has less tolerated animal testing. 2 As a consequence, legislations, guidelines, and practice of animal experiments have been implementing the principles to reduce, refine, and replace animals used in laboratory.As a prime example, in vivo evaluation for cosmetic products is forbidden in Europe since 2003 3 and the sale of cosmetics and ingredients tested on animals are banned since 2013. 4In Brazil, since 2015, alternative methods recognized in the National Regulatory Agency (ANVISA) are sufficient for the approval of chemicals 5 and animal testing will not be allowed for endpoints with approved alternative methods (e.g., eye irritation, skin irritation, acute toxicity, etc.) starting from 2019. 6omputational methods have become an effective alternative method for the evaluation of new or untested compounds, such as drug candidates, cosmetics and pesticides, providing high-quality predictions at low-cost and reasonable time. 7,8Toxicity prediction often relies on structural alerts, 9 read-across, 10 and quantitative structureactivity/property relationships (QSAR/QSPR) models. 11,12tructural alerts are molecular substructures that are associated with a particular adverse biological effect. 13ead-across is a technique that extrapolates data based on structural similarity (usually using structural alerts) to previously measured compound(s) for those lacking experimental data. 14This method has earned prominence from a regulatory perspective 15,16 due to its simplicity and transparency. 17cientific applications implemented in a web interface (web apps) are an interesting alternative to standalone software applications, since they are ready to use, fast, and they usually present an intuitive interface.Several web apps to solve chemical and molecular modeling problems have been proposed through the years, such as the Platform for Unified Molecular Analysis (PUMA), 18 SWISS-MODEL, 19,20 Chembench, 21 Pred-hERG, 22,23 Pred-Skin, 24 among others.Conversely, to a lesser extent, chemistry mobile apps have also been developed and they tend to see a rapid uptake in the next few years since smartphones and tablet computers are becoming ubiquitous. 25Such apps may be useful for education and to improve chemists' productivity. 26,27he Green Solvents mobile app is one of the first chemistry apps.It is based on a dataset of solvents containing several informations to serve as a solvent guide for chemists. 28The TB Mobile app provides information on over 700 molecules screened versus Mycobacterium tuberculosis from the Collaborative Drug Discovery (CDD) database.The app also performs similarity search, which can infer potential targets from similar structures. 29The Lead Designer app 30 provides a tool for drawing compounds and predict permeability through blood-brain barrier membrane, CYP3A4 binding potency, and some physical-chemical properties.
Recently, our group has developed and implemented predictive QSAR models in two distinctive tools available as a web app (Pred-hERG) 23 and web and mobile apps (Pred-Skin), 24 for the fast prediction of hERG cardiac toxicity and skin sensitization, respectively.Here, we aim to describe the details behind the development of these two innovative and impactful web-based and mobile applications for chemical toxicity prediction.

Chemical Toxicity Prediction Using Apps
Chemical toxicity prediction usually relies on chemical similarity, by using structural alerts and read-across, as well as QSAR models. 31Structural alerts are molecular fragments that are associated with a positive response for a particular endpoint. 9Chemical read-across uses chemical similarity to assess the effect of untested chemicals. 10,32astly, QSAR models are a major computational approach that uses statistical or machine learning algorithms to establish a correlation of molecular representation to activity and can be used to provide statistically significant prediction of chemicals. 33 reliable mobile or web server for chemical toxicity prediction should be based on highly predictive and externally accurate models.As we and collaborators have recently shown, 31 structural alerts are oversensitive, disproportionally flagging chemicals as toxic.Therefore, the use of statistically validated QSAR models are preferred.The best practices for development and validation of QSAR models have been recommended by the Organization for Economic Co-operation and Development (OECD) 34 and elaborated by Tropsha. 35Briefly, any QSAR model should have "(i) a defined endpoint; (ii) an unambiguous algorithm; (iii) a defined applicability domain; (iv) appropriate measures of goodness-of-fit, robustness, and predictivity; (v) and, if possible, a mechanistic interpretation."In addition, the reproducibility crisis has questioned the quality of experimental data published in peer-reviewed journals. 36][39] The workflow we have adopted for the development of the web apps is shown on Figure 1.First, we compile the biggest dataset available for the particular endpoint, following by the curation protocol proposed by Fourches et al. [37][38][39] Then, models are developed following the best practices for model development and validiation. 35eb-based and mobile apps are then developed (see details in next section).Finally, apps are made available for use of the scientific community to predict new compounds using an intuitive tool with highly confident predictions and graphical interpretation.
One of the advantages of building apps based on QSAR models is the wide applicability of these models through different industrial chemical classes.As recently reported, 40 QSAR models generated for cosmetics, drugs and pesticides can be used interchangeably, i.e., a model developed using mainly drugs and drug-like compounds, can be used also to evaluate cosmetics and pesticides since a QSAR model built mainly using cosmetics compounds usually shares the same chemical space of most drugs or even pesticides.This is of high value for endpoints such as skin sensitization, as many industrial products (e.g., cosmetics, drugs, pesticides, food additives, etc.) could be in touch with the skin and cause allergic reactions. 413][44] This is especially true because chemical space has faced a substantial growth in the last decade.For instance, the Chemical Abstract Services has reached 100 million chemical substances in 2015, 45 and the current version of the ZINC database 46 contains over 35 million purchasable compounds.

Development of Web-Based and Mobile Applications
The development of integrative web-based and mobile applications running machine learning routines written in Python is possible by using Flask. 47Flask is a small framework for creating web microframeworks.Flask was created from scratch as an extensible framework with a solid core of essential services and easy integration with Python extensions.The template support of Flask is provided by Jinja2, while the Web Server Gateway Interface (WSGI) subsystems is based on the Werkzeug toolkit.Usually web applications features such as validation of webforms, user authentication, and access to databases are employed through extensions.The main drawback of Flask is the lack of support for projects dealing with a large source file, leading the structuration of the app entirely to the developer.
The implementation of enormously complex workflows for cheminformatics applications written in Python with Flask is possible by integrating RDKit 48 and scikit-learn.
RDKit is an open-source cheminformatics application programming interface (API) developed by Gregory Landrum at Novartis Rational Discovery for building predictive models for pharmacokinetics properties, toxicity, and biological activity.RDKit can be used to calculate molecular descriptors, such as Morgan fingerprints (extended circular fingerprints like), 49,50 and to enable the generation of appropriate molecular representations for developing QSAR models using machine learning algorithms.The scikit-learn API, a combination of NumPy, SciPy, and matplotlib packages, can run machine learning scripts, perform datasets splits, data curation, model generation, and calculation of the appropriate statistical measures in conjunction with 5-fold external validation process, allowing an appropriate evaluation of the generated models.
Python routines integrating developed machine learning models using RDKit and scikit-learn works in the back-end of the app (Figure 2). 51,52All the functionalities are orchestrated by Flask (back-front-end), which is responsible to call the individual modules in the back-end and interact with the user in the front-end, using the responsive web-based templates in HTML5 and JavaScript (front-end).The features described here were used to develop the two innovative web apps, named Pred-hERG, to identify possible hERG blockers and non-blockers, and Pred-Skin, for assessment of skin sensitization potential of chemicals (see Pred-hERG and Pred-Skin sections).

Pred-hERG
The Pred-hERG is a web app that allows users to predict blockers and non-blockers of the hERG channels, and important drug anti-target associated with lethal cardiac arrhythmia. 53This app has a fast and intuitive interface.We implemented binary (blocker vs. non-blocker) and multi-classification models, which are able to distinguish weak/moderate and strong/extreme blockers.We also implemented the probability maps of atomic contribution as predicted by the models, allowing users to interpret the results and propose structural modifications for the predicted compound.The current version of the app (v.4.0) was developed using ChEMBL 54 version 23, containing 8,134 compounds with hERG blockage data after curation.This app is publicly available at the website. 55

Pred-Skin
The Pred-Skin 24 is an open source web-based and mobile application for evaluation of skin sensitization potential using externally validated QSAR models based on human and animal (local lymph node assay, LLNA) data.The app represents a benchmark for the prediction of skin sensitization, since it is the first tool to provide predictions from models based on human data.Predictions for a single compound are produced within a few seconds.The following outputs are provided: (i) binary predictions of human and murine skin sensitization; (ii) multiclass predictions of murine skin sensitization potency; and (iii) probability maps illustrating the predicted contribution of chemical fragments.This app is freely available to the public at the website 56 and at the App Store.

Usage and Interpretation of Apps for Toxicity Prediction
There is a strong need for the development of userfriendly tools that would allow non-experts to usually predict their compounds of interest and also visualize important structural features related to increase or decrease the activity/toxicity of these compounds.QSAR models have been commonly referenced as a non-interpretable approach. 57This common sense has led experimentalists and regulatory agencies to prefer the use of more simple and transparent approaches, such as structural alerts or read-accross. 580][61][62][63][64] These substructures derived from the interpretation of QSAR models can be used to design novel compounds with improved toxicity profile. 65illing to provide predictive and interpretative tools for toxicity prediction, we have been working to implement highly predictive QSAR models in web and mobile apps with intuitive usage and interpretation.Both Pred-hERG and Pred-Skin allows the user to make predictions by pasting SMILES strings or drawing molecules in the JSME 66 molecule editor.Alternatively, the user can load .sdfor .molfiles.After hitting the "Predict" button, the user will receive the QSAR predictions on the computer/ mobile screen.
In addition, we have implemented the probability maps proposed by Riniker and Landrum 51 in our apps.This feature provides a graphical visualization of the predicted fragment contribution, allowing the user to interpret the prediction and to design safer compounds.The atoms are highlighted according to their predicted activity contribution.Atoms and molecular fragments highlighted in green represent a positive contribution (increase in the toxicity potential); whereas, pink fragments represent a negative contribution (decrease in toxicity potential).The gray fragments do not contribute to toxicity potential and gray isolines define the frontier between the positive (green) and the negative (pink) contributions.

Final Remarks
Despite the progress made in the last decade to predict toxicity endpoints, we observe that these efforts still do not fully guarantee that all new chemicals do not induce toxicity or harzard to human health.Therefore, the development of alternative methods, including in silico or  52 (1) front-end using the responsive web-based templates in HTML5 and JavaScript for the user to draw/paste the chemical; (2) RDKit ECFP-4 fingerprint (Morgan) representation is calculated on the back-end; (3) scikit-learn API machine learning model runs on the back-end; (4) the front-end responsive framework templates based on Flask is rendered with the models predictions and the probability maps. 51omputational methods for addressing toxicity potential of chemicals is of upmost importance.Machine learning methods, such as modern QSAR approaches have become more powerful due to the rapid expansion of bioactivity and toxicity data available in chemical databases such as TOXNET, 67,68 ChEMBL, 69 and PubChem. 70,71In this paper, we have discussed and reviewed the development of web-based and mobile applications for chemical problems, focusing on best practices, development, usage, and application.We have also described two innovative and freely accessible web-based and mobile applications for evaluation of potential hERG blockers and skin sensitizers developed in our laboratory.These applications are based on robust and predictive QSAR models developed using the largest publicly available data sets of structurally diverse compounds.The freely available Pred-hERG and Pred-Skin web apps ensure both highly accurate predictions and easy interpretation of the models, allowing users to perform rapid screening of large libraries of virtual compounds to discriminate potential toxicants as to purpose structural modifications to design safer chemicals.toxicity properties of chemical compounds are also her field of expertise.She is CNPq productivity research fellow since 2012.In 2014, she was awarded with the "For Women in Science" award from L'Oréal-ABC-UNESCO and in 2015 she received the "International Rising Talents" from L'Oréal-UNESCO.In 2016, she was elected Affiliated member of the Brazilian Academy of Sciences.Currently, she is vice-director of the Medicinal Chemistry Division of the Brazilian Chemical Society.

Figure 1 .
Figure 1.General workflow for development of QSAR models and their implementation for web and mobile applications for toxicity prediction of chemical compounds.

Figure 2 .
Figure 2. Pred-Skin front-back-end technology (full-stack).52(1) front-end using the responsive web-based templates in HTML5 and JavaScript for the user to draw/paste the chemical; (2) RDKit ECFP-4 fingerprint (Morgan) representation is calculated on the back-end; (3) scikit-learn API machine learning model runs on the back-end; (4) the front-end responsive framework templates based on Flask is rendered with the models predictions and the probability maps.51