Expansion of the application domain of a macromolecular ocular irritation test (OptiSafe™)

The OptiSafe (OS) test is shelf-stable, macromolecular eye irritation test that does not include any animal ingredient or component (“vegan”). The purpose of this study was to evaluate the test’s accuracy for an expanded application domain for both the original and recently updated OS method. This study involved the testing of additional ocular corrosives and previously excluded foaming agents (“surfactants”) using both the original and updated OS methods and then combining these data with prior validation data for a total of 147 chemicals. Predictivity was evaluated by a statistical comparison of the OptiSafe predictions with historical in vivo “Draize” rabbit eye data for the same chemicals (from public databases). We report that for the detection of chemicals not requiring classification for eye irritation [Globally Harmonized System of Classification and Labeling of Chemicals (GHS) No Category], the accuracy, specificity, and sensitivity were 92.8%, 79.6%, and 100.0%, respectively, for the updated method; for the detection of chemicals inducing extreme eye damage/corrosion (GHS Category 1), the accuracy, specificity, and sensitivity were 79.4%, 71.8%, and 91.7%, respectively, for the updated method. Results indicate that both the original and updated methods have a high accuracy for the expanded application domain that included ocular corrosives and surfactants.


Introduction
Chemically induced eye damage has traditionally been evaluated using a live rabbit eye test. This test ("Draize test") involves instilling 100 μL of the substance under evaluation within the conjunctival sac of a live New Zealand White rabbit (Draize et al., 1944). Indices of toxicity (redness, swelling, opacity of the cornea, discharge, lesions) are clinically graded and recorded for the cornea, iris, and conjunctiva daily for up to 21 days (Luechtefeld et al., 2016;OECD, 2021a). Based on these toxicity outcomes, chemicals can be classified using the Globally Harmonized System of Classification and Labeling of chemicals (GHS) of eye irritation classification (UN, 2021). The GHS classification system includes the category "not classified" (NC; chemicals that do not induce significant levels of eye irritation or damage averaged over the first three days after exposure) (UN, 2021), Categories 2B and 2A "ocular irritants" (chemicals that induce significant irritation over the first three days but then the irritation reverses prior to 7 days (2B) or 21 days (2A), [UN, 2021]), and Category 1 "ocular corrosives" (chemicals that cause an extreme response or corrode and permanently damage the eye) (UN, 2021).
Since the use of live animals for routine product testing is not consistent with efforts to "reduce, refine, and replace" animal studies (Liebsch et al., 2011), there is a strong shift from animal testing toward the use of nonanimal test methods (Humane Society, 2013). A number of nonanimal eye toxicity tests have been recognized by the Organization for Economic Cooperation and Development (OECD) for which test guidelines have been established, including in vitro cell culture-based tests that apply test substances to a cell monolayer and determine if there is a reduction in viability using the 3-(4,5dimethylthiazol-2-yl)-2,5-diphenyltetrazolium bromide (MTT) assay. [Short Time Exposure (STE) test] and ex vivo tests that use food-source animal eyes, which includes the Isolated Chicken Eye (ICE) and the Bovine Corneal Opacity and Permeability (BCOP) tests (OECD, 2018(OECD, , 2020aVerstraelen et al., 2017;Lebrun et al., 2021). For both types of tests, materials are applied to the corneal surface, and damage is assessed by measuring corneal opacity, swelling (ICE), and fluorescein staining (ICE) or fluorescein permeability (BCOP). Additionally, OECD test guidelines exist for in vitro reconstituted human corneal epithelial (rhCE) tests, in which materials are applied to 3D epithelial cultures (EpiOcular ™ , SkinEthic ™ , and MCTT HCE ™ Eye Tests), and viability is measured using the MTT assay (Pfannenbecker et al., 2013;OECD, 2019a;Kandarova et al., 2018;Van Rompay et al., 2018;Lim et al., 2019). Recently, a new protocol (Time-To-Toxicity; TTT) was developed for the reconstituted corneal epithelium equivalent test SkinEthic. This procedure involves dosing tissues for 6, 16, and 120 min for liquids or 30 and 120 min for solids and using MTT-based viability data to predict the level of irritation (Alépée et al., 2021(Alépée et al., , 2022. There is also a long list of other alternative tests that have not received OECD acceptance (Bagley et al., 1999;Hafner, 2000;Piehl et al., 2011;Bartok et al., 2015;Spöler et al., 2015;Adriaens et al., 2018;Araujo Lowndes Viera et al., 2022).
Emerging alternatives to animal testing are the "macromolecular" tests. These test tube ("in chemico") methods are highly standardized and shelf-stable options for the assessment of eye irritation potential. Opti-Safe ™ ("Optimized for Safety") is a shelf-stable in chemico test method that consists of a proprietary macromolecular test matrix that is used to quantify the potential of an unknown test material to cause eye irritation or eye damage. Damage to macromolecules results in the loss of cell viability, and the extent of measured macromolecular damage is used to predict the toxicity of the material being tested (Lebrun, 2018;2021;Lebrun et al., 2021aLebrun et al., , 2021bLebrun et al., 2022;Choksi et al., 2020;Lebrun et al., 2019Lebrun et al., , 2021aLebrun et al., , 2021bLebrun et al., , 2022aLebrun et al., , 2022b. To conduct the test, materials to be tested are added to "ocular discs" to control the delivery of the chemical to be tested as it enters the macromolecular reagent mixture (insoluble materials are floated instead of placed within membrane discs). Results are read using a spectro-photometer and the optical density (OD; at 400 nm) and pH values are compared with quality controls and a standard curve to calculate the irritation score. The score is then applied to a prediction model to classify the material tested. For a detailed description, see Lebrun et al., 2022a. A validation study of the OS test was coordinated by the National Toxicology Program Interagency Center for the Evaluation of Alternative Toxicological Methods, with members of Interagency Coordinating Committee on the Validation of Alternative Methods (ICCVAM) as the validation management team that assessed the test's accuracy for the detection of nonirritants versus irritants (GHS NC versus the rest). Foaming materials ("surfactants") were excluded from the study, and there were only 11 GHS Category 1 chemicals. For the three-lab blind transferability phase, when results were combined based on the majority classification for the GHS classification, the accuracy was 89%, the false-negative (FN) rate was 0%, the false-positive (FP) rate was 23%, the sensitivity was 100%, the specificity was 77%, and the balanced accuracy was 88.5% (Choksi et al., 2020). Additional chemicals were selected by ICCVAM for evaluation of the test's application domain; these materials were tested by the lead lab only in the "application domain phase." Based on overall results from the three-lab blind transferability study and the blind application domain study, the test method accuracy for the GHS system was 80%, the FN rate was 0%, the FP rate was 40%, the sensitivity was 100%, the specificity was 60%, and the balanced accuracy was 80% (Lebrun et al., 2021a).
We then studied the overpredicted chemicals (Lebrun et al., 2021a) and found that chemicals associated with reactive oxygen species (ROS) chemistry were commonly overpredicted (Lebrun et al., 2021b(Lebrun et al., , 2022a. Based on the hypothesis that naturally occurring antioxidants in tears can deactivate ROS prior to eye damage, we then tested a range of tear antioxidants and identified that ascorbic acid reduces the OptiSafe FP rate (Lebrun et al., 2021b(Lebrun et al., , 2022a and then compared results for the same chemicals with and without ascorbic acid (Lebrun et al., 2022b). We then updated the prediction model, the OS physiochemical handling procedures (PCHPs) and also comprehensively determined the impact of adding an antioxidant to the OS test matrix (Lebrun et al., 2021b(Lebrun et al., , 2022a. All prior OS-coded validation study chemicals were then retested (Lebrun et al., 2022a). These chemicals included the prior coded validation chemicals described above (Choksi et al., 2020;Lebrun et al., 2021a). Based on the retesting of chemicals, for the detection of GHS NC, the addition of an antioxidant to levels found in tears lowered the FP rate from 40.0% to 22.2%. The FN rate remained the same at 0.0% (100% sensitivity), and the overall accuracy improved from 80.3% to 89.2% (Lebrun et al., 2022a); however, this study only evaluated the accuracy for GHS NC chemicals versus the rest, and no surfactants were tested.
The purpose of the current study was to expand these results by increasing the number of surfactant and extreme/ocular corrosive (GHS Category 1) chemicals tested with and without ascorbic acid to determine the accuracy of the detection of "nonirritants" (GHS Category NC), irritants (GHS Categories 2B and 2A), and extreme/ocular corrosives (GHS Category 1), bringing the total number of chemicals tested in triplicate for the new updated method to 147 (59 UN GHS Category 1, 37 UN GHS Category 2, and 51 UN GHS NC).

Methods
The OS macromolecular test was conducted as previously described (Choksi et al., 2020;Lebrun et al., 2021aLebrun et al., , 2022b and is briefly described below.

OS protocol
The OS test is packaged as a complete kit (Lebrun Labs LLC, Anaheim, CA), which includes the OS "Active Agent" (AA) macromolecular reagent formulation (see Lebrun et al., 2022a for a description). As described in prior publications (Lebrun et al., 2021b(Lebrun et al., , 2022a(Lebrun et al., , 2022b, samples were initially evaluated using a set of Physiochemical Handling Procedure (PCHP) pretests (see Lebrun et al., 2022a). Specific physiochemical properties (solubility, buffering capacity, and foaming) of the material to be tested are measured during the pretest step. Based on these physiochemical properties, there are specific changes in the protocol that improve sensitivity and accuracy (Lebrun et al., 2021b(Lebrun et al., , 2022a(Lebrun et al., , 2022b. For substances with significant pH buffering, the buffering power was further evaluated, and the protocol was adjusted to match; testing of insoluble materials does not use membrane discs and surfactants are diluted. For the updated version of the test, L-ascorbic acid at in vivo levels (530 μM, Sigma-Aldrich, Milwaukee, WI, Catalog number A5960) was added directly to the AA formulation (see Lebrun et al., 2022a). The updated version of OS with ascorbic acid has an updated prediction model (Table 1).
Besides the updates for the updated version with ascorbic acid, the standard protocol was followed; five increasing doses of test chemicals were titrated onto an "ocular disc," which controls the delivery of the chemical to be tested as it enters the reagent mixture. After incubation at 31 °C, the OD is measured with a spectrophotometer. The resulting OD and pH values are compared with quality controls and a standard curve to generate a score (see Choksi et al., 2020 for procedure details and flow chart).
Scores are assigned classifications based on the older (Choksi et al., 2020;Lebrun et al., 2022a) or new unified OS prediction model (Lebrun et al., 2022a). For the new prediction model (Table 1), a score of 15 or less predicts GHS NC, and an OS score >45 predicts a GHS Category 1. Scores that fall within these cutoffs are predicted as GHS Category 2A/ 2B (Lebrun et al., 2022a).
In some cases (when the dose-response curve is nonlinear and for some buffering capacity test outcomes), the OS method does not differentiate between UN GHS Categories 1 and 2 (UN, 2021). Test chemicals that are predicted as Category 2/1 should only be used for GHS NC versus the rest analyses and must therefore be subsequently tested by another method to establish a definitive UN GHS Category 2 or 1 Classification. In other cases, the OS method cannot provide results, termed "criteria not met"" (CNM). CNM occurs when the photometric range of the spectrophotometer has been exceeded or when there is an inverse dose-response curve below the irritant cut-off (Lebrun et al., 2022a(Lebrun et al., , 2022b.

Test chemicals
All test chemicals had existing publicly available in vivo "Draize" rabbit eye data to which OS results are compared (no new animal testing was performed). Specific references for in vivo classifications are provided as part of the tables. Newly tested chemicals are shown in  Choksi et al., 2020). Surfactants were selected by the OS validation management team for the prior study (Choksi et al., 2020) but were not tested as part of that study because surfactants were previously excluded based on the OS PCHP foaming test (Choksi et al., 2020). As shown in Table 2C, Forty-eight additional GHS Category 1 chemicals were newly tested, bringing the total number of GHS Category 1 chemicals tested in triplicate to 59, thereby allowing for a more significant evaluation of the ability of the test to detect GHS Category 1.

Statistics
For statistical evaluation, data for the newly tested chemicals were combined with data for the previously tested chemicals. The additional testing increased the n from 78 (Lebrun et al., 2022a) to 147 (total n for this paper). OS results were compared with the in vivo results to determine the predictive capacity. In vivo GHS classifications for each chemical were obtained from historical databases (no new in vivo testing was done) of live animal "Draize" test eye irritation/damage data (Draize et al., 1944); specific references for each chemical are provided as part of the tables (Tables 3A -3D).

GHS NC vs. the rest calculations
The accuracy was calculated as the number of correctly predicted in vivo positives (GHS Category 2 or 1) predicted by OS to be positive (Category 2 or 1) added to the number of in vivo negatives (NC) identified by OS to be negative (NC) divided by the total number of chemicals. The sensitivity was calculated as the total number of in vivo Category 2 or 1 chemicals that were correctly predicted by OS to be Category 2 or 1 divided by the total number of in vivo Category 2 or 1 chemicals. The specificity was calculated as the total number of in vivo NC chemicals that were correctly predicted by OS to be NC divided by the total number of in vivo NC chemicals. The FN rate was calculated as the total number of in vivo Category 2 or 1 chemicals that were predicted by OS to be NC divided by the total number of in vivo positives (Category 2 or 1). The FP rate was calculated as the total number of in vivo NC chemicals predicted by OS to be Category 2 or 1 divided by the total number of in vivo negatives (NC).

GHS Category 1 vs. the rest calculations
The accuracy was calculated as the number of in vivo Category 1 chemicals predicted to be Category 1 by OS plus the number of not Category 1 chemicals predicted to be not Category 1 by OS divided by the total number of chemicals. The sensitivity was calculated as the total number of in vivo Category 1 chemicals that were correctly predicted by OS to be Category 1 divided by the total number of in vivo Category 1 chemicals. The specificity was calculated as the total number of in vivo Category 2 or NC chemicals that were correctly predicted by OS to be Category 2 or NC divided by the total number of in vivo Category 2 or NC chemicals. The FN rate was calculated as the number of in vivo Category 1 chemicals that were predicted by OS as NC or Category 2 divided by the total number of in vivo positives (Category 1). The FP rate was calculated as the total number of in vivo NC or Category 2 chemicals that were predicted by OS to be Category 1 divided by the total number of in vivo negatives (NC).

Results
Triplicate results for the OS and updated OS "transferability" and "application domain" studies have previously been published (Choksi et al., 2020;Lebrun et al., 2022b), and the consensus prediction for these triplicate results are provided as Supplemental Data Table  S1. In this context, "consensus result" means the majority prediction of the three repeats (either 2/3 or 3/3). Consensus results for the past OS retrospective study (Choksi et al., 2020) are compared in Table 3A with the newly tested individual triplicate results for the same chemicals tested with the updated OS. Tables 3B and C show the new triplicate results for both OS and updated OS for the "expanded corrosive set" (38 additional GHS Category 1 chemicals) and "surfactant set" (12 surfactant chemicals), and additional chemicals (3D), respectively. All of the compiled consensus results were then used for predictivity analysis. Table 4 shows the in vivo results in the column on the left and the OS and updated OS results in the row at the top. This table allows for a comprehensive comparison of OS (Table  4A) and updated OS (Table 4B). Tables 4A and 4B, NC overpredictions decreased for the updated OS compared with OS. For OS, there were 18 FPs and 3 CNM. Of the 18 FPs, 11 were overpredicted as Category 2B/2A, 6 as Category 1, and 1 as Category 2/1. For updated OS, there were 10 FPs and 2 CNM. Of the 10 FPs, 6 were overpredicted as Category 2 B/A, 2 as Category 1, and 2 as Category 2/1. 1,3-Di-isopropylbenzene (99-62-7) was predicted as Category 1 with OS and as Category 2/1 with updated OS. The following NC predictions that were overpredicted by OS as Category 2 were correctly predicted by updated OS as TNs: 1,9decadiene (1647-16-1), 2,2-dimethyl-3-pentanol (3970-62-5), triethylene glycol (112-27-6), 2-(2-ethoxyethoxy)ethanol (111-90-0), n,n-dimethylguanidine sulfate (598-65-2), Tween 80 (9005-65-6), Tween 20 (9005-64-5), and styrene (100-42-5) (corrected from an OS Category 1 to NC with updated OS). 2-Ethoxyethyl methacrylate (2370-63-0) and sodium lauryl sulfate (3%) (151-21-3) were predicted by OS as Category 1 and by updated OS as Category 2. Also note that 1,5-hexadiene (592-42-7) was excluded by OS due to the lack of consensus between the repeats and was predicted as an NC with updated OS.

As shown in
For both OS and updated OS, there were no Category 2 B/A under-predictions (see Tables 4A and 4B). However, the GHS in vivo Category 2 B/A overpredictions differed between OS and updated OS. For OS, 14 chemicals were predicted as Category 2 B/A, 20 were overpredicted as Category 1, 2 were predicted as Category 2/1, and 1 was CNM. For updated OS, 11 chemicals were predicted as Category 2 B/A, 20 chemicals were overpredicted as Category 1, 3 were predicted as Category 2/1, and 3 were CNM. n,n-Diethyl-m-toluamide (134-62-3) and chlorhexidine digluconate solution (18472-51-0) were predicted as Category 2 with OS and as Category 1 with updated OS. Maneb (solid) (12427-38-2) was predicted as a Category 1 with OS and CNM with *OS; blanks exceeded the acceptance criteria of OD 0.850. Sodium benzoate (532-32-1) was predicted as Category 1 with OS and Category 2/1 with updated OS due to a negative dose-response curve, indicating assay inhibition. Sodium lauroyl sarcosinate (10%) (137-16-6) was predicted as a Category 2 with OS and CNM with updated OS; the resulting score for updated OS (CNM) was under the irritancy cut-off and had a negative dose-response curve.
Tables 5A and 5B, respectively, show the predictive capacity for OS and updated OS for the detection of GHS NC versus the rest. For NC versus the rest, both OS and updated OS have a very high sensitivity (100%) and low FN rate (0%). The FP rate is cut almost in half for updated OS compared to OS (compare Tables 4A and 4B), which results in an improved accuracy for updated OS compared with OS (92.8% vs. 86.8%). Balanced accuracy corrects for the ratios of positives and negatives. The balanced accuracy for OS is 81.3% and the balanced accuracy for updated OS is 89.8%. The superior balanced accuracy of updated OS is best attributed to the improvement (reduction) in the updated OS FP rate compared to OS, since both have a zero FN rate (100% sensitivity).
Tables 5C and 5D show the predictive capacity for OS and updated OS, respectively, for the detection of GHS Category 1 versus the rest. For Category 1 versus the rest, the sensitivity was improved for updated OS compared with OS (91.7% vs. 82.2%, respectively). Also, the specificity was better for updated OS compared with OS (71.8% vs. 67.9%, respectively). These improvements resulted in a better accuracy of 79.4% for updated OS compared with an accuracy of 73.0% for the older version of OS.

Discussion
In this study, we assessed the ability of the original and updated versions of the OS test to predict the ocular irritation potential of 59 ocular "corrosives" (GHS Category 1), 37 ocular "irritants" (GHS Category 2), and 51 not classified as ocular irritant chemicals (GHS NC). This set of chemicals included surfactants representative of all GHS levels of classification.
Remarkably, for all of the OS and updated OS studies, there has not been a consensus GHS NC versus irritant FN. This finding is in contrast to other nonanimal ocular irritation test predictive capacities for GHS NC, including the BCOP Laser-Light Based Opacitometer, ICE, EpiOcular ™ Eye Irritation Test (EIT), Ocular Irritection ® (OI), and STE eye irritation tests (Lebrun et al., 2021a). As shown in Table 6A, other OECD-accepted nonanimal EITs have sensitivities that range from 88% to 97%. While the OECD Test Guideline 437 lists the BCOP OP-KIT as having a 100% sensitivity, in more recent studies, the BCOP (OP-KIT) was found to have a sensitivity closer to 88.0%-93.3% (OECD, 2020a;Lebrun et al., 2021a).
A goal of this study was to determine the OS and updated OS predictive capacity for the detection of extreme/corrosive chemicals (GHS Category 1) and thereby expand the application domain to this important class of chemicals. Table 5 shows the accuracy of 73.0% (Table 5C) for the original OS test, and Table 5D shows the accuracy of 79.4% for the updated OS. Table 6B compares these values of other EITs with those of the OECD guidelines. Compared with these other tests, updated OS has the highest sensitivity and balanced accuracy.
As shown in Table 7, the 8.3% GHS category 1 underprediction rate for the updated OS was for GHS Category 2; No GHS category 1 chemicals were underpredicted as UN GHS NC. Also shown in Table 7, there is a high overprediction rate for UN GHS Category 2 to be overpredicted as UN GHS Category 1. Therefore, while "not causing eye damage (Not UN GHS Category 1)" predictions have high sensitivity and can be accepted without further testing, UN GHS Category 1, CNM and Category 2/1 predictions would be subsequently tested using other adequately validated in vitro test(s). The current "state of the art" is to use multiple tests in a tiered-testing approach using a "bottom-up" or "top-down" series (Scott et al., 2010;Alépée et al., 2019aAlépée et al., , 2019bOECD, 2019). We propose that OS is better suited for a "bottom-up" GHS Category 1 strategy, which may find utility in cases where only a single test to rule out the potential for eye damage (GHS Category 1) can be conducted following a "bottom-up strategy." Since the "top-down strategy" employs multiple tests with low sensitivity but high specificity to sequentially identify Category 1 (Scott et al., 2010;Alépée et al., 2019aAlépée et al., , 2019bOECD, 2019), in some cases, the requirement to conduct multiple tests may be viewed as complicated and expensive. In addition, there are situations where a quick answer about whether a material causes eye irritation or damage would be useful. Since OS is shelf stable and ready to use, and results can be obtained in <24 h, it fits well for cases when a single, sensitive test with a fast turn-around time can be used to identify chemicals that do not "irritate" (GHS NC) or do not damage (not GHS Category 1) the eye.
In summary, the OS application domain was expanded to include surfactants and ocular corrosives (GHS Category 1), and these studies confirm and extend our previous findings, showing that the antioxidant, ascorbic acid, improves the accuracy of the OS test by reducing the FP rate. The OS test has a high sensitivity and accuracy for the identification of materials that should be classified as GHS NC and chemicals that do not damage the eye (not Category 1). Based on these outcomes, the best use of OS within a tiered testing strategy may be: 1) to identify chemicals that do not induce serious eye damage (not UN GHS Category 1), that is, chemicals not to be classified as UN GHS Category 1 without further testing; and 2) to identify chemicals predicted to be UN GHS NC, that is, predicted not to cause eye irritation/serious eye damage without further testing; however, chemicals predicted to be UN GHS Category 1 would require additional information and/or testing to establish a definitive UN GHS Category classification.

Supplementary Material
Refer to Web version on PubMed Central for supplementary material.

Funding
Stewart Lebrun reports financial support was provided by National Institute of Environmental Health Sciences. Stewart Lebrun reports a relationship with Lebrun Labs that includes: board membership, employment, equity or stocks, and funding grants. Stewart Lebrun has patent # Biochemistry Based Ocular Toxicity Assay, Publication number: 20160290982 issued to Lebrun Labs LLC. Stewart Lebrun, Sara Chavez, Roxanne Chan has patent #Methods and Reagents to Improve the Specificity, Sensitivity and Accuracy of Nonanimal Eye Safety Tests, patent application number 63048112 pending to Lebrun Labs LLC. Supported by a National Institute of Environmental Health Sciences (NIEHS) Small Business Innovative Research Grant. Research reported in this publication was supported by the NIEHS of the National Institutes of Health under Award Number R44ES025501. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

Innovative Research grant under award numbers R44ES025501 and SB1ES025501. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
Research reported in this publication was supported by the NIEHS of the National Institutes of Health Small Business Innovative Research grant under award numbers R44ES025501 and SB1ES025501. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

Data availability
Data will be made available on request. Results for Surfactants.