Visualizing topical drug uptake with conventional fluorescence microscopy and deep learning.

Mapping the uptake of topical drugs and quantifying dermal pharmacokinetics (PK) presents numerous challenges. Though high resolution and high precision methods such as mass spectrometry offer the means to quantify drug concentration in tissue, these tools are complex and often expensive, limiting their use in routine experiments. For the many topical drugs that are naturally fluorescent, tracking fluorescence emission can be a means to gather critical PK parameters. However, skin autofluorescence can often overwhelm drug fluorescence signatures. Here we demonstrate the combination of standard epi-fluorescence imaging with deep learning for the visualization and quantification of fluorescent drugs in human skin. By training a U-Net convolutional neural network on a dataset of annotated images, drug uptake from both high "infinite" dose and daily clinical dose regimens can be measured and quantified. This approach has the potential to simplify routine topical product development in the laboratory.


Introduction
The development of topically applied drug products presents numerous challenges, which include the identification of an optimal active pharmaceutical ingredient (API), the formulation of the API for delivery through the skin barrier, assessment of the percutaneous pharmacokinetics (PK) of the API, its engagement with the target, and assessment of the skin's pharmacodynamic response [1]. The API within a topically applied drug product must first escape the formulation, then permeate through the skin barrier, and finally flow through the layers of the skin to reach its target. There are multiple pathways to enter the skin, including through the tortuous lipid pathway across the outermost layer of skin, the stratum corneum. There are also shunt routes, where a topical API can travel through the follicle or sweat gland to reach deeper skin structures [2].
The current workflow in clinical studies involve either the application of a given topically formulated drug to a patient, followed by biopsy, or more commonly, application of a topical product to ex vivo skin. Assessment of API uptake into the skin is typically carried out using chromatographic methods, where a skin tissue biopsy specimen is homogenized and processed using an analytical technique such as LC-MS/MS. This tool offers the ability to precisely quantify drug levels in tissue, but the method of bulk processing eliminates the ability to determine drug uptake at the cellular level. The derived percutaneous PK profiles are then assessed via algorithms and software models to compute the kinetics of drug absorption, diffusion, metabolism, and elimination (ADME) [3][4][5]. It is preferable to image drug uptake into skin to directly assess uptake pathways and mechanisms. Mass spectrometry imaging (MSI) tools such as Matrix Assisted Laser Desorption Ionization (MALDI), Secondary Ion Mass Spectrometry (SIMS) and Desorption Electrospray Ionization (DESI) offer the ability to quantify drug uptake within biopsy tissue slices at close to cellular resolution [6]. These methods, however, can be rather costly, making them prohibitive for widespread use and restrictive to larger pharmaceutical ventures. Autoradiography, in particular Microautoradiography (MARG), offers the ability to examine cellular and even subcellular drug uptake, but the use of radiolabels largely restricts the use of this method to preclinical studies [7,8].
Advanced fluorescence and vibrational microscopy tools have shown considerable promise in measuring the uptake and localization of topical APIs by their native, endogenous signatures. Both fluorescent and non-fluorescent drugs have been visualized and quantified using multiphoton and coherent Raman imaging methods. Fluorescence lifetime microscopy, for example, has found use in quantifying topically applied drugs within human skin [9,10]. In the case of coherent Raman imaging, the PK of drugs applied to skin can be directly imaged and quantified over time to estimate PK parameters [11,12]. These tools provide a degree of depth resolution, allowing imaging of drugs within skin non-invasively. Despite these advances, these tools will likely remain expensive, somewhat rare, and require trained, experienced individuals to both operate and analyze the resulting data. It would be a significant advantage if native, unaltered drugs could be visualized and quantified by simple and readily available imaging tools.
There has been considerable interest in the use of computational methods that make use of machine learning to glean information from imaging data. Machine learning, and related deep learning approaches, use neural networks of different architectures to carry out tasks such as categorization and quantification. Neural networks can be initially trained via cross validation using an annotated dataset. For example, a neural network can be trained to locate certain features within an image-based dataset. These trained networks are then provided novel data and used to infer similar information content.
When used in medicine, machine learning approaches have demonstrated the capability of providing diagnostic information. For example, recent applications of machine and deep learning to CT and MRI data have performed diagnosis and detection of numerous conditions at rates comparable to those of trained radiologists [13]. IBM's Watson initiative, although not as successful as hoped, is a strong example of the application of machine learning tools to patient data for the purposes of diagnostic medicine. Of particular interest has been the application of machine learning methods to augment what would be considered simple imaging methods. For example, machine learning methods applied to straightforward mammography scans have shown promise in improving diagnostic accuracy [14]. Machine learning approaches have recently been applied for the quantification of topically applied therapeutics, where microscopy data is interpreted to understand the rates and routes of drug uptake in skin [12]. These investigations indicate the potential for machine learning techniques to extract valuable information from widespread imaging methods, to enhance and augment our understanding of disease diagnosis and physiological processes.
Acne vulgaris is a widespread dermatological condition that effects tens of millions of individuals each year. Outside of the immediate pain, discomfort, and effects on one's appearance, chronic acne can result in physical scarring of the face that can last a lifetime, as well as psychological effects that are felt for years. The leading pharmacological treatments for acne are the antibiotic minocycline and retinoids such as isotrentinoin. These APIs are currently administered systemically via consumption of multiple pills, even though their targets are within the skin. Systemic administration of minocycline can cause a number of unwanted side effects including dizziness, nausea, photosensitivity, and the appearance of dark lesions on the skin. Systemic administration of retinoids is far more problematic, with side effects ranging from severe dry skin and mucous membranes to birth defects that can occur when women on retintoid therapy become pregnant. There is considerable interest in developing these drugs for topical administration to the site of need, which would avoid systemic side effects and likely require lower total drug doses. Interestingly, both minocycline and the retinoid tazarotene are naturally fluorescent, such that they can be detected via their emission properties. Recent studies exploring this potential have made use of complex fluorescence lifetime imaging approaches, which while highly effective, require custom hardware and analysis methods that are currently both expensive and not widespread. [9,10] The purpose of this study was to explore if commonly available and inexpensive equipment could be paired with machine learning for the assesment of topical drug product using currently employed drug discovery and drug development workflows. Ex vivo skin and biospies are used in the vast majority of today's topical product studies. Fluorescent microscopes are ubiquitous and relatively inexpensive, and are therefore accessible for drug development in universities, startups, small and medium sized companies, and large pharmaceutical firms alike. There are quite a number of commonly used topical products that are fluorescent, including many retinoids, antibiotics such as minocycline and doxycycline, sunfilters such as octocrylene, [15] as well as numerous natural product and cosmetic actives. Therefore, while fluorescence is not a universal source of contrast, assessment of topical product uptake via fluorescence could enable the imaging and quantification of compounds that effect the lives of hundreds of millions of individuals.
Here we show that conventional epi-fluorescence microscopy images can be combined with U-Net convolutional neural networks to identify and quantify the uptake of naturally fluorescent drugs such as minocycline and tazarotene in the skin, even in the presence of normally interfering skin autofluorescence. A key component of this approach was the addition of a gentle photobleaching step, where the different bleaching rates of two different drugs and skin autofluorescence were leveraged to generate a difference image for neural network training and quantification. Importantly, while the U-Net was trained with high product dose images, it was able to identify and quantify the uptake of low concentrations of the same drugs, enabling simple quantification of a topical drug delivery at a single, daily dose.

Preparation of facial skin
Human facial skin was obtained from a patient undergoing elective facelift surgery and stored at -80°C. Tissue samples were thawed to room temperature and cut into small portions ( 1-2 square cm) for the experiment. APIs were delivered in either the proprietary BPX-05 formulation or as part of a commercially available pharmaceutical product formulation. The skin samples were then topically treated with one of the following treatments: 1) 0.05% tazarotene formulated in the BPX-05 vehicle, 2) commercially available Tazorac, 0.05% tazarotene, 3) 1% minocycline formulated in BPX-05 vehicle, 4) 0.05% tazarotene and 1% minocycline both formulated in BPX-05 vehicle, 5) BPX-05 vehicle alone and 6) no treatment. The 0.05% tazarotene formulation in BPX-05 was selected to match that of the commercial Tazorac product, while the 1% minocycline formulation was selected to match previous publications [9,10]. This study examined APIs delivered at two different doses: a high "infinite" dose, and low dose that represents daily, clinical application. The high-dose groups were treated with 60 mg/cm 2 formulation, while the low-dose (daily dose) groups were treated with 2.5 mg/cm 2 formulation. In each case, formulations were rubbed onto the skin surface. For tissues receiving the high dose, a ring barrier was used to contain the formulation on the surface of the skin. The ring was not required for the low product dose application as the volume of topical drug product was considerably smaller. Once applied to the skin, the formulations remained on skin samples for 1, 4, or 24 hours, during which the skin samples were incubated on a damp gauze pad at 32°C.
Following the incubation period, skin samples were gently wiped with isopropyl alcohol to remove any residual formulation and then trimmed. Skin samples were then embedded within Optimal Cutting Temperature compound and frozen within plastic cassettes at -80°C. These frozen tissues were then cross-sectioned perpendicular to the skin surface at a thickness of 30 µm using a cryostat (Leica CM1850 UV, Buffalo Grove, IL). Skin tissue sections were mounted on microscope slides for imaging. Two to three fields of view were acquired for each experimental condition.

Fluorescence microscopy
An epi-fluorescence microscope was used to acquire fluorescence emission images of the skin samples. A microscope base (Zeiss Axiovert 100M, Pleasanton, CA) was outfitted with a RGB CCD camera (Leica DFC7000 T, Buffalo Grove, IL) and a halogen lamp for image acquisition. A Ziess Plan-Neofluar 10x, NA=0.3 (product # 440330) was used to acquire all images. Tissue samples were placed on a microscope slide and were not coverslipped. The slide was then placed on the inverted microscope, with images acquired through the microscope slide. Individual FOVs were selected randomly following these rules: (1) Images were not acquired at the edges of the tissue sections to avoid drug crosstalk/contamination from any surface residue into the dermis. This was done out of an abundance of caution, even though the skin was always cleaned prior to trimming/biopsying; (2) Areas with evident sectioning artifacts such as tears in the section, and tissue folds exhibiting greater fluorescence due to increased thickness were avoided; and (3) Fields of view that featured multiple anatomical features, including the hair follicle, sebaceous glands and/or hypodermis, the dermis, and the epidermis were preferred. This preference arose as these regions have been found in prior studies to provide PK information that included transepidermal and transfollicular delivery.
To acquire tazarotene emission images, the "TZF" cube was used with an exciter filter at 340 nm, 26 nm FWHM bandwidth and an emission filter at 492 nm, 10 nm FWHM bandwidth. To acquire minocycline emission images, an "MNF" cube was used with an exciter filter at 386 nm, 27 nm FWHM bandwidth and an dual bandpass emission filter at 450-480 nm and 610-680 nm, FWHM. Images acquired with the MNF cube featured tazarotene fluorescence crosstalk in its blue channel, while minocycline fluorescence typically dominated in the red channel over other endogenous fluorescence. For high-dose samples, tazarotene images were collected with an integration time of 1.5 s and minocycline images were collected with an integration time of 50 ms. For low-dose samples, tazarotene images were collected with an integration time of 10 s, while minocycline images were still collected with an integration time of 50 ms.
Following the collection of tazarotene and minocycline images, the MNF cube remained selected, and skin slices were exposed to the UV excitation for a period of 5 minutes. This bleaching step was employed as tazarotene, collagen, and minocycline all bleach at different rates under UV excitation, and as such, this step imposed a chemically-specific emission change encoding all three emitters. Following the 5 minute bleaching step, minocycline MNF images were once again acquired as described above.
Importantly, a strict file naming convention was adopted for these experiments, where the information regarding the formulation, the API, the filter, the treatment duration, the bleaching step, and the image number were explicitly written. This enabled automated parsing of the filenames and easy computational comparison between groups.

Deep learning
Image feature extraction was performed via Deep Learning using the U-Net convolutional neural network architecture [16]. The U-Net architecture was selected as it was developed specifically for image segmentation while at the same time requiring only moderately-sized training datasets. The U-Net structure was originally based off of a model developed to process electron microscopy data [17]. In this study, the U-Net itself was developed to input 9 channel image data: 3 RGB channels each for the 1) tazarotene TZF filter image, 2) the minocycline MNF filter image, and 3) the tazarotene-bleached MNF filter image. These three images were combined into a single array and fed into the U-Net as a single input. Image annotation was provided by an imaging expert in this study, who reviewed a training set of high-dose acquired images and provided binary annotated images for tazarotene, collagen, and minocycline features. Annotations were made where the single human expert could unambiguously determine the identity of each emitter. Annotations were made with the aid of ImageJ (FIJI) as follows: For the TZF filter images, the RGB channels were separated to isolate the blue channel which contained tazarotene and collagen fluorescence. Tazarotene and collagen regions were then manually labeled. For the MNF filter images, the RGB channels were separated to retrieve the blue and red channels. The blue channel contained tazarotene, minocycline, and collagen fluorescence, while the red channel contained mainly minocycline fluorescence. Some red autofluorescence could occasionally be noted in the red channel and was separately marked for later study. Manual labelling of tazarotene, minocycline, and collagen were then made using the blue and red channel images.
This image and annotation set was then augmented using a standard image augmentation approach [17]. Briefly, image-annotation data pairs (1920x1440x9 images and 1920 x 1440 x 3 annotations) were randomly cropped into 512 x 512 lateral pixel dimension image-annotation data sets (9 input and 3 output channels, respectively). This random cropping was carried out such that the trained network is robust to image feature translation/position. These randomly cropped datasets were then used to generate a set of training images. This process generated the training dataset used to train the U-Net. It is worth noting that although two image datasets were used to train for each condition, the U-Net only operated on 512x512 image segments at a time due to memory constraints; therefore, each image data set was functionally the equivalent of approximately 10 512x512 cropped data sets.
U-Nets were implemented in Tensorflow 2.1 using the Keras library in python [18]. U-Net training was carried out on Linux workstation (System 76) outfitted with an NVIDIA Quadro P6000 and NVIDIA TITAN RTX GPUs. During training, 10% of the training images were randomly held back as a per-epoch validation set. The val_loss parameter was tracked during each epoch, and the U-Net weights only stored when the val_loss parameter improved. To improve the chances that the gradient decent algorithm reached a local minimum, the learning rate was automatically adjusted using "reduce_lr" to smaller values during training. Early stopping was included in the training, with an early stopped patience threshold of 25 repeated epochs without val_loss improvement. The U-Net reached greater than 90% accuracy on the per-epoch validation sets and never required more than 40 epochs to reach convergence.
The U-Net was then used to generate output probability images. Feeding the input training images (1920x1440x9 channel sets) into the trained U-Net generates a probability image (1920x1440x3 channels). The intensity within each channel (tazarotene, minocycline, and collagen) scales from zero to one, with pixels close to one being at the highest probability of belonging to the feature class. The training images were all fed back into the U-Net, with the output compared to the expert annotations to determine channel-specific thresholds. While this could have been done within the neural network with a softmax layer, manual exploration was found helpful in setting the optimal identification threshold levels.
The resulting U-Net and thresholds were then applied to a holdout dataset, with the results described in the results section below.

Analysis pipeline
In order to improve image processing, a Jupyter Lab based workflow was created. A set of python libraries were written to facilitate automated file parsing and data loading. The python Pandas library was used to package all data into dataframes that could be readily queried. Numerical processing was carried out using Numpy, while image plotting was achieved using the Matplotlib library. Figures were prepared using the ggplot2 package in R. Statistical calculations were carried out in R using both Wilcoxon Sum Rank tests as well as linear regression analysis.

U-Net training and validation
Following U-Net training, a script was written that parsed through a data folder to build a matched data frame containing the path of 1) the TZF image, 2) the MNF image, and 3) the tazarotene-bleached MNF image. This dataframe was then used to iteratively run inference on each image set with the trained U-Net. The output 3-channel probability image was then stored in a numpy npz file. A Jupyterlab notebook was then used to compare the probability images to the original expert annotations to determine the optimal binary thresholds for each inferred element (e.g. tazarotene) to generate binary masks for later quantification. For this step, the U-Net was run against the same images used in training so that the threshold level settings were consistent with the training set. Figure 1 shows representative images of the fluorescence images, the expert annotations, and U-Net generated, thresholded probability annotations for tazarotene, minocycline, and collagen. The threshold level was iteratively calculated to minimize the mismatch between the expert annotation and U-Net calculated annotations. The determined threshold was tested by performing a bitwise exclusive OR (XOR) across all imaging pixels. The trained U-Net and determined thresholds were then applied to a new, holdout high-dose dataset unused for training. The U-Net method was found to identify the same expert-annotated image features with high overall accuracy. Here, accuracy was computed by comparing the expert annotated images with the thresholded CNN-generated probability image. Here, image accuracy is defined as the number of correctly determined pixels as a fraction of the total pixels in an image. This accuracy value was computed by calculating the pixel-wise XOR between pairs of U-Net and annotation images which by definition includes both false positives and false negatives. The number of pixels in disagreement was then divided by the total number of pixels in the image and subtracted from one to calculate the total correct fraction. The accuracy of the U-Net method can be seen in Fig. 2. The accuracy of minocycline identification was excellent, with a median of 99.7%. Similarly, the identification of skin regions containing tazarotene was extremely good, with a median accuracy of 99.8%. Interestingly, the accuracy in determining collagen features via the intrinsic collagen autofluorescence was somewhat lower, with a median of 95.1% over all image sets. Exploration of the imaging dataset revealed several tissue features that impacted the accuracy of the U-Net method. In the case of collagen, it was observed that images acquired at the 24 hour drug application mark showed overall lower accuracies. The dependence of accuracy on incubation can be seen in Figures S1 and S2 for minocycline and tazarotene, respectively. This may be attributed to a swelling of the tissue that occurred due to long-term exposure to the drug formulation. Indeed, the collagen network in 24 hour incubation images show swelling, which likely impacts feature recognition. This swelling is thought to be an effect of long-term high dose exposure to the BPX-05 vehicle; such an effect was not observed in the untreated control tissues. In the case of minocycline, while the vast majority of the images displayed high accuracy, several had total accuracies lower than 90%. In these images, red-emitting tissue autofluorescence was observed that was typically associated with the presence of a follicle. This source of autofluorescence is thought to arise from the porphyrins synthesized by bacteria, which include P. Acnes [19]. As the overall contribution of this autofluorescence was found to be minor, the U-Net approach was found acceptable for next-step quantification of drug uptake fluorescence.

Quantification of high dose fluorescence emission
Quantification of each drug's fluorescence emission can provide an approximation of drug concentration within tissue. Fluorescence signal quantification is almost never an exact measure of concentration, as there are numerous factors that can influence fluorescence, including pH, solvation environment, the presence of mono-and divalent cations, and energy and electron transfer to other molecules. However, in many circumstances [9], fluorescence can be a sound approximation of differences in relative uptake, especially when fluorescence is quantified in similar tissue samples treated using similar drugs and vehicles. As each drug or drug combination was applied to the same areas of skin in the same vehicle, fluorescence quantification via microscopy here provides a good estimate of relative drug uptake.
To quantify drug concentrations in tissue, the binary masks generated by the U-Net and the expert annotator were both employed in two ways. By carrying out this measurement using both the expert annotation and the U-Net, it is possible to explore potential errors in drug uptake measurements made by the neural network. Binary masks from each image set were used to quantify drug fluorescence in two complimentary ways. In the first approach, the individual masks were directly quantified by summing the total number of "true" pixels. This provided an estimate of the drug uptake area independent of fluorescent levels. In the second approach, the binary masks were applied to each image to generate a fluorescent "filtered" image that was then summed to extract a measure of total fluorescence contribution. Figures 3(A) and (B) show a comparison between minocycline and tazarotene uptake areas, respectively, determined by the expert annotator and the trained U-Net. To compare the performance of the U-Net against expert annotation, Wilcoxon Rank Sum tests were run. Here the goal of the test was to determine if there were statistically significant differences between the total drug uptake areas as determined by the U-Net and expert annotator. Comparing the annotated and U-Net outcomes ( Table 1) provides two major findings: 1) the performance of the U-Net in matching tazarotene annotations is overall better than that of minocycline and 2) expert annotations and U-Net outcomes are similar (no statistically significant difference found) in cases where the U-Net's API detection (e.g. minocycline) matched the given experimental condition (e.g. Mnc). The former of these findings is thought to stem from the presence of autofluorescence within tissue which causes the U-Net to falsely identify the presence of minocycline. In the case of minocycline, red autofluorescence was found within the hair shafts that was incorrectly identified as the drug; this autofluorescence is hypothesized to arise from P. Acnes. When minocycline is present, as in the "Mino" and "Combo" cases, no statistically significant difference is observed between annotation and U-Net, as the drug fluorescence contribution is far greater than any background autofluorescence. The cases where the U-Net results are not similar to that of the annotation (p >0.05) only occur when the U-Net tested for an API never applied to the skin.

Comparison of Annotation and U-Net API Areas
The 24 hour incubation conditions were also noted to impart a structural change within the tissue, where skin and collagen network were observed to take on a swollen appearance. This situation could arise from several factors, including changes in tissue hydration as well as any long-term effects of incubation with high-doses of the vehicle. This swelling effect was observed for all treatment conditions except the no-treatment arm. It should be noted that this morphology change is highly unlikely to occur in patients, as no individual would experience this high dose nor constant 24 hour exposure.
Using the identified areas as masks, the total fluorescence intensity of each drug in each image was calculated and is plotted in Fig. 4. Direct pairwise analysis of the results plotted in Fig. 4 (e.g. Mnc 1 hour incubation vs Mnc 24 hour incubation) were found to be not statistically significant, due primarily to the sparse sampling of the tissue specimens in this proof-of-concept study. To understand the dependence of experimental condition and incubation time on the U-Net determined drug uptake, a linear regression model was used. In the regression equation, uptake was modeled as dependent on both the experimental condition (e.g. Combo) and the incubation time (e.g. 1 hour). Linear regression, and not more complex random effects models, could be used in this case as each incubation time measurement was carried out independently on different pieces of tissue. Minocycline delivered in the BPX-05 and combination formulations showed incubation timedependent uptake, consistent with prior findings using the similar BPX-01 delivery vehicle [9]. Interestingly, it was observed that minocycline has considerably lower dermal area uptake when delivered as part of the combination formulation. In comparing minocycline treated groups quantified for minocycline (Combo and Mnc groups) via linear regression, both experimental condition (p = 0.015) and incubation time (p = 0.0007) were found to be statistically significant, supporting the observed decreased dermal uptake of minocycline in the combination formulation, as well as the observed dose-dependent uptake for both treatment groups. This is not necessarily a surprise, as both tazarotene and minocycline are lipophilic and enter the skin via the trans-follicular route [10].
Examining the tazarotene uptake plot, it is seen that tazarotene in the BPX-05 vehicle has greater uptake than that of the commercial Tazorac formulation. Linear regression analysis found that there was a strong statistical difference between these two groups (p = 0.0003). Careful inspection of the imaging data explains this result, as the Tazorac formulation appears to deliver the majority of tazarotene only slightly into the epidermis in this study. This can be understood as the Tazorac used in this study was in an aqueous gel formulation; tazarotene is not well solubilized in the aqueous phase. In contrast, tazarotene was fully solubilized in the BPX-05 formulation containing ethanol and an amphiphilic base which is miscible in both lipophilic and hydrophilic environments, and could more efficiently delivery the API into the skin.

Applicability to daily, low-dose application
The major challenge in quantifying fluorescent compounds in the skin occurs not in high-dose treatment scenarios, but when trying to visualize and quantify the uptake of the drugs under realistic, low, daily dose conditions. Quantification of daily-dose API uptake is paramount, as that is the true product dose at which the drug is clinically delivered. There are possibly linear and non-linear PK scenarios that arise as the delivery product dose increases. These can include permeation pathway saturation, protein/tissue binding saturation, or changes in the elimination rate, which is indicative of flip-flop pharmacokinetics [20]. Therefore, while initially studying percutaneous drug uptake at high product dose levels can be informative, quantification of APIs at such concentrations is not necessarily indicative of actual in vivo performance.
As the delivered concentration of the APIs is more than 30-fold lower than daily dose conditions, background autofluorescence is a dominant contribution to the fluorescence images and can readily overwhelm or interfere with detection of the APIs. Though manual processing can provide an estimate of drug uptake at high dose levels, thresholding methods fail at low, daily doses. Therefore, it is of interest to see if the deep learning models trained on high-dose data might be applied for the accurate quantification of low, single daily dose treatment conditions.
Initial imaging experiments of low-dose treated facial skin samples revealed that, while minocycline could still be visualized using the same microscope acquisition settings as the high-dose case, tazarotene fluorescence levels were too low to obtain high-quality images. To increase image signal-to-noise ratio, TZF filter images were acquired using a 10 second integration time, in contrast to the shorter 1.5 second integration times use in the high dose case. This change is obviously a deviation from the conditions under which the U-Net was trained. Nevertheless, U-Net models were designed to be robust changes in image quality, suggesting that the U-Net here would still be applicable.
Low, daily-dose skin images were acquired and processed via the U-Net model. As the fluorescence levels of both minocycline and tazarotene were low, accurate expert annotation was not possible. Figure 5 shows representative probability images for minocycline, collagen, and tazarotene as determined by the U-Net. It can be readily observed that the collagen fluorescence level, which has the largest overall fluorescence contribution, is likely mis-interpreted as minocycline and tazarotene contributions in regions of the images. For example, in the minocycline-only treated case, while there are clear minocycline contributions at the skin surface and in adnexal structures, the signature of the collagen network can be readily observed. This was not observed in either visual inspection or U-NET inference of the high dose minocycline treatment data, making it unlikely to be a true result. The collagen contribution to the tazarotene images is more severe, which makes sense given that collagen autofluorescence strongly spectrally overlaps with tazarotene fluorescence emission. Still, the U-Net was successful at inferring the presence of tazarotene on and within the skin. It is worth noting that attempting to "normalize" the 10 s tazarotene images via simple division to have the equivalent 1.5 s effective exposure did not significantly alter the U-Net output. This likely arises due to the fact that the neural network is not interpreting the overall intensity of fluorescence in an image, but rather its "texture" across the RGB channels. Fig. 5. Representative images of TZF and tazarotene-bleached MNF images (first two columns from the left), their corresponding collagen, minocycline, and tazarotene probabilities, and minocycline and tazarotene thresholded difference images for tissues imaged after 1 hour of incubation at daily dose under each formulation condition. Both the minocycline and tazarotene probability image contain contributions from the collagen autofluorescence due to the low levels of fluorescence emission in the daily-dose treatment. This contribution is effectively suppressed by taking thresholding the difference of the drug and collagen probability images.
These results point to the U-Net encountering challenges in rejecting collagen autofluorescence, particularly in situations where both drug and collagen co-localize. This is not necessarily a surprise, as the U-Net was trained under different conditions. To provide a means to quantify fluorescence emission, and thus give an estimate of drug uptake within the skin, images were processed to remove regions of collagen co-localization. It is important to note, however, that the U-Net does not return collagen contributions as having high "probability" of being minocycline or tazarotene. Rather, the output of the U-Net, which is a probability image, assigns these regions relatively low relative probability. In examining these outputs, there appeared to be a bimodal probability distribution, such that a threshold could be used to separate out "true" drug signatures from that of the collagen background. In applying this approach, a very conservative method was applied so as to not falsely assign any collagen signal as an API. To set this conservative threshold, the untreated experimental arms were used: the vehicle only condition and the untreated (UnTx) condition. Both of these experimental conditions only have collagen and tissue autofluorescence, without the presence of drug.
First, image math was carried out using the U-Net generated probability images, and not binary thresholded probability images, with the goal of retaining the probability information generated by the U-Net. Applying a binary threshold to an image sets each pixel at either a zero or one value; subtraction between such images will yield a binary image. Subtraction of probability images, on the other hand, retains information on the probability weights as determined by the U-Net. A probability threshold was manually iteratively adjusted for these subtracted probability images for the vehicle only and untreated data such that the collagen contribution to the API channels was effectively suppressed. In this case, supressed was defined as less than 5% false positive pixels in a given image.
With this threshold set, all the low-dose U-Net output images were evaluated. To look at the sensitivity of API detection as a function of this threshold, the threshold was experimentally altered over 5 to 50% of its value, re-evaluating the outcome each time. Given the nature of the probability distribution and the tresholding method used, low sensitivity was observed to small (5-10%) threshold changes, with large effects found once the threshold was adjusted by greater values. It is worth noting that these large perturbations to the threshold caused substantial mis-assignment of collagen as API in the vehicle and untreated condition images; these threshold levels would never actually be selected. Based on these results, and the fact that the API determination was not highly sensitive to small threshold changes, the thresholding method was determined to be acceptable for image evaluation. Binary masks were then applied to each image, with the total fluorescence in identified regions summed.
There was large uncertainly, however, in the U-Net's interpretation of API levels when co-localized with collagen. Given that there was no means provided in this study to independently assess API uptake into the dermal collagen network, and given the nonlinear nature of the neural network's performance, it was not entirely clear if thresholding was enough to determine true API levels in co-localized pixels. Out of an abundance of caution, and not wanting to mis-atribute collagen signal as API concentration, the decision was made to excluded collagen co-localized regions from API quantification. In prior studies it was found that both APIs were found within the epidermis, hair follicle, and sebaceous glands -all regions that are collagen poor. [9,10] It is recognized that excluding collagen-rich regions may lead to a lower estimate of API uptake, but to do otherwise would risk mis-attribution. These results therefore do inform API uptake, with the caveat of underestimating potential uptake into dermal collagen. Figure 6 shows the results for each of the treatment conditions for both minocycline and tazarotene delivered. The uptake of minocycline into the skin is high for both the minocycline-only and combination treatments, with the combination treatment showing only a weak time-dependent trend in drug fluorescence emission (linear regression, p =0.0977). Unlike in the high-dose case, here the uptake of minocycline in both minocycline-only and combination formulations are similar in magnitude and not statistically significant (p = 0.5053). It is interesting to note that low levels of minocycline are reported by the algorithm for treatments that do not contain the drug. Inspection of the image data suggests that red-emitting skin autofluorescence is likely playing a factor. This is not at all a surprise, especially given the far lower dose of minocycline delivered in the single daily dose treatment. One interesting observation is that this autofluorescence contribution seems to be treatment time-dependent (for example in the case of tazarotene-only treatment), which is thought to potentially arise due to the vehicle itself. In related pump-probe absorption experiments (data not shown) the absorption cross section of skin was found to increase solely due to exposure with the vehicle. Given that the vehicle is alcohol-based, it is possible that chromophores are being solvated and potentially diffusing during treatment, leading to an increase in local autofluorescence [21]. Future studies will additionally seek to explore potential changes in skin hydration state throughout the incubation timeframe as another potential contributor.
The uptake of tazarotene follows a similar pattern, with the tazarotene-containing treatment arms of tazarotene-only and combination therapy showing treatment time-dependent uptake (p = 0.0006). Linear regression analysis additionally did not find statistically significant differences between the tazarotene and Tazorac groups. Like in the case of minocycline, low levels of autofluorescence contributions can be found in tazarotene-free treatments, with the major exception of what would appear to be high uptake in the minocycline 24 hour treatment arm. Inspection of the imaging data show this is clearly caused by high collagen autofluorescence contributions, likely arising due to high local concentration of collagen in the field of view in the collected images. This overall performance is considered outstanding, especially in light of the U-Net being used to infer drug uptake from image data taken under substantially different conditions than the training data.
It should be noted that this low-dose drug uptake estimation is based on taking the difference of the probability images, and may not entirely capture the full extent of APIs bound to or co-localizing with collagen. There is both the potential for under-estimation of drug uptake (excluding areas where drug might be) as well as over-estimation (where collagen fluorescence is mis-classified as drug fluorescence).

Discussion
Mapping and quantifying the uptake of topical drugs into skin can be a complicated and expensive prospect. Autoradiography and mass spectrometry imaging tools are costly, with many academic labs and smaller companies unable to afford routine studies. Coherent Raman imaging tools are not yet in routine laboratory use [12]. For fluorescent drugs, there are other options including FLIM, but these systems and the data they produce are still rather complex and challenging for routine use.
This study explored the use of simple epi-fluorescence microscopy combined with machine learning methods for the visualization and quantification of drug uptake within skin. When trained via expert-annotated image data, a U-Net convolutional neural network was able to interpret fluorescence images and identify regions of drug uptake within skin. Though the neural network was trained using the readily-attainable high dose treatment images, it was interestingly found capable of mapping and extracting quantitative fluorescence emission from low, daily dose treated tissue samples. This capability is thought to arise from the nature of U-Net, which was essentially trained to recognize image "textures" across the input channels. This study shows the promise in combining relatively inexpensive and accessible microscopy tools with deep learning approaches for PK studies.
One of the interesting findings was that, while the fluorophores could not be readily spectrally distinguished from autofluorescence, the addition of a bleaching step enabled fluorescent drug mapping and quantification. This is due to the fact that photobleaching was not uniform between minocycline, tazarotene, and collagen. In this way, bleaching rate became an orthogonal additional source of contrast that could be harnessed to quantify drug fluorescence. While this study made use of a single bleach step, it is likely that the addition of multiple bleaching images would enable more precise quantification with reduced mis-identification in the future. Imaging with multiple bleaching steps could provide a more precise measure of bleaching rate, which can be encoded via deep learning into an improved model. This study has a few limitations that are worth discussing. First, data collected was from excess surgical skin samples that were collected and frozen prior to imaging. The samples were thawed and APIs applied for one, four, and twenty-four hours. To understand in vivo ADME (absorption, distribution, metabolism, and elimination), the APIs would have ideally been applied prior to tissue resection, so that drug uptake could have occurred while the tissue was still perfused and under normal physiology. While many of the PK datasets currently gathered today are acquired from ex vivo samples, it is appreciated that the loss of cellular viability and changes in tissue activity caused by frozen sample preservation may lead to alterations in drug uptake, for example through changes in tissue drug metabolism. It would be of great interest to study drug uptake and pharmacokinetics in both normal and diseased tissue using this approach, either via sourced ex vivo tissue or through biopsies acquired during the course of clinical studies.
Though this study made use of expert training annotations for both drugs as well as collagen, a complete training set including red emission autofluorescence annotations would have likely been advantageous. While some regions within the skin images could be recognized as red autofluorescence, these locations were sparse in the dataset, as they likely arose from P. Acnes and other bacteria residing in the pores and follicles [19]. The low number of images containing red autofluorescence contributions made neural network training for these features challenging with poor outcomes. In future studies, a larger training set that specifically was collected to bias these autofluorescence features would likely allow for better identification, and thus separation, of autofluorescence from the drug uptake results.
Lastly, while the total image dataset contained many images, the number of images acquired per condition was in duplicate or triplicate. While this was adequate for training, the high magnification of the images and the low number of total images essentially sparsely sampled the skin. Future studies seeking to more precisely quantify drug uptake will require greater replicates of images that are purposely acquired across numerous types of skin structures. While it is anticipated that the data will still be heterogeneous, lower sparsity will enable improved reliability in the quantified outcomes. The throughput of this approach could be readily enhanced with improved microscopy equipment such as a slide scanner for automated image acquisition. Stitched imaging data would aid in improving the sparsity problem, though recent studies have found that the drug uptake heterogeneity may remain the same. [12] It would additionally be interesting to cross-validate the results of this type of study with proven tools, including FLIM and MSI. MSI tools may be helpful in determining the accuracy of API uptake within the layers of skin, in particular in the low, daily-dose conditions. Such quantification would likely have to be done on alternative slices, as mass spectra techniques are inherently destructive. It is also likely advantageous to use such sources of drug concentration mapping as additional annotations to provide improved neural network training. This could remove the potential for human expert bias, providing a rigorous and quantitative dataset for training.
It is also worth noting that the approach outlined here could be applicable beyond assessment of APIs and instead be used for the quantification of the formulation's permeation within the skin. Many topical formulations, especially cosmetic formulations, make use of fluorescent molecules that could be imaged and measured using the methods explored here. Other exposures, such as to sunscreens and even environmental toxins, could additionally be explored via a combination of epi-fluorescence microscopy and machine learning methods.
Future studies will focus on applying this approach across patient groups with a greater number of images per subject. It is anticipated that new U-Net models will need to be trained to accommodate for inter-subject differences in skin morphology and fluorescence [22]. There are substantial differences in skin structure between a child, an adult, and an elderly individual, just as there are large differences between healthy and diseased skin. Any efforts in this area will need to accommodate for these differences while still being able to separate out the individual fluorescence contributions of drugs.
Another interesting avenue of future work would enable spatial quantification of the APIs within skin. Alternative slicing of the frozen tissue blocks could be used to gather pairs of unprepared and H&E stained slides. In this way, the measured API levels could be matched to skin structural elements and compared across skin structures and depths. While such work could be carried out with hand-annotated histology slides, neural networks developed for segmenting skin tissue sections could enable more rapid and automated quantification.

Conclusion
Epi fluorescence microscopy combined with deep learning was able to successfully recognize and quantify the uptake of two different drugs within biopsied skin samples. The neural network was trained on high-dose drug application tissue cross-sections, and was found to match expert annotations in both the original training set and on a holdout, novel dataset. The U-Net architecture provided robust image segmentation when applied to high-dose drug application images. For low-dose drug application experiments, the U-Net was found to be less effective due to both the decreased drug fluorescence signal and the increased image signal-to-noise. It was found that when combined with experimentally-determined thresholds, the U-Net output could be used for drug uptake quantification of many regions of skin. This approach, which uses inexpensive and nearly ubiquitous microscopes, provides a means for relatively simple, computer-aided drug quantification that can be used for cutaneous drug development and pharmacokinetic assessment.