Cell-vision fusion: A Swin transformer-based approach for predicting kinase inhibitor mechanism of action from Cell Painting data

Summary Image-based profiling of the cellular response to drug compounds has proven effective at characterizing the morphological changes resulting from perturbation experiments. As data availability increases, however, there are growing demands for novel deep-learning methods. We applied the SwinV2 computer vision architecture to predict the mechanism of action of 10 kinase inhibitor compounds directly from Cell Painting images. This method outperforms the standard approach of using image-based profiles (IBP)—multidimensional feature set representations generated by bioimaging software. Furthermore, our fusion approach—cell-vision fusion, combining three different data modalities, images, IBPs, and chemical structures—achieved 69.79% accuracy and 70.56% F1 score, 4.20% and 5.49% higher, respectively, than the best-performing IBP method. We provide three techniques, specific to Cell Painting images, which enable deep-learning architectures to train effectively and demonstrate approaches to combat the significant batch effects present in large Cell Painting datasets.


S2. Modality ablation study
Table S1 contains ablation study results whereby models were evaluated using combinations of two of the data modalities present -from images, image-based CellProfiler data and compound structures.The results show that the proposed combination of all three modalities (i.e.Cell-Vision Fusion) performs better than the two-modality approaches in accuracy, F1 score and precision.Excluding the compound structure data improved the recall, AUPR and AUC, however there is clearly a tradeoff between classification performance and ranking ability (displayed by the lower accuracy, F1 score and precision for this combination).
Compound structure is the least diverse set of data, as it presented at a compound level (i.e. each compound has a particular compound structure), whereas the image-based profile data is added at the well level and the images are at a field level (i.e.sub-well).This lack of diversity in the compound data likely lead to this reduction in ranking ability of the model combinations which included it.Additionally, when compound structure is used alongside only the image data, the combined model suffers from overfitting and the model results are substantially worse.In a larger, more diverse, dataset we would expect the inclusion of compound structure data to be more beneficial, as has been shown in prior research (1) .Cell-Vision Fusion, the fusion method created by this paper, uses all three data modalities -being images, image-based CellProfiler profiles and compound structures converted into Morgan Fingerprints.We have tested above how the model evaluation metrics are affected if only combinations of two modalities are used.

DepMap gene TPM values
Gene expression TPM values for the U2OS cell line were downloaded from the DepMap Public 23Q2 gene TPM values using the DepMap U2OS ID "ACH-000364".The broad kinase inhibitor classes found in the JUMP cpg0016 data were then associated with the specific underlying genes found in the DepMap data (Figure S2A) before the expression values were summed and sorted at a kinase inhibitor class level (Figure S2B).This provided an idea of the likely expression levels of the protein kinases found in U2OS cells which would then impact how significant an impact inhibiting them would have on the phenotypic response of the cell to each inhibitor.
From these classes, we considered which ones were appropriate for prediction, given the cpg0016 experiments had been conducted only using the U2OS cell line.Due to the cell type, there were multiple classes which will likely not produce a distinguishable phenotypic response that a model would be able to interpret.The brief rationale for each class is summarized below: • Vascular endothelial growth factor receptor (VEGFR): VEGF action is primarily targeted to endothelial, rather than epithelial cells (2) .As such, we would not expect to see a significant phenotypic response from this class of inhibitors when administered in U2OS cells which have an epithelial morphology.
• Platelet-derived growth factor receptor (PDGFR) and Fibroblast growth factor receptor (FGFR) Inhibitors: PDGFR and FGFR are primarily expressed in mesenchymal cells, including fibroblasts, smooth muscle cells, and pericytes, but not in epithelial cells (5) .
• Janus Kinase (JAK): The JAK-STAT pathway is triggered by the binding of cytokines to their respective receptors.Unlike immune cells, epithelial cells typically do not express high levels of cytokine receptors.Furthermore, JAK inhibitors, especially earlier generations, are known to exhibit low selectivity, leading to wide-ranging off-target effects (6) .Due to this, including the JAK class may introduce a lot of noise into the dataset, making it more difficult to classify other, more selective, classes.Despite this, Janus kinases play a central role in inflammation and therefore their impact can be significant across different cell types.
Whilst PI3K and EGFR demonstrated relatively low gene expression values (Figure S2B), the literature review highlighted the importance of the pathways they are directly involved in, as well as the fact that the upstream nature of these two classes of inhibitors should mean that they demonstrate phenotypic responses which will be appropriate to model.

S5. Quality control
Columns from the CellProfiler IBP feature sets were extracted and associated to three different quality control measures -blur, saturation and focus.In total, 25 blur features, 10 saturation features and 25 focus features were identified.These features were min-maxed normalized and plotted to show their distribution throughout our kinase inhibitor dataset.Figure S3 shows the distribution of the blur scores, coloured according to the source where the datapoint was produced.A stacked histogram of the composite, normalized blur scores for each datapoint within the kinase inhibitor dataset, coloured according to which source produced the data.Blur scores were calculated as a mean score from all CellProfiler features associated with the image quality control metric blur, being "ImageQual-ity_Correlation", "ImageQuality_PowerLogLogSlope" obtained from the CellProfiler manual.
The saturation and focus scores had to be transformed to be comparable to the blur scores.This was due to the fact that the saturation scores were highly skewed, whilst the focus scores were calculated such that a lower score meant lower quality (i.e.low focus), compared to blur scores where a low score meant higher quality (i.e.low blur).Therefore, the saturation scores were transformed into a normal distribution using sklearn's QuantileTransformer module, whilst the focus scores were inverted (Figure S4).On inspection of the images, however, both highly and lowly saturated images were recognised to be poor quality.
To combat this, a tanh transform was applied to the saturation scores to ensure all images scoring highly in this metric were reflective of a low quality image (Figure S5).

S6. Mechanism of Action -Class Distribution
Figure S7 shows the distribution of kinase inhibitor compounds according to both the source where the experiments took place (S7 -A, B) as well as the microscope used to take the images (S7 C,D).These visualizations were used to inspect whether there was any particular clustering of compound mechanism of action according to either of these two major confounding factors.
While there is only one sample in the dataset from Source 1 (an Aurora Kinase Inhibitor), and Sources

S8. Fusion performance compared to compound clinical phase
The Drug Repurposing Hub and ChEMBL data contain clinical phase information, reflecting how far along in the clinical development pipeline each compound was at the point of inclusion in the dataset.In Figure S10 we compare how this clinical phase impacts model performance, noting that generally as compounds get further along in development they are more accurately identified by the CVF Fusion model.This potentially reflects the fact that drugs which have been launched are more likely to effectively impact their intended target with less off-target effects compared to compounds which are still at the preclinical stage.

S9. Predicting microscope and data source
Table S3 shows results when the SwinV2 model was trained to predict either the source where the data was produced or the microscope used to record the image (as opposed to predict compound mechanism of action).This test was run to observe the impact that the two approaches to normalization and standardization of the image data had on the ability of the model to predict the main confounding factors.The results show that the model is very adept at classifying both microscope and source before any efforts were made to counteract batch effects.Using five-fold cross-validation the model was 99.69% accurate at predicting which microscope was being used and 98.11% accurate at predicting the source.After we applied both plate-wise, channelwise standardization (PCS) and per-channel normalization (PCN), the model accuracy dropped to 85.98% for microscope and 84.41% for source prediction.This provides evidence that the measures implemented in this paper are, to some extent, reducing the obscuring batch effects present in the data, allowing the model to focus on the biological signal to a greater degree.These results are backed up by the results in the main report.

S10. Study workflow
Figure S11 includes an overview of the workflow used in this study.Initially the JUMP dataset metadata is combined, before MOA labels are assigned to matched compounds using data from both the Drug Repurposing Hub and ChEMBL.Compounds labelled as kinase inhibitors were downloaded before a literature review excluded inhibitors which would likely elicit little to no consistent phenotypic effect in U2OS cells.Quality control was then devised to exclude samples where low quality images were taken by the microscope, this was done using three quality factors -blur, saturation and focus.The remaining samples were included within our kinase inhibitor dataset.
Different approaches to data processing were taken for each data modality, detailed in Figure S10  An overview of the workflow used in this study, describing the dataset selection process, subsequent different processing and modelling approaches, and eventual combination of the optimal approaches into the CVF Fusion architecture.
Information, Figure S2.Mapping of kinase inhibitor classes to DepMap data, related to Figure 1.A) Shows a dictionary used to map kinase classes identified in the cpg0016 dataset to the genes within the DepMap expression TPM value data.b) displays the value responses for each gene amalgamated at a kinase inhibitor class level, implying that classes that inhibit proteins associated with highly expressed genes (i.e.MEK, CDK inhibitors) will likely show a more significant phenotypic response in U2OS cells compared with those classes where associated gene expression is low (i.e.ALK, BTK).

Supplemental
Information, Figure S3.Distribution plot showing the levels of blur present in the dataset images, related to STAR Methods -Quality control.
FigureS7shows the distribution of kinase inhibitor compounds according to both the source where the experiments took place (S7 -A, B) as well as the microscope used to take the images (S7 C,D).These visualizations were used to inspect whether there was any particular clustering of compound mechanism of action according to either of these two major confounding factors.While there is only one sample in the dataset from Source 1 (an Aurora Kinase Inhibitor), and Sources 4 and 5 are over-indexed to the Cyclin-dependent kinase class, overall there is very little clear clustering observed in the below distribution plots which could lead to leakage during model training.Generally class samples are spread across multiple sources and microscopes within the data.

Supplemental
Information, Figure S10.CVF model accuracy plotted for compounds grouped according to their clinical phase of development, related to STAR Methods -Analysis and evaluation.The points are labelled with a number, reflecting the amound of compounds in that phase in the dataset, i.e., there are 44 in preclinical development out of 96 total in our data.The red dotted line denotes the overall CVF performance achieved, showing that for all compounds at Phase 1 or later, the CVF model classifies them with greater than average performance.
under the individual modalities.The image-based profiles represent the profiles output by the CellProfiler software which are included with the raw images in the cpg0016 JUMP dataset s3 bucket.Different models were tested for each modality before the best-performing processing techniques and model architectures were combined to form the CVF Fusion model architecture.Supplemental Information, Figure S11.Workflow overview, related to Figure 2.

Table S2 . Technical specifications of images produced by the different cpg0016 data sources, related to STAR Methods -Data acquisition.
TableS2contains a summary of the image file formats for each contributing source to the cpg0016 dataset.The information was extracted from image files downloaded from the cpg0016 AWS s3 bucket.Includes image resolution, bit depth, and compression as well as the different microscope names.

Table S3 . Image normalization and standardization ablation study, related to STAR Methods -Raw images: Normalization and standardization
.Ablation study performed by removing either per-channel normalization (PCN), plate-wise, channel-wise standardization (PCS) or both during model training when the SwinV2 model was used to predict either the data source or microscope used.