Abstract
As experiments continue to increase in size and scope, a fundamental challenge of subsequent analyses is to recast the wealth of information into an intuitive and readily interpretable form. Often, each measurement conveys only the relationship between a pair of entries, and it is difficult to integrate these local interactions across a dataset to form a cohesive global picture. The classic localization problem tackles this question, transforming local measurements into a global map that reveals the underlying structure of a system. Here, we examine the more challenging bipartite localization problem, where pairwise distances are available only for bipartite data comprising two classes of entries (such as antibody-virus interactions, drug-cell potency, or user-rating profiles). We modify previous algorithms to solve bipartite localization and examine how each method behaves in the presence of noise, outliers, and partially observed data. As a proof of concept, we apply these algorithms to antibody-virus neutralization measurements to create a basis set of antibody behaviors, formalize how potently inhibiting some viruses necessitates weakly inhibiting other viruses, and quantify how often combinations of antibodies exhibit degenerate behavior.
8 More- Received 26 July 2022
- Revised 31 December 2022
- Accepted 24 February 2023
DOI:https://doi.org/10.1103/PhysRevX.13.021002
Published by the American Physical Society under the terms of the Creative Commons Attribution 4.0 International license. Further distribution of this work must maintain attribution to the author(s) and the published article’s title, journal citation, and DOI.
Published by the American Physical Society
Physics Subject Headings (PhySH)
Popular Summary
As datasets grow and become more complex, they necessitate tools that build an intuitive understanding of a system by revealing its underlying structure. We look at the interactions between two classes of entries—a broad definition that includes datasets on how antibodies inhibit viruses, how transcription factors bind to DNA segments, and even how people rate movies. We develop three algorithms to embed these pairwise interactions into a low-dimensional space where the distance between entries corresponds to their interaction strength. Such embeddings not only predict unmeasured interactions, but they also provide a basis set of behaviors for a system. For example, given an antibody’s inhibition against one virus, we can predict its possible behaviors against other variants.
To create robust embeddings, we leverage tools from data science, geometric computation, and biophysics to explore three embedding schemes and quantify how they (and their combinations) can tolerate noise, missing measurements, and large outliers. We apply these methods to antibody-virus inhibition data and show that they conform with a 2D representation. Using this framework, we find that most two-antibody cocktails can be mimicked by a single antibody, whereas cocktails with three or more antibodies often exhibit novel behavior that no single antibody can replicate.
Such embeddings can be applied to diverse systems to harness a wealth of available data and extrapolate new behaviors. This not only amplifies the amount of data but also provides insight into the underlying trade-offs and constraints of a system. A key open question is how far such extrapolations can be pushed before they break.