Motion Sensing Superpixels (MOSES): A systematic framework to quantify and discover cellular motion phenotypes

Cellular motion is fundamental in tissue development and homeostasis. There is strong interest in identifying factors that affect the interactions of cells in disease but analytical tools for robust and sensitive quantification in varying experimental conditions for large extended timelapse acquisitions is limited. We present Motion Sensing Superpixels (MOSES), a method to systematically capture diverse features of cellular dynamics. We quantify dynamic interactions between epithelial cell sheets using cell lines of the squamous and columnar epithelia in human normal esophagus, Barrett’s esophagus and esophageal adenocarcinoma and find unique boundary formation between squamous and columnar cells. MOSES also measured subtle changes in the boundary formation caused by external stimuli. The same conclusions of the 190 videos were arrived at unbiasedly with little prior knowledge using a visual motion map generated from unique MOSES motion ‘signatures’. MOSES is a versatile framework to measure, characterise and phenotype cellular interactions for high-content screens.


Introduction 43
In the development of multicellular organisms, different cell types expand and 44 migrate to form defined organ structures. Tissue development and homeostasis require 45 coordinated collective cellular motion. For example, in conditions such as wound healing, 46 immune and epithelial cells need to proliferate and migrate (1-3). Deregulation of key 47 signalling pathways in pathological conditions, including cancer, causes alterations in cellular 48 motion properties that are critical for disease development and progression, for example 49 leading to invasion and metastasis. Therefore there is strong biological and clinical need to 50 study precise cellular motion characteristics in a tissue-relevant context. An example of a 51 problem that would benefit from high-throughput computational analysis is the formation of 52 stable boundaries between homo-and heterotypic cell populations. When two cell 53 populations meet in-vivo, they often form a sharp, stable interface termed a 'boundary', with 54 limited intermingling (4). In adult humans, sharp boundaries separate different types of 55 epithelia: for example between the squamous and columnar epithelia in the esophagus, 56 cervix and anus. Disruption of these boundaries can lead to disease. Disruption of the 57 squamo-columnar epithelial boundary in Barrett's Esophagus (BE) confers a 30-50 fold 58 increased risk of esophageal adenocarcinoma (EAC) (5). Understanding how tissue 59 dynamics relates to pathological phenotypes and how it can be affected by intrinsic and 60 extrinsic factors is therefore a key issue.

62
Thus, there is a strong interest in live cell imaging for high throughput screening and 63 analysis to identify targets and drugs that can cause or prevent abnormal cellular motion (6-64 8). Such applications require the extraction of quantitative biologically relevant information 65 from complex time-lapse imaging data in a manner that allows quantification of phenotypic 66 variation under different experimental conditions. However, such tools are currently limited. 67 Existing experimental approaches mainly emphasise individual cell/cell interactions and 68 single cell migration. Studies of cell population interactions and migration have been limited. 69 Most assays use a single cell type (e.g. in a scratch wound healing assay (9)) or may not be 70 suitable for handling high variability among replicates (e.g. due to variable cell transfection, 71 seeding and growth rates (10, 11)) and are thus not suited to systematically analyse 72 interactions between populations of different cell types particularly for high content screens.

74
A suitable computational method for studying cell population dynamics, including in 75 a medium-or high-throughput manner, should be: (i) robust, i.e. able to handle inevitable 76 variations in image acquisition and experimental protocol; (ii) sensitive, i.e. able to detect 77 motion differences resulting from small changes in environment or stimuli with a minimum 78 number of replicates; (iii) automatic, not requiring manual intervention except from the initial 79 setting of parameters; and (iv) unbiased, able to characterise motion (e.g. as a motion 80 'signature') with minimal assumptions about motion behaviour. Existing approaches 81 including vertex models (12, 13), differential equations (14, 15), cellular automata ( To develop a motion analysis framework we established an experimental model to assess 119 boundary formation between squamous and columnar epithelia found at the esophageal 120 SCJ (Fig.1A). We used three epithelial cell lines: EPC2 (an immortalised squamous 121 epithelial cell line from the normal esophagus (27)); CP-A (an immortalised BE cell line with 122 columnar epithelial properties (28)); and OE33 (derived from EAC (29)). To model the 123 interfaces that occur in the esophagus we used the combinations: EPC2:EPC2 124 (squamous:squamous, as a normal control), EPC2:CP-A (squamous:columnar, as in 125 Barrett's esophagus) and EPC2:OE33 (squamous:cancer, as in EAC) (Fig. 1B). In this 126 experimental model (Fig.1C), two epithelial cell populations are co-cultured in the same well 127 of a 24 well plate, separated by a divider with width 500µm. The divider is then removed 128 after 12hr and cells allowed to migrate towards each other. Each cell population is labelled 129 with a lipophilic membrane dye, which provides uniform staining, low phototoxicity and can 130 be used in-vivo and to label primary cells (30). We tested the effects of the dye in two 131 populations of EPC2, filmed over 96 and 144 hrs (Supp. Movie 1). The green and red 132 fluorescent labelled EPC2 cells proliferated similarly, exhibiting the same cell density over 133 the time course (slope=0.976, Pearson correlation coefficient 0.978; automatically counted 134 from DAPI staining) (Fig.1D, Supp. Fig.1) and migrated similarly (assessed by the mean 135 squared displacement (31)) (Supp. Fig 2). Thus, the dye has a minimal impact on cell motion 136 behaviour. The effects of the dye were similarly checked in CP-A and OE33, (Fig. 1D,  137 Supp. Fig.1D, 2 and Supp. Movie 1).

139
We tested different combinations of the three epithelial cell lines, initially in serum-containing 140 media (5% FBS) (Fig. 1E). In all combinations both populations moved as epithelial sheets 141 (Supp. Movie 2). In the squamous EPC2:EPC2 combination, cells met and coalesced into a 142 monolayer. In the squamous-columnar EPC2:CP-A combination a stable boundary formed 143 between the two populations after 72 hrs, following a short period of CP-A pushing EPC2 144 (Supp. Movie 2, Fig. 1E). In the squamous:cancer EPC2:OE33 combination, the cancer cell 145 line OE33 pushed EPC2 out of the field of view.

147
Evidence from systems including Drosophila embryonic parasegment (32) and 148 anteroposterior and dorsoventral wing disc boundaries (33-35) suggest the importance of 149 physical properties of cell/cell interactions for boundary formation and that collective 150 migration is required for stable boundaries between epithelial populations. To test the 151 importance of physical contact between cells in boundary formation, we cultured the same 152 cell combinations in serum free medium to reduce cell-cell contacts (Supp. Fig. 3A). Under 153 these conditions we observed loss of collective sheet migration (Supp. Fig. 3B To address the computational challenges of analysing collective motion and cell interactions 162 in a robust and sensitive manner we developed MOSES. The MOSES framework is modular 163 and comprises three components ( Fig.2): 1) motion extraction; 2) track filtering; and 3) 164 "mesh" formulation. The first component is motion extraction from a video (steps 1-3, Fig.2) 165 using superpixel tracking. As we cannot rely on the tracking of individual cells in tissue, this 166 is done in a similar manner as the particle image velocimetry (PIV) approach for analysing 167 monolayers and migrating epithelial fronts (e.g. (9)). Avoiding shortcomings of cell 168 segmentation methods, the initial frame is divided into a specified number of regular regions 169 to generate superpixels; each superpixel is larger than one image pixel (36) (Step1, Fig.2). 170 The position of each superpixel is tracked frame-to-frame by updating its current (x,y) 171 position according to the mean displacement calculated from optical flow (37) (Step 2, Fig.2).

172
A superpixel track is generated by collecting the positions of a superpixel at all time points 173 (Step 3, Fig.2). The collection of all superpixel tracks summarises all the motion within the 174 video. The number of superpixels used should be chosen to sufficiently subsample the 175 spatial area of the image to capture all spatial variations in motion. In this paper, we used 176 1000 superpixels to cover an image size of 1344 x 1024 (with this setting an average 177 superpixel is 37 pixels x 37 pixels covering ≈ 14000μm 2 (2x lens) or ≈57000μm 2 (4x lens)) to 178 track the sheet dynamics. The video channel for each different coloured cell population (i.e. 179 red and green in our experimental system) is independently tracked in this manner.

181
As the superpixel tracks capture all the spatial motion dynamics in the video, some tracks 182 may not be relevant to the phenomenon of interest. Therefore, the second component is 183 track filtering (Step 4, Fig.2) and is specific to the experimental setup. In this step, 184 superpixels are assigned to cover the entire dynamic motion for each 'object' of interest. For 185 analysing epithelial sheets, the 'object' is the entire sheet; for single cell tracking the 'object' 186 is each individual cell in the frame. To assign superpixels to an object, normally each object 187 is first segmented from the initial video frame to produce a binary image where pixels have 188 value 1 if belonging to the object or 0 otherwise. A superpixel then belongs to an object if its 189 centroid lies in a pixel whose value is 1 in the respective binary image. However this 190 approach is problematic for variable image intensities. To overcome this, we developed an 191 intensity-independent segmentation method using the spatial layout of the superpixels for 192 migrating epithelial sheets (Supp. Fig. 4, SI Materials and Methods).

194
The third component of MOSES is a "mesh formulation" to capture the spatial context of 195 motion in tissue (Step 5, Fig.2 Fig.2). 207 We tested the accuracy of the superpixel motion extraction experimentally by adding a 208 sparse population of blue cells to the in-vitro setup described above (the blue cells being the 209 same cell type as the population to which they were added) (Supp. Fig. 5A). We tested all 210 three combinations of cell types (EPC2:EPC2, EPC2:CP-A and EPC2:OE33) and switched 211 the red/green dye and the spiked-in population, generating 23 videos (each 144hr). To 212 assess the performance of MOSES in tracking at particle level resolution (i.e. tracking the 213 spiked-in cells), we used tracks generated by TrackMate (38), a published single particle 214 tracker implemented in Fiji, as a benchmark. When the field of view was divided into 10000 215 superpixels in MOSES, the tracks were highly similar to TrackMate (Supp. Fig. 5B,C). To 216 assess if the motion of single cells could be inferred from sheet motion, 1000 superpixels 217 were used to track the motion of the sheet and the nearest superpixel tracks to each spiked-218 in cell were averaged to estimate the single cell trajectories and compared to TrackMate.

219
Although the precise individual cell trajectory was lost, the overall motion pattern of the 220 spiked in cells could be inferred (supp. Fig. 5B, C). Therefore, in confluent epithelial sheets 221 where cells are 'crowded', individual cells behave similarly to their neighbours so global 222 motion patterns can be used as a proxy to study single cell behaviour.

224
Quantitative measurement of squamous and columnar epithelial boundary formation 225 using MOSES 226 227 To test the ability of MOSES to generate quantitative measurements that can be used to 228 assess a specific biological feature, we used the example of stable boundary formation to 229 define three indices: i) boundary formation index; ii) motion stability index; and iii) maximum 230 velocity cross-correlation. Boundary formation and motion stability are mesh-derived 231 statistics. Maximum velocity cross-correlation is derived from the individual superpixel tracks.

233
The boundary formation index is a global measure of the likelihood of stable boundary 234 formation, by exploiting the notion that a boundary can be viewed as a spatially constrained 235 phenomenon. By historically aggregating the number of neighbouring superpixels within a 236 specified radius for each superpixel, we can generate a heatmap that spatially detects the 237 interface from which we can quantify the global likelihood of boundary formation in a simple 238 normalised signal to noise ratio index from 0-1, (Fig.3A); the higher the score the greater the 239 likelihood of boundary formation.

241
The motion stability index is a local measure that assesses the additional stability of the 242 interface between the two cell populations not captured by the boundary formation index, 243 where individual cells in the sheet may still be moving stochastically without affecting the 244 equilibrium spatial position and shape of the boundary. It is computed by subtracting from 1 245 the gradient of the normalised mesh strain curve, which measures the relative degree of 246 motion of superpixels with respect to its neighbours within the tissue. The normalised mesh 247 strain curve is a natural generalisation of the root mean squared displacement curve (RMSD) 248 (materials and methods) to incorporate collective motion. The higher the motion stability 249 index (maximum value of 1), the more stable the boundary and the cells at the boundary. 250 The motion stability index for a cell combination is computed from the averaged normal 251 mesh strain curve of the individual cell populations.

253
The maximum velocity cross correlation measures the degree that two epithelial sheets 254 move together as a single cohesive connected sheet by correlating each superpixel track in 255 one population (colour) with every superpixel track of the other population (colour), factoring 256 in time delays. It is an averaged normalised measure (0-1). For two initially diametrically 257 migrating sheets, a significant increase in this measure before and after closure of the gap is 258 indicative of increased interaction across the sheets. This might be due to increased cell-cell 259 contact in the combined monolayer leading to increased coordinated motion across larger 260 spatial scales. These descriptors/measures are just examples of statistics that can be 261 derived using MOSES. For further discussion and calculation of these metrics see SI 262 materials and methods.

264
We used MOSES to compute these indices for different combinations of the three cell lines 265 (EPC2, CP-A and OE33) in our experimental system. In total 125 videos (48 with 0% serum 266 and 77 with 5% serum) were collected from 4 independent experiments and jointly analysed. 267 These videos are highly heterogeneous and acquired under different conditions creating a 268 challenging dataset for analysis (Supp. Table. 1, Supp. Fig. 6). Cells grown in 0% serum 269 were used as negative control to set a cut-off for boundary formation (0.69) defined 270 statistically as one standard deviation higher than the pooled mean of all three combinations 271 (Fig.3B). Above this cut-off, cell combinations are categorised as forming a boundary. The 272 boundary formation index was highest (0.74) for EPC2:CP-A grown in 5% serum (n=30/77) 273 (Fig.3B, Supp. Fig. 7A). For EPC2:EPC2 and EPC2:OE33 in 5% serum the boundary 274 formation index was below the cut-off (Fig.3B). We also ranked the serum videos on a 275 continuous scale using the boundary formation index, which shows unbiasedly that the 276 majority of EPC2:CP-A videos are at the top of this ranking (Supp. Fig. 8). Similarly, 277 experiments in 0% serum were used to set the global motion stability threshold (0.87), one 278 standard deviation below the pooled mean of EPC2:EPC2 and EPC2:CP-A. Cells at the 279 interface of EPC2:OE33 are not stable and therefore not included in the pooled statistics. 280 Below this cut-off, cell combinations are categorised as forming unstable interfaces. In 5% 281 serum, EPC2:CP-A had the highest motion stability index with a median value of 0.90, 282 compared to EPC2:EPC2 (0.76) and EPC2:OE33 (0.25) (Fig.3C, supp. Fig. 7B). These 283 results illustrate that the squamous-columnar combination EPC2:CP-A forms a stable 284 boundary.

286
We measured sheet-sheet interactions using the maximum velocity cross-correlation before 287 and after gap closure for the three cell type combinations. In 0% serum there is no difference 288 in velocity cross-correlation across all combinations before and after gap closure (Fig.3D). 289 The two sheets do not move cohesively as a unit, which is to be expected with minimal cell-290 cell contact. In serum, there is greater cohesion due to cell-cell contact. We observe that the 291 difference for EPC2:CP-A (median value of 0.03 before and 0.20 after gap closure) was ~3-292 6 times larger than for EPC2:EPC2 (0.01 to 0.08) and EPC2:OE33 (0.02 to 0.05), (c.f. left 293 and right violins in Fig.3D). We note also that CP-A:OE33 (Barrett's:cancer, n=6) also 294 exhibited a substantial increase in velocity cross-correlation following gap closure (0.03 to 295 0.17) (Supp. Fig.7C,D). This is unlikely to be a feature of the CP-A cell line, as no substantial 296 increase was observed for CP-A:CP-A (0.01 to 0.06) (Supp. Fig.7C,D) For MOSES to be applicable for high-content imaging analyses, it needs to be able to 307 quantitatively monitor subtle changes in cellular motion dynamics with a minimal number of 308 replicates. As a test of the sensitivity of MOSES, we assessed whether it could detect subtle 309 changes in the EPC2:CP-A boundary caused by an external stimulus.

311
The main cause of BE is bile acid reflux (39, 40). Bile acid activates epidermal growth factor 312 receptor (EGFR) (41, 42), a receptor tyrosine kinase that is frequently mutated in EAC (43) 313 and sometimes overexpressed in BE (44, 45). We therefore used EGF, the ligand of EGFR, 314 as a stimulus to activate the EGFR signalling pathway. Increasing amounts of EGF (0ng/ml 315 to 20ng/ml) were added to the culture medium to assess incremental effects on cellular 316 motion and boundary formation in the EPC2:CP-A combination. A total of 40 videos (each 317 144hr) were collected from three independent experiments, in a 24-well plate medium 318 throughput screen. Viewing the mesh (Fig. 4A), shows the boundary position further away 319 from the initial point of contact between the two cell populations and decreased coherence of 320 the boundary with increasing EGF. This is also shown by the shape of the mean normalised 321 strain curve (Fig. 4B): at 0ng/ml EGF this curve linearly increases before plateauing around 322 72hrs; as EGF concentration increases, the curve becomes more linear and the plateau is 323 lost above 5ng/ml. The boundary formation index decreases with increasing EGF (0.74 at 324 0ng/ml to 0.46 at 20ng/ml), indicating that the boundary is lost (i.e. index below the 0.69 cut-325 off) (Fig. 4C). The index at 20ng/ml EGF is similar to that for EPC2:OE33 without EGF (0.46), 326 (Fig.4C). The interface becomes increasingly unstable and cells move more as the motion 327 stability index decreases from 0ng/ml (0.94, stable) to 20ng/ml (0.72, unstable) (Fig.4D). 328 Also, the interaction between the two cell populations is lost as the maximum velocity cross-329 correlation difference before and after gap closure decreased from 0ng/ml (0.16) to 20ng/ml 330 (0.04) EGF (Fig. 4E). The maximum velocity cross-correlation after gap closure is similar to 331 that for EPC2:OE33 (0.02) (Fig.4C), but the motion stability index remains higher (Fig.4D). 332 Altogether these measures show that above 5 ng/ml EGF the phenotype of EPC2:CP-A 333 becomes similar to that of the interaction between EPC2 and the EAC cell line OE33 (supp. 334 Movie 4).

336
We also developed an index of the level of disorder in the MOSES mesh (SI materials and 337 methods) (Supp. Fig. 9A), which captures the extent local cell populations move in opposing 338 directions with respect to their neighbours. The mesh disorder index showed statistically 339 significant increases with EGF concentration (0ng/ml: 0.708, 2.5ng/ml: 0.709, 5ng/ml: 0.709, 340 10ng/ml: 0.713, 20ng/ml: 0.715) (Fig.4F, Supp. Fig. 9B,C), suggesting collective sheet 341 motion is lost. This loss of collective motion is supported by a decrease in the standard 342 spatial correlation measure (SI materials and methods). However, using this standard 343 approach gives high statistical variance, (Supp. Fig. 9D). The effect of increased disorder is 344 visually subtle (supp. Movie. 4), but is clearly detected using the MOSES mesh and the 345 proposed mesh disorder index.

347
Titrating EGF in the absence of serum gave non-significant changes in the boundary 348 formation index (0ng/ml: 0.60±0.07, 20ng/ml: 0.65±0.03) and maximum velocity cross-349 correlation (supp. Fig.10 (n=25)). Decreasing motion stability index indicates increased cell 350 movement and increasing mesh disorder index suggests the absence of collective dynamics 351 required for boundary formation. In summary, this example with EGF in the context of our 352 experimental set-up shows that MOSES enables continuous-scale quantification of motion 353 after systematic perturbation in a medium-throughput 24-well format. The general process for motion map generation is illustrated in Fig. 5A. For each video to be 370 positioned on a 2D map, we applied principal components analysis (PCA) to the normalised 371 mesh strain curves for the 77 videos of cell combinations cultured in 5% serum conditions. 372 The mesh strain curve for each video (used above to define the motion stability index) is 373 used here as a 1D motion signature for summarising the entire video motion, (see SI 374 material and methods for generating other descriptive feature vectors). The map generated 375 for these 77 videos (Fig. 5B) shows that this unbiased approach has automatically clustered 376 together the videos for each cell type combination. Furthermore, the videos are ordered in a 377 continuous manner, as shown by the progressive transformation in the shape of the mean 378 normalised strain curve when looking across the plot in Fig. 5B from left to right, CP-A:CP-A 379 to EPC2:OE33 (i.e. the shape is increasingly linear). This result could not be achieved with 380 RMSD (Supp. Fig. 11), and is independent of the particular dimensionality reduction 381 technique used, (Supp. Fig. 12). Further, the 1D motion signatures derived from MOSES 382 could be used to train a machine learning classifier with no further processing to predict cell 383 combination identity better than RMSD, (Supp. Fig. 11).

385
To demonstrate how 2D motion phenotype maps generated by MOSES can be used to 386 compare videos, we next mapped the 48 videos from 0% serum cultures on the same axes 387 as the videos from 5% serum (Fig. 5C,D). The videos from 0% serum mapped to a different 388 area of the 2D plot, whilst preserving the continuous ordering of the previous videos. 389 Therefore, without having watched the videos it is easy to predict that the cells have 390 markedly different motion dynamics in 0% serum compared to 5% serum. Furthermore, 391 since the points for the 5% serum videos cover a larger area of the 2D plot than the 0% 392 serum videos, one would expect less diversity of motion in 0% serum, (Fig.5D).

394
The motion map can also capture subtle motion changes. This is demonstrated by mapping 395 the mean video motion for each concentration of EGF from 0-20ng/ml (represented by the 396 respective mean normalised strain curves for each concentration (1 per concentration from a 397 total n=40 videos, see Fig.4B)) onto the same axis as the 5% serum videos in the absence 398 of EGF (square points in Fig 5E). With increasing EGF, the EPC2:CP-A motion dynamics 399 become more similar to EPC2:OE33, as evidenced above 5ng/ml by the square points 400 moving from the area of blue circular EPC2:CP-A points into the area of orange circular 401 EPC2:OE33 points. Therefore, the motion map is consistent with the results obtained using 402 the specific derived indices (above

440
Cell lines and Tissue Culture. EPC2 and CP-A (ATCC) cells were grown in full KSFM 441 (Thermo Fisher), AGS (ATCC) and OE33 (ATCC) in full RPMI with 10% FBS. Both were 442 supplemented with glutamine and Penicillin streptomycin at 37 0 C and 5% CO 2 until 80% 443 confluent. To passage EPC2 and CP-A, cells were resuspended after trypsinization for 5 min 444 with PBS supplemented with soybean trypsin inhibitor (0.25 g/L, Sigma) to prevent cell death 445 prior to resuspension in KSFM. To store, cells were resuspended at a concentration of 10 6 446 cells/mL with 90% FBS + 10% DMSO freezing media following centrifugation and stored at -447 80 0 C before passing to liquid nitrogen storage. 448 449 Fluorescent Labelling. Cells were labelled using Celltracker Green CMFDA and Celltracker 450 Orange CMRA dyes (Life Technologies) according to protocol. Two different concentrations 451 2.5µM and 10µM were used. The lower concentration still permits tracking with less toxicity 452 concerns.

454
Temporary Divider Co-culture Assay. 70,000 labelled cells in 70µL of culture media were 455 seeded into each side of a cell culture insert (Ibidi). After 12h inserts were removed and the 456 well washed three times with PBS to remove non-attached cells before adding the desired 457 media (KSFM (0% serum in the text) or 1:1 mixture of KSFM:RPMI + 5%FBS (5% serum in 458 the text)) for filming. For the perturbations the effector, e.g. EGF, was also added to the 459 media at the stated concentrations.