Abstract
Cytometry is frequently used in immunological research, pre-clinical trials, clinical diagnosis, and monitoring of lymphomas, leukemia, and AIDS. However, analysis of modern high-throughput cytometric data presents great challenges for current computational tools due to the high dimensionality, large number of observations, as well as complex distributional features such as multimodality, asymmetry, and other non-normal characteristics. This chapter proposes a novel statistical approach that can automatically cluster and perform implicit dimension reduction of high-dimensional cytometry data. Our approach is also robust against non-normal distributional features such as heterogeneity and skewness that are typical in flow and mass cytometry data. By adopting a factor analytic model-based approach, the proposed framework is able to learn latent nonlinear low-dimensional representations of the data. It thus allows automatic segmentation of cell populations and quantification of the relative importance of each markers, while also facilitating visualization in low-dimensional space. The effectiveness of our approach is demonstrated on a large mass cytometry data, outperforming existing benchmark algorithms.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Aghaeepour N, Nikoloc R, Hoos HH, Brinkman RR (2011) Rapid cell population identification in flow cytometry data. Cytometry A 79:6–13
Aghaeepour N, Finak G, The FLOWCAP Consortium, The DREAM Consortium, Hoos H, Mosmann T, Gottardo R, Brinkman RR, Scheuermann RH (2013) Critical assessment of automated flow cytometry analysis techniques. Nat Methods 10:228–238
Arellano-Valle RB, Azzalini A (2006) On the unification of families of skew-normal distributions. Scand J Stat 33:561–574
Arellano-Valle RB, Genton MG (2005) On fundamental skew distributions. J Multivar Anal 96:93–116
Arellano-Valle RB, Branco MD, Genton MG (2006) A unified view on skewed distributions arising from selections. Can J Stat 34:581–601
Azzalini A (1985) A class of distributions which includes the normal ones. Scand J Stat 12:171–178
Azzalini A, Capitanio A (2003) Distributions generated by perturbation of symmetry with emphasis on a multivariate skew t-distribution. J R Stat Soc B 65:367–389
Azzalini A, Dalla Valle A (1996) The multivariate skew-normal distribution. Biometrika 83:715–726
Bendall SC, Simonds EF, Qiu P, Amir ED, Krutzik PO, Finck R (2011) Single-cell mass cytometry of differential immune and drug responses across a human hematopoietic continuum. Science 332:687–696
Branco MD, Dey DK (2001) A general class of multivariate skew-elliptical distributions. J Multivar Anal 79:99–113
Cabral CRB, Lachos VH, Prates MO (2012) Multivariate mixture modeling using skew-normal independent distributions. Comput Stat Data Anal 56:126–142
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soci B 39:1–38
Frühwirth-Schnatter S, Pyne S (2010) Bayesian inference for finite mixtures of univariate and multivariate skew-normal and skew-t distributions. Biostatistics 11:317–336
Ghahramani Z, Beal M (2000) Variational inference for Bayesian mixture of factor analysers. In: Solla S, Leen T, Muller KR (eds) Advances in Neural Information Processing Systems. MIT Press, Cambridge, p 449–455
Lee SX, McLachlan GJ (2013) Model-based clustering and classification with non-normal mixture distributions. Stat Methods Appl 22:427–454
Lee SX, McLachlan GJ (2013) On mixtures of skew-normal and skew t-distributions. Adv Data Anal Class 7:241–266
Lee SX, McLachlan GJ (2016) Finite mixtures of canonical fundamental skew t-distributions: The unification of the restricted and unrestricted skew t-mixture models. Stat Comput 26:573–589
Lee SX, McLachlan GJ, Pyne S (2016) Modelling of inter-sample variation in flow cytometric data with the Joint Clustering and Matching (JCM) procedure. Cytometry A 89:30–43
Levine JH, Simonds EF, Bendall SC, Davis KL, Amir ED, Tadmor MD, Nolan GP (2015) Data driven phenotypic dissection of aml reveals progenitor-like cells that correlate with prognosis. Cell 162:184–197
Ley C, Paindaveine D (2010) Multivariate skewing mechanisms: a unified perspective based on the transformation approach. Stat Prob Lett 80:1685–1694
Lin TI, McLachlan GJ, Lee SX (2016) Extending mixtures of factor models using the restricted multivariate skew-normal distribution. J Multiv Anal 143:398–413
McLachlan GJ, Lee SX (2016) Comment on ”On nomenclature for, and the relative merits of, two formulations of skew distributions” by A. Azzalini, R. Browne, M. Genton, and P. McNicholas. Stat Probab Lett 116:1–5
McLachlan GJ, Peel D (2000) Finite Mixture Models. Wiley, New York
McLachlan GJ, Peel D (2000) Mixtures of factor analyzers. In: Proceedings of the Seventeenth International Conference on Machine Learning. Morgan Kaufmann, San Francisco, pp 599–606
Mosmann TR, Naim I, Rebhahn J, Datta S, Cavenaugh JS, Weaver JM (2014) SWIFT – scalable clustering for automated identification of rare cell populations in large, high-dimensional flow cytometry datasets. Cytometry A 85A:422–433
Pyne S, Hu X, Wang K, Rossin E, Lin TI, Maier LM, Baecher-Allan C, McLachlan GJ, Tamayo P, Hafler DA, De Jager PL, Mesirow JP (2009) Automated high-dimensional flow cytometric data analysis. Proc Natl Acad Sci USA 106:8519–8524
Pyne S, Lee SX, Wang K, Irish J, Tamayo P, Nazaire MD, Duong T, Ng SK, Hafler D, Levy R, Nolan GP, Mesirov J, McLachlan G (2014) Joint modeling and registration of cell populations in cohorts of high-dimensional flow cytometric data. PLoS ONE 9:e100,334. https://doi.org/10.1371/journal.pone.0100334
Saeys Y, Van Gassen S, Lambrecht BN (2016) Computational flow cytometry: helping to make sense of high-dimensional immunology data. Nat Rev Immunol 16:449–462
Sahu SK, Dey DK, Branco MD (2003) A new class of multivariate skew distributions with applications to bayesian regression models. Can J Stat 31:129–150
Sorensen T, Baumgart S, Durek P, Grutzkau A, Haaupl T (2015) immunoClust – an automated analysis pipeline for the identification of immunophenotypic signatures in high-dimensional cytometric datasets. Cytometry A 87A:603–615
Van Gassen S, Callebaut B, Van Helden MJ, Lambrecht BN, Demeester P, Dhaene T (2015) FlowSOM: Using self-organizing maps for visualization and interpretation of cytometry data. Cytometry A 87A:636–645
Wang K, Ng SK, McLachlan GJ (2009) Multivariate skew t mixture models: applications to fluorescence-activated cell sorting data. In: Shi H, Zhang Y, Bottema MJ, Lovell BC, Maeder AJ (eds) Proceedings of Conference of Digital Image Computing: Techniques and Applications. IEEE, Los Alamitos, pp 526–531
Weber LM, Robinson MD (2016) Comparison of clustering methods for high-dimensional single-cell flow and mass cytometry data. Cytometry A 89A:1084–1096
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Lee, S.X., McLachlan, G.J., Pyne, S. (2021). Automated Gating and Dimension Reduction of High-Dimensional Cytometry Data. In: Molina-París, C., Lythe, G. (eds) Mathematical, Computational and Experimental T Cell Immunology. Springer, Cham. https://doi.org/10.1007/978-3-030-57204-4_16
Download citation
DOI: https://doi.org/10.1007/978-3-030-57204-4_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-57203-7
Online ISBN: 978-3-030-57204-4
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)