Skip to main content

Automated Gating and Dimension Reduction of High-Dimensional Cytometry Data

  • Chapter
  • First Online:
Mathematical, Computational and Experimental T Cell Immunology

Abstract

Cytometry is frequently used in immunological research, pre-clinical trials, clinical diagnosis, and monitoring of lymphomas, leukemia, and AIDS. However, analysis of modern high-throughput cytometric data presents great challenges for current computational tools due to the high dimensionality, large number of observations, as well as complex distributional features such as multimodality, asymmetry, and other non-normal characteristics. This chapter proposes a novel statistical approach that can automatically cluster and perform implicit dimension reduction of high-dimensional cytometry data. Our approach is also robust against non-normal distributional features such as heterogeneity and skewness that are typical in flow and mass cytometry data. By adopting a factor analytic model-based approach, the proposed framework is able to learn latent nonlinear low-dimensional representations of the data. It thus allows automatic segmentation of cell populations and quantification of the relative importance of each markers, while also facilitating visualization in low-dimensional space. The effectiveness of our approach is demonstrated on a large mass cytometry data, outperforming existing benchmark algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 249.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 249.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Aghaeepour N, Nikoloc R, Hoos HH, Brinkman RR (2011) Rapid cell population identification in flow cytometry data. Cytometry A 79:6–13

    Article  Google Scholar 

  2. Aghaeepour N, Finak G, The FLOWCAP Consortium, The DREAM Consortium, Hoos H, Mosmann T, Gottardo R, Brinkman RR, Scheuermann RH (2013) Critical assessment of automated flow cytometry analysis techniques. Nat Methods 10:228–238

    Article  Google Scholar 

  3. Arellano-Valle RB, Azzalini A (2006) On the unification of families of skew-normal distributions. Scand J Stat 33:561–574

    Article  Google Scholar 

  4. Arellano-Valle RB, Genton MG (2005) On fundamental skew distributions. J Multivar Anal 96:93–116

    Article  Google Scholar 

  5. Arellano-Valle RB, Branco MD, Genton MG (2006) A unified view on skewed distributions arising from selections. Can J Stat 34:581–601

    Article  Google Scholar 

  6. Azzalini A (1985) A class of distributions which includes the normal ones. Scand J Stat 12:171–178

    Google Scholar 

  7. Azzalini A, Capitanio A (2003) Distributions generated by perturbation of symmetry with emphasis on a multivariate skew t-distribution. J R Stat Soc B 65:367–389

    Article  Google Scholar 

  8. Azzalini A, Dalla Valle A (1996) The multivariate skew-normal distribution. Biometrika 83:715–726

    Article  Google Scholar 

  9. Bendall SC, Simonds EF, Qiu P, Amir ED, Krutzik PO, Finck R (2011) Single-cell mass cytometry of differential immune and drug responses across a human hematopoietic continuum. Science 332:687–696

    Article  CAS  Google Scholar 

  10. Branco MD, Dey DK (2001) A general class of multivariate skew-elliptical distributions. J Multivar Anal 79:99–113

    Article  Google Scholar 

  11. Cabral CRB, Lachos VH, Prates MO (2012) Multivariate mixture modeling using skew-normal independent distributions. Comput Stat Data Anal 56:126–142

    Article  Google Scholar 

  12. Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soci B 39:1–38

    Google Scholar 

  13. Frühwirth-Schnatter S, Pyne S (2010) Bayesian inference for finite mixtures of univariate and multivariate skew-normal and skew-t distributions. Biostatistics 11:317–336

    Article  Google Scholar 

  14. Ghahramani Z, Beal M (2000) Variational inference for Bayesian mixture of factor analysers. In: Solla S, Leen T, Muller KR (eds) Advances in Neural Information Processing Systems. MIT Press, Cambridge, p 449–455

    Google Scholar 

  15. Lee SX, McLachlan GJ (2013) Model-based clustering and classification with non-normal mixture distributions. Stat Methods Appl 22:427–454

    Article  Google Scholar 

  16. Lee SX, McLachlan GJ (2013) On mixtures of skew-normal and skew t-distributions. Adv Data Anal Class 7:241–266

    Article  Google Scholar 

  17. Lee SX, McLachlan GJ (2016) Finite mixtures of canonical fundamental skew t-distributions: The unification of the restricted and unrestricted skew t-mixture models. Stat Comput 26:573–589

    Article  Google Scholar 

  18. Lee SX, McLachlan GJ, Pyne S (2016) Modelling of inter-sample variation in flow cytometric data with the Joint Clustering and Matching (JCM) procedure. Cytometry A 89:30–43

    Article  Google Scholar 

  19. Levine JH, Simonds EF, Bendall SC, Davis KL, Amir ED, Tadmor MD, Nolan GP (2015) Data driven phenotypic dissection of aml reveals progenitor-like cells that correlate with prognosis. Cell 162:184–197

    Article  CAS  Google Scholar 

  20. Ley C, Paindaveine D (2010) Multivariate skewing mechanisms: a unified perspective based on the transformation approach. Stat Prob Lett 80:1685–1694

    Article  Google Scholar 

  21. Lin TI, McLachlan GJ, Lee SX (2016) Extending mixtures of factor models using the restricted multivariate skew-normal distribution. J Multiv Anal 143:398–413

    Article  Google Scholar 

  22. McLachlan GJ, Lee SX (2016) Comment on ”On nomenclature for, and the relative merits of, two formulations of skew distributions” by A. Azzalini, R. Browne, M. Genton, and P. McNicholas. Stat Probab Lett 116:1–5

    Article  Google Scholar 

  23. McLachlan GJ, Peel D (2000) Finite Mixture Models. Wiley, New York

    Book  Google Scholar 

  24. McLachlan GJ, Peel D (2000) Mixtures of factor analyzers. In: Proceedings of the Seventeenth International Conference on Machine Learning. Morgan Kaufmann, San Francisco, pp 599–606

    Google Scholar 

  25. Mosmann TR, Naim I, Rebhahn J, Datta S, Cavenaugh JS, Weaver JM (2014) SWIFT – scalable clustering for automated identification of rare cell populations in large, high-dimensional flow cytometry datasets. Cytometry A 85A:422–433

    Article  Google Scholar 

  26. Pyne S, Hu X, Wang K, Rossin E, Lin TI, Maier LM, Baecher-Allan C, McLachlan GJ, Tamayo P, Hafler DA, De Jager PL, Mesirow JP (2009) Automated high-dimensional flow cytometric data analysis. Proc Natl Acad Sci USA 106:8519–8524

    Article  CAS  Google Scholar 

  27. Pyne S, Lee SX, Wang K, Irish J, Tamayo P, Nazaire MD, Duong T, Ng SK, Hafler D, Levy R, Nolan GP, Mesirov J, McLachlan G (2014) Joint modeling and registration of cell populations in cohorts of high-dimensional flow cytometric data. PLoS ONE 9:e100,334. https://doi.org/10.1371/journal.pone.0100334

    Article  Google Scholar 

  28. Saeys Y, Van Gassen S, Lambrecht BN (2016) Computational flow cytometry: helping to make sense of high-dimensional immunology data. Nat Rev Immunol 16:449–462

    Article  CAS  Google Scholar 

  29. Sahu SK, Dey DK, Branco MD (2003) A new class of multivariate skew distributions with applications to bayesian regression models. Can J Stat 31:129–150

    Article  Google Scholar 

  30. Sorensen T, Baumgart S, Durek P, Grutzkau A, Haaupl T (2015) immunoClust – an automated analysis pipeline for the identification of immunophenotypic signatures in high-dimensional cytometric datasets. Cytometry A 87A:603–615

    Article  Google Scholar 

  31. Van Gassen S, Callebaut B, Van Helden MJ, Lambrecht BN, Demeester P, Dhaene T (2015) FlowSOM: Using self-organizing maps for visualization and interpretation of cytometry data. Cytometry A 87A:636–645

    Article  Google Scholar 

  32. Wang K, Ng SK, McLachlan GJ (2009) Multivariate skew t mixture models: applications to fluorescence-activated cell sorting data. In: Shi H, Zhang Y, Bottema MJ, Lovell BC, Maeder AJ (eds) Proceedings of Conference of Digital Image Computing: Techniques and Applications. IEEE, Los Alamitos, pp 526–531

    Google Scholar 

  33. Weber LM, Robinson MD (2016) Comparison of clustering methods for high-dimensional single-cell flow and mass cytometry data. Cytometry A 89A:1084–1096

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sharon X. Lee .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Lee, S.X., McLachlan, G.J., Pyne, S. (2021). Automated Gating and Dimension Reduction of High-Dimensional Cytometry Data. In: Molina-París, C., Lythe, G. (eds) Mathematical, Computational and Experimental T Cell Immunology. Springer, Cham. https://doi.org/10.1007/978-3-030-57204-4_16

Download citation

Publish with us

Policies and ethics