Abstract
Mass spectrometric immuno assays (MSIA) can now measure multiple modified forms of a protein in large cohorts of patients. These measurements consist of the relative abundances of proteoforms, and are well-suited for the compositional data analysis statistical framework. In this article, we describe an approach to the analysis of relative abundance of proteoforms from MSIA data using the compositional framework. We demonstrate the application of these concepts by exploring the association of human serum albumin’s posttranslational modifications and kidney function in patients with Type 2 diabetes mellitus. Finally, we discuss the pitfalls of ignoring the compositional nature of such data, and highlight emerging applications demonstrating the generality of the framework.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Aitchison, J. (1982). The statistical analysis of compositional data. Journal of the Royal Statistical Society. Series B (Methodological), 139–177. http://www.jstor.org/stable/10.2307/2345821.
Aitchison, J. (1983). Principal component analysis of compositional data. Biometrika, 70(1), 57–65. http://biomet.oxfordjournals.org/content/70/1/57.short.
Aitchison, J. (2001). Simplicial inference. In M. A. G. Viana & D. S. P. Richards (Eds.), Algebraic methods in statistics and probability. Contemporary mathematics (Vol. 287). Providence, RI: American Mathematical Society. doi:10.1090/conm/287. http://www.ams.org/conm/287/.
Aitchison, J., & Greenacre, M. (2002). Biplots of compositional data. Applied Statistics, 51(4), 375–392. doi:10.1111/1467-9876.00275. http://dx.doi.org/10.1111/1467-9876.00275.
Aitchison, J., & Shen, S. M. (1980). Logistic-normal distributions: Some properties and uses. Biometrika, 67(2), 261–272. http://biomet.oxfordjournals.org/content/67/2/261.short.
Billheimer, D., Guttorp, P., & Fagan, W. F. (2001). Statistical interpretation of species composition. Journal of the American Statistical Association, 96(456), 1205–1214. doi:10.1198/016214501753381850. http://www.tandfonline.com/doi/abs/10.1198/016214501753381850.
Borges, C. R., Rehder, D. S., Jensen, S., Schaab, M. R., Sherma, N. D., Yassine, H., et al. (2014). Elevated plasma albumin and apolipoprotein A-I oxidation under suboptimal specimen storage conditions. Molecular & Cellular Proteomics: MCP, 13(7), 1890–1899. doi:10.1074/mcp.M114.038455. http://www.mcponline.org/cgi/doi/10.1074/mcp.M114.038455.
Chammas, R., Sonnenburg, J. L., Watson, N. E., Tai, T., Farquhar, M. G., Varki, N. M., et al. (1999). De-N-acetyl-gangliosides in humans: Unusual subcellular distribution of a novel tumor antigen. Cancer Research, 59(6), 1337–1346. http://cancerres.aacrjournals.org/content/59/6/1337.full.
Chanturia, G., Birdsell, D. N., Kekelidze, M., Zhgenti, E., Babuadze, G., Tsertsvadze, N., et al. (2011). Phylogeography of Francisella tularensis subspecies holarctica from the country of Georgia. BMC Microbiology, 11(1), 139. doi:10.1186/1471-2180-11-139. http://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=pubmed&id=21682874&retmode=ref&cmd=prlinks.
Consortium, T. H. M. P. (2012). A framework for human microbiome research. Nature, 486(7402), 215–221. doi:10.1038/nature11209. http://dx.doi.org/10.1038/nature11209.
Egozcue, J. J., & Barcelo-Vidal, C. (2011). Elements of simplicial linear algebra and geometry. In V. Pawlowsky-Glahn & A. Buccianti (Eds.), Compositional data analysis (pp. 141–156). New York: Wiley. doi:10.1002/9781119976462.ch4. http://dx.doi.org/10.1002/9781119976462.ch4.
Egozcue, J. J., & Pawlowsky-Glahn, V. (2006). Simplicial geometry for compositional data. Geological Society, London, Special Publications, 264(1), 145–159. http://sp.lyellcollection.org/content/264/1/145.short.
Karve, T. M., & Cheema, A. K. (2011). Small changes huge impact: The role of protein posttranslational modifications in cellular homeostasis and disease. Journal of Amino Acids, 2011(2), 1–13. doi:10.4061/2011/207691. http://www.hindawi.com/journals/jaa/2011/207691/.
Li, H. (2015). Microbiome, metagenomics, and high-dimensional compositional data analysis. Annual Review of Statistics and Its Application, 2(1), 73–94. doi:10.1146/annurev-statistics-010814-020351. http://dx.doi.org/10.1146/annurev-statistics-010814-020351.
Lovell, D., Müller, W., Taylor, J., & Zwart, A. (2011). Proportions, percentages, ppm: Do the molecular biosciences treat compositional data right. In: V. Pawlowsky-Glahn & A. Buccianti (Eds.), Compositional data analysis. New York: Wiley. http://books.google.com/books?hl=en&lr=&id=Ggpj3QeDoKQC&oi=fnd&pg=PT215&dq=Proportions+Percentages+PPM+Do+the+Molecular+BiosciencesTreat+Compositional+Data+Right&ots=cII3kxnfSb&sig=icwOFojg2zPXj2WPUj9IQ2K4MCk.
Martín-Fernández, J. A., Palarea-Albaladejo, J., & Olea, R. A. (2011). Dealing with zeros. In: V. Pawlowsky-Glahn & A. Buccianti (Eds.), Compositional data analysis (pp. 43–58). New York: Wiley. doi:10.1002/9781119976462.ch4. http://dx.doi.org/10.1002/9781119976462.ch4.
Nagumo, K., Tanaka, M., Chuang, V. T. G., Setoyama, H., Watanabe, H., Yamada, N., et al. (2014). Cys34-cysteinylated human serum albumin is a sensitive plasma marker in oxidative stress-related chronic diseases. PloS One, 9(1), e85,216–9. doi:10.1371/journal.pone.0085216. http://dx.plos.org/10.1371/journal.pone.0085216.
Nelson, R. W., Krone, J. R., Bieber, A. L., & Williams, P. (1995). Mass-spectrometric immunoassay. Analytical Chemistry, 67(7), 1153–1158. doi:10.1021/ac00103a003. http://pubs.acs.org/doi/abs/10.1021/ac00103a003.
Pallen, M. J. (2014). Diagnostic metagenomics: Potential applications to bacterial, viral and parasitic infections. Parasitology, 141(14), 1856–1862. doi:10.1017/S0031182014000134. http://www.journals.cambridge.org/abstract_S0031182014000134.
Pawlowsky-Glahn, V., & Buccianti, A. (2011). Compositional data analysis. Theory and Applications. New York: Wiley. http://books.google.com/books?id=Ggpj3QeDoKQC&printsec=frontcover&dq=intitle:Compositional+Data+Analysis+Theory+and+Applications&hl=&cd=1&source=gbs_api.
Pawlowsky-Glahn, V., & Egozcue, J. J. (2001). Geometric approach to statistical analysis on the simplex. Stochastic Environmental Research and Risk Assessment, 15(5), 384–398. doi:10.1007/s004770100077. http://link.springer.com/10.1007/s004770100077.
Peleg, S., Sananbenesi, F., Zovoilis, A., Burkhardt, S., Bahari-Javan, S., Agis-Balboa, R. C., et al. (2010). Altered histone acetylation is associated with age-dependent memory impairment in mice. Science, 328(5979), 753–756. doi:10.1126/science.1186088. http://www.sciencemag.org/content/328/5979/753.full.
Teeling, H., & Glockner, F. O. (2012). Current opportunities and challenges in microbial metagenome analysis–a bioinformatic perspective. Briefings in Bioinformatics, 13(6), 728–742. doi:10.1093/bib/bbs039. http://bib.oxfordjournals.org/cgi/doi/10.1093/bib/bbs039.
Thomas, T., Gilbert, J., & Meyer, F. (2012). Metagenomics - a guide from sampling to data analysis. Microbial Informatics and Experimentation, 2(1), 3. doi:10.1186/2042-5783-2-3. http://www.microbialinformaticsj.com/content/2/1/3.
Trenchevska, O., Schaab, M. R., Nelson, R. W., & Nedelkov, D. (2015). Development of multiplex mass spectrometric immunoassay for detection and quantification of apolipoproteins C-I, C-II, C-III and their proteoforms. Methods, 1–7. doi:10.1016/j.ymeth.2015.02.020. http://dx.doi.org/10.1016/j.ymeth.2015.02.020.
Vos, F. E., Schollum, J. B., & Walker, R. J. (2011). Glycated albumin is the preferred marker for assessing glycaemic control in advanced chronic kidney disease. Clinical Kidney Journal, 4(6), 368–375. doi:10.1093/ndtplus/sfr140. http://ckj.oxfordjournals.org/cgi/doi/10.1093/ndtplus/sfr140.
Walsh, C. T., & Tsodikova, S. G. (2005). Protein posttranslational modifications: The chemistry of proteome diversifications. Angewandte Chemie International Edition in English. http://onlinelibrary.wiley.com/doi/10.1002/anie.200501023/full.
Acknowledgements
The authors would like to thank Dr. Borges for guidance on various mechanisms that can lead to higher proportions of cysteinylated albumin proteoforms in MSIA data. Acknowledgment is also due to National Institutes of Health (NIH) for supporting the work presented in this chapter under awards numbered R24DK090958-01A1 and P30ES006694.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix
Appendix
1.1 A Synopsis on Compositional Framework
Compositional data describe the proportion that each of D components contributes to the whole. Coherence of inference between subset of components and the whole composition, also called subcompositional coherence, is an important feature of the analysis. Scale invariance is essential for such analysis.
A D part composition is defined as an element of the d-dimensional positive simplex
where d = D − 1. We will use the convention d = D − 1 in the rest of Appendix. Thus the sample space is a subset of the space \(\mathcal{S}^{\mathrm{d}}\).
Let C denote the closure operator on \(\mathbb{R}^{\mathrm{D}}\) which normalizes the vector to a unit sum. That is, for \(z \in \mathbb{R}^{\mathrm{D}}\),
Also, we define two additional operations. For any two elements x = (x 1, x 2, …, x D) and \(y = (y_{1},y_{2},\ldots,y_{\mathrm{D}}) \in \mathcal{S}^{\mathrm{d}}\) and for \(\alpha \in \mathbb{R}\), we define
The operations in Eqs. (7) and (8) are called the perturbation and power operators, respectively. With perturbation operator as addition and power operator as the scalar multiplication, \(\mathcal{S}^{\mathrm{d}}\) acquires the structure of a d-dimensional Hilbert space [6, 21] with a metric given by (2).
The additive log ratio transform defined as:
and the centered log ratio (3) are alternative co-ordinate systems on this space.
The dependence structure induced by the unit sum (compositional) constraint is often addressed by using the class of logistic normal distributions as appropriate models of the data. For illustration, consider an element \(x = (x_{1},x_{2},\ldots,x_{\mathrm{D}}) \in \mathcal{S}^{\mathrm{d}}\). Following Aitchison [1] the pullback of the multivariate normal distribution using ϕ from Eq. (9) is the logistic normal density function given by:
where μ is the location parameter in \(\mathbb{R}^{\mathrm{d}}\) and Σ is the d × d variance–covariance matrix, (∏ i = 1 D x i )−1 is the Jacobian of the transformation. In the following, we will denote this d-dimensional logistic normal distribution by \(\mathcal{L}\mathcal{N}_{\mathrm{d}}\) and A′ will denote the transpose if A is a matrix.
The part S composition is orthogonal projection of the full composition, with respect to the inner product on \(\mathcal{S}^{\mathrm{d}}\) [12]. The class preserving property of logistic normal distributions, i.e., if \(x \in \mathcal{L}\mathcal{N}_{D}(\mu,\varSigma )\) and A is an n × D matrix, then \(Ax \in \mathcal{L}\mathcal{N}_{n}(A\mu,A\varSigma A')\) [5], ensures that the orthogonal projections satisfy the property of subcompositional coherence.
A comprehensive review of analytical techniques and applications of compositional framework is available in the book Pawlowsky-Glahn et al. [20].
Rights and permissions
Copyright information
© 2017 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Sinari, S., Nedelkov, D., Reaven, P., Billheimer, D. (2017). The Analysis of Human Serum Albumin Proteoforms Using Compositional Framework. In: Datta, S., Mertens, B. (eds) Statistical Analysis of Proteomics, Metabolomics, and Lipidomics Data Using Mass Spectrometry. Frontiers in Probability and the Statistical Sciences. Springer, Cham. https://doi.org/10.1007/978-3-319-45809-0_8
Download citation
DOI: https://doi.org/10.1007/978-3-319-45809-0_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-45807-6
Online ISBN: 978-3-319-45809-0
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)