Skip to main content

The Analysis of Human Serum Albumin Proteoforms Using Compositional Framework

  • Chapter
  • First Online:
Statistical Analysis of Proteomics, Metabolomics, and Lipidomics Data Using Mass Spectrometry

Part of the book series: Frontiers in Probability and the Statistical Sciences ((FROPROSTAS))

Abstract

Mass spectrometric immuno assays (MSIA) can now measure multiple modified forms of a protein in large cohorts of patients. These measurements consist of the relative abundances of proteoforms, and are well-suited for the compositional data analysis statistical framework. In this article, we describe an approach to the analysis of relative abundance of proteoforms from MSIA data using the compositional framework. We demonstrate the application of these concepts by exploring the association of human serum albumin’s posttranslational modifications and kidney function in patients with Type 2 diabetes mellitus. Finally, we discuss the pitfalls of ignoring the compositional nature of such data, and highlight emerging applications demonstrating the generality of the framework.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Aitchison, J. (1982). The statistical analysis of compositional data. Journal of the Royal Statistical Society. Series B (Methodological), 139–177. http://www.jstor.org/stable/10.2307/2345821.

  2. Aitchison, J. (1983). Principal component analysis of compositional data. Biometrika, 70(1), 57–65. http://biomet.oxfordjournals.org/content/70/1/57.short.

    Article  MathSciNet  MATH  Google Scholar 

  3. Aitchison, J. (2001). Simplicial inference. In M. A. G. Viana & D. S. P. Richards (Eds.), Algebraic methods in statistics and probability. Contemporary mathematics (Vol. 287). Providence, RI: American Mathematical Society. doi:10.1090/conm/287. http://www.ams.org/conm/287/.

  4. Aitchison, J., & Greenacre, M. (2002). Biplots of compositional data. Applied Statistics, 51(4), 375–392. doi:10.1111/1467-9876.00275. http://dx.doi.org/10.1111/1467-9876.00275.

  5. Aitchison, J., & Shen, S. M. (1980). Logistic-normal distributions: Some properties and uses. Biometrika, 67(2), 261–272. http://biomet.oxfordjournals.org/content/67/2/261.short.

    Article  MathSciNet  MATH  Google Scholar 

  6. Billheimer, D., Guttorp, P., & Fagan, W. F. (2001). Statistical interpretation of species composition. Journal of the American Statistical Association, 96(456), 1205–1214. doi:10.1198/016214501753381850. http://www.tandfonline.com/doi/abs/10.1198/016214501753381850.

  7. Borges, C. R., Rehder, D. S., Jensen, S., Schaab, M. R., Sherma, N. D., Yassine, H., et al. (2014). Elevated plasma albumin and apolipoprotein A-I oxidation under suboptimal specimen storage conditions. Molecular & Cellular Proteomics: MCP, 13(7), 1890–1899. doi:10.1074/mcp.M114.038455. http://www.mcponline.org/cgi/doi/10.1074/mcp.M114.038455.

  8. Chammas, R., Sonnenburg, J. L., Watson, N. E., Tai, T., Farquhar, M. G., Varki, N. M., et al. (1999). De-N-acetyl-gangliosides in humans: Unusual subcellular distribution of a novel tumor antigen. Cancer Research, 59(6), 1337–1346. http://cancerres.aacrjournals.org/content/59/6/1337.full.

    Google Scholar 

  9. Chanturia, G., Birdsell, D. N., Kekelidze, M., Zhgenti, E., Babuadze, G., Tsertsvadze, N., et al. (2011). Phylogeography of Francisella tularensis subspecies holarctica from the country of Georgia. BMC Microbiology, 11(1), 139. doi:10.1186/1471-2180-11-139. http://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=pubmed&id=21682874&retmode=ref&cmd=prlinks.

  10. Consortium, T. H. M. P. (2012). A framework for human microbiome research. Nature, 486(7402), 215–221. doi:10.1038/nature11209. http://dx.doi.org/10.1038/nature11209.

  11. Egozcue, J. J., & Barcelo-Vidal, C. (2011). Elements of simplicial linear algebra and geometry. In V. Pawlowsky-Glahn & A. Buccianti (Eds.), Compositional data analysis (pp. 141–156). New York: Wiley. doi:10.1002/9781119976462.ch4. http://dx.doi.org/10.1002/9781119976462.ch4.

  12. Egozcue, J. J., & Pawlowsky-Glahn, V. (2006). Simplicial geometry for compositional data. Geological Society, London, Special Publications, 264(1), 145–159. http://sp.lyellcollection.org/content/264/1/145.short.

    Article  MATH  Google Scholar 

  13. Karve, T. M., & Cheema, A. K. (2011). Small changes huge impact: The role of protein posttranslational modifications in cellular homeostasis and disease. Journal of Amino Acids, 2011(2), 1–13. doi:10.4061/2011/207691. http://www.hindawi.com/journals/jaa/2011/207691/.

  14. Li, H. (2015). Microbiome, metagenomics, and high-dimensional compositional data analysis. Annual Review of Statistics and Its Application, 2(1), 73–94. doi:10.1146/annurev-statistics-010814-020351. http://dx.doi.org/10.1146/annurev-statistics-010814-020351.

  15. Lovell, D., Müller, W., Taylor, J., & Zwart, A. (2011). Proportions, percentages, ppm: Do the molecular biosciences treat compositional data right. In: V. Pawlowsky-Glahn & A. Buccianti (Eds.), Compositional data analysis. New York: Wiley. http://books.google.com/books?hl=en&lr=&id=Ggpj3QeDoKQC&oi=fnd&pg=PT215&dq=Proportions+Percentages+PPM+Do+the+Molecular+BiosciencesTreat+Compositional+Data+Right&ots=cII3kxnfSb&sig=icwOFojg2zPXj2WPUj9IQ2K4MCk.

    Google Scholar 

  16. Martín-Fernández, J. A., Palarea-Albaladejo, J., & Olea, R. A. (2011). Dealing with zeros. In: V. Pawlowsky-Glahn & A. Buccianti (Eds.), Compositional data analysis (pp. 43–58). New York: Wiley. doi:10.1002/9781119976462.ch4. http://dx.doi.org/10.1002/9781119976462.ch4.

  17. Nagumo, K., Tanaka, M., Chuang, V. T. G., Setoyama, H., Watanabe, H., Yamada, N., et al. (2014). Cys34-cysteinylated human serum albumin is a sensitive plasma marker in oxidative stress-related chronic diseases. PloS One, 9(1), e85,216–9. doi:10.1371/journal.pone.0085216. http://dx.plos.org/10.1371/journal.pone.0085216.

  18. Nelson, R. W., Krone, J. R., Bieber, A. L., & Williams, P. (1995). Mass-spectrometric immunoassay. Analytical Chemistry, 67(7), 1153–1158. doi:10.1021/ac00103a003. http://pubs.acs.org/doi/abs/10.1021/ac00103a003.

  19. Pallen, M. J. (2014). Diagnostic metagenomics: Potential applications to bacterial, viral and parasitic infections. Parasitology, 141(14), 1856–1862. doi:10.1017/S0031182014000134. http://www.journals.cambridge.org/abstract_S0031182014000134.

  20. Pawlowsky-Glahn, V., & Buccianti, A. (2011). Compositional data analysis. Theory and Applications. New York: Wiley. http://books.google.com/books?id=Ggpj3QeDoKQC&printsec=frontcover&dq=intitle:Compositional+Data+Analysis+Theory+and+Applications&hl=&cd=1&source=gbs_api.

    Google Scholar 

  21. Pawlowsky-Glahn, V., & Egozcue, J. J. (2001). Geometric approach to statistical analysis on the simplex. Stochastic Environmental Research and Risk Assessment, 15(5), 384–398. doi:10.1007/s004770100077. http://link.springer.com/10.1007/s004770100077.

  22. Peleg, S., Sananbenesi, F., Zovoilis, A., Burkhardt, S., Bahari-Javan, S., Agis-Balboa, R. C., et al. (2010). Altered histone acetylation is associated with age-dependent memory impairment in mice. Science, 328(5979), 753–756. doi:10.1126/science.1186088. http://www.sciencemag.org/content/328/5979/753.full.

  23. Teeling, H., & Glockner, F. O. (2012). Current opportunities and challenges in microbial metagenome analysis–a bioinformatic perspective. Briefings in Bioinformatics, 13(6), 728–742. doi:10.1093/bib/bbs039. http://bib.oxfordjournals.org/cgi/doi/10.1093/bib/bbs039.

  24. Thomas, T., Gilbert, J., & Meyer, F. (2012). Metagenomics - a guide from sampling to data analysis. Microbial Informatics and Experimentation, 2(1), 3. doi:10.1186/2042-5783-2-3. http://www.microbialinformaticsj.com/content/2/1/3.

  25. Trenchevska, O., Schaab, M. R., Nelson, R. W., & Nedelkov, D. (2015). Development of multiplex mass spectrometric immunoassay for detection and quantification of apolipoproteins C-I, C-II, C-III and their proteoforms. Methods, 1–7. doi:10.1016/j.ymeth.2015.02.020. http://dx.doi.org/10.1016/j.ymeth.2015.02.020.

  26. Vos, F. E., Schollum, J. B., & Walker, R. J. (2011). Glycated albumin is the preferred marker for assessing glycaemic control in advanced chronic kidney disease. Clinical Kidney Journal, 4(6), 368–375. doi:10.1093/ndtplus/sfr140. http://ckj.oxfordjournals.org/cgi/doi/10.1093/ndtplus/sfr140.

  27. Walsh, C. T., & Tsodikova, S. G. (2005). Protein posttranslational modifications: The chemistry of proteome diversifications. Angewandte Chemie International Edition in English. http://onlinelibrary.wiley.com/doi/10.1002/anie.200501023/full.

Download references

Acknowledgements

The authors would like to thank Dr. Borges for guidance on various mechanisms that can lead to higher proportions of cysteinylated albumin proteoforms in MSIA data. Acknowledgment is also due to National Institutes of Health (NIH) for supporting the work presented in this chapter under awards numbered R24DK090958-01A1 and P30ES006694.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dean Billheimer .

Editor information

Editors and Affiliations

Appendix

Appendix

1.1 A Synopsis on Compositional Framework

Compositional data describe the proportion that each of D components contributes to the whole. Coherence of inference between subset of components and the whole composition, also called subcompositional coherence, is an important feature of the analysis. Scale invariance is essential for such analysis.

A D part composition is defined as an element of the d-dimensional positive simplex

$$\displaystyle{ \mathcal{S}^{\mathrm{d}} = \left \{\mathbf{x} = (x_{ 1},x_{2},\ldots,x_{D}): x_{i}> 0(i = 1,2,\ldots,D),\sum \limits _{i=1}^{D}x_{ i} = 1\right \} }$$
(5)

where d = D − 1. We will use the convention d = D − 1 in the rest of Appendix. Thus the sample space is a subset of the space \(\mathcal{S}^{\mathrm{d}}\).

Let C denote the closure operator on \(\mathbb{R}^{\mathrm{D}}\) which normalizes the vector to a unit sum. That is, for \(z \in \mathbb{R}^{\mathrm{D}}\),

$$\displaystyle{ C(z) = \left ( \frac{z_{1}} {\sum _{i=1}^{D}z_{i}}, \frac{z_{2}} {\sum _{i=1}^{D}z_{i}},\ldots, \frac{z_{D}} {\sum _{i=1}^{D}z_{i}}\right ) \in \mathcal{S}^{\mathrm{d}} }$$
(6)

Also, we define two additional operations. For any two elements x = (x 1, x 2, , x D) and \(y = (y_{1},y_{2},\ldots,y_{\mathrm{D}}) \in \mathcal{S}^{\mathrm{d}}\) and for \(\alpha \in \mathbb{R}\), we define

$$\displaystyle{ x \oplus y = C((x_{1} \cdot y_{1},x_{2} \cdot y_{2},\ldots,x_{D} \cdot y_{D})) }$$
(7)
$$\displaystyle{ \alpha \odot x = C((x_{1}^{\alpha },x_{ 2}^{\alpha },\ldots,x_{ D}^{\alpha })). }$$
(8)

The operations in Eqs. (7) and (8) are called the perturbation and power operators, respectively. With perturbation operator as addition and power operator as the scalar multiplication, \(\mathcal{S}^{\mathrm{d}}\) acquires the structure of a d-dimensional Hilbert space [6, 21] with a metric given by (2).

The additive log ratio transform defined as:

$$\displaystyle{ \begin{array}{rl} \phi: \mathcal{S}^{\mathrm{d}} & \rightarrow \mathbb{R}^{\mathrm{d}} \\ x&\longmapsto \left (\log \left ( \dfrac{x_{1}} {x_{D}}\right ),\log \left ( \dfrac{x_{2}} {x_{D}}\right ),\ldots,\log \left ( \dfrac{x_{d}} {x_{D}}\right )\right ) \end{array} }$$
(9)

and the centered log ratio (3) are alternative co-ordinate systems on this space.

The dependence structure induced by the unit sum (compositional) constraint is often addressed by using the class of logistic normal distributions as appropriate models of the data. For illustration, consider an element \(x = (x_{1},x_{2},\ldots,x_{\mathrm{D}}) \in \mathcal{S}^{\mathrm{d}}\). Following Aitchison [1] the pullback of the multivariate normal distribution using ϕ from Eq. (9) is the logistic normal density function given by:

$$\displaystyle{ f(x\vert \mu,\varSigma ) = (2\pi )^{d/2}\vert \varSigma \vert ^{-1/2}\left ( \dfrac{1} {\prod _{i=1}^{k}x_{i}}\right )\mathrm{exp}{\biggl [ -\dfrac{1} {2}(\phi (x)-\mu )'\varSigma ^{-1}(\phi (x)-\mu )\biggr ]} }$$
(10)

where μ is the location parameter in \(\mathbb{R}^{\mathrm{d}}\) and Σ is the d × d variance–covariance matrix, ( i = 1 D x i )−1 is the Jacobian of the transformation. In the following, we will denote this d-dimensional logistic normal distribution by \(\mathcal{L}\mathcal{N}_{\mathrm{d}}\) and A′ will denote the transpose if A is a matrix.

The part S composition is orthogonal projection of the full composition, with respect to the inner product on \(\mathcal{S}^{\mathrm{d}}\) [12]. The class preserving property of logistic normal distributions, i.e., if \(x \in \mathcal{L}\mathcal{N}_{D}(\mu,\varSigma )\) and A is an n × D matrix, then \(Ax \in \mathcal{L}\mathcal{N}_{n}(A\mu,A\varSigma A')\) [5], ensures that the orthogonal projections satisfy the property of subcompositional coherence.

A comprehensive review of analytical techniques and applications of compositional framework is available in the book Pawlowsky-Glahn et al. [20].

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Sinari, S., Nedelkov, D., Reaven, P., Billheimer, D. (2017). The Analysis of Human Serum Albumin Proteoforms Using Compositional Framework. In: Datta, S., Mertens, B. (eds) Statistical Analysis of Proteomics, Metabolomics, and Lipidomics Data Using Mass Spectrometry. Frontiers in Probability and the Statistical Sciences. Springer, Cham. https://doi.org/10.1007/978-3-319-45809-0_8

Download citation

Publish with us

Policies and ethics