The Analysis of Human Serum Albumin Proteoforms Using Compositional Framework

Sinari, Shripad; Nedelkov, Dobrin; Reaven, Peter; Billheimer, Dean

doi:10.1007/978-3-319-45809-0_8

Shripad Sinari⁸,
Dobrin Nedelkov⁹,
Peter Reaven¹⁰ &
…
Dean Billheimer¹¹

Part of the book series: Frontiers in Probability and the Statistical Sciences ((FROPROSTAS))

2942 Accesses
1 Citations

Abstract

Mass spectrometric immuno assays (MSIA) can now measure multiple modified forms of a protein in large cohorts of patients. These measurements consist of the relative abundances of proteoforms, and are well-suited for the compositional data analysis statistical framework. In this article, we describe an approach to the analysis of relative abundance of proteoforms from MSIA data using the compositional framework. We demonstrate the application of these concepts by exploring the association of human serum albumin’s posttranslational modifications and kidney function in patients with Type 2 diabetes mellitus. Finally, we discuss the pitfalls of ignoring the compositional nature of such data, and highlight emerging applications demonstrating the generality of the framework.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Softcover Book: USD 199.99; Price excludes VAT (USA)

Hardcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Aitchison, J. (1982). The statistical analysis of compositional data. Journal of the Royal Statistical Society. Series B (Methodological), 139–177. http://www.jstor.org/stable/10.2307/2345821.
Aitchison, J. (1983). Principal component analysis of compositional data. Biometrika, 70(1), 57–65. http://biomet.oxfordjournals.org/content/70/1/57.short.
Article MathSciNet MATH Google Scholar
Aitchison, J. (2001). Simplicial inference. In M. A. G. Viana & D. S. P. Richards (Eds.), Algebraic methods in statistics and probability. Contemporary mathematics (Vol. 287). Providence, RI: American Mathematical Society. doi:10.1090/conm/287. http://www.ams.org/conm/287/.
Aitchison, J., & Greenacre, M. (2002). Biplots of compositional data. Applied Statistics, 51(4), 375–392. doi:10.1111/1467-9876.00275. http://dx.doi.org/10.1111/1467-9876.00275.
Aitchison, J., & Shen, S. M. (1980). Logistic-normal distributions: Some properties and uses. Biometrika, 67(2), 261–272. http://biomet.oxfordjournals.org/content/67/2/261.short.
Article MathSciNet MATH Google Scholar
Billheimer, D., Guttorp, P., & Fagan, W. F. (2001). Statistical interpretation of species composition. Journal of the American Statistical Association, 96(456), 1205–1214. doi:10.1198/016214501753381850. http://www.tandfonline.com/doi/abs/10.1198/016214501753381850.
Borges, C. R., Rehder, D. S., Jensen, S., Schaab, M. R., Sherma, N. D., Yassine, H., et al. (2014). Elevated plasma albumin and apolipoprotein A-I oxidation under suboptimal specimen storage conditions. Molecular & Cellular Proteomics: MCP, 13(7), 1890–1899. doi:10.1074/mcp.M114.038455. http://www.mcponline.org/cgi/doi/10.1074/mcp.M114.038455.
Chammas, R., Sonnenburg, J. L., Watson, N. E., Tai, T., Farquhar, M. G., Varki, N. M., et al. (1999). De-N-acetyl-gangliosides in humans: Unusual subcellular distribution of a novel tumor antigen. Cancer Research, 59(6), 1337–1346. http://cancerres.aacrjournals.org/content/59/6/1337.full.
Google Scholar
Chanturia, G., Birdsell, D. N., Kekelidze, M., Zhgenti, E., Babuadze, G., Tsertsvadze, N., et al. (2011). Phylogeography of Francisella tularensis subspecies holarctica from the country of Georgia. BMC Microbiology, 11(1), 139. doi:10.1186/1471-2180-11-139. http://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=pubmed&id=21682874&retmode=ref&cmd=prlinks.
Consortium, T. H. M. P. (2012). A framework for human microbiome research. Nature, 486(7402), 215–221. doi:10.1038/nature11209. http://dx.doi.org/10.1038/nature11209.
Egozcue, J. J., & Barcelo-Vidal, C. (2011). Elements of simplicial linear algebra and geometry. In V. Pawlowsky-Glahn & A. Buccianti (Eds.), Compositional data analysis (pp. 141–156). New York: Wiley. doi:10.1002/9781119976462.ch4. http://dx.doi.org/10.1002/9781119976462.ch4.
Egozcue, J. J., & Pawlowsky-Glahn, V. (2006). Simplicial geometry for compositional data. Geological Society, London, Special Publications, 264(1), 145–159. http://sp.lyellcollection.org/content/264/1/145.short.
Article MATH Google Scholar
Karve, T. M., & Cheema, A. K. (2011). Small changes huge impact: The role of protein posttranslational modifications in cellular homeostasis and disease. Journal of Amino Acids, 2011(2), 1–13. doi:10.4061/2011/207691. http://www.hindawi.com/journals/jaa/2011/207691/.
Li, H. (2015). Microbiome, metagenomics, and high-dimensional compositional data analysis. Annual Review of Statistics and Its Application, 2(1), 73–94. doi:10.1146/annurev-statistics-010814-020351. http://dx.doi.org/10.1146/annurev-statistics-010814-020351.
Lovell, D., Müller, W., Taylor, J., & Zwart, A. (2011). Proportions, percentages, ppm: Do the molecular biosciences treat compositional data right. In: V. Pawlowsky-Glahn & A. Buccianti (Eds.), Compositional data analysis. New York: Wiley. http://books.google.com/books?hl=en&lr=&id=Ggpj3QeDoKQC&oi=fnd&pg=PT215&dq=Proportions+Percentages+PPM+Do+the+Molecular+BiosciencesTreat+Compositional+Data+Right&ots=cII3kxnfSb&sig=icwOFojg2zPXj2WPUj9IQ2K4MCk.
Google Scholar
Martín-Fernández, J. A., Palarea-Albaladejo, J., & Olea, R. A. (2011). Dealing with zeros. In: V. Pawlowsky-Glahn & A. Buccianti (Eds.), Compositional data analysis (pp. 43–58). New York: Wiley. doi:10.1002/9781119976462.ch4. http://dx.doi.org/10.1002/9781119976462.ch4.
Nagumo, K., Tanaka, M., Chuang, V. T. G., Setoyama, H., Watanabe, H., Yamada, N., et al. (2014). Cys34-cysteinylated human serum albumin is a sensitive plasma marker in oxidative stress-related chronic diseases. PloS One, 9(1), e85,216–9. doi:10.1371/journal.pone.0085216. http://dx.plos.org/10.1371/journal.pone.0085216.
Nelson, R. W., Krone, J. R., Bieber, A. L., & Williams, P. (1995). Mass-spectrometric immunoassay. Analytical Chemistry, 67(7), 1153–1158. doi:10.1021/ac00103a003. http://pubs.acs.org/doi/abs/10.1021/ac00103a003.
Pallen, M. J. (2014). Diagnostic metagenomics: Potential applications to bacterial, viral and parasitic infections. Parasitology, 141(14), 1856–1862. doi:10.1017/S0031182014000134. http://www.journals.cambridge.org/abstract_S0031182014000134.
Pawlowsky-Glahn, V., & Buccianti, A. (2011). Compositional data analysis. Theory and Applications. New York: Wiley. http://books.google.com/books?id=Ggpj3QeDoKQC&printsec=frontcover&dq=intitle:Compositional+Data+Analysis+Theory+and+Applications&hl=&cd=1&source=gbs_api.
Google Scholar
Pawlowsky-Glahn, V., & Egozcue, J. J. (2001). Geometric approach to statistical analysis on the simplex. Stochastic Environmental Research and Risk Assessment, 15(5), 384–398. doi:10.1007/s004770100077. http://link.springer.com/10.1007/s004770100077.
Peleg, S., Sananbenesi, F., Zovoilis, A., Burkhardt, S., Bahari-Javan, S., Agis-Balboa, R. C., et al. (2010). Altered histone acetylation is associated with age-dependent memory impairment in mice. Science, 328(5979), 753–756. doi:10.1126/science.1186088. http://www.sciencemag.org/content/328/5979/753.full.
Teeling, H., & Glockner, F. O. (2012). Current opportunities and challenges in microbial metagenome analysis–a bioinformatic perspective. Briefings in Bioinformatics, 13(6), 728–742. doi:10.1093/bib/bbs039. http://bib.oxfordjournals.org/cgi/doi/10.1093/bib/bbs039.
Thomas, T., Gilbert, J., & Meyer, F. (2012). Metagenomics - a guide from sampling to data analysis. Microbial Informatics and Experimentation, 2(1), 3. doi:10.1186/2042-5783-2-3. http://www.microbialinformaticsj.com/content/2/1/3.
Trenchevska, O., Schaab, M. R., Nelson, R. W., & Nedelkov, D. (2015). Development of multiplex mass spectrometric immunoassay for detection and quantification of apolipoproteins C-I, C-II, C-III and their proteoforms. Methods, 1–7. doi:10.1016/j.ymeth.2015.02.020. http://dx.doi.org/10.1016/j.ymeth.2015.02.020.
Vos, F. E., Schollum, J. B., & Walker, R. J. (2011). Glycated albumin is the preferred marker for assessing glycaemic control in advanced chronic kidney disease. Clinical Kidney Journal, 4(6), 368–375. doi:10.1093/ndtplus/sfr140. http://ckj.oxfordjournals.org/cgi/doi/10.1093/ndtplus/sfr140.
Walsh, C. T., & Tsodikova, S. G. (2005). Protein posttranslational modifications: The chemistry of proteome diversifications. Angewandte Chemie International Edition in English. http://onlinelibrary.wiley.com/doi/10.1002/anie.200501023/full.

Download references

Acknowledgements

The authors would like to thank Dr. Borges for guidance on various mechanisms that can lead to higher proportions of cysteinylated albumin proteoforms in MSIA data. Acknowledgment is also due to National Institutes of Health (NIH) for supporting the work presented in this chapter under awards numbered R24DK090958-01A1 and P30ES006694.

Author information

Authors and Affiliations

BIO5 Institute, The University of Arizona, Tucson, AZ, USA
Shripad Sinari
Molecular Biomarkers Laboratory, Biodesign Institute, Arizona State University, Tucson, AZ, USA
Dobrin Nedelkov
Phoenix VA Health Care System, Phoenix, AZ, USA
Peter Reaven
Epidemiology and Biostatistics, BIO5 Institute, The University of Arizona, Tucson, AZ, USA
Dean Billheimer

Authors

Shripad Sinari
View author publications
You can also search for this author in PubMed Google Scholar
Dobrin Nedelkov
View author publications
You can also search for this author in PubMed Google Scholar
Peter Reaven
View author publications
You can also search for this author in PubMed Google Scholar
Dean Billheimer
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dean Billheimer .

Editor information

Editors and Affiliations

Department of Biostatistics, University of Florida, Gainesville, Florida, USA
Susmita Datta
Department of Medical Statistics and Bioinformatics, Leiden University Medical Centre, RC Leiden, The Netherlands
Bart J. A. Mertens

Appendix

1.1 A Synopsis on Compositional Framework

Compositional data describe the proportion that each of D components contributes to the whole. Coherence of inference between subset of components and the whole composition, also called subcompositional coherence, is an important feature of the analysis. Scale invariance is essential for such analysis.

A D part composition is defined as an element of the d-dimensional positive simplex

$$\displaystyle{ \mathcal{S}^{\mathrm{d}} = \left \{\mathbf{x} = (x_{ 1},x_{2},\ldots,x_{D}): x_{i}> 0(i = 1,2,\ldots,D),\sum \limits _{i=1}^{D}x_{ i} = 1\right \} }$$

(5)

where d = D − 1. We will use the convention d = D − 1 in the rest of Appendix. Thus the sample space is a subset of the space $\mathcal{S}^{\mathrm{d}}$.

Let C denote the closure operator on $\mathbb{R}^{\mathrm{D}}$ which normalizes the vector to a unit sum. That is, for $z \in \mathbb{R}^{\mathrm{D}}$,

$$\displaystyle{ C(z) = \left ( \frac{z_{1}} {\sum _{i=1}^{D}z_{i}}, \frac{z_{2}} {\sum _{i=1}^{D}z_{i}},\ldots, \frac{z_{D}} {\sum _{i=1}^{D}z_{i}}\right ) \in \mathcal{S}^{\mathrm{d}} }$$

(6)

Also, we define two additional operations. For any two elements x = (x ₁, x ₂, …, x _D) and $y = (y_{1},y_{2},\ldots,y_{\mathrm{D}}) \in \mathcal{S}^{\mathrm{d}}$ and for $\alpha \in \mathbb{R}$, we define

$$\displaystyle{ x \oplus y = C((x_{1} \cdot y_{1},x_{2} \cdot y_{2},\ldots,x_{D} \cdot y_{D})) }$$

(7)

$$\displaystyle{ \alpha \odot x = C((x_{1}^{\alpha },x_{ 2}^{\alpha },\ldots,x_{ D}^{\alpha })). }$$

(8)

The operations in Eqs. (7) and (8) are called the perturbation and power operators, respectively. With perturbation operator as addition and power operator as the scalar multiplication, $\mathcal{S}^{\mathrm{d}}$ acquires the structure of a d-dimensional Hilbert space [6, 21] with a metric given by (2).

The additive log ratio transform defined as:

$$\displaystyle{ \begin{array}{rl} \phi: \mathcal{S}^{\mathrm{d}} & \rightarrow \mathbb{R}^{\mathrm{d}} \\ x&\longmapsto \left (\log \left ( \dfrac{x_{1}} {x_{D}}\right ),\log \left ( \dfrac{x_{2}} {x_{D}}\right ),\ldots,\log \left ( \dfrac{x_{d}} {x_{D}}\right )\right ) \end{array} }$$

(9)

and the centered log ratio (3) are alternative co-ordinate systems on this space.

The dependence structure induced by the unit sum (compositional) constraint is often addressed by using the class of logistic normal distributions as appropriate models of the data. For illustration, consider an element $x = (x_{1},x_{2},\ldots,x_{\mathrm{D}}) \in \mathcal{S}^{\mathrm{d}}$. Following Aitchison [1] the pullback of the multivariate normal distribution using ϕ from Eq. (9) is the logistic normal density function given by:

$$\displaystyle{ f(x\vert \mu,\varSigma ) = (2\pi )^{d/2}\vert \varSigma \vert ^{-1/2}\left ( \dfrac{1} {\prod _{i=1}^{k}x_{i}}\right )\mathrm{exp}{\biggl [ -\dfrac{1} {2}(\phi (x)-\mu )'\varSigma ^{-1}(\phi (x)-\mu )\biggr ]} }$$

(10)

where μ is the location parameter in $\mathbb{R}^{\mathrm{d}}$ and Σ is the d × d variance–covariance matrix, (∏ _i = 1 ^D x _i)⁻¹ is the Jacobian of the transformation. In the following, we will denote this d-dimensional logistic normal distribution by $\mathcal{L}\mathcal{N}_{\mathrm{d}}$ and A′ will denote the transpose if A is a matrix.

The part S composition is orthogonal projection of the full composition, with respect to the inner product on $\mathcal{S}^{\mathrm{d}}$ [12]. The class preserving property of logistic normal distributions, i.e., if $x \in \mathcal{L}\mathcal{N}_{D}(\mu,\varSigma )$ and A is an n × D matrix, then $Ax \in \mathcal{L}\mathcal{N}_{n}(A\mu,A\varSigma A')$ [5], ensures that the orthogonal projections satisfy the property of subcompositional coherence.

A comprehensive review of analytical techniques and applications of compositional framework is available in the book Pawlowsky-Glahn et al. [20].

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Sinari, S., Nedelkov, D., Reaven, P., Billheimer, D. (2017). The Analysis of Human Serum Albumin Proteoforms Using Compositional Framework. In: Datta, S., Mertens, B. (eds) Statistical Analysis of Proteomics, Metabolomics, and Lipidomics Data Using Mass Spectrometry. Frontiers in Probability and the Statistical Sciences. Springer, Cham. https://doi.org/10.1007/978-3-319-45809-0_8

Download citation

DOI: https://doi.org/10.1007/978-3-319-45809-0_8
Published: 16 December 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-45807-6
Online ISBN: 978-3-319-45809-0
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics

The Analysis of Human Serum Albumin Proteoforms Using Compositional Framework

Abstract

Access this chapter

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix

Appendix

1.1 A Synopsis on Compositional Framework

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation