Skip to main content

The Statistical Analysis of Protein Domain Family Distributions via Jaccard Entropy Measures

  • Chapter
  • First Online:
Trends in Biomathematics: Modeling Cells, Flows, Epidemics, and the Environment (BIOMAT 2019)

Abstract

The present work is part of a research programme of assessment of protein databases by applying statistical analysis of protein domain families and their consequent association into clans. An extensive discussion on the construction of an adequate sample space will lead to support the classification of protein domains in families and clans of the literature. An interesting derivation via the Jaccard entropy measure of a specific variable of the non-dimensional parameter of a Havrda–Charvat entropy measure, which corresponds to the best approximation to a normal distribution, is the most important result to be reported here.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 109.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 139.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. R.D. Finn et al., The Pfam protein families databases in 2019. Nucleic Acids Res. 47, D427–D432 (2019)

    Article  Google Scholar 

  2. E.L.L. Sonnhammer, S.R. Eddy, R. Durbin, Pfam: a comprehensive database of protein domain families based on seed alignments. Proteins Struct. Funct. Genet. 28, 405–420 (1997)

    Article  Google Scholar 

  3. E.L.L. Sonnhammer et al., Pfam: multiple alignments and HMM-profiles of protein domains. Nucleic Acids Res. 26, 320–322 (1998)

    Article  Google Scholar 

  4. A. Bateman et al., The Pfam protein families database. Nucleic Acids Res. 30(1), 276–280 (2002)

    Article  Google Scholar 

  5. R.D. Finn et al., Pfam: clans, web tools and services. Nucleic Acids Res. 34(D1), D247–D251 (2006)

    Article  Google Scholar 

  6. R.D. Finn et al., The Pfam protein families database. Nucleic Acids Res. 40(D1), D290–D301 (2012)

    Article  Google Scholar 

  7. R.D. Finn, J. Clements, S.R. Eddy, HMMER web server: interactive sequence similarity searching. Nucleic Acids Res. 39, W29–W37 (2011)

    Article  Google Scholar 

  8. R.P. Mondaini, S.C. de Albuquerque Neto, entropy measures and the statistical analysis of protein family classification, in BIOMAT 2015: International Symposium on Mathematical and Computational Biology (World Scientific, Singapore, 2016), pp. 193–210

    Google Scholar 

  9. R.P. Mondaini, S.C. de Albuquerque Neto, Khinchin-Shannon generalized inequalities for “Non-Additive” entropy measures, in Trends in Biomathematics: Mathematical Modeling for Health, Harvesting and Population Dynamics (BIOMAT 2018) (Springer, Berlin, 2019), pp. 177–189

    Google Scholar 

  10. R.P. Mondaini, S.C. de Albuquerque Neto, The pattern recognition of probability distributions of amino acids in protein families, in Mathematical Biology and Biological Physics (BIOMAT 2016) (World Scientific, Singapore, 2017), pp. 29–50

    MATH  Google Scholar 

  11. R.P. Mondaini, S.C. de Albuquerque Neto, Stochastic assessment of protein databases by generalized entropy measures, in Trends in Biomathematics: Modeling, Optimization and Computational Methods (BIOMAT 2017) (Springer, Berlin, 2018), pp. 103–119

    Google Scholar 

  12. R.D. Finn et al., Pfam: the protein families database. Nucleic Acids Res. 42, D222–D230 (2014)

    Article  Google Scholar 

  13. W. Feller, An Introduction to Probability Theory and Its Applications, vol. II (Wiley, New York, 1971)

    MATH  Google Scholar 

  14. M. Taboga, Lectures on Probability Theory and Mathematical Statistics (CreateSpace Independent Publishing Platform, South Carolina, 2012)

    Google Scholar 

  15. M. Abramowitz, I.A. Stegun, Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables, 9th edn. (Dover, New York, 1965)

    MATH  Google Scholar 

  16. M.H. DeGroot, M.J. Schervish, Probability and Statistics (Addison-Wesley, Reading, 2012)

    Google Scholar 

  17. R.A. Fisher, Statistical Methods for Research Workers, 12th edn. (Hafner Publishing Company Inc., New York, 1954)

    MATH  Google Scholar 

  18. B.D. Sharma, D.P. Mittal, New non-additive measures of entropy for discrete probability distributions. J. Math. Sci. 10, 28–40 (1972)

    Google Scholar 

  19. A.I. Khinchin, Mathematical Foundations of Information Theory (Dover, New York, 1957)

    MATH  Google Scholar 

  20. P. Jaccard, Étude comparative de la distribution Florale dans une portion des Alpes et du Jura. Bull. Soc. Vaud. Sci. Nat. 37(142), 547–579 (1901)

    Google Scholar 

  21. P. Jaccard, The distribution of the Flora in the Alpine Zone. New Phytol. 11(2), 37–50 (1912)

    Article  Google Scholar 

  22. N. Carels, C.F. Mondaini, R.P. Mondaini, Entropy measures based method for the classification of protein domains into families and clans, in BIOMAT 2013: International Symposium on Mathematical and Computational Biology (World Scientific, Singapore, 2014), pp. 209–218

    Google Scholar 

  23. S. Kullback, R.A. Leibler, On information and sufficiency. Ann. Math. Stat. 22(1), 79–86 (1951)

    Article  MathSciNet  Google Scholar 

  24. S. Kullback, Information Theory and Statistics (Dover Publications, New York, 1968)

    MATH  Google Scholar 

  25. J. Havrda, F. Charvat, Quantification method of classification processes. Concept of structural a-Entropy. Kybernetika 3(1), 30–35 (1967)

    Google Scholar 

Download references

Acknowledgements

Simão C. de Albuquerque Neto thanks the International Union of Biological Sciences (IUBS) for partial support of living expenses in Szeged, during the 19th BIOMAT International Symposium, October 20–26, 2019.

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Mondaini, R.P., de Albuquerque Neto, S.C. (2020). The Statistical Analysis of Protein Domain Family Distributions via Jaccard Entropy Measures. In: Mondaini, R.P. (eds) Trends in Biomathematics: Modeling Cells, Flows, Epidemics, and the Environment. BIOMAT 2019. Springer, Cham. https://doi.org/10.1007/978-3-030-46306-9_13

Download citation

Publish with us

Policies and ethics