Abstract
The present work is part of a research programme of assessment of protein databases by applying statistical analysis of protein domain families and their consequent association into clans. An extensive discussion on the construction of an adequate sample space will lead to support the classification of protein domains in families and clans of the literature. An interesting derivation via the Jaccard entropy measure of a specific variable of the non-dimensional parameter of a Havrda–Charvat entropy measure, which corresponds to the best approximation to a normal distribution, is the most important result to be reported here.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
R.D. Finn et al., The Pfam protein families databases in 2019. Nucleic Acids Res. 47, D427–D432 (2019)
E.L.L. Sonnhammer, S.R. Eddy, R. Durbin, Pfam: a comprehensive database of protein domain families based on seed alignments. Proteins Struct. Funct. Genet. 28, 405–420 (1997)
E.L.L. Sonnhammer et al., Pfam: multiple alignments and HMM-profiles of protein domains. Nucleic Acids Res. 26, 320–322 (1998)
A. Bateman et al., The Pfam protein families database. Nucleic Acids Res. 30(1), 276–280 (2002)
R.D. Finn et al., Pfam: clans, web tools and services. Nucleic Acids Res. 34(D1), D247–D251 (2006)
R.D. Finn et al., The Pfam protein families database. Nucleic Acids Res. 40(D1), D290–D301 (2012)
R.D. Finn, J. Clements, S.R. Eddy, HMMER web server: interactive sequence similarity searching. Nucleic Acids Res. 39, W29–W37 (2011)
R.P. Mondaini, S.C. de Albuquerque Neto, entropy measures and the statistical analysis of protein family classification, in BIOMAT 2015: International Symposium on Mathematical and Computational Biology (World Scientific, Singapore, 2016), pp. 193–210
R.P. Mondaini, S.C. de Albuquerque Neto, Khinchin-Shannon generalized inequalities for “Non-Additive” entropy measures, in Trends in Biomathematics: Mathematical Modeling for Health, Harvesting and Population Dynamics (BIOMAT 2018) (Springer, Berlin, 2019), pp. 177–189
R.P. Mondaini, S.C. de Albuquerque Neto, The pattern recognition of probability distributions of amino acids in protein families, in Mathematical Biology and Biological Physics (BIOMAT 2016) (World Scientific, Singapore, 2017), pp. 29–50
R.P. Mondaini, S.C. de Albuquerque Neto, Stochastic assessment of protein databases by generalized entropy measures, in Trends in Biomathematics: Modeling, Optimization and Computational Methods (BIOMAT 2017) (Springer, Berlin, 2018), pp. 103–119
R.D. Finn et al., Pfam: the protein families database. Nucleic Acids Res. 42, D222–D230 (2014)
W. Feller, An Introduction to Probability Theory and Its Applications, vol. II (Wiley, New York, 1971)
M. Taboga, Lectures on Probability Theory and Mathematical Statistics (CreateSpace Independent Publishing Platform, South Carolina, 2012)
M. Abramowitz, I.A. Stegun, Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables, 9th edn. (Dover, New York, 1965)
M.H. DeGroot, M.J. Schervish, Probability and Statistics (Addison-Wesley, Reading, 2012)
R.A. Fisher, Statistical Methods for Research Workers, 12th edn. (Hafner Publishing Company Inc., New York, 1954)
B.D. Sharma, D.P. Mittal, New non-additive measures of entropy for discrete probability distributions. J. Math. Sci. 10, 28–40 (1972)
A.I. Khinchin, Mathematical Foundations of Information Theory (Dover, New York, 1957)
P. Jaccard, Étude comparative de la distribution Florale dans une portion des Alpes et du Jura. Bull. Soc. Vaud. Sci. Nat. 37(142), 547–579 (1901)
P. Jaccard, The distribution of the Flora in the Alpine Zone. New Phytol. 11(2), 37–50 (1912)
N. Carels, C.F. Mondaini, R.P. Mondaini, Entropy measures based method for the classification of protein domains into families and clans, in BIOMAT 2013: International Symposium on Mathematical and Computational Biology (World Scientific, Singapore, 2014), pp. 209–218
S. Kullback, R.A. Leibler, On information and sufficiency. Ann. Math. Stat. 22(1), 79–86 (1951)
S. Kullback, Information Theory and Statistics (Dover Publications, New York, 1968)
J. Havrda, F. Charvat, Quantification method of classification processes. Concept of structural a-Entropy. Kybernetika 3(1), 30–35 (1967)
Acknowledgements
Simão C. de Albuquerque Neto thanks the International Union of Biological Sciences (IUBS) for partial support of living expenses in Szeged, during the 19th BIOMAT International Symposium, October 20–26, 2019.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG
About this chapter
Cite this chapter
Mondaini, R.P., de Albuquerque Neto, S.C. (2020). The Statistical Analysis of Protein Domain Family Distributions via Jaccard Entropy Measures. In: Mondaini, R.P. (eds) Trends in Biomathematics: Modeling Cells, Flows, Epidemics, and the Environment. BIOMAT 2019. Springer, Cham. https://doi.org/10.1007/978-3-030-46306-9_13
Download citation
DOI: https://doi.org/10.1007/978-3-030-46306-9_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-46305-2
Online ISBN: 978-3-030-46306-9
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)