Distinct flavors of Zipf's law and its maximum likelihood fitting: Rank-size and size-distribution representations

Álvaro Corral, Isabel Serra, and Ramon Ferrer-i-Cancho
Phys. Rev. E 102, 052113 – Published 10 November 2020

Abstract

In recent years, researchers have realized the difficulties of fitting power-law distributions properly. These difficulties are higher in Zipfian systems, due to the discreteness of the variables and to the existence of two representations for these systems, i.e., two versions depending on the random variable to fit: rank or size. The discreteness implies that a power law in one of the representations is not a power law in the other, and vice versa. We generate synthetic power laws in both representations and apply a state-of-the-art fitting method to each of the two random variables. The method (based on maximum likelihood plus a goodness-of-fit test) does not fit the whole distribution but the tail, understood as the part of a distribution above a cutoff that separates non-power-law behavior from power-law behavior. We find that, no matter which random variable is power-law distributed, using the rank as the random variable is problematic for fitting, in general (although it may work in some limit cases). One of the difficulties comes from recovering the “hidden” true ranks from the empirical ranks. On the contrary, the representation in terms of the distribution of sizes allows one to recover the true exponent (with some small bias when the underlying size distribution is a power law only asymptotically).

  • Figure
  • Figure
  • Figure
  • Figure
  • Figure
  • Figure
  • Received 11 November 2019
  • Accepted 18 October 2020

DOI:https://doi.org/10.1103/PhysRevE.102.052113

©2020 American Physical Society

Physics Subject Headings (PhySH)

Interdisciplinary PhysicsStatistical Physics & Thermodynamics

Authors & Affiliations

Álvaro Corral1,2,3,4, Isabel Serra1,5, and Ramon Ferrer-i-Cancho6

  • 1Centre de Recerca Matemàtica, Edifici C, Campus Bellaterra, E-08193 Barcelona, Spain
  • 2Departament de Matemàtiques, Facultat de Ciències, Universitat Autònoma de Barcelona, E-08193 Barcelona, Spain
  • 3Barcelona Graduate School of Mathematics, Edifici C, Campus Bellaterra, E-08193 Barcelona, Spain
  • 4Complexity Science Hub Vienna, Josefstädter Strasse 39, 1080 Vienna, Austria
  • 5Computer Architecture and Operating Systems Group, Barcelona Supercomputing Center (BSC-CNS), E-08034 Barcelona, Spain
  • 6Complexity and Quantitative Linguistics Lab, Departament de Ciències de la Computació, Universitat Politècnica de Catalunya, E-08034 Barcelona, Catalonia, Spain

Article Text (Subscription Required)

Click to Expand

References (Subscription Required)

Click to Expand
Issue

Vol. 102, Iss. 5 — November 2020

Reuse & Permissions
Access Options
Author publication services for translation and copyediting assistance advertisement

Authorization Required


×
×

Images

×

Sign up to receive regular email alerts from Physical Review E

Log In

Cancel
×

Search


Article Lookup

Paste a citation or DOI

Enter a citation
×