• Rapid Communication

Level statistics of words: Finding keywords in literary texts and symbolic sequences

P. Carpena, P. Bernaola-Galván, M. Hackenberg, A. V. Coronado, and J. L. Oliver
Phys. Rev. E 79, 035102(R) – Published 10 March 2009

Abstract

Using a generalization of the level statistics analysis of quantum disordered systems, we present an approach able to extract automatically keywords in literary texts. Our approach takes into account not only the frequencies of the words present in the text but also their spatial distribution along the text, and is based on the fact that relevant words are significantly clustered (i.e., they self-attract each other), while irrelevant words are distributed randomly in the text. Since a reference corpus is not needed, our approach is especially suitable for single documents for which no a priori information is available. In addition, we show that our method works also in generic symbolic sequences (continuous texts without spaces), thus suggesting its general applicability.

    • Received 21 May 2008

    DOI:https://doi.org/10.1103/PhysRevE.79.035102

    ©2009 American Physical Society

    Authors & Affiliations

    P. Carpena1, P. Bernaola-Galván1, M. Hackenberg2, A. V. Coronado1, and J. L. Oliver3

    • 1Departamento de Física Aplicada II, Universidad de Málaga, 29071 Málaga, Spain
    • 2Bioinformatics Group, CIC bioGUNE, Technology Park of Bizkaia, 48160 Derio, Bizkaia, Spain
    • 3Departamento de Genética, Universidad de Granada, 18071 Granada, Spain

    Article Text (Subscription Required)

    Click to Expand

    References (Subscription Required)

    Click to Expand
    Issue

    Vol. 79, Iss. 3 — March 2009

    Reuse & Permissions
    Access Options

    Authorization Required


    ×
    ×

    Images

    ×

    Sign up to receive regular email alerts from Physical Review E

    Log In

    Cancel
    ×

    Search


    Article Lookup

    Paste a citation or DOI

    Enter a citation
    ×