Hostname: page-component-8448b6f56d-c4f8m Total loading time: 0 Render date: 2024-04-20T14:06:51.164Z Has data issue: false hasContentIssue false

On the Distribution of the Number of Missing Words in Random Texts

Published online by Cambridge University Press:  28 January 2003

SVEN RAHMANN
Affiliation:
Department of Computational Molecular Biology, Max-Planck-Institut für Molekulare Genetik, Ihnestraße 63-73, D-14195 Berlin, GermanySven.Rahmann@molgen.mpg.de
ERIC RIVALS
Affiliation:
L.I.R.M.M., CNRS U.M.R. 5506, 161 rue Ada, F-34392 Montpellier Cedex 5, Francerivals@lirmm.fr

Abstract

Determining the distribution of the number of empty urns after a number of balls have been thrown randomly into the urns is a classical and well understood problem. We study a generalization: Given a finite alphabet of size σ and a word length q, what is the distribution of the number X of words (of length q) that do not occur in a random text of length n+q−1 over the given alphabet? For q=1, X is the number Y of empty urns with σ urns and n balls. For q[ges ]2, X is related to the number Y of empty urns with σq urns and n balls, but the law of X is more complicated because successive words in the text overlap. We show that, perhaps surprisingly, the laws of X and Y are not as different as one might expect, but some problems remain currently open.

Type
Research Article
Copyright
2003 Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)