Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

< Back to Article

Maximum Entropy, Word-Frequency, Chinese Characters, and Multiple Meanings

Fig 3

Similarity of translation between words and characters versus words of different languages.

(a) The Russian short story The Man in a Case by A. Chekhov, and its translations into English words and Chinese characters (triangles, squares, and filled dots, respectively). The RGF-predictions are given by the curves (dashed dotted, dashed, and full, respectively, The RGF-prediction completely characterizes a frequency distribution in terms of the total number of words/characters (M), the number of specific words/characters (N), and how many of the total number of words/characters are given by the most common word/character (kmax/M). Each such triple (M, N, kmax) gives a unique prediction-curve [(M, N, kmax) = (4061, 1721, 231), (5375, 1317, 256), and (8212, 1150, 312), respectively]. The agreement shows that words and characters are entirely analogous with respect to frequency distributions. (b) illustrates the same thing starting from the English novel The Old Man and the Sea and translating into Russian words and Chinese characters. The triples are this time (M, N, kmax) = (22414, 5378, 988), (23894, 2388, 2091), and (34220, 1685, 1289), in the order Russian, English and Chinese characters (Data points and RGF-curves, as in (a)).

Fig 3

doi: https://doi.org/10.1371/journal.pone.0125592.g003