Multi–Dimensional Data Analysis of Deep Language in J.R.R. Tolkien and C.S. Lewis Reveals Tight Mathematical Connections

: Scholars of English Literature unanimously say that J.R.R. Tolkien influenced C.S. Lewis’s writings. For the first time, we have investigated this issue mathematically by using an original multi-dimensional analysis of linguistic parameters, based on surface deep language variables and linguistic channels. To set our investigation in the framework of English Literature, we have considered some novels written by earlier authors, such as C. Dickens, G. MacDonald and others. The deep language variables and the linguistic channels, discussed in the paper, are likely due to writers’ unconscious design and reveal connections between texts far beyond the writers’ awareness. In summary, the capacity of the extended short-term memory required to readers, the universal readability index of texts, the geometrical representation of texts and the fine tuning of linguistic channels within texts—all tools largely discussed in the paper—revealed strong connections between The Lord of the Rings (Tolkien), The Chronicles of Narnia , The Space Trilogy (Lewis) and novels by MacDonald, therefore agreeing with what the scholars of English Literature say.


Introduction
Unanimously, in a large number of papers-some of which are recalled here [1][2][3][4][5][6][7][8] from the vast literature on the topic-scholars of English Literature state that J.R.R. Tolkien influenced C.S. Lewis's writings.The purpose of the present paper is not to review the large wealth of literature based on the typical approach used by scholars of literature-which is not our specialty-but to investigate this issue mathematically and statistically-a study that has never been conducted before-by using recent methods devised by researching the impact of the surface deep language variables [9,10] and linguistic channels [11] in literary texts.Since scholars mention the influence of George MacDonald on both, we consider some novels written by this earlier author.To set all these novels in the framework of English Literature, we consider some novels written by other earlier authors, such as C. Dickens and others.
After this introduction, in Section 2, we introduce the literary texts (novels) considered.In Section 3, we report the series of words, sentences and interpunctions versus chapters for some novels, and define an index useful to synthetically describe regularity due to what we think is a conscious design by authors.In Section 4, we start exploring the four deep language variables; to avoid misunderstanding, these variables, and the linguistic channels derived from them, refer to the "surface" structure of texts, not to the "deep" structure mentioned in cognitive theory.In Section 5, we report results concerning the extended short-term memory and a universal readability index; both topics address human short-term memory buffers.In Section 6, we represent literary texts geometrically in the Cartesian plane by defining linear combinations of deep language variables and calculate the probability that a text can be confused with another.In Section 7, we show the linear relationships existing between linguistic variables in the novels considered.In Section 8, we report the theory of linguistic channels.In Section 9, we apply it to the novels presently studied.Finally, in Section 10, we summarize the main findings and conclude.Several Appendices report numerical data.

Database of Literary Texts (Novels)
Let us first introduce the database of literary texts used in the present paper.Table 1 lists some basic statistics of the novels by Tolkien, Lewis and MacDonald.To set these texts in the framework of earlier English Literature, we consider novels by Charles Dickens (Table 2) and other authors (Table 3).
Table 1.Novels written by Tolkien, Lewis and MacDonald, with year of publication.Number of chapters (i.e., the number of samples considered in calculating the regression lines reported below), total number of characters contained in the words ( C), total number of words W and sentences (S).Table 3. Novels by authors of English Literature, with year of publication.Number of chapters (M, i.e., the number of samples considered in calculating the regression lines reported below), total number of characters contained in the words ( C), total number of words W and sentences (S).

Novel (Author, Year) Chapters (M) Characters (C) Words (W) Sentences (S)
Pride and Prejudice (J.Some homogeneity can be noted in novels of the same author.The stories in The Space Trilogy and The Chronicles of Narnia, by Lewis are told with about the same number of chapters, words and sentences, as is also for a couple of MacDonald's novels, such as At the Back of the North Wind and Lilith: A Romance.Some homogeneity can be found in David Copperfield, Bleak House and Our Mutual Friend (by Dickens) and in The Adventures of Oliver Twist and A Tale of Two Cities.These numerical values, we think, are not due to chance but consciously managed by the authors, which is a topic we purse more in the next section.

Conscious Design of Texts: Words, Sentences and Interpunctions versus Chapters
First, we study the linguistic variables which we think the authors deliberately designed.In the specifics, we show the series of words, sentences and interpunctions versus chapter.
Let us consider a literary work (a novel) and its subdivision into disjointed blocks of text long enough to give reliable average values.Let n S be the number of sentences contained in a text block, n W the number of words contained in the n S sentences, n C the number of characters contained in the n W words and n I the number of punctuation marks (interpunctions) contained in the n S sentences.
Figure 1 shows the series n W versus the normalized chapter number for The Lord of the Rings, The Chronicles of Narnia, The Space Trilogy.Table 3. Novels by authors of English Literature, with year of publication.Number of chapters (, i.e., the number of samples considered in calculating the regression lines reported below), total number of characters contained in the words (), total number of words  and sentences ().Some homogeneity can be noted in novels of the same author.The stories in The Space Trilogy and The Chronicles of Narnia, by Lewis are told with about the same number of chapters, words and sentences, as is also for a couple of MacDonald's novels, such as At the Back of the North Wind and Lilith: A Romance.Some homogeneity can be found in David Copperfield, Bleak House and Our Mutual Friend (by Dickens) and in The Adventures of Oliver Twist and A Tale of Two Cities.These numerical values, we think, are not due to chance but consciously managed by the authors, which is a topic we purse more in the next section.

Conscious Design of Texts: Words, Sentences and Interpunctions Versus Chapters
First, we study the linguistic variables which we think the authors deliberately designed.In the specifics, we show the series of words, sentences and interpunctions versus chapter.
Let us consider a literary work (a novel) and its subdivision into disjointed blocks of text long enough to give reliable average values.Let  be the number of sentences contained in a text block,  the number of words contained in the  sentences,  the number of characters contained in the  words and  the number of punctuation marks (interpunctions) contained in the  sentences.
Figure 1 shows the series  versus the normalized chapter number for The Lord of the Rings, The Chronicles of Narnia, The Space Trilogy.
For example, the normalized value of chapter 10 in The Chronicles of Narnia, is 10/110 = 0.09 in the scale of Figure 1.This normalization allows the synoptic showing of novels with a different number of chapters.For example, the normalized value of chapter 10 in The Chronicles of Narnia, is 10/110 = 0.09 in the x-scale of Figure 1.This normalization allows the synoptic showing of novels with a different number of chapters.
In The Chronicles of Narnia (in the following, Narnia, for brevity), we can notice a practically constant value n W compared to The Lord of the Rings (Lord) and The Space Trilogy (Trilogy).
Let us define a synthetic index to describe the series drawn in Figure 1, namely the coefficient of variation δ, given by the standard deviation σ n W divided by the mean value Tables 4 and 5 report δ for n W , n S and n I .Since n S and n I are very well correlated with n S , the three coefficients of dispersion are about the same.In Narnia δ = 0.16, in Lord δ = 0.34 and in Trilogy δ = 0.60.Let us also notice the minimum value δ = 0.07 in The Screwtape Letters (Screwtape).
The overall (words, sentences and interpunctions mixed together) mean value is < δ >= 0.44 and the standard deviation σ δ = 0.18.Therefore, Screwtape is practically more than 2 × σ δ from the mean, as also is Silmarillion on the other side, and Narnia is at about 1.5 × σ δ .In contrast, Trilogy, Lord and The Hobbit (Hobbit) are within 1 × σ δ .
From these results, it seems that Lewis designed the chapters of Narnia and Screwtape with an almost uniform distribution of words, sentences and interpunctions, very likely because of the intended audience in Narnia (i.e., kids) and the "letters" fiction tool used in Screwtape.In Trilogy the design seems very different (δ = 0.60, well within 1 × σ δ ) likely due to the development of the science fiction story narrated.
Tolkien acted differently from Lewis, because he seems to have designed chapters more randomly and within 1 × σ δ , as Hobbit and Lord show.An exception is The Silmarillion, published posthumously, which is a text far from being a "novel".
Finally, notice that the novels by MacDonald show more homogeneous values, very similar to Hobbit and Trilogy and to the other novels listed in Table 5.
In conclusion, the analysis of series of words, sentences and interpunctions per chapter does not indicate likely connections between Tolkien, Lewis and MacDonald.Each author structured their use of words, sentences and punctuation according to distinct plans, which varied not only between authors but also between different novels by the same author.
There are, however, linguistic variables that-as we have reported for modern and ancient literary texts-are not consciously designed/managed by authors; therefore, these variables are the best candidates to reveal hidden mathematical/statistical connections between texts.In the next section, we start dealing with these variables, with the specific purpose of comparing Tolkien and Lewis, although this comparison is set in the more general framework of the authors mentioned in Section 2.

Surface Deep Language Variables
We start exploring the four stochastic variables we called deep language variables, following our general statistical theory on alphabetical languages [9][10][11].To avoid possible misunderstandings, these variables, and the linguistic channels derived from them, refer to the "surface" structure of texts, not to the "deep" structure mentioned in cognitive theory.
Contrarily to the variables studied in Section 3, the deep language variables are likely due to unconscious design.As shown in [9][10][11], they reveal connections between texts far beyond writers' awareness; therefore, the geometrical representation of texts [10] and the fine tuning of linguistic channels [11] are tools better suited to reveal connections.They can also likely indicate the influence of an author on another.
We defined the number of characters per chapter n C and the number of I P ′s per chapter n I P , and the four deep language variables are [9] the number of characters C P : the number of words per sentence P F : the number of interpunctions per word, referred to as the word interval, I P : the number of word intervals per sentence M F : Equation ( 5) can be written also as M F = P F /I P .Tables 6-9 reports the mean and standard deviation of these variables.Notice that these values have been calculated by weighing each chapter with its number of words to avoid the short chapters weighing as much as long ones.For example, chapter 1 of Lord has 10097 words; therefore, its statistical weight is 10097/472173 ≈ 0.021, not 1/62 ≈ 0.016.Notice, also, that the coefficient of dispersion used in Section 2 was calculated by weighing each chapter 1/62, not 10097/472173, to visually agree with the series drawn in Figure 1.Specifically, let M be the number of samples (i.e., chapters), then the mean value < P F > is given by Therefore, notice, for not being misled, that In other words, < P F > is not given by the total number of words W divided by the total number of sentences S, or by assigning the weight 1/M to every chapter.The three values coincide only if all the text blocks contain the same number of words and the same number of sentences, which did not occur.The same observations apply to all other variables.The following characteristics can be observed from Tables 6-9.Lord and Narnia share the same < P F >. Silmarillion is distinctly different from Lord and Hobbit, which is in agreement with the different coefficient of dispersion.Screwtape is distinctly different from Narnia and Trilogy.There is a great homogeneity in Dicken's novels and a large homogeneity in < C P > in all novels.
In the next sections, we use < P F >, < I P > and < M F > to calculate interesting indices connected to the short-term memory of readers.

Extended Short-Term Memory of Writers/Readers and Universal Readability Index
In this section, we deal with the linguistic variables that, very likely, are not consciously managed by writers who, of course, act also as readers of their own text.We first report findings concerning the extended short-term memory and then those concerning a universal readability index.Both topics address human short-term memory buffers.

Extended Short-Term Memory and Multiplicity Factor
In [12,13], we have conjectured that the human short-term memory is sensitive to two independent variables, which apparently engage two short-term memory buffers in series, constituents of what we have called the extended short-term memory (E-STM).The first buffer is modeled according to the number of words between two consecutive interpunctions, i.e., the variable I P , the word interval, which follows Miller's 7 ± 2 law [14]; the second buffer is modeled according to the number of word intervals, I P ′s, contained in a sentence-i.e., the variable M F -ranging approximately from 1 to 7.
In [13], we studied the patterns (which depend on the size of the two buffers) that determine the number of sentences that theoretically can be recorded in the E-STM of a given capacity.These patterns were then compared with the number of sentences actually found in novels of Italian and English literature.We have found that most authors write for readers with short memory buffers and, consequently, are forced to reuse sentence patterns to convey multiple meanings.This behavior is quantified by the multiplicity factor α, defined as the ratio between the number of sentences in a novel and the number of sentences theoretically allowed by the two buffers, a function of I P and M F .
We found that α > 1 is more likely than α < 1 and often α ≫ 1.In the latter case, writers reuse many times the same pattern of number of words.Few novels show α < 1; in this case, writers do not use some or most of them.The values of α found in the novels presently studied are reported in Tables 10 and 11.

Universal Readability Index
In Reference [14], we have proposed a universal readability index given by In Equation ( 8), < C p,ITA >= 4.48, < C p,ENG >= 4.24.By using Equations ( 7) and ( 8), the average value < kC P > of any language is forced to be equal to that found in Italian, namely 4.48.The rationale for this choice is that C P is a parameter typical of a language which, if not scaled, would bias G U without really quantifying the reading difficulty for readers, who in their language are used, on average, to reading shorter or longer words than in Italian.This scaling, therefore, avoids changing G U for the only reason that a language has, on average, words shorter (as English) or longer than Italian.In any case, C p affects Equation ( 7) much less than P F or I P .
The values of < G U >-calculated as the other linguistic variables, i.e., by weighing chapters (samples) according to the number of words -are reported in Tables 10 and 11.
The reader may be tempted to calculate Equation ( 7) by introducing the mean values reported in Tables 6-9.This, of course, can be performed but it should be noted that the values so obtained are always less or equal (hence they are lower bounds) to the means calculated from the samples (see Appendix A).For example, for Lord, instead of 64.9, we would obtain 61.9.
It is interesting to "decode" these mean values into the minimum number of school years, Y necessary to make a novel "easy" to read, according to the Italian school system, which is assumed as the reference, see Figure 1 of [15].The results are also listed in Tables 10 and 11.

Discussion
Several intriguing observations can be drawn from the results presented in the preceding subsections.
(a).Silmarillion with α = 0.2 is quite diverse from other Tolkien's writings.Mathematically, this is due to its large < M F >= In conclusion, Lord and Narnia are the novels that address readers with very similar E-STM buffers, reuse sentence patterns in similar ways, contain the same number of words per sentence, and require the same reading ability and school years compared to other novels by Tolkien and Lewis.The mathematical connections between Lord and Narnia will be further pursued in the next section, where the four deep language parameters are used to represent texts geometrically.

Geometrical Representation of Texts
The mean values of Tables 6-9 can be used to assess how texts are "close", or mathematically similar, in the Cartesian coordinate plane, by defining linear combinations of deep-language variables.Texts are then modeled as vectors; the representation is discussed in detail in [9,10] and briefly recalled here.An extension of this geometrical representation of texts allows the calculation of the probability that a text may be confused with another one, an extension in two dimensions of the problem discussed in [16].The values of the conditional probability between two texts (authors) can be considered an index indicating who influenced who.

Vector Representation of Texts
Let us consider the following six vectors of the indicated components of deep language variables: → R 6 = (< I P >, < C P > ) and their resulting vector sum: The choice of which parameter represents the component in the abscissa and ordinate axes is not important because, once the choice is made, the numerical results will depend on it, but not the relative comparisons and general conclusions.
In the first quadrant of the Cartesian coordinate plane, two texts are likely mathematically connected-they show close ending points of vector ( 9)-if their relative Pythagorean distance is small.A small distance means that texts share a similar mathematical structure, according to the four deep language variables.
By considering the vector components x and y of Equation ( 9), we obtain the scatterplot shown in Figure 2 where X and Y are normalized coordinates calculated by setting Lord at the origin (X = 0, Y = 0) and Silmarillion at (X = 1, Y = 1), according to the linear tranformations: AppliedMath 2023, 3, FOR PEER REVIEW 10 Pythagorean distance is small.A small distance means that texts share a similar mathematical structure, according to the four deep language variables.By considering the vector components  and  of Equation ( 9), we obtain the scatterplot shown in Figure 2 where  and  are normalized coordinates calculated by setting Lord at the origin ( = 0,  = 0) and Silmarillion at ( = 1,  = 1), according to the linear tranformations: From Figure 2, we can notice that Silmarillion and Screwtape are distinctly very far from all other texts examined, marking their striking diversity, as already remarked; therefore, in the following analyses, we neglect them.Moreover, Pride, Vanity, Moby and Floss are grouped together and far from Trilogy, Narnia and Lord; therefore, in the following analyses, we will not consider them further.
The complete set of the Pythagorean distance  between pairs of texts is reported in Appendix B. These data synthetically describe proximity of texts and may indicate to scholars of literature connections between texts not considered before.From Figure 2, we can notice that Silmarillion and Screwtape are distinctly very far from all other texts examined, marking their striking diversity, as already remarked; therefore, in the following analyses, we neglect them.Moreover, Pride, Vanity, Moby and Floss are grouped together and far from Trilogy, Narnia and Lord; therefore, in the following analyses, we will not consider them further.
The complete set of the Pythagorean distance d between pairs of texts is reported in Appendix B. These data synthetically describe proximity of texts and may indicate to scholars of literature connections between texts not considered before.
Figure 3   These remarks, however, refer to the "average" display of vectors whose ending point depends only on mean values.The standard deviation of the four deep language variables, reported in Tables 6-9, do introduce data scattering; therefore, in the next subsection, we study and discuss this issue by calculating the probability (called "error" probability) that a text may be mathematically confused with another one.

Error Probability: An Index to Assess Who Influenced Who
Besides the vector  ⃗ of Equation ( 9)-due to mean values-we can consider another vector  ⃗, due to the standard deviation of the four deep language variables that adds to  ⃗ .In this case, the final random vector describing a text is given by Now, to obtain some insight into this new description, we consider the area of a circle centered at the ending point of  ⃗ .
We fix the magnitude (radius)  as follows.First, we add the variances of the deep language variables that determine the components  and  of  ⃗ , let them be  ,  .Then, we calculate the average value  = 0.5 × ( +  ) and finally, we set  =  (13) Now, since in calculating the coordinates  and  of  ⃗ a deep language variable can be summed twice or more, we add its standard deviation (referred to as sigma) twice or more times before squaring.For example, in the component,  appears three Besides the proximity with earlier novels, Lord and Narnia show close proximity with each other and with two novels by MacDonald.
These remarks, however, refer to the "average" display of vectors whose ending point depends only on mean values.The standard deviation of the four deep language variables, reported in Tables 6-9, do introduce data scattering; therefore, in the next subsection, we study and discuss this issue by calculating the probability (called "error" probability) that a text may be mathematically confused with another one.

Error Probability: An Index to Assess Who Influenced Who
Besides the vector → R of Equation ( 9)-due to mean values-we can consider another vector → ρ , due to the standard deviation of the four deep language variables that adds to → R. In this case, the final random vector describing a text is given by Now, to obtain some insight into this new description, we consider the area of a circle centered at the ending point of → R.
We fix the magnitude (radius) ρ as follows.First, we add the variances of the deep language variables that determine the components x and y of → R, let them be σ 2 x , σ 2 y .Then, we calculate the average value σ 2 ρ = 0.5 × σ 2 x + σ 2 y and finally, we set ρ = σ ρ (13) Now, since in calculating the coordinates x and y of → R a deep language variable can be summed twice or more, we add its standard deviation (referred to as sigma) twice or more times before squaring.For example, in the x-component, I P appears three times; therefore, its contribution to the total variance in the x-axis is 9 times the variance calculated from the standard deviation reported in Tables 6-9.For Lord, for example, it is 9 × 0.51 2 .After these calculations, the values of the 1-sigma circle are transformed into the normalized coordinates X, Y according to Equations ( 10) and (11).
Figure 4 shows a significant example involving Lord, Narnia, Trilogy, Back and Peter.We see that Lord can be almost fully confused with Narnia, and partially with Trilogy, but not vice versa.Lord can also be confused with Peter and Back, therefore indicating strong connections with these earlier novels.
AppliedMath 2023, 3, FOR PEER REVIEW 12 Figure 4 shows a significant example involving Lord, Narnia, Trilogy, Back and Peter.We see that Lord can be almost fully confused with Narnia, and partially with Trilogy, but not vice versa.Lord can also be confused with Peter and Back, therefore indicating strong connections with these earlier novels.Now, we can estimate the (conditional) probability that a text is confused with another by calculating the ratio of areas.This procedure is correct if we assume that the bivariate density of the normalized coordinates  ,  , centered at  ⃗ , is uniform.By assuming this hypothesis, we can calculate probabilities as the ratio of areas [17,18].
The hypothesis of substantial uniformity around  ⃗ should be justified by noting that the coordinates ,  are likely distributed according to a log-normal bivariate density because the logarithm of the four deep language variables, which combine in Equation ( 9) linearly, can be modeled as a Gaussian.For the central limit theorem, we should expect approximately a Gaussian model on the linear values, but with a significantly larger standard deviation that that of the single variables.Therefore, in the area close to  ⃗ , the bivariate density function should not be peaked, hence the uniform density modeling.

4.
Normalized coordinates  and  of the ending point of vector ( 5) and 1-sigma circles, such that Lord, blue square, is at (0,0) and Silmarillion, blue triangle pointing left, is (1,1).Lord: blue square (blue 1-sigma circle); Narnia: red square (red 1-sigma circle); Trilogy: red circle (dashed red 1-sigma circle); Back: cyan triangle pointing left (cyan 1-sigma circle); Peter: green triangle pointing upward (green 1-sigma circle).Now, we can calculate the following probabilities.Let  be the common area of two 1-sigma circles (i.e., the area proportional to the joint probability of two texts), let  be the area of 1-sigma circle of text 1 and  the area of 1-sigma circle of text 2. Now, since probabilities are proportional to areas, we obtain the following relationships: Now, we can estimate the (conditional) probability that a text is confused with another by calculating the ratio of areas.This procedure is correct if we assume that the bivariate density of the normalized coordinates ρ X , ρ Y , centered at → R, is uniform.By assuming this hypothesis, we can calculate probabilities as the ratio of areas [17,18].
The hypothesis of substantial uniformity around → R should be justified by noting that the coordinates X, Y are likely distributed according to a log-normal bivariate density because the logarithm of the four deep language variables, which combine in Equation ( 9) linearly, can be modeled as a Gaussian.For the central limit theorem, we should expect approximately a Gaussian model on the linear values, but with a significantly larger standard deviation that that of the single variables.Therefore, in the area close to → R, the bivariate density function should not be peaked, hence the uniform density modeling.Now, we can calculate the following probabilities.Let A be the common area of two 1-sigma circles (i.e., the area proportional to the joint probability of two texts), let A 1 be the area of 1-sigma circle of text 1 and A 2 the area of 1-sigma circle of text 2. Now, since probabilities are proportional to areas, we obtain the following relationships: In other words, A/A 1 gives the conditional probability P(A 2 /A 1 ) that part of text 2 can be confused (or "contained") with text 1; A/A 2 gives the conditional probability P(A 1 /A 2 ) that part of text 1 can be confused with text 2. Notice that these conditional prob- abilities depend on the distance between two texts and on the 1-sigma radii (Appendix C).
Of course, these joint probabilities can be extended to three or more texts, e.g., in Figure 4 we could calculate the area shared by Lord, Narnia and Trilogy and the corresponding joint probability, which is not conducted in the present paper.
We think that the conditional probabilities and the visual display of 1-sigma circles give useful clues to establish possible hidden connections between texts and, maybe, even between authors, because the variables involved are not consciously managed by them.
In Table 12, the conditional probability P(A 2 /A 1 ) is reported in the columns; therefore, A 1 refers to the text indicated in the upper row.P(A 1 /A 2 ) is reported in the rows; therefore, A 2 refers to the text indicated in the left column.
Table 12.Conditional probability between the indicated novels.P(A 2 /A 1 ) is reported in the columns; therefore, A 1 refers to the text indicated in the upper row.P(A 1 /A 2 ) is reported in the rows; therefore, A 2 refers to the text indicated the left column.For example, assuming Lord as text 1 (column 1 of Table 12) and Narnia as text 2 (row 3), we find P(A 2 /A 1 ) = 0.974 and vice versa.If we assume Narnia as text 1 (column 3) and Lord as text 2 (row 1), we find P(A 2 /A 1 ) = 0.356.
On the contrary, if the text is extracted from Narnia, then it is more likely attributed to Peter or Trilogy than to Lord or other texts.
We think that these conditional probabilities indicate who influenced who more.In other words, Tolkien influenced more Lewis that the opposite.Now, we can define a synthetic parameter which highlights how much, on the average, two texts can be erroneously confused with each other.The parameter is the average conditional probability (see [16] for a similar problem): Now, since in comparing two texts we can assume P(A 1 ) = P(A 2 ) = 0.5, we receive If p e = 0, there is no intersection between the two 1-sigma circles.The two texts cannot be each other confused; therefore, there is no mathematical connection involving the deep language parameters (this happens for Screwtape and Silmarillion, which can be each other confused, but not with the other texts).If p e = 1, the two texts can be totally confused, and the two 1-sigma circles coincide.Appendix D reports the values of p e for all the pairs of novels.Now, just to allow some rough analysis, it is reasonable to assume p e = 0.5 as a reference threshold, i.e., the probability of obtaining heads or tails in flipping a fair coin.If p e > 0.5, then two texts can be confused not by chance; if p e ≤ 0.5, then two texts cannot likely be confused.
To visualize p e , Figure 5 draws p e when text 1 is Lord (column 1 of Table 12), Narnia (column 3) or Trilogy (column 4).We notice that p e > 0.5 in the following cases: If  = 0, there is no intersection between the two 1-sigma circles.The two texts cannot be each other confused; therefore, there is no mathematical connection involving the deep language parameters (this happens for Screwtape and Silmarillion, which can be each other confused, but not with the other texts).If  = 1, the two texts can be totally confused, and the two 1-sigma circles coincide.Appendix D reports the values of  for all the pairs of novels.Now, just to allow some rough analysis, it is reasonable to assume  = 0.5 as a reference threshold, i.e., the probability of obtaining heads or tails in flipping a fair coin.If  > 0.5, then two texts can be confused not by chance; if  ≤ 0.5, then two texts cannot likely be confused.
To visualize  , Figure 5 draws  when text 1 is Lord (column 1 of Table 12), Narnia (column 3) or Trilogy (column 4).We notice that  > 0.5 in the following cases: We can reiterate that Tolkien (Lord) appears significantly connected to Lewis (Narnia), to MacDonald (Back, Lilith) and Barrie (Peter), but not to Dicken's novels where, on the contrary, Lewis appears connected.In the next section, the four deep language variables are singled out to consider linguistic channels existing in texts.This is the analysis we have called the "fine tuning" of texts [11].We can reiterate that Tolkien (Lord) appears significantly connected to Lewis (Narnia), to MacDonald (Back, Lilith) and Barrie (Peter), but not to Dicken's novels where, on the contrary, Lewis appears connected.
In the next section, the four deep language variables are singled out to consider linguistic channels existing in texts.This is the analysis we have called the "fine tuning" of texts [11].

Linear Relationships in Literary Texts
The theory of linguistic channels, which we will be revisited in the next section, is based on the regression line between linguistic variables: Therefore, we show examples of these linear relationships found in Lord and Narnia.
Figure 6a shows the scatterplot of n S versus n W of Lord and Narnia.In Narnia, the slope of the regression line is m = 0.0729 and the correlation coefficient r = 0.7610.In Lord, m = 0.0731 and r = 0.9199.Since the average relationships-i.e., Equation ( 18)-are practically identical-see also the values of < P F > in Tables 6 and 7-while the correlation coefficients-i.e., the scattering of the data-are not, this fact will impact the sentence channel discussed in Section 9.

Linear Relationships in Literary Texts
The theory of linguistic channels, which we will be revisited in the next section, is based on the regression line between linguistic variables: Therefore, we show examples of these linear relationships found in Lord and Narnia.
Figure 6a shows the scatterplot of  versus  of Lord and Narnia.In Narnia, the slope of the regression line is  = 0.0729 and the correlation coefficient  = 0.7610.In Lord,  = 0.0731 and  = 0.9199.Since the average relationships-i.e., Equation ( 18)are practically identical-see also the values of <  > in Tables 6 and 7-while the correlation coefficients-i.e., the scattering of the data-are not, this fact will impact the sentence channel discussed in Section 9.
Similar observations can be carried out for Figure 6b, which shows  versus  in Lord and Narnia.We find  = 2.0372,  = 0.9609 in Lord, and  = 1.9520 and  = 0.9384 in Narnia.Appendix E reports the complete set of these parameters.Similar observations can be carried out for Figure 6b, which shows n I versus n S in Lord and Narnia.We find m = 2.0372, r = 0.9609 in Lord, and m = 1.9520 and r = 0.9384 in Narnia.Appendix E reports the complete set of these parameters.
Figure 7 shows the scatterplots of Lord and Trilogy.In Trilogy, for n S versus n W m = 0.0672, r = 0.9325; for n I versus n S m = 1.9664, r = 0.9830.
Figure 8 shows the scatterplots for Lord and Back or Lilith.We see similar regression lines and data scattering.In Back (left panel), the regression line between n S and n W gives m = 0.0681, r = 0.9416; in Lilith (right panel), m = 0.0676, r = 0.8890.These results likely indicate the influence of MacDonald on Tolkien's writings because they are different from most other novels.
In conclusion, the regression lines of Lord, Narnia and Trilogy are very similar, but they can differ in the scattering of the data.Regression lines, however, describe only one aspect of the relationship, namely the relationship between conditional average values in Equation (18); they do not consider the other aspect of the relationship, namely the scattering of data, which may not be the same even when two regression lines almost coincide, as shown above.The theory of linguistic channels, discussed in the next section, on the contrary, considers both slopes and correlation coefficients and provides a "fine tuning" tool to compare two sets of data by singling out each of the four deep language parameters.In conclusion, the regression lines of Lord, Narnia and Trilogy are very similar, but they can differ in the scattering of the data.Regression lines, however, describe only one aspect of the relationship, namely the relationship between conditional average values in Equation (18); they do not consider the other aspect of the relationship, namely the scattering of data, which may not be the same even when two regression lines almost coincide, as shown above.The theory of linguistic channels, discussed in the next section, on the contrary, considers both slopes and correlation coefficients and provides a "fine tuning" tool to compare two sets of data by singling out each of the four deep language parameters.

Theory of Linguistic Channels
In this section, we recall the general theory of linguistic channels [11].In a literary work, an independent (reference) variable  (e.g.,  ) and a dependent variable  (e.g.,  ) can be related by the regression line given by Equation (18).
Let us consider two different text blocks  and  , e.g., the chapters of work  and work .Equation ( 18) does not give the full relationship between two variables because it links only conditional average values.We can write more general linear relationships, which take care of the scattering of the data-measured by the correlation coefficients  and  , respectively-around the average values (measured by the slopes  and  ):

Theory of Linguistic Channels
In this section, we recall the general theory of linguistic channels [11].In a literary work, an independent (reference) variable x (e.g., n W ) and a dependent variable y (e.g., n S ) can be related by the regression line given by Equation (18).
Let us consider two different text blocks Y k and Y j , e.g., the chapters of work k and work j.Equation (18) does not give the full relationship between two variables because it links only conditional average values.We can write more general linear relationships, which take care of the scattering of the data-measured by the correlation coefficients r k and r j , respectively-around the average values (measured by the slopes m k and m j ): The linear models Equations ( 19) and (20) introduce additive "noise" through the stochastic variables n k and n j , with zero mean value [9,11,15].The noise is due to the correlation coefficient |r| ̸ = 1.
We can compare two literary works by eliminating x; therefore, we compare the output variable y for the same number of the input variable x.For example, we can compare the number of sentences in two novels-for an equal number of words-by considering not only the average relationship, Equation (18), but also the scattering of the data, measured by the correlation coefficient, Equations ( 19) and (20).We refer to this communication channel as the "sentences channel", S-channel, and to this processing as "fine tuning" because it deepens the analysis of the data and can provide more insight into the relationship between two literary works or any other texts.
By eliminating x from Equations ( 19) and (20), we obtain the linear relationship between the input number of sentences in work Y k (now the reference, input text) and the number of sentences in text Y j (now the output text): Compared to the new reference work Y k , the slope m jk is given by The noise source that produces the correlation coefficient between Y k and Y j is given by The "regression noise-to-signal ratio", R m , due to m jk ̸ = 1, of the new channel is given by The unknown correlation coefficient r jk between y j and y k is given by r jk = cos arcos r j -arcos(r k ) The "correlation noise-to-signal ratio", R r , due to r jk < 1, of the new channel from text Y k to text Y j is given by Because the two noise sources are disjoint and additive, the total noise-to-signal ratio of the channel connecting text Y k to text Y j is given by Notice that Equation ( 27) can be represented graphically [10].Finally, the total and the partial signal-to-noise ratios are given by Of course, we expect that no channel can yield r jk = 1 and m jk = 1; therefore, Γ dB = ∞, a case referred to as the ideal channel, unless a text is compared with itself.In practice, we always find r jk < 1 and m jk ̸ = 1.The slope m jk measures the multiplicative "bias" of the dependent variable compared to the independent variable; the correlation coefficient r jk measures how "precise" the linear best fit is.
In conclusion, the slope m jk is the source of the regression noise R m , and the correlation coefficient r jk is mostly the source of the correlation noise of the channel R r .
In S-channels, the number of sentences of two texts is compared to the same number of words.These channels describe how many sentences the author of text j writes, compared to the writer of text k (reference text), by using the same number of words.Therefore, these channels are more linked to P F than to other parameters.It is very likely they reflect the style of the writer.
In I-channels, the number of word intervals of two texts is compared for the same number of sentences.These channels describe how many short texts between two contiguous punctuation marks (of length I P ) two authors use; therefore, these channels are more linked to M F than to other parameters.Since M F is very likely connected with the E-STM, I-channels are more related to the second buffer of readers' E-STM than to the style of the writer.
In WI-channels, the number of words contained in a word interval (i.e., I P ) is compared for the same number of interpunctions.These channels are more linked to I P than to other parameters.Since I P is very likely connected with the E-STM, WI-channels are more related to the first buffer of readers' E-STM than to the style of the writer.
In C-channels, the number of characters of two texts is compared to the same number of words.They are more related to the language used, e.g., English, than to the other parameters, unless essays or scientific/academic texts are considered because these latter texts use, on average, longer words [9].
As an example, Table 13 reports the total and the partial signal-to-noise ratios Γ dB , Γ m,dB , Γ r,dB in the four channels by considering Lord as reference (input) text.In other words, text j is compared to text k (reference text, i.e., Lord).
Table 13.Total and the partial signal-to-noise ratios Γ dB , Γ m,dB , Γ r,dB in the four channels by considering Lord as reference (input) text.

S-Channel
or scientific/academic papers.These channels are not apt to distinguish or assess large differences between texts or authors.
In the three other channels, we can notice that Trilogy, Back and Lilith have the largest signal-to-noise ratios, about ∼ 19 to ∼ 22 dB; therefore, these novels are very similar to Lord.In other words, these channels seem to confirm the likely influence by MacDonald on both Lord and Trilogy and the connection between Lord and Trilogy.
On the contrary, Narnia shows poor values in the S-Channel (10.12 dB) and WI-Channel (7.94 dB).These low values are determined by the correlation noise because R = R m + R r ≈ R r .If we consider only Γ m,dB -i.e., only the regression line-then we notice a strong connection with Lord since Γ m,dB = 51.26dB.As we have already observed regarding Figure 6, the regression lines are practically identical but the spreading of the data is not.Lewis in Narnia is less "regular" than in Trilogy or Tolkien in Lord in shaping (unconsciously) these two linguistic channels.

Summary and Conclusions
Scholars of English Literature unanimously say that J.R.R. Tolkien influenced C.S. Lewis's writings.For the first time, we have investigated this issue mathematically by using an original multi-dimensional analysis of linguistic parameters, based on the surface deep language variables and linguistic channels.
To set our investigation in the framework of English Literature, we have also considered some novels written by earlier authors, such as Charles Dickens and others, including George MacDonald, because scholars mention his likely influence on Tolkien and Lewis.
In our multi-dimensional analysis, only the series of words, sentences and interpunctions per chapter, in our opinion, were consciously planned by the authors and, specifically, they do not indicate strong connections between Tolkien, Lewis and MacDonald.Each author distributed words, sentences and interpunctions differently from author to author and, sometimes, even from novel to novel by the same author.
On the contrary, the deep language variables and the linguistic channels, discussed in the paper, are likely due to unconscious design and can reveal connections between texts far beyond writers' awareness.
In summary, the buffers of the extended short-term memory required to readers, the universal readability index of texts, the geometrical representation of texts and the fine tuning of linguistic channels-all tools largely discussed in the paper-have revealed strong connections between The Lord of the Rings (Tolkien), The Chronicles of Narnia and The Space Trilogy (Lewis) on one side, and the strong connection also with some novels by MacDonald on the other side, therefore substantially agreeing with what scholars of English Literature say.

Figure 1 .
Figure 1.Series of words versus the normalized chapter number.Blue line: The Lord of the Rings (Lord); red line: The Chronicles of Narnia (Narnia); green line: The Space Trilogy (Trilogy).

Figure 3
Figure 3 shows example of these distances concerning Lord, Narnia and Trilogy.By referring to the cases in which  < 0.2, we can observe the following: (a).The closest texts to Lord are Narnia, Back, Lilith, Mutual and Peter.

Figure 2 .
Figure 2. Normalized coordinates X and Y of the ending point of vector (5) such that Lord, blue square, is at (0,0) and Silmarillion, blue triangle pointing left, is (1,1).Narnia: red square; Trilogy: red circle; Hobbit: blue triangle pointing right; Screwtape: red triangle pointing upward; Back: cyan triangle pointing left; Lilith: cyan triangle pointing downward; Back: cyan triangle pointing left; Phantastes: cyan triangle pointing right; Princess: cyan triangle pointing upward; Oliver: blue circle; David: green circle; Tale: cyan circle; Bleak: magenta circle; Mutual: black circle; Pride: magenta triangle pointing right; Vanity: magenta triangle pointing left; Moby: magenta triangle pointing downward; Mill: magenta triangle pointing upward; Alice: yellow triangle pointing right; Jungle: yellow triangle pointing downward; War: yellow triangle pointing right; Oz: green triangle pointing left; Bask: green triangle pointing right; Peter: green triangle pointing upward; Martin: green square; Finn: black triangle pointing right.
shows example of these distances concerning Lord, Narnia and Trilogy.By referring to the cases in which d < 0.2, we can observe the following: (a).The closest texts to Lord are Narnia, Back, Lilith, Mutual and Peter.(b).The closest texts to Narnia are Lord, Lilith, Bleak, Martin and Peter.(c).The closest texts to Trilogy are Hobbit, Martin and Peter.AppliedMath 2023, 3, FOR PEER REVIEW 11 (b).The closest texts to Narnia are Lord, Lilith, Bleak, Martin and Peter.(c).The closest texts to Trilogy are Hobbit, Martin and Peter.Besides the proximity with earlier novels, Lord and Narnia show close proximity with each other and with two novels by MacDonald.

Figure 8 Figure 8 .
Figure 8 shows the scatterplots for Lord and Back or Lilith.We see similar regression lines and data scattering.In Back (left panel), the regression line between  and  gives  = 0.0681 ,  = 0.9416 ; in Lilith (right panel),  = 0.0676 ,  = 0.8890.These results likely indicate the influence of MacDonald on Tolkien's writings because they are different from most other novels.

Figure 8 .
Figure 8. Scatterplot of the number of sentences n S versus the number of words n W : (a) Lord (blue) and Back (cyan); (b) Lord (blue) and Lilith (cyan).

Table 4 .
The coefficient of dispersion in the series of words, sentences and interpunctions in the indicated novels by Tolkien, Lewis and MacDonald.

Table 5 .
The coefficient of dispersion in the series of words, sentences and interpunctions in the indicated novels.

Table 6 .
John R.R. Tolkien.Mean value and standard deviation (in parentheses) of < C P >, < P F >, < I P >, < M F > in the indicated novels.Mean and standard deviation have been calculated by weighing each chapter with its number of words.

Table 7 .
Clive S. Lewis.Mean value and standard deviation (in parentheses) of < C P >, < P F >, < I P >, < M F > in the indicated novels.Mean and standard deviation have been calculated by weighing each chapter with its number of words.

Table 8 .
George MacDonald.Mean value and standard deviation (in parentheses) of < C P >, < P F >, < I P >, < M F > in the indicated novels.Mean and standard deviation have been calculated by weighing each chapter with its number of words.

Table 9 .
Other authors.Mean value and standard deviation (in parentheses) of < C P >, < P F >, < I P >, < M F > in the indicated novels.Mean and standard deviation have been calculated by weighing each chapter with its number of words.

Table 10 .
Multiplicity factor α, universal readability index < G U > and number of school years Y in the indicated novels by Tolkien, Lewis, MacDonald.

Table 11 .
Multiplicity factor α, universal readability index < G U > and number of school years in the indicated novels of English Literature.
3.62 and < I P >= 8.58.In practice, the number of theoretical sentences allowed by the E-STM to read this text is only 1/α = 5 times the number of sentence patterns actully used in the text.The reader needs a powerful E-STM and reading ability, since G U = 38.7 and Y > 13.This does not occur for Hobbit (α = 39.4,G U = 52.4,Y =9.9) and Lord (α = 368.1,G U = 64.2,Y = 7.4) in which Tolkien reuses patterns many times, especially in Lord.(b).Lord and Narnia show very large values, α = 368.1 and α = 297.7,and very similar G The novels by MacDonald show values of α and G U very similar to those of the other English novels.(e).Notice the homogeneity in Dicken's novels, which require about Y = 7 ∼ 8 years of school and readability index < G U >= 59 ∼ 65.
U ′s and school years: G U = 64.2,Y=7.4 and G U = 61.1,Y=7.9, respectively.Sentence patterns are reused many times by Lewis in this novel, but not in Screwtape ( α = 1.4), which is more difficult to read (G U = 33.5)andrequiresmore years of schooling, Y > 13.Moreover, Lord and Narnia have practically the same < P F >≈ 14. (c).In general, Narnia is closer to Lord than to Trilogy, although the number of words and sentences in Trilogy and Narnia are quite similar (Table1).This difference between Trilogy (G U = 56.2,Y = 9) and Narnia (G U = 61.1,Y = 7.9) might depend on the different readers addressed, kids for Narnia and adults for Trilogy, with different reading ability, as G U indicates.(d).