Distant Reading of the Gospel of Thomas and the Gospel of John: Reflection of Methodological Aspects of the Use of Digital Technologies in the Research of Biblical Texts

Abstract The aim of this study is to demonstrate the applicability of selected methods of the so-called distant reading from the area of digital humanities for the interpretation of early Christian texts, specifically for approaching similarities and differences between the Gospel of Thomas and the Gospel of John. We use the term “distant reading” for the methods that allow us to explore, analyze, and visualize digitized textual data while using the tools from the area of data mining, natural language processing, or corpus linguistics. We want to explore whether methods from the field of digital humanities can allow for a sophisticated, quantifiable, and replicable comparison of the corpora of early Christian movements and thereby help to uncover the basic features of their theology and thus be a suitable complement to traditional exegesis and interpretation achieved by close reading.


Introduction
This study¹ attempts to contribute to the ongoing scholarly debate concerning the relationship between the Gospel of Thomas and the Gospel of John: While one group of scholars emphasizes the similarities between the two, others highlight the differences. Therefore, the first aim of this study is to use selected methods of quantitative text analysis (QTA), which are based on standardized tools and algorithms from the fields of computational linguistics, information retrieval, and digital humanities, to explore whether these methods can help us to decide which of these two viewscommonly based on more traditional exegetical interpretationsis better grounded in evidence.
On a more general level, we also attempt to demonstrate the applicability of these types of methods for the study of early Christian history in general. Over the last several years, these methods have become widely popular in the digital humanities and are referred to under the umbrella term "distant reading," especially in respect to the macroanalysis of English literature.² However, it has been further suggested that distant reading might represent a fruitful methodology for the history of religions as well.³ Following this research pathway, we want to explore to what extent these methods allow for a sophisticated, quantifiable, and replicable comparison of the two texts and their respective theologies. Going even further, we even believe that these differences might reflect differences in theologies of the early Christian movements associated with these texts.⁴ In general, we propose that these methods represent a suitable complement to traditional exegesis and interpretation achieved by close reading of the texts. While our case study is based on a very small amount of documents, we envision that the very same methodology could be employed to analyze much larger corpora.
After the introductory section, which offers an overview of the recent debate concerning the relationship between the Gospel of John and the Gospel of Thomas, we proceed with a section on QTA consisting of three subsections. In Analysis 1, we first apply a rather general method to gain a rough comparison of the vocabulary of Jesus' sayings in the canonical⁵ Gospels and the Gospel of Thomas. As an outcome of this analysis, we produce a series of plots visualizing distances between the documents. Subsequently, in Analysis 2, we focus more deeply on the most frequent terms in these sets of Jesus' sayings, which enables us to obtain more detailed insights concerning what is responsible for the differences between the documents. Finally, in Analysis 3, we utilize methods from distributional semantics to capture the differences in the meaning of a set of target words from these documents.

Historical analysis 2.1 The relationship between the Gospel of Thomas and the Gospel of John
In the following, we focus on the similarities and differences between the Gospel of Thomas and the Gospel of John that have been noted by a number of scholars. After the stage in which the Gospel of Thomas was compared mainly with the Synoptics, most researchers began to discover the relationship between the Gospel of Thomas and the Gospel of John. They noticed the usefulness in their comparison, as the Gospels share similar motifs and parallels. The question remains as to whether the common motifs were a sign of their similarity or the conflict between these two texts.
We can distinguish several attitudes⁶ adopted by researchers regarding the relationship between the Gospel of John and the Gospel of Thomas.⁷ The first group of researchers assumes that the Gospels of  4 In this context, it is worth mentioning that there are scholars who question whether it is possible to reconstruct identifiable communities behind individual Gospels. They suggest that the consensus about the original intended audiences of the Gospels (the so-called Markan, Matthean, Lukan, and Johannine communities) has come about in New Testament scholarship without any substantial argument. This issue is mainly discussed in the publication The Gospels for All Christians: Rethinking the Gospel Audiences, edited by Richard Bauckham, the main proponent of this view. In his perspective, it is first of all necessary to realize that the Gospels are not Paul's epistles, i.e., it is not unproblematic to assume the specific addressees behind them. He emphasizes that mobility and communication in the first-century Roman world were exceptionally developed and many members of the early Christian churches regularly traveled. Moreover, we have concrete evidence for close contacts between churches in the period around which the Gospels were written. According to Bauckham, we cannot assume that it is possible to reconstruct the community from the text, because we certainly cannot take for granted that a Gospel was written in only one community. Bauckham even suggests that it is entirely possible that a Gospel was written over a period during which its author was a resident for a time in each of two or more very different communities. Therefore, the historical context of the Gospels is not the community of the author, but the early Christian movement in the late first century as a whole. For more see Bauckham, "For Whom Were Gospels Written?" 9-48; and Klink, The Sheep of the Fold: The Audience and Origin of the Gospel of John. 5 We use the term "canonical" only as a quick reference term and have no ambition to suggest an approach to these texts, which is different to other texts originating in the same historical and cultural context (e.g., the Gospel of Thomas). 6 According to Perrin and Skinner, we can distinguish four basic attitudes among researchers. See Perrin and Skinner,"Recent Trends in Gospel of Thomas Research,[77][78][79][80][81] Thomas and John are based on common sources, regardless of whether this shared material was written or based upon an oral tradition.⁸ This position was held by Raymond Brown,⁹ who suggests that these two Gospels are based on Gnostic or Gnostic-like sources which functioned as an intermediary, and also by Quispel,¹⁰ who attributes the parallels between the Gospels to their common sourcecollections of Jesus' sayings. The second group of researchers assumes that one Gospel depends on the other and differs in which one they consider to be earlier and which later. For example, Sell¹¹ and Marcovich¹² assume that the Gospel of Thomas is directly dependent on the Gospel of John. Koester,¹³ on the other hand, suggests that some of the Nag Hammadi writings, including the Gospel of Thomas, Dialogue of the Savior and the Apocryphon of James, represent earlier stages of the development of Jesus' sayingsfrom a collection of sayings to fully developed dialogues and discourses in the Gospel of John. In addition, Koester assumes that the author of the Gospel of John reinterprets these sayings to reject the soteriology proposed by the Gospel of Thomas.¹⁴ The third group of scholars perceives the relationship between the two Gospels to be even closer. They assume that these two Gospels have the same theological and social basis and even suggest that they were produced by the same community. This position is held very strongly by Mirkovic,¹⁵ who concentrates on similarities between John and Thomas, suggesting that the similarities originate from the same Sitz im Lebenthe Syrian syncretistic-wisdom tradition of wandering ascetics in the first century¹⁶but also by Davies,¹⁷ who assumes that the Gospel of Thomas is a text created by the same Christian community, which then drew up the Gospel of John. The thesis about the common origin of these two texts, which is considered to be responsible for similarities in theological terminology emphasized by these scholars, is also supported by Fieger.¹⁸ The fourth and fifth groups of researchers emphasize differences between the two Gospels and differ primarily in which of the two Gospels they consider to have originated earlier. Scholars who emphasize the dependence of the Gospel of Thomas on the Gospel of John especially include Popkes,¹⁹ who observes that the use of the same metaphors in both Gospels is very different and considers the Gospel of Thomas to be a relecture of the Johannine text. The view emphasizing Thomasine priority and considering the Gospel of John as a polemical answer to Thomas' theology is maintained by Gregory J. Riley,²⁰ who focuses on the divergence in the concept of resurrection, or April DeConick,²¹ who emphasizes the mystical character of Thomas' Gospel, against which the author of the Gospel of John takes his stance. This position is held also by Elaine Pagels,²² who sees the most crucial moment in the different interpretations of primordial divine light from Genesis 1.

Similarities
The view emphasizing similarities and parallels between the two Gospels is primarily based on assumptions that the parallels and affinities between the Gospel of John and the Gospel of Thomas are the result of some common traditions. Proponents of this view tend to assume that the Gospel of Thomas and the Gospel of John originated around the same time²³ and had the same cultural background.²⁴ Gilles Quispel assumes that the similarities that we can recognize in these two Gospels are based on a shared source of Jesus' sayings and, moreover, that the Gospel of John knows about the distinctive Palestinian traditions represented by Thomas.²⁵ Stevan L. Davies goes further in his argument, claiming that the same community stands behind these two Gospelsin his view, the Johannine community first created the Gospel of Thomas and then the Gospel of John, because the Gospel of Thomas is a collection of sayings, an early stage of the Johannine community tradition.²⁶ According to Davies, both Gospels are derived from the tendency of early Christianity to apply to Jesus the terms and concepts of the Wisdom tradition.²⁷ Davies writes: […] I shall presume that when John and Thomas share christological conceptions and vocabulary derived from Sophiology, the similarities in derivation, conception, and vocabulary indicate a similarity of theological orientation, which is modified but not eliminated by the divergence of the literary forms and particular emphasis of the two documents. Thomas' theology is not identical to that of John, but neither is it "entirely" different.²⁸ Davies especially emphasizes similarity in the usage of sapiential motifs²⁹ such as light,³⁰ a dualism of light and darkness,³¹ Jesus as a teacher coming from above, i.e., from the Father,³² a symbolism of drinking and eating,³³ a motif of failing in seeking and finding,³⁴ an ambivalent attitude toward the world,³⁵ a return to the beginningto the original state of creation³⁶and a present eschatology.³⁷ The similarities between the two Gospels are also emphasized by Mirkovic who does not agree that these two Gospels are literally dependent on one another.³⁸ Mirkovic assumes that these parallels originate from the same Sitz im Leben, which is in his view the popular wisdom of wandering asceticsholy men  23 They tend to date both Gospels around the end of the first century. Robinson, "The Johannine Trajectory," 232-68; Mirkovic, "Johannine Sayings," 1-4; Dunderberg, The Beloved Disciple in Conflict? 5, 114. 24 Regardless of whether researchers emphasize parallels or differences between the two Gospels, there is no consensus on where to seek out their origins. Some scholars tend to place both Gospels in Syria, especially Riley's Resurrection Reconsidered, 177 or Mirkovic's "Johannine Sayings," 1-4; however, others emphasize the roots of the Gospel of John in the Palestinian context, see Quispel, "Qumran, John and Jewish Christianity," 144-6. and women in the first-century Syria. Moreover, Mirkovic assumes the theological proximity of the two Gospels, which, according to him, lies primarily in the fact that both communities believed in Jesus and considered him to be a self-conscious sage whose salvific words for us are true, flawless, and decisive. This concept comes from the aforementioned oral tradition of wandering holy men and women. The main task of their followers was also to keep, preserve and interpret the sayings of the master. Along these lines, this remarkable self-consciousness of Jesus in John and Thomas has its origin in the Stoic notion of "cataleptic impressions"the concept in which a sage is able to discern truth from falsehood because he has a peculiar power of revealing his object and to teach so to speak without any need for external confirmation or logical proof.³⁹ This stoic concept, according to Mirkovic, is shared by both Gospels, which we see in the way they portray Jesus.

Differences
The second widespread position emphasizes the fact that, despite the abovementioned similarities, there are profound differences in theology and in the understanding of the role of Jesus. Without doubt, these scholars also recognize the points of contact between John and Thomas, especially in contrast to the Synoptics: Matthew, Mark, and Luke place emphasis on Jesus' warning of the coming "end of time" and identify Jesus as God's human agent or servant. On the contrary, John and Thomas emphasize that Jesus actually directs his disciples toward the beginning. They link Jesus to the divine light, which came into being in the beginning according to the account of creation, and understand him as God in human form. Thomas and John both say that this primordial light connects Jesus with the entire universe and with divinity, because all things were made through him, in John's terminology through the "Logos."⁴⁰  39 Ibid., 17-18. 40 Pagels,Beyond Belief,[36][37][38][39][40] However, according to these scholars, while the vocabulary and metaphors are identical, they are used to express a completely different understanding of the role of Jesus.
Gregory J. Riley claims that both communities were based in Syria and interacted closely.⁴¹ However, Riley stresses the controversy between Johannine and Thomasine traditions, which is primarily in their view of the resurrection. In Riley's opinion, John promotes the fleshly resurrection of Jesus, while Thomas' view tends to deny the concept of physical resurrection.⁴² He emphasizes the controversial character of John's depiction of the encounter with the risen Jesus. For the author of John's Gospel, the Doubting Thomas serves to depict the antihero, and his humiliation is also a literary construct through which John rejects the theology of the Thomasine movement.⁴³ Another important figure in the research of the Gospel of Thomas, April DeConick, based her analysis on the traditio-rhetorical model. She also perceives the character of Thomas as a dramatic construct of the author of the Gospel of John; however, in her view, what divides them most is a Thomasine inclination to mysticism. According to the Gospel of John, the misunderstanding of the Gospel of Thomas lies in Thomas' claim that one must seek the path to Jesus, the path to ascend into heaven, and a visionary experiencevisio Dei.⁴⁴ Thomas's conception, on the other hand, is very close to what we know from Jewish and later Christian mysticism, i.e., the image of God is hidden in each of us, regardless of whether the person is aware of it or not. Everyone has a divine spark within themselves, because they were created by God. This is precisely the kind of mysticism that the author of John's Gospel blames on the Thomasine community, arguing that salvation, and thus the coming of the Kingdom of God, can only be achieved through faith in Jesus. According to DeConick, the Johannine author created a "faith mysticism" in order to oppose the "visionary mysticism" and "mystical ascent soteriology" that can be found in the Gospel of Thomas.⁴⁵ Elaine Pagels also holds a strong position that emphasizes the conflict between the two Gospels. She assumes that the Gospel of John was written to oppose certain Christology and, more precisely, to oppose a certain view of Jesus held by the Gospel of Thomas. Although the Gospel of John also speaks of Jesus as the light, the main message of the Gospel of Thomas is that this divine lightwhich comes from God and is most clearly manifested in Jesusshines not only in him but potentially in anyone who is willing to seek this light. According to Pagels, "Thomas' Gospel encourages the hearer not so much to believe in Jesus, as John requires, as to seek for God through one's own divinely given capacity, since all are created in the image of God."⁴⁶ For Thomas, everyone in creation receives an innate capacity to know Godbecause we are all created in the image of the primordial light. Thus, one may come to recognize oneself and Jesus as identical twins.⁴⁷ The Gospel of John strictly opposes this view and claims that the image of God resides only in Jesus, because he is the incarnate Christ, Redeemer, true Logos, and God revealed in a human form.⁴⁸ According to Pagels, the author of the Gospel of John most likely knew the Gospel of Thomas and wrote his Gospel as an open controversy against this direction of thought and against all other Jewish and pagan mystical tendencies toward this optimistic vision of divine capacity or divine spark in each of us. The author of the Gospel of John decided to write his own Gospel to underscore that it is Jesusand only Jesuswho is this primordial light and embodies God's word, and therefore speaks with divine authority. For this author, Jesus is unique, and only by believing in Jesus can we find the divine truth. John agrees with the Synoptic view that Jesus is called upon by God, that Jesus is the Messiah, rabbi, and prophet, but his teaching goes much further. The author of the Gospel of John appears to imply that the Synoptics did not fully understand who Jesus really was. He is more radical and dares to say that not only is Jesus chosen by God, he is even God himself -God who revealed himself to us in the form of Jesus. John declares that Jesus existed even before Abraham was born.⁴⁹ This is the crucial difference between John and Thomas as also expressed in the episode about Doubting Thomas.⁵⁰ From this perspective, this episode is an open polemic against Thomasine Christianity. Thomas is portrayed here as a disciple who does not believe and does not understand. However, after Jesus' revelation, he is surprisingly and perhaps sarcastically the only one who proclaims Jesus as Lord and even as God.⁵¹ Nevertheless, the view that we are actually dealing with a conflict between two texts here has been criticized by some researchers. For example, Skinner,⁵² drawing on his analysis of characters appearing in the Gospel of John, claims that "the instances of Thomas's misunderstanding identified by Riley, DeConick, and Pagels are actually part of a much larger presentation of characters in the Fourth Gospel. Thus, the supposition of John's anti-Thomas polemic runs into difficulties unless it can be demonstrated that John's similar treatment of Peter, Nicodemus, the Samaritan woman, etc. are also accounted for by positing numerous polemics aimed at individuals and/or communities within early Christianity."⁵³ Dunderberg also holds a restrained position, suggesting that there should be more indicators and convincing signs of literary dependence to postulate either a closer relationship or conflict between the two Gospels. In his view, these two Gospels were probably part of more general discussions about certain similar topics and took different stances on them. Dunderberg also assumes that they did not know each other's opinions.⁵⁴ Moreover, like Skinner, he also notes that Thomas is depicted negatively, but so are most other figures in the Gospel of John.⁵⁵ With this in mind, it seems that despite the similarities that appear in both Gospels and the fact that we cannot say with certainty that the Gospels were formulated in conflict, we are dealing with a different theological message here. The author of the John's Gospel probably knew and complemented the Synoptic Gospels; at the same time, he drove the view of Jesus into a radical belief that Jesus is a preexistent Logos, divine light, and incarnate God, and presents this concept through Jesus' self-revealing monologues. Jesus explicitly designates himself several times as the light: Ἐγώ εἰμι τὸ φῶς τοῦ κόσμου (I am the light of the world; GJn 8:12); ὅταν ἐν τῷ κόσμῳ ὦ, φῶς εἰμι τοῦ κόσμου (While I shall be in the world, I am the light of the world; GJn 9:5); ἐγὼ φῶς εἰς τὸν κόσμον ἐλήλυθα, ἵνα πᾶς ὁ πιστεύων εἰς ἐμὲ ἐν τῇ σκοτίᾳ μὴ μείνῃ (I have come into the world as a light, so that everyone believing in me in darkness not should abide; GJn 12:46). For John, the motif of Jesus' equality with God is very important, as it is already strongly expressed in the Johannine prologue: Ἐν ἀρχῇ ἦν ὁ Λόγος, καὶ ὁ Λόγος ἦν πρὸς τὸν Θεόν, καὶ Θεὸς ἦν ὁ Λόγος (In the beginning was the Word/Logos, and the Word/Logos was with God, and God was the Word/Logos; GJn 1:1) but also further in the gospel: ἐν ἐμοὶ ὁ Πατὴρ κἀγὼ ἐν τῷ Πατρί (In me (is) the Father and I (am) in the Father; GJn 10:38). The unity of the Father and the Son is also extensively presented in chapter 17, in Jesus' prayer for the disciples: καὶ νῦν δόξασόν με σύ, Πάτερ, παρὰ σεαυτῷ τῇ δόξῃ ᾗ εἶχον πρὸ τοῦ τὸν κόσμον εἶναι παρὰ σοί (And now glorify me, you, Father, in your presence with the glory I had before the world began with you; GJn 17:5); ἵνα πάντες ἓν ὦσιν, καθὼς σύ, Πατήρ, ἐν ἐμοὶ κἀγὼ ἐν σοί (That all one may be one as you, Father, (are) in me and I (am) in you; GJn 17:21); ἵνα ὦσιν ἓν καθὼς ἡμεῖς ἕν (So that they may be one as we (are) one; GJn 17:22). This unity can never be achieved by disciples and believers, so the role of Jesus is emphasized as unique and irreplaceable. Thomas, who on the other hand perceives Jesus as a great sage, also uses the symbol of light and emphasizes the importance of his words, but assumes that the inner capacity to find divine light is in each of us and this is not an exclusive feature of Jesus. Jesus is the light: ⲁⲛⲟⲕ ⲡⲉ ⲡ. ⲟⲩⲟⲉⲓⲛ ⲡⲁⲉⲓ ⲉⲧ. ϩⲓ ϫⲱ. ⲟⲩ ⲧⲏⲣ. ⲟⲩ (I am the Light that is over all; GTh 77), but this light is potentially in each of us: ⲟⲩⲛ ⲟⲩⲟⲉⲓⲛ ϣⲟⲟⲡ ⲙ. ⲫⲟⲩⲛ ⲛⲛ. ⲟⲩ. ⲣⲙ. ⲟⲩⲟⲉⲓⲛ ⲁⲩⲱ ϥ. ⲣ. ⲟⲩⲟⲉⲓⲛ ⲉ. ⲡ. ⲕⲟⲥⲙⲟⲥ (There is light existing inside a man of light, and he makes light/shines to the whole world; GTh 24), for we have come out of this light: ⲉ. ⲩ. ϣⲁⲛ. ϫⲟⲟ. ⲥ ⲛⲏ. ⲧⲛ ϫⲉ ⲛⲧⲁ. ⲧⲉⲧⲛ. ϣⲱⲡⲉ ⲉⲃⲟⲗ ⲧⲱⲛ ϫⲟⲟ. ⲥ ⲛⲁ. ⲩ ϫⲉ ⲛⲧⲁ. ⲛ. ⲉⲓ ⲉⲃⲟⲗ ϩⲙ ⲡ. ⲟⲩⲟⲉⲓⲛ ⲡ. ⲙⲁ ⲉⲛⲧ. ⲁ. ⲡ. ⲟⲩⲟⲉⲓⲛ. ϣⲱⲡⲉ ⲙⲙⲁⲩ ⲉⲃⲟⲗ ϩⲓ ⲧⲟⲟⲧ. ϥ ⲟⲩⲁⲁⲧ. ϥ (If they say to you: "Where do you come from?" (then) say to them: We have come from the light, the place where the light has come into being by itself; GTh 50) and can become light again, be reunited and refilled with it: ϩⲟⲧⲁⲛ ⲉ. ϥ. ϣⲁ. ϣⲱⲡⲉ ⲉ. ϥ. ϣⲏϥ ϥ. ⲛⲁ. ⲙⲟⲩϩ ⲟⲩⲟⲉⲓⲛ (If someone becomes [like] (God/the one who is equal), he will become full of light; GTh 61). Jesus' role is not unique; anyone who follows Jesus and his words can be like him: ⲡⲉϫⲉ ⲓⲥ ϫⲉ ⲡⲉⲧⲁ. ⲥⲱ ⲉⲃⲟⲗ ϩⲛ ⲧⲁ. ⲧⲁⲡⲣⲟ ϥ. ⲛⲁ. ϣⲱⲡⲉ ⲛ. ⲧⲁ. ϩⲉ ⲁⲛⲟⲕ ϩⲱ ϯ. ⲛⲁ. ϣⲱⲡⲉ ⲉ. ⲛⲧⲟϥ ⲡⲉ ⲁⲩⲱ ⲛⲉⲑ. ⲏⲡ ⲛⲁ. ⲟⲩⲱⲛϩ ⲉⲣⲟ. ϥ (Jesus says: "Whoever will drink from my mouth will become like me. I myself will become he, and what is hidden will be revealed to him"; GTh 108).

Quantitative text analysis
In the following text, we approach the historical research problem outlined above by means of QTA, i.e., utilizing computer algorithms and tools used in computational linguistics and computer science to quantitatively analyze human natural language data.⁵⁶ In the digital humanities, these methods are sometimes referred to under the umbrella term "distant reading," drawing on the idea that the "close reading" of individual texts and passages that characterize traditional humanities scholarship might be fruitfully enriched by the employment of computational tools to "zoom out" from individual texts and passages in an attempt to discover more general patterns perceivable only from a larger distance.⁵⁷ The patterns of special interest are those concerning usage of certain words, either alone or in context, approached either within one or across many documents under scrutiny.
To proceed with our QTA, we first describe the textual data we have used in all our analyses and how we obtained them. Subsequently, we introduce three analyses employing these data to resolve particular research questions. Following good practice in quantitative scholarship, each analysis is divided into subsections of methods and results. We begin with the most general method, i.e., quantitatively comparing vocabulary employed by individual documents to calculate distances between these documents. Subsequently, we look more closely at these differences via an analysis of word frequencies.
Finally, we employ a variant of word embeddings to trace differences in meaning of certain words across the documents.

Data
At the core of our interest lie the similarities and differences in theology of the Gospel of John and the Gospel of Thomas. The first obstacle in approaching this problem while using QTA is associated with the fact that these two texts are fully extant only in different languages, namely ancient Greek and Coptic. Therefore, the texts cannot be compared directly in primary languages. To deal with this problem, a part of our analyses will be based upon English translations of the Gospels, forming a bridge between the two languages. However, wherever it is meaningful and possible, we use both Coptic and Greek. Furthermore, we also validate the results obtained for the Gospel of John by comparing them with results acquired by applying the same methods to other canonical Gospels, both in Greek and English.
For the English part of our analysis, we used the Revised Standard Version translation of the canonical Gospels⁵⁸ and Thomas O. Lambdin's translation of the Gospel of Thomas.⁵⁹ We chose these versions also because they have been used in an online synopsis of the five Gospels produced in an HTML markup by John W. Marshall.⁶⁰ This way we were able to rely upon a single online resource. We extracted these texts from this resource using Python programming language, which we also used for all the subsequent analytical steps described in this article.⁶¹ In the case of the Greek text of the canonical Gospels, we used a morphologically tagged and lemmatized version of Eberhard Nestle's 1904 edition of the Greek New Testament, which has been made available online in a tabular format by biblicalhumanities.org via GitHub.⁶² For the Coptic version of the Gospel of Thomas, we turned to a machine-readable interlinear translation produced by Milan Konvička and made available online as a part of the Marcion 1.6.2 software.⁶³ While approaching our task, we faced another obstaclethe fact that the Gospel of John and the Gospel of Thomas each represent a different genre, i.e., the first is framed as a narrative and the second as a set of Jesus' sayings. To overcome this obstacle while making our analyses of similarities and differences in the theology of these two Gospels more meaningful, we extracted from all texts only the words attributed to Jesus, i.e., the so-called Jesus' sayings. All analyses introduced below in Greek, Coptic, and English are based only upon this subset of the texts.
As a final step in preprocessing the data, we proceeded with lemmatization and word type filtering. Lemmatization is a process producing a dictionary form of a word (or: lemma), which can be conducted either manually or by means of automatic computational tools. It is commonly done simultaneously with morphological analysis of the word in the text, classifying whether it is a noun, adjective, etc. In the case of the Greek New Testament, this task was rather straightforward, since we could rely upon lemmatization and morphological analysis being a part of the edition we worked with. Thus, for each of Jesus' sayings in the Greek versions of the canonical Gospels, we could easily filter the data only for certain word types, namely nouns, adjectives, and verbs, and extract only the lemmata for these types.
In the case of the Coptic version of the Gospel of Thomas, instead of lemmata, we were able to use ID codes associated with each lexeme,⁶⁴ referring to online grammar and dictionary.
In the case of English, we had to make the lemmatization on our own, and for this purpose we used an automatic lemmatization tool in the Python NLTK library.⁶⁵ We further filtered words within this lemmatized dataset by automatically removing all stopwords using the same Python library.
As a result of these cleaning and preprocessing procedures, we obtained three highly comparable datasets. Since the three languages employ the word types differently and since we used different methods for filtering them, different language versions are slightly different in size. In the case of Jesus' sayings within canonical Gospels, there are 13,780 words in total in their English versions compared to 14,508 words in total in their Greek versions, which equals 0.94982. In the case of 415 sayings identified within the Gospel of John, we have a ratio of 2,644:2,817 words, which equals 0.93859. The cleaned and preprocessed version of Jesus' sayings in the English translation of the Gospel of Thomas consists of 1,654 words, while the Coptic version of the same text consists of 1,751 words, which equals a ratio of 0.9446. These words form 241 and 273 sentences, respectively,⁶⁶ with an average length of 6.86307 and 6.41392 words, respectively. Again, this is highly comparable to Jesus' sayings in the Gospel of John, with an average length of 6.37108 words.

Analysis 1: document distances 3.2.1 Methods
In the first analysis we carried out, we used the dataset of Jesus' sayings to formally evaluate similarity between the five/four Gospels by drawing on the vocabulary they employ. This method is based on the idea that documents sharing a larger amount of words are more similar to each other than documents that have less words in common. This similarity is calculated by a comparison of multidimensional vectors based on document-term co-occurrence matrices, by which for each pair of documents within a corpus we obtain one value expressing their numerically expressed similarity. This value can then be used to produce a visualization of distances between documents in a two-dimensional space, where more similar documents are plotted as being closer to each other within the space. Here the similarity between two documents is inverted to express their distance on a scale between 0 and 1, with 1 meaning that the documents are completely unrelated. However, the projection into a two-dimensional space has to be interpreted with caution, since it represents only a rough approximation of the numerical values.

Results
We calculated the distances between three sets of documents containing the sayings of Jesus. The first set is based on the five Gospels in the English translation (RSV + GoT), the second is based on the four canonical Gospels in the English translation (RSV), and the third is based on the four canonical Gospels in Greek (GNT). The results can be seen in Figure 1.
On the left plot, visualizing distances between the five Gospels in the English translation, we clearly see that the Gospel of Thomas and the Gospel of John are quite far from each other. Expressed numerically, their distance is 0.40978. This is a higher distance than the one dividing them from the Synoptics. For instance, the distance between Thomas and Mark is 0.28595, while the distance between John and Mark is 0.35930. These are still much greater distances than the ones dividing the Synoptics from one another: e.g., the distance between Matthew and Luke is only 0.08327.
The middle and the right plots are here to demonstrate that we obtain the very same results when we compare the canonical Gospels in English and Greek. This has also been evaluated statistically by means of the Pearson correlation (ρ = 0.99564, p < 0.001). This is an important finding, as it demonstrates that the similarities and differences in Jesus' vocabulary across the Gospels we observe in the English text are actually not an artificial product of the translation but rather closely mirror the original.
The results of this method, which identify the large distance between John and Thomas in comparison to the rest of the Gospels, appear to support the second perspective within the scholarly debate, i.e., the one that highlights differences between these two Gospels. With this in mind, it does seem a bit less likely that these two texts shared the same sources or that they could have been created by the same community wanting to proclaim the same theological concept. Hypothetically, such a large distance between the two texts, even farther than their distance from the Synoptic Gospels, might explain the thesis, which considers that the Gospel of John proclaims a theology that is not in accordance with the theology of the Gospel of Thomas. However, with some degree of certainty, it can only be said that the distance between these texts indicates their differences. The conclusion that the contradiction or direct controversy highlighted by some of these researchers may be the reason why these texts are so far removed cannot be directly deduced from this analysis.
It is also possible that this analysis could support Dunderberg's thesis that there is no literary relationship between these two Gospels. Dunderberg argues that it is necessary to consider whether the similarities between the two Gospels imply some particular connection or conceptual relationship. It should be considered whether these parallels are also commonly attested outside the two Gospels in early Christian literature. Based on the analysis of I-sayings in the Gospel of Thomas and their comparison with similar statements in John, Dunderberg concludes that no intimate relationship between the Gospel of Thomas and the Johannine writings can be assumed. In his view, even close contact between the communities behind them cannot be postulatedthey shared no similar theology, nor were they in conflict. Dunderberg suggests that each of them employed similar Jewish and Christian traditions in quite different ways and did not know each other's positions. Moreover, the emphasis that is common to both Gospels can be found also in other early Christian texts.⁶⁷ The results of our first analysis might be interpreted also in agreement with these points, but since we are interested in whether distance reading can support the view of either the researchers who emphasize similarities or the researchers who reveal differences, our further analysis will focus on what terms appear most often in the gospels.

Methods
Our second analysis focused in more detail on the findings obtained by means of the first analysis. It consisted of measuring word frequencies in both the English and Greek versions of the canonical Gospels and in both English and Coptic versions of the Gospel of Thomas. In the case of texts available in Greek and English, we made two measurements for each unique word within each Gospel. First, we calculated its term frequency within the Gospel (gosTF) by counting all its occurrences and dividing this number by the total number of words within the Gospel. Second, to reveal words of special importance for each individual Gospel, we generated a second metrics, dividing the gosTF value by a term frequency of the same word across all Gospels together (i.e., five Gospels in the case of the English text and four canonical Gospels in the case of the Greek text). In instances where the value of these second metrics is significantly higher than 1, we can conclude that we are dealing with a term specific to the given Gospel. In the case of the Coptic version of the Gospel of Thomas, we conducted only the first measurement, as we do not possess other Gospels in a suitable format for comparison.

Results
To report and interpret results of this analysis, we produced Figure 2  contained in Jesus' sayings within the Gospel. Although there is evidently an overlap between the Greek and English version of the canonical Gospels, we should not ignore the fact that there are also some differences, especially in verb usage. Therefore, in the subsequent analysis, we will pay special attention to nouns.
Quite unsurprisingly, we see that the most frequent term in Jesus' sayings in the Gospel of John is the word "father" (πατήρ), while the word "man" (ἄνθροπος), which is so predominant in the Synoptics, is almost missing. Overall, the figure appears to capture the same general pattern as in the case of distances: Looking at the 20 most frequent words, Jesus sayings' in the Gospel of John and the Gospel of Thomas do not seem to be very much related. Moreover, the table of normalized frequency of words seems to reflect the differences in theology, especially soteriology, between these two Gospels. It has been suggested that, for Thomas, the most important message is that everyone who seeks can find God, that every man has a divine capacity to know God and thus to become like Jesus. It appears that key terms for this message -"man" (ⲣⲱⲙⲉ), "become" (ϣⲱⲡⲉ), "find" (ϩⲉ), "know" (ⲥⲟⲟⲩⲛ)are also among the most common in the Gospel of Thomas. On the other hand, the main message of the Gospel of John is that God, the divine Father, because he loved us, came to this corrupt world and gave his Son to save us. In John, you cannot only seek and find; you do not have this kind of capacity within yourself, nor do you recognize the Father from the world. You must believe in Jesus, whom the Father sent into this world, know his word, and obey it. As in the previous case, the keywords of John's theology seem to be reflected in our analysiswe see that the frequency distribution of words (gosTF) has shown the importance of terms like "father" (πατήρ), "come" (ἔρχομαι), "world" (κόσμος), "believe" (πιστεύω), "give" (δίδωμι), "son" (υἱὸς), "know" (γινώσκω), "love" (ἀγαπῶ), and "send" (πέμπω). The same analysis was also made for Jesus' sayings in the Synoptic Gospels to test whether our results were accidental or not. Although these three Gospels do not translate their theological concepts into Jesus' words as strongly as John and Thomas, we can also note here that some of the characteristic features and the typical theological vocabulary of Figure 2: Twenty terms with the highest gosTF in the five Gospels. Where differentiated, the color scale depicts the term frequency within the Gospel in ratio to its frequency in all Gospels together; darker color signifies a higher ratio, and thus a higher specificity of the given term for the given Gospel.
the Gospels' authors are manifested here. An eloquent example is the importance of the term "heaven" (οὐρανὸς) in the Gospel of Matthew. At the same time, the similarities between the 20 most frequent terms of these three Gospels make it very clear that they are indeed Synoptic Gospels.

Methods
The two analyses introduced above managed to successfully capture similarities and differences in adopted vocabulary. However, these methods are completely blind to the semantic content of the words. Nevertheless, there is a whole family of QTA methods attempting to surpass this limitation. These methods form the core of distributional semantics models of meaning.⁶⁸ Distributional semantics draws upon a general assumption that two words that tend to co-occur in the same contexts within a corpus also tend to have a similar meaning (the so-called "distributional hypothesis").⁶⁹ When a context is defined as a sentence, two words occurring repeatedly together in the same sentences are considered to be closer to each other in terms of semantic relatedness than two words that tend to occur in different sentences.⁷⁰ Adopting this approach in our analysis, we first construct a word-by-word co-occurrence matrix with cell values corresponding to the number of instances that any two words co-occur within a sentence.⁷¹ Each row or column within this square matrix might be considered a vector, informing us about a location of a given word within a multidimensional space (the number of dimensions equals the number of words). However, since many elements within such vectors are typically zeros (i.e., no sentence co-occurrence between two words within the text), two vectors cannot be directly compared (as more zeros imply higher similarity, which is misleading). Therefore, we have to apply standardized methods for reducing vector dimensionality, such as singular value decomposition.⁷² As a result of this step, we obtain a matrix with rows corresponding to words and with columns corresponding to a vector with our preselected number of dimensions. This reduced word-vector matrix can finally be used to calculate similarities or distances between any word pair.

Results
We have used this methodology to generate a word similarity matrix for four documents: Jesus' sayings within the Coptic version of the Gospel of Thomas, Jesus' sayings within the English version of the Gospel  68 Lenci, "Distributional Models of Word Meaning," 151-71. For a more elaborated application of these methods in New Testament Studies, see Munson,Biblical Semantics. 69 Harris. "Distributional structure, The computational methods to calculate this kind of relatedness between two words are actually very similar to the ones we used in the case of calculating distances between two documents. The difference is that instead of analyzing and comparing documents as a whole, we are now calculating similarities between words within one document by comparing vectors based on the distribution of these words across sentences in this document. 71 Instead of simple word counts, we employ Tf-Idf weighting here. Generally speaking, our method differs from Latent Semantic Analysis, which is actually constructed on the basis of a word-by-document co-occurrence matrix. Thus, while LSA attempts to capture similarities on the basis of word-by-sentence co-occurrence vectors, we are interested here in more basic patterns on the level of word-by-word co-occurrences vectors. Furthermore, LSA is designed to work with much larger corpora than the ones we are dealing with here. See Landauer, McNamara, Dennis, and Kintsch, Handbook of Latent Semantic Analysis;Altszyler, Sigman, Ribeiro, andSlezak, "Comparative Study of LSA vs Word2vec," https://doi.org/10.1016/j.concog.2017. 09.004. 72 For further details, see Martin and Berry,"Mathematical Foundations, of Thomas, Jesus' sayings within the Greek version of the Gospel of John, and Jesus' sayings within the English version of the Gospel of John.
In the historical section of this article, we introduced a set of concepts shared by the Gospel of John and the Gospel of Thomas, which have been discussed by scholars to demonstrate either similarities or differences in the theology of the two Gospels. Here we focus on the context of usage of terms which were directly identified in both Gospels using the same wordings and which could be more or less straightforwardly identified in either the Greek or Coptic version of the two Gospels.⁷³ Thus, we ended up with six terms: "father" (πατήρ, ⲉⲓⲱⲧ), "begin/beginning" (ἀρχή, ϩⲏ/ ⲁⲣⲭⲏ/ ⲁⲣⲭⲉⲓ), "find" (εὑρίσκω, ϭⲓⲛⲉ/ ϩⲉ), "light" (φῶς, ⲟⲩⲟⲉⲓⲛ), "seek" (ζητέω, ϣⲓⲛⲉ), "world" (κόσμος, ⲕⲟⲥⲙⲟⲥ).
For each language variant of the document, we can now extract a chosen number of words with which these six target terms most often co-occur. These words might be considered to be their nearest neighbors. Since we are working with two documents in two languages, for each of the six target terms we now have two versions of their nearest neighbors (see Table 1).
These results may be easily subjected to a qualitative inspection. We can identify some obvious similarities here. For instance, in both cases, the nearest neighbor of "light" (φῶς, ⲟⲩⲟⲉⲓⲛ) is "darkness" (σκοτία, ⲕⲁⲕⲉ). At the same time, we clearly observe many differences. However, we cannot evaluate the extent of these similarities vs differences in a controlled and quantitative manner. To overcome this limitation, we have turned again to the English translations of these texts.  73 Thus, for instance, we omitted words like "understanding/to understand" and "knowledge/to know," since in the Greek original they sometimes correspond to γινώσκω, sometimes to οἶδα, etc.
When we focus only on words that appear at least twice within both documents, we find that there are 123 words which the two English-translation Gospels have in common. To make our models more comparable, we will focus especially on this shared vocabulary. For each of the target words in this shared vocabulary, we can now easily extract its relatedness to all these 123 words (including the target word itself with a value of 1, which means absolute similarity, or identity). In this manner, we obtain pairs of numerical lists of identical length, which can be inspected for correlation. Following the logic of the distributional hypothesis, the extent of the correlation between these two lists should mirror semantic similarities in usage of the target word across the two documents.
The results of this analysis are listed in Table 2. First, we see that there are a number of differences between the nearest neighbors here and in the original languages. This is not surprising considering the different structure of the languages and the very limited size of the corpus. However, we can also see that some overall patterns persist. In general, we might claim that the target terms are used quite differently within the two documents, as is evident from their nearest neighbors.
Furthermore, we also see that in the case of some words, such as "light" or "world," the correlation is quite strong, while in the case of other words, like "begin" or "seek," it is much weaker. A stronger correlation suggests that the usage of the term is more similar across the two documents. However, we should use discretion with these findings, as the reliability of these models is highly dependent on the size of the input language data. In our case, this size is very small.
With this in mind, we can still see that by using these technological tools we are able to discern that the same vocabulary and the same metaphors are used to convey semantically different messages. Despite the similarities and parallels shared by the two Gospels, they seem to be quite far from each other according to this analysis. By focusing especially on terms that scholars consider proof of the similarity of the two documents, we were able to capture quantitatively that these terms are used very differently here. This analysis therefore seems more to support the view shared by scholars who focus on the differences and assume that the same vocabulary is used to present different attitudes in these Gospels, as we can see that key common terms in both Gospels are associated with a very different set of expressions, which is shown in the two tables above.

Conclusion
In summary, all of the quantitative text analyses we employed in the previous section seem to support the second view in the scholarly debate, i.e., the view that emphasizes differences between the two Gospels and considers these differences to be more fundamental than the similarities visible at first sight. However, it is not possible to decide based on these analyses whether these Gospels are in direct conflict with each other or whether they are different becausedespite the use of similar terminologythey represent different theological concepts. With these results in mind, it appears that at least in our particular case it is meaningful and useful to complement traditional close reading, i.e., exegetical interpretation, with methods of "distant reading," and that even very basic methods of this type can provide relevant insights. It was also very important to find that these tools have the ability to capture semantic differences when the same terminology is used to transmit different meanings. The essential benefits therefore include the acquisition of a global view of the content and structure of the examined texts or corpora; furthermore, we can also compare textual corpora with one another. In a controllable and replicable form, it is possible to support and complement the researcher's view obtained via a detailed close reading of individual texts. A very important benefit we see in these methods is the possibility to correct pre-understandings and subjective views, which is difficult to avoid in close reading and classical exegesis. Especially researchers who have been dealing with certain texts and topics for a long time and know them very well already have a wealth of knowledge and information that naturally shapes and influences their understanding of related problems and topics. Computer analysis can thus enable them to look at the problem without all of these burdens, to evaluate their hypothesis and interpretation in a replicable form, or to warn them in cases in which they have overlooked some essential aspects or overloaded an element at the expense of another.
Naturally, there are also indisputable limitations to this type of research, some of which have been revealed by this study. Linguistic questions have arisen, such as how texts preserved in different languages can be compared and whether the use of their translations into English is acceptable and meaningful or not. By comparing the analyses performed for the canonical Gospels in the English, Greek, and Coptic versions, we have attempted to show that the use of translations did not impair the comparability of these texts to such an extent that the results would not be relevant. Similarly, the question arises as to whether it is possible to compare texts differing in genre (narrative vs collections of sayings). In this case, we attempted to balance this difference in the genre of the texts by selecting only the statements of Jesus preserved in direct speech from all the Gospels. Since the author of the Gospel of John lets his theology speak to the reader through the words of Jesus, we believe that in this case the analysis was meaningful and justified. However, if we wanted to compare the theology of the Gospel of Thomas and the Synoptic Gospels solely on the basis of an analysis of Jesus' sayings contained therein, it would not be possible and would not lead to relevant results, as Jesus' statements in the Synoptics are not significant bearers of their theological concepts.
In conclusion, an important issue that we also pointed out concerns the informative value of comparing smaller text corpora, as, from a statistical point of view, the comparison of a larger volume of data naturally has much greater evidential value. On the other hand, the advantage of analyzing smaller text corpora lies in the controllability and comparability of the results of these quantitative analyses with the results obtained by close reading. This is the procedure we have chosen for this study, and it cannot be applied to cases in which the corpus analyzed is too large to be covered by close readingin such a case, it is necessary to rely solely on the results of these quantitative analyses without comparing them with the results of qualitative methods. To summarize, we can conclude that despite the aforementioned limitations, we can see that these quantitative text analyses can provide a relevant and useful complement to traditional close reading and to correct or control it in a replicable fashion.