Theme Enrichment Analysis: A Statistical Test for Identifying Significantly Enriched Themes in a List of Stories with an Application to the Star Trek Television Franchise

In this paper, we describe how the hypergeometric test can be used to determine whether a given theme of interest occurs in a storyset at a frequency more than would be expected by chance. By a storyset we mean simply a list of stories defined according to a common attribute (e.g., author, movement, period). The test works roughly as follows: Given a background storyset and a sub-storyset of interest, the test determines whether a given theme is over-represented in the sub-storyset, based on comparing the proportions of stories in the sub-storyset and background storyset featuring the theme. A storyset is said to be"enriched"for a theme with respect to a particular background storyset, when the theme is identified as being significantly over-represented by the test. Furthermore, we introduce here a toy dataset consisting of 280 manually themed Star Trek television franchise episodes. As a proof of concept, we use the hypergeometric test to analyze the Star Trek stories for enriched themes. The hypergeometric testing approach to theme enrichment analysis is implemented for the Star Trek thematic dataset in the R package stoRy. A related R Shiny web application can be found at https://github.com/theme-ontology/shiny-apps.


INTRODUCTION
A literary theme, or theme for short, is loosely defined as "An idea that recurs in or pervades a work of art or literature" (e.g., a story) 1 . Themes are often expressible in a single word or short phrase, as is illustrated by such garden-variety themes as "love", "loyalty", and "the lust for gold". These examples all happen to be value-neutral abstractions. But themes can just as well take the form of morally charged messages, such as "be wary of strangers" and "do not judge prematurely". The consummate story-maker usually takes pains to imply a theme indirectly, rather than state it explicitly. Sometimes the story-maker is even unconscious of important themes found in their stories. A typical story will enjoy multiple themes. In the present work, we distinguish between central themes (i.e. themes found to recur throughout a major part of a story or are otherwise important to its conclusion) and peripheral themes (i.e. briefly featured themes that The hypergeometric test is routinely used in bioinformatical analyses to identify over-represented biological terms in lists of genes (Boyle et al., 2004;Zheng and Wang, 2008;Huang et al., 2009). This differs from the corresponding state of affairs in literary studies. To our knowledge, the hypergeometric test is yet to be implemented in any of the textual analysis tools in common usage among humanities scholars, including the Stanford Topic Modeling Toolbox (Ramage et al., 2009), TAPoR 2 , TOME (Klein et al., 2015), Word Seer (Muralidharan et al., 2013), and Voyant Tools 3 . The same holds true of computer-assisted qualitative data analysis software that is sometimes used for document analysis in the social sciences 4 . ATLAS.ti 5 , NVivo 6 , and MAXQDA 7 are three such programs that allow users to manage, analyze, and visualize data related to text, audio, and video documents. However, none of these programs implement the said test at the time of writing.
In this paper, we champion the hypergeometric test as a statistically principled approach to theme enrichment analysis. To this end, we introduce a toy dataset consisting of a total of 280 themed Star Trek television series episodes. We recorded themes for each of the 80 episodes of Star Trek: The Original Series (TOS), 22 episodes of Star Trek: The Animated Series (TAS), and 178 episodes of Star Trek: The Next Generation (TNG); see Supplementary Note 1 for an overview of the Star Trek franchise. What is more, we hierarchically arranged the collected themes into a draft theme ontology. The Star Trek thematic dataset paves the way for a demonstration of how the hypergeometric testing approach to enriched theme identification is helpful for summing up what makes a storyset unique and for generating speculative hypotheses. We present the results of two case studies as a proof of concept. The first examines how the portrayal of the Klingons changed from that of a tyrannical and expansionist empire in TOS to an inward looking warrior culture in TNG. The second explores the most significantly enriched themes for each of TOS, TAS, and TNG. In short, we find that TOS stands out for its focus on the social issues of the day and by extension on what constitutes a just and flourishing society, TAS for novel sci-fi and fantasy concepts especially suited to an animated series, and TNG for its comparatively refined treatment of the human condition. Moreover, we show that our findings compare favorably with those obtained using the standard term frequency-inverse document frequency (TF-IDF) approach to enrichment analysis (Salton et al., 1975), which is commonly implemented in textual analysis software packages. More specifically, we find that the hypergeometric testing and TF-IDF approaches to enrichment analysis, while agreeing in broad outlines, exhibit significant differences in terms of the top themes they identify as being enriched. This is consequential because the hypergeometric test is, by its very definition, the standard according to which heuristics (i.e.TF-IDF) ought to be evaluated when it comes to answering the question of whether a theme is enriched in a storyset of interest.
The rest of the paper is organized as follows: In Section 2 we introduce our aforementioned draft theme ontology. It is a hierarchically organized theme vocabulary, partitioned into the following four domains: the human condition, society, the pursuit of knowledge, and alternate reality. There are 1535 unique themes in total. Criteria and guidelines motivating our hierarchical arrangement are discussed. In Section 3 we explain the hypergeometric testing approach to theme enrichment analysis in full technical detail. In Section 4 we use the hypergeometric test to identify enriched themes in the two Star Trek case studies outlined above. These results we compare with those obtained using the standard TF-IDF approach to enrichment analysis. We conclude the paper in Section 5 with a summary of our main contributions, a discussion of some limitations of our methodology, and go on to describe a handful of possible future works. Most notably, in terms of limitations, we emphasize that we manually tagged stories with themes, and as a consequence the findings we report inevitably reflect our point of view, and are not fully replicable. An overview of Star Trek film and television series franchise is found in Supplementary Note 1. A list of episodes used in the Klingon theme enrichment case study is found in Supplementary Note 2. The results of the TF-IDF approach to theme enrichment analysis are presented in Supplementary Note 3. The theme enrichment analysis procedure based on the hypergeometric test is implemented in the R package stoRy (version 0.1.1) (Sheridan and Onsjö, 2017), released through CRAN 8 . The Star Trek thematic data is included in the package. A related R Shiny web application is available for download at the Theme Ontology 9 GitHub repository at https://github.com/theme-ontology/shiny-apps.

A THEME ONTOLOGY
A theme ontology is a controlled vocabulary of defined terms representing literary themes in fiction. In this section, we introduce a draft theme ontology covering all TOS, TAS, and TNG Star Trek television series episodes, of which there are 280. The ontology consists of 1535 unique themes arranged into the following four domains: The Human Condition [ ]: Themes pertaining to "characteristics, key events, and situations which compose the essentials of human existence, such as birth, growth, emotionality, aspiration, conflict, and mortality" 10 . Society [ ]: Themes pertaining to a "community of people living in a particular country or region and having shared customs, laws, and organizations" 11 . The Pursuit of Knowledge [ ]: Themes pertaining to "facts, information, and skills acquired through experience or education; the theoretical or practical understanding of a subject" 12 .
Alternate Reality [ ]: Themes related to subject matter falling outside of reality as it is presently understood. These are classical science fiction 13 and fantasy 14 themes. Figure 1 shows a bird's eye view of the ontology. The abstract theme "literary thematic entity" is taken as root theme. Each domain is structured as a tree descended from the root with "the human condition", "society", "the pursuit of knowledge", and "alternate reality" serving as the top themes of their respective domains. Each child theme is made to bare a subtype relationship with its parent. In the figure, the ontology tree structure is depicted to a height of three levels, although in reality it branches out a number levels further still, as summarized along with other information in Table 1.
In designing the ontology, we strive to define sibling themes so as to be mutually exclusive, but not necessarily jointly exhaustive. All non-root themes are accompanied with short definitions in an effort to make their range of applicability plain. We appeal to the principle of refutability as an anodyne to vagueness in definition writing. In other words, an acceptably defined theme will be such that there is the possibility of appealing to the definition to show that the associated theme is not featured in a story. Take "to tell the truth vs. offering a comforting lie" as an example, which is defined as "A charadcter must choose between telling a comforting white lie on one hand, and being honest on the other.". We contend that definition writing of this sort helps to bring the conversation of whether a theme is featured in a given story into the realm of rational argumentation. The theme ontology data has been made accessible in a structured manner through the R package stoRy (Sheridan and Onsjö, 2017). This paper uses version 0.1.1 of the ontology, which can be accessed through the like versioned stoRy package 0.1.1. Functions for exploring the ontology are described in the package reference manual. For example, the command theme$print() prints summary information for the theme object theme, and the function print tree takes a theme object as input and prints the corresponding theme together with its descendants in tree format to the console. We encourage non-R-users to explore the latest version of the ontology on the Theme Ontology website 15 . Pervious versions of the ontology have been made available for download there.
It is crucial to bear in mind that the theme ontology is wholly separable from the Star Trek television series franchise. Many of the themes populating the ontology are sufficiently universal in scope to be helpful in the recording of themes for of almost any work of fiction. That said, we availed ourselves of these themes in the recording of themes for 280 Star Trek television series episodes. Table 2 shows a basic statistical summary of the thematic data. We assigned central and peripheral themes for each episode. Note that centrality and peripherality are to be understood not as properties of a theme, but rather as relations between a theme and a story. Thus, it is perfectly fine for a given theme to be central to one story and peripheral to another. Note also that the ancestors of the themes featured in episodes are not counted in Table 2. This means, for example, if "human aspiration" is featured in an episode, then "purpose in life" and "the human condition" are not counted in the table, because they are both ancestors of "human aspiration". Although, "purpose in life" and "the human condition" are both technically themes of the episode. Ancestor themes are omitted from the counts merely in the name of parsimony.
Take the TOS episode The Devil in the Dark (1967) as an example. In the story, the starship USS Enterprise is dispatched to investigate rumors of a subterranean creature that is thought to be responsible for the destruction of equipment and the deaths of fifty men on the Janus VI mining colony. Captain James T. Kirk and First Officer Spock discover a hideous "silicon-based life form" inhabiting the surrounding bedrock. The mother Horta, as the creature comes to be known, is an "endangered species" -the last of its kind. Kirk is faced with a "tough decision" as the creature seemingly blocks the miners path to wealth: either commit genocide or forego plundering the mother Horta's natural resources. But Spock fortuitously manages to achieve a "cross cultural understanding" with the creature by means of a Vulcan mind-meld. An unsettling compromise is reached when the mother Horta agrees to help the miners locate ore deposits in the rock in exchange for a cessation of hostilities. The attentive viewer will note, from the creature's perspective, an "attack from outer space by a powerful conquering alien race" (a.k.a. "alien invasion") with overtones of "the morality of colonization" by the episode's conclusion. This summarizes some of the more salient central story themes, as recorded by the authors. The proverb "beauty is in the eye of the beholder" is a noteworthy peripheral theme. Indeed, Spock learned, in the course of the mind-meld, that the human form is just as repellant to the mother Horta, as her appearance is to humans.
We themed each of the 280 episodes of TOS, TAS, and TNG in a similar manner. The process according to which we assigned themes can be summed up as follows. We independently tagged episodes with themes and then compared notes with a view toward building a consensus set of themes for each episode. We aimed to abide in the principle of low-hanging fruit in the compilation of consensus themes. In the present context, this means we tried to ensure that at least the most salient topics featured in the episodes are covered by appropriate themes. Another principle guiding our thought process is the minimization of false positives (i.e. the tagging of episodes with themes that are not featured) at the expense of tolerating false negatives (i.e. neglecting to tag episodes with themes that they feature). This strategy amounts to erring on the side of caution. We fully acknowledge that this process lacks safeguards against the tagging of stories with themes that are idiosyncratic and unique to our point of view. We will return to this subject in the discussion section.

THEME ENRICHMENT ANALYSIS
This section is devoted to an exposition on the hypergeometric testing approach to theme enrichment analysis. The test uses the p-value obtained from the hypergeometric cumulative distribution to answer the question of whether a given theme occurs in a test storyset at a frequency significantly greater than would be expected by chance alone. In the equation, n is the size of the test storyset, k is the number of stories in the test storyset featuring the theme, N is the size of the background storyset, and K is the number of stories in the background storyset featuring the theme. The K background stories featuring the theme determine what we will call the theme storyset. Figure 2 depicts the testing framework in Venn diagrammatic form. The value of P (k, n, K, N ) is the probability of observing at least k stories featuring a given theme in a test storyset of size n that is composed of stories drawn at random, without replacement, from the background storyset. As we explained in the introduction, a theme is deemed to be enriched in a test storyset with respect to a background storyset if its p-value is less than a preselected significance level, alpha. Whether the test storyset is found to be enriched for a given theme or not will necessarily depend on the choice of background storyset.
The hypergeometric testing procedure is silent on the matter of theme, test storyset, and background storyset selection. It falls on the investigator to first pose an interesting question, and then choose the  Figure 2. Hypergeometric testing framework overview for testing whether a given theme x is enriched in a test storyset relative to a background storyset. The various storysets and variables n, k, K, and N are defined in the main text.
hypergeometric test input storysets accordingly. In the previous section, we proposed that the TOS episode The Devil in the Dark (1967) featured an "alien invasion". Indeed, the theme has long been a favorite among sci-fi writers. More generally, the sci-fi genre has proven to be particularly well-suited to the exploration of "existential risks" (i.e. manners by which civilization on a planetary scale and beyond could come into jeopardy), of which "alien invasion" is but one, albeit far-fetched, example. It is easy to imagine a fictitious investigator who wishes to assess whether "existential risk" (theme) is enriched in TOS (test storyset) with respect to all of TOS, TAS, and TNG (background storyset). This casts the question "Does TOS stand out among Gene Roddenberry produced Star Trek television series for its featuring of existential risks?" in precise enough terms to be made amenable to theme enrichment analysis. As we will see in Section 4.2, the test results in a p-value of 0.0002, so that we may conclude the theme "existential risk" is enriched in TOS at significance level alpha equal to 0.01. In practice, the curious investigator will often wish to check whether each theme in an ontology is significantly enriched in a given test storyset relative to some background storyset. This raises the specter of multiple comparisons (Noble, 2009;Meijer and Goeman, 2016;McDonald, 2014). To illustrate the matter, recall that the p-value in an hypothesis test is the probability of getting a result at least as extreme as the observed one, assuming the null hypothesis is true. A p-value calculated to be less than the investigator's desired significance level, alpha, is interpreted as evidence in favor of the alternative hypothesis. The value of alpha sets an upper limit on the chance of making a false positive discovery that the investigator is willing to tolerate. Consider, for example, the classroom problem of testing whether a given coin is fair or biased. The conventional null and alternative hypotheses are that the coin is fair and biased, respectively. Choosing an alpha of 0.05 translates into a 1 in 20 chance of concluding that the hypothetical coin in question is biased when it is actually fair (i.e. the null hypothesis is rejected when it is true). In the case when a large number hypothesis tests are conducted, chance dictates that some definite proportion of the p-values obtained will be less than alpha, even when the truth of all the null hypotheses is assured. For example, suppose 1535 fair coins are individually tested in the above manner. This is as many coins as there are themes in our ontology. If all 1535 coins are tested at an alpha of 0.05, then the expected number of fair coins mistakenly identified as biased ones works out to be 0.05 × 1535 ≈ 77. The aim of multiple comparisons correction procedures is to limit the number of false positives. The Bonferroni correction and the false discovery rate (Benjamini and Hochberg, 1995;Benjamini and Yekutieli, 2001) are the two most common procedures. However, it is inappropriate to correct for multiple comparisons in the context of testing for multiple enriched themes. This is because a given theme is either over-represented in a test storyset relative to a background or it is not. If a theme is significantly enriched at a given significance level, then it is a true positive by definition. Therefore it would be misguided to apply the sorts of standard correction procedures that are used in biology in the context of testing list of genes for enriched terms. For this reason, we refrain from correcting the p-values calculated in this paper for multiple comparisons.

A STUDY OF ENRICHED THEMES IN STAR TREK
In this section, we present two cases studies of theme enrichment analysis applied to the Star Trek television franchise. As we explained in the introduction, the first examines how the Klingons changed from a tyrannical and expansionist empire in TOS to an inward looking warrior culture in TNG. The second, also described in the introduction, takes an in depth look into significantly enriched themes by series. In short, we find that TOS stands out for its treatment of what constitutes a good society and how is one to lead a good life within it, TAS for novel alien and sci-fi concepts appropriate to an animated series, and TNG for its comparatively refined treatment of the human condition. The message we wish to convey based on these case studies is that theme enrichment analysis stands to be helpful in not only for summarizing what makes a storyset unique, but also in regard to the formulation of speculative hypotheses. Note that we include both central and peripheral themes in the analyses. In addition, we present the reassuring results of some negative control experiments, and compare the TF-IDF approach to enrichment with the results of our case studies. The theme enrichment analyses are easily replicated using version 0.1.1 of the stoRy package.

A Tale of Two Klingons
The Klingons are anüber-belligerent humanoid species in the Star Trek alien pantheon. In TOS, the Klingon Empire is made to pursue a harsh imperialist foreign policy, characterized by the unapologetic use of military force in the subjugation of their weaker neighbors. The Federation, by contrast, is portrayed as a group of confederated alien races, united under the common principles of human rights, equality, and interstellar cooperation. In the series, the Federation fights to check Klingon expansion in the galaxy. The conflict between the Federation and the Klingon Empire in TOS is commonly understood as an allegory for the Cold War (Cantor, 2000). According to this interpretation, the Federation represents the Western powers, and the Klingon Empire represents the Soviet Union; both as seen from a more or less contemporaneous American point of view. But by TNG, Klingon society had undergone a radical transformation. The Klingons changed from being an avowed Federation enemy hell-bent on galactic domination, to a loose Federation ally with a warrior culture preoccupied with internal struggle and the maintaining of cherished traditions in a changing world (Knight and Smith, 1998). The changing face Klingon society has been examined in much detail in books (Taylor, 2002;Gonzalez, 2015;Telotte, 2008), academic papers (Knight and Smith, 1998;Cantor, 2000), and scattered online sources 16 17 18 .
In this case study, we use the Klingons as a positive control to demonstrate that our approach to theme enrichment analysis is able to distinguish the imperialist Klingons of TOS from the warrior culture ones of TNG. To this end, we curated a storyset consisting of all 26 Klingon-centric TOS, TAS, and TNG episodes; see Supplementary Note 2 for a complete episode listing. The criteria for episode inclusion is that the Klingons were deemed by the authors to have been featured throughout the episode in a way that is central to the story. We performed two theme enrichment analyses: 1) the test storyset of TOS/TAS Klingon-centric episodes against a background of all TOS/TAS episodes, 2) the test storyset of TNG Klingon-centric episodes against a background of all TNG episodes. The TOS and TAS episodes we pooled because TAS is conventionally considered to be a continuation of TOS. The TOS/TAS and TNG test storysets consist of n = 8 and n = 18 episodes, respectively. In each experiment, we calculated an enrichment score, that is, a hypergeometric test p-value, for each of the 1535 themes present in the ontology. Table 3 contains the top 20 most enriched themes for each analysis. An inspection of the table shows that our positive control test results are interpretable in a manner that is in keeping with expectation. Consider first those themes from the society domain. TOS/TAS Klingon "imperialistic society" posed a serious military threat to the Federation. When an inevitable "transnational social issue" flared up between the Federation and Klingon Empire, such as a "conflict over a shared resource", the resolution usually came about by either "diplomatic negotiating" or outright "war". Although in rare instances, Klingons and Federation members came to a "cross cultural understanding" when united by a common enemy. But TOS/TAS Klingon society had at its heart a "conflict of moral codes" with Federation ideology that proved insuperable. Where TOS/TAS Klingon society is enterprising and enthusiastic in its convictions, TNG Klingon society is inward looking and gloomy. The society themes enriched in TNG Klingon episodes pertain to internal conflicts, as evidenced by the themes "racism in society", "religious fanaticism", and "war of succession". No longer is the Klingon Empire striving to impose Klingon values on the galaxy by means of military force, but rather it is focused on its own internal affairs. This brings us to the human condition. In TOS/TAS, the human condition domain themes are almost all virtues possessed by the aliens that the Klingons conquered (i.e. "pacifism", "humility", "patience", and "temperance"). On the other hand, the human condition domain themes enriched in TNG (i.e. "honor", "rage", and "loyalty") are all signature TNG Klingon characteristics. Notice, by contrast, that "honor" is nowhere to be found among the top 20 significantly enriched TOS/TAS Klingon themes. In fact, it occurs as the 101st ranked theme with a p-value of 0.219. What is more, in the one episode in which it was featured -Friday's Child (1967) -it pertained not to the Klingons, but to their enemy the Capellans. A number of the human condition themes pertain to Worf trying to maintain his Klingon culture in a Human world. Most importantly "the need for cultural heritage" and "belonging". Other human condition themes surround aspects of life in a cut-throat warrior culture, like "the lust for power" and "the desire for redemption". Finally, it is pleasant to note that Klingon "über-belligerence" and a passion for "the art of war" shine through in both cases.

A Tale of Three Series
We identified enriched themes in each of TOS, TAS, and TNG. The background storyset in each case consists of the episodes from all the series combined. Here we report the outcomes of the analyses and show how they can aid in the generation of speculative hypotheses. The main results are shown in Table 4. Star Trek enthusiasts will find few surprises in the kinds of themes that are shown to distinguish the respective series. To the layperson, however, the results of Table 4 may come as unexpected serve as a useful point of departure for exploring the series. The stacked percentage bar plots of Figure 3 show a broad pattern of human condition domain themes being enriched in TNG, alternate reality domain themes in TAS, and society domain themes in TOS to some degree. The associated matrix scatterplot hints at some interesting enriched theme domain correlations between series. But let us proceed to inspect and compare more specific themes in order to gain a more nuanced understanding of the series.

TOS:
The two society domain themes "female stereotype" and "gender issues" stand out as they relate to the role of women in 1960s society. The former is indicative of what Karen Blair and R. P. M. have described as a tendency in TOS to portray females in such a manner as to "affirm traditional male fantasies in a most direct and unenlightened way" (Blair and R. M. P., 1983). The latter, however, is in keeping with another line of scholarly thought that contends TOS made some positive contributions to the advancement of women in society (Ferguson et al., 1997;Vettel-Becker, 2014). Three of the seven enriched alternate reality domain themes (i.e. "alternate society", "existential risk", and "man-made existential risk") confront viewers with ideas about how society could be changed for better or worse. In particular, the emphasis on existential risks is likely a reflection of the Cold War and relatively fresh memories of the Second World War (Cantor, 2000). In light of this, it is interesting to note that five of the most enriched human condition domain themes (i.e. "wrath", "facing a fight to the death", "rage", "unpleasant emotion", and "disagreeable characteristic") are also be closely tied to conflict. The remaining human condition domain themes (i.e. "way of life", "purpose in life", "personal ethical dilemma", "tough decision", and "the need for a challenge in life") pertain to life choices and decision making. Speculating about why this might be a feature of TOS relative to the later series is left to the reader. Suffice it to say that they, like all the most enriched human condition domain themes in TOS, are notably different from the most enriched themes in TNG.
TAS: Seven of the ten most enriched themes can be labeled simply as fanciful notions. Four are typical sci-fi themes: "earth-life inspired life form", "life-support belt", "miscellaneous life form", and "what if my life were different". The remaining two themes "Chariots of the Gods" and "Atlantis" refer to the "crackpot theories" that aliens made contact with humans in ancient times and Atlantis was a civilization with advanced technology, respectively. That such themes are enriched in TAS can be explained by the fact that it is the only animated series of the trio. This would have released the authors from constraints otherwise imposed by the need for costly props and special effects (see Table 4 for some examples) and allowed them to further unleash their imaginations. We hypothesize the lack of emotion-related themes may be partially explained by early animation technology's inability to approach the nuances of facial expression and body language that the consummate actors of TOS and TNG, such as William Shatner and Patrick Stewart, would routinely employed.
TNG: Nearly all of the 20 most enriched TNG themes relate to individual human experience. About half of them are descendants of the theme "family affairs": "familial love", "growing up", "mother and son", "maternal love", "adolescence", "familial relations", "father and son", "paternal love", and "child rearing". The others are not dissimilar: "human emotion", "heavenly virtue", "human personality", "social interaction", "pride", "belonging", and "introspection". Anyone familiar with the android Data will recognize that the themes "android" and "AI point of view" relate to stories about individual human experience as well. One theme that stands out for not falling under the human condition is "virtual reality room", a particular sci-fi concept that refers to the holodeck in TNG, which has become something of a meme in its own right. Why it is that TNG so distinctly features these family affair, relationship, and emotional themes is of course open to interpretation. We speculatively hypothesize that an inexorable trend in modern television has been towards vapid character development designed to evoke safe familiarity rather than intellectual stimulus or moral controversy. Be that as it may, it is safe to say that the main characters in TNG have more elaborate background stories, subtle personality traits, and complicated interpersonal relationships than the main characters of the two earlier series.

Negative Control Experiments
We performed a series of negative control experiments. In one such experiment, we performed enrichment analyses for 1000 test storysets consisting of n = 8 randomly selected TOS/TAS episodes against the same background. This mirrors the TOS/TAS Klingon positive control settings. The average number of significantly enriched themes was 10±5 at significance level α = 0.05. The corresponding Klingon storyset has 28 significantly enriched themes at the same significance level. This is noticeably more enriched themes on average than would be expected by chance. In a similar negative control with n = 18 for the test storyset, we found the average number of significantly enriched themes to be 13 ± 5 relative to a background of all TNG episodes. The corresponding Klingon storyset has 22 significantly enriched themes. Again this is more than would be expected to be enriched by chance.

Hypergeometric Test and TF-IDF Comparison
TF-IDF is a statistic that is used in data mining to measure the importance of a word in a document in a collection of documents (Rajaraman and Ullman, 2011). It is implemented in such textual analysis tools as the Stanford Topic Modeling Toolbox (Ramage et al., 2009), TAPoR 19 , TOME (Klein et al., 2015), Word Seer (Muralidharan et al., 2013), and Voyant Tools 20 . In this subsection, we take a TF-IDF approach to the identification of enriched themes in the Klingon and series case studies, and compare the results with those obtained using the hypergeometric test. We implemented the TF-IDF formula −k/n × log(K/N ), where k/n is the term frequency, and − log(K/N ) is the logarithmically scaled inverse document frequency. Table S2 contains the top 20 TF-IDF scoring themes in TOS/TAS Klingon and TNG Klingon episodes. Table S3 contains the likewise results for the individual series TOS, TAS, and TNG. The results significantly overlap with those obtained using the hypergeometric test. For the Klingons of TOS/TAS a remarkable 20 out of the top 20 themes are held in common (it turns out to be 38 out of the top 50), and for the Klingons of TNG the figure amounts to 14 out of 20. The numbers for the individual series TOS, TAS, and TNG are 12/20, 6/10, and 8/20, respectively. What is more, the scatterplots of Figure S2 show broad correlation between TF-IDF scores and hypergeometric test logarithmically scaled p-values. What we take from this is that TF-IDF and the hypergeometric test approaches agree in their broad outlines, but differ in the details. We content that the hypergeometric test is to be preferred over TF-IDF when it comes to the identification of enriched themes on two grounds: 1) the hypergeometric test statistically rigorous whereas TF-IDF is a heuristic, and 2) the p-values obtained from the hypergeometric test are more amenable to interpretation than TF-IDF scores. Figure 3. Star Trek television series theme enrichment scatterplot matrix. Theme enrichment p-value stacked percentage bar plots for TOS, TAS, and TNG are plotted along the diagonal from top left to bottom right. For each series, the p-values are divided into quartiles from left to right, and each stacked bar indicated the percentage of themes from each domain. Theme enrichment p-value scores are plotted against one another for each pair of series in the off-diagonal panels. Circle size is proportional to the square root of the negative of the logarithm of the product of p-values pairs. Color corresponds to theme domain: the human condition (red), society (green), the pursuit of knowledge (blue), alternate reality (yellow).

DISCUSSION
The primary aim of this paper has been to introduce the hypergoemetric test for theme enrichment analysis to the digital humanities community. We consider our proposed draft theme ontology and toy Star Trek thematic dataset to be contributions of a secondary nature. The hypergeometric testing approach to the identification of enriched themes in a list of stories equips the digital humanist with a new weapon to wield in their document analyses. We have demonstrated the potential of the hypergeometric test by applying it to Star Trek television franchise thematic data. In the first place, we found the lists of enriched themes produced by the test to be helpful for identifying what makes the corresponding storysets special and for generating speculative hypotheses. In the second place, we argued that the hypergeometric test is be to preferred over the commonly used TF-IDF heuristic when it comes to answering the question of whether such-and-such a theme (or any term for that matter) occurs in a storyset at a frequency significantly greater than would be expected by chance. But we would be remiss not to touch on potential for "garbage in, garbage out" to bias and confound theme enrichment analyses. In other words, the result of a theme enrichment analysis using the hypergeometric test is at best as good as the themes on which it is based. We, therefore, stress the preliminary nature of our work in applying the hypergeometric test to thematic data. All that said, it is our hope that this work will contribute to the judicious use of the hypergeometric test in the digital humanities in the fullness of time.
Two main obstacles stand in the way of making theme enrichment analysis practical on a large-scale. First, a protocol for theming stories in such a manner that they can be meaningfully compared in terms of their shared themes must be developed. We have taken a first step toward addressing this need by proposing a draft theme ontology. Moving forward, we aim to make the ontology compliant with Basic Formal Ontology design best practices (Arp et al., 2015). This includes the incorporating of related ontologies such as the Emotion Ontology (Hastings et al., 2011) to name but one. Ontology design is an open ended process, subject to setbacks and changes of direction. It is plain that our draft theme ontology will be no exception. However, we point out that even if the structure of the ontology changes markedly, many of the themes will remain intact as presently defined. Second, a database of compatibly themed stories numbering in the thousands to millions is required. To this end, we have launched the Theme Ontology (beta version) online community platform 21 . The website features an ever-expanding controlled vocabulary of defined themes, hierarchically arranged into our draft theme ontology. Community members are encouraged to tag whatever stories (e.g., short stories, novels, films, TV shows, etc.) they please with themes drawn from the ontology, and adorn the ontology with newly coined themes as necessary. Stories are manually tagged with themes at present. Topic modeling techniques (Blei, 2012) as implemented in such software packages as MALLET 22 , the Python module gensim (Řehůřek and Sojka, 2010), topicmodels (Grün and Hornik, 2011), and the R package tm (Feinerer et al., 2008) have been used successfully to identify literary themes in text copora (Jockers, 2013;Jockers and Mimno, 2013;Goldstone and Underwood, 2014;Boyd-Graber et al., 2017). In the future, we plan to use topic modeling to automatically collect themes for large numbers of stories in order to grow the Theme Ontology database. An interesting challenge awaits in figuring out how to adapt the current methods for automatic topic labeling (Lau et al., 2011;Basave et al., 2014;Bhatia et al., 2017) to the problem of mapping identified topics to themes in the ontology. Lastly, a theme enrichment analyzer web application is available for download at the Theme Ontology GitHub repository at https://github.com/theme-ontology/shiny-apps. Tools from the the stoRy package, including our theme enrichment test, will be made accessible as web applications on the Theme Ontology website in order to help users analyze curated thematic datasets. It is our aim to build up a large-scale database of freely available themed stories that can be analyzed using web applications via this story theming system.
On the identification of themes in stories, something must be said. We emphasize that our tagging of Star Trek television series episodes with themes in this paper was manual, subjective to our point of view, and not fully replicable. In general, theme identification is admittedly subjective, but this is not to say the endeavor is altogether arbitrary. The Oxford Dictionaries definition of theme quoted in the introduction, which proves adequate for most literary critical purposes, is problematic from the present point of view insofar as it designates a theme to be a property of a story. Instead, we propose to consider a theme as a relation between story and partaker thereof. According to this subjective view, it is entirely possible for partaker A to contend that theme X is featured in story Y, but not partaker B. Objectivity is approached to the extent that universal agreement among story partakers is attained. The potential for a system of this kind to degenerate into a wasteland of subjectivity depends on the extent to which themes can be made precise. At the Theme Ontology community platform, we are presently drafting a policies and guidelines document that will emphasize the need for clarity in and verifiability of theme definitions. The theme "the desire for vengeance", which is defined as "A character seeks vengeance over a perceived injury or wrong.", constitutes a model definition. Growing pains are inevitable. But by concentrating on cataloging precisely defined and verifiable themes, i.e., the low-hanging fruit, we hope to ensure that Theme Ontology becomes a useful literary studies resource in the future.
There are a few points to mention in closing. First, the toy thematic dataset we have introduced in the present work consists of the combined 280 episodes of TOS, TAS, and TNG. We contend this sufficed for our purpose of demonstrating the value of theme enrichment analysis. But there entire Star Trek franchise is made up of 740 episodes and films. A good future work would be to round out the toy dataset with themed episodes from all the other series along with the films. Second, the hypergeometric test is designed to answer the question: what themes in a test storyset of interest stand out against a background storyset. But other interesting literary questions are also amenable to statistical investigation. For example, the investigator may wish to discover a subset of stories in a storyset that have similar themes by performing a clustering analysis. Another example is time-series analysis for the study of how theme usage changes over time in a storyset with timestamped stories. In the future, the stoRy package should be extended to include statistical methods to address questions of these sorts. Lastly, computer-assisted qualitative data analysis software is sometimes used for document analysis in the social sciences. ATLAS.ti 23 , NVivo 24 , and MAXQDA 25 are three such programs that allow users to manage, analyze, and visualize data related to text, audio, and video documents. It is possible to use these sorts of programs to annotate stories with themes and subsequently explore the correlations among them. However, to our knowledge, none of these programs implement the hypergeometric test. The same thing holds for the popular textual analysis tools the Stanford Topic Modeling Toolbox (Ramage et al., 2009), TAPoR 26 , TOME (Klein et al., 2015), Word Seer (Muralidharan et al., 2013), and Voyant Tools 27 . In the future, it could be profitable to augment these programs with a hypergeometric test function for term enrichment.