Measuring Modernist Novelty

That these two statements about change and innovation have been so frequently cited reveals a contradiction in modernist studies: novelty is commonplace, both for modernist writers and for literary critics. Novelty has become so firmly fixed as a central framing narrative for literary modernism that the two terms—“novelty”, “modernism”—are often treated as synonymous. The period (roughly, 1890-1945) has been defined by its capacities for rupture and the rejection of tradition: a “paradigmatic shift, a major revolt;” ” a ruthless break with any or all preceding historical conditions [...] characterized by a never-ending process of internal ruptures and fragmentations within itself;” “a powerful vortex of historical conditions that coalesce to produce sharp ruptures

from the past […] shattering change. " 2 These revolts, breaks, and ruptures coalesced into the definitive sense of newness that would come to define the age for modernists and modernist literary critics alike, producing an exhaustive, exhausting commitment to newness in all its forms: The new spirit, the new form, the new reality, the new object, new facts, new organisms, new life, new values, a new world, new cosmos, new society, new culture or civilization, new era or epoch, new time and new age, to which the new consciousness and enhanced susceptibility would respond with a new sensitivity, a new corporeality, a new psychology aroused by new sensations; and these would give rise, naturally, to a new aesthetics, a new beauty -a new realism and a new archaism, enforced by a new rhythm-arising from new means and new methods: a new art, architecture, plastic expression, poetry, literature, new words and new language, music, drama, new optic realms, a new vision, a new. . . . The alacrity required to dispel the old order was audacious and impulsive, but it could always be called New. 3 Then as now, the "new" means everything and nothing, as Jed Rasula's catalog aptly demonstrates. Novelty is so capacious a concept that it is a challenge to operationalize and measure-the very processes that would provide a deeper understanding of both the concept and its significance in modernism, literary history, and beyond. In this essay, we propose a method to measure one type of literary novelty-not distinct to, but emblematic of, high modernism: intratextuality. Situating modernism's literary innovations in the context of contemporaneous linguistic and scientific theory, Michael Levenson coined the term "intratextuality" to describe modernism's "aesthetic of composites" in his essay "Novelty, Modernity, Adjacency. 4 In keeping with Levenson's scientifically-inflected definition of modernist novelty, we derive our methods from computational biology and metagenomics, applying a Bloom Filter to the study of literary texts.
"Novelty" is a concept that has fascinated scientists, philosophers, and mathematicians as much as literary critics, as Michael North has shown; it is a fieldspecific term, taking on different meanings and tonal registers in different disciplines. 5 While literary critics such as Rasula may treat novelty as "new words and a new language, " metagenomicists, for instance, understand novelty as a genetic sidered at length in his seminal essay, "Tradition and the Individual Talent": "The existing monuments form an ideal order among themselves, which is modified by the introduction of the new (the really new) work of art among them… for order to persist after the supervention of novelty, the whole existing order must be, if ever so slightly, altered; and so the relations, proportions, values of each work of art toward the whole are readjusted. " 7 For Eliot, novelty's comparative nature is also a problem of scale, in which the context of the "whole existing order" is both ever-shifting and ever-expanding. Eliot's "new (the really new)" would be methodologically impossible to measure.
In his essay, "Novelty, Modernity, Adjacency, " Michael Levenson advances a heuristic for understanding novelty-in-context: intratextuality. Instead of the revolutionary novel, Levenson proposes intratextuality as model of the relationally novel, developed in response to both Bertram Russell's logical atomism and Saussurean linguistics. Levenson writes, "Intratextuality designates all those relations threaded within the boundaries of the artifact itself, all the mirroring and summoning of one piece of text by another… the tonal play that aligns distant elements without building narrative connections…The persistent impulse to say it again as said before, to repeat and resume, to gather up the foregoing words of the text, to have them melt into one another… assemble [d] into an echo chamber of mutual relations. " 8 The result is an "aesthetic of composites, multivoiced, polyphonic compounds, " representative of the usual high modernist suspects -"The Waste Land, " Ulysses, montage, Cubism. Modernism's inheritance, Levenson argues, is the "inveterate relation of novelty and context, " made particularly apparent by moments of intratextuality. Intratextual novelty is not an absolute novelty, in which individual work readjusts an Eliotic "whole existing order, " but rather, novelty contained within a text-the means by which an individual text might evince novelty within itself. Delimiting novelty's context to the boundaries of a text provides a much more stable basis for sorting the "new" from the "not new, " and an eminently more measurable scale. It is here that we begin, adopting Levenson's concept of intratextuality as a means of operationalizing novelty as a high modernist form.
To identify moments of novelty within the context of the familiar, we turn to bioinformatics, a field that poses similar questions about novelty vs. familiarity -not about literature, but about genetics, studying the progression of genetic se- quences for previously unseen materials. 9 Novelty, in this context, means that a particular DNA sequence either deviates subtly from an earlier found sample or has never been seen before. In order to identify novelty within samples, metagenomicists employ a probabilistic, memory-efficient data structure called the Bloom Filter, which trades an acceptable amount of error for enough data capacity to work at scale. 10 Simply put, the Bloom Filter determines whether or not an element is a part of a set, allowing scientists to quickly identify moments of genetic variance for further study. 11 When applied to literature, the Bloom Filter makes it possible to computationally identify moments of heightened or diminished newness within a text ("has this string of text been seen before?"), as well as the degree of internal variation or novelty that occurs across the text as a whole ("how many times?"). Instead of an externally defined comparison ("the whole existing order"), the Bloom Filter analyzes character-sequence novelty within a text itself, providing us a measurement of intratextual novelty.
The Filter scans the entirety of a text, tracking the repetition of language at incredibly small, precise intervals called k-mers; in our case, a fixed 12-character window called a 12-mer. The Bloom Filter evaluates the text one k-mer at a time, starting at the beginning then advancing one character, such that 12-mer 1 of the English alphabet would be ABCDEFGHIJKL, 12-mer 2 would be BCDEFGHI-JKLM, and the third 12-mer would be CDEFGHIJKLMN, and so on. 12 The Filter develops a growing database of k-mers, and each new k-mer is assessed for its presence (novelty score of 0, i.e., already in the database, therefore: not new) or absence (novelty score of 1, i.e., not in the database, therefore: new) in the 9 Indeed, as Alberto Piazza reflects in his Afterward to Franco Moretti's Graphs, Maps, and Trees, literary theorists and biologists share a great many metaphors: Scientists and philosophers alike rely on the alphabetic sequence and, even, the metaphor of poetic inspiration to describe complex, evolutionary problems; DNA is translated and transcribed, not to mention written, copied, and, increasingly, edited. 12 Punctuation and spaces are included in each k-mer, which are encoded according to their ASCII value using a modulo-32 operation to reduce the number of bits required to encode each character from 8 to 5. database. The Bloom Filter is a data structure that achieves a workable efficiency at the cost of the occasional "hash collision, " when the filter stores two pieces of information in the same spot, indistinguishably. So, the Bloom Filter will return false positives (identifying an unknown sequence as known), but never false negatives (identifying a known sequence as unknown). That is to say, while the Bloom Filter might erroneously mislabel a segment of text as "not novel, " it never erroneously labels a text as "novel. " Our novelty scores, therefore, are quite trustworthy, if erring on the conservative side. A text will remain 100% novel until the point at which a k-mer is repeated at any point in the text. 13 Consider this scan of Gertrude Stein's "A rose is a rose is a rose. " (Table 1.) For ease of illustration, spaces are represented as underscores. While segments of Stein's sentence repeat -"rose"-only a perfect match of a k-mer is scored a 0. This example is illustrative of the importance of character sequence and order to the Bloom Filter. Though Stein repeats only three words, the word-sequence and punctuation are vital to the Filter's understanding of the text's relative novelty. If more variation were to enter this sequence, the filter would register novelty at the point in which the variable character was introduced.
As this example demonstrates, approximately 59% of Stein's sentence is novel, at the level of character sequence. But imagine this scan occurring at the scale of the chapter, or the text. While variations of "a rose is" may not appear often, certainly character names ("Alice_B_To"), dialogue tags ("_she_said. "), or Stein's characteristic repetition would cause her writing to register as less novel as the text progresses. To a Stein scholar, this finding may seem counter-intuitive; Stein's repetition is precisely what makes her work so innovative. However, because the Bloom Filter's assessment of character-sequence novelty ultimately registers patterns of relative newness and not-newness over the course of a text, atypical repetition like Stein's can be reclaimed as innovation in its own right, bringing a level of nuance and sophistication to our understanding of novelty beyond a simple proportion of unique words. By registering novelty at the level of the k-mer and the character, the Bloom Filter attends more closely to the stylistic components of text that are often elided, dismissed pragmatically as noise, reading (as in DNA) the most elemental components of language as raw data, from which we might make claims at the level of sentence, the paragraph, the chapter, the text.
As a heuristic, we believe that the Bloom Filter's analysis of the smallest, most basic units of measurement in language helps us to peer in at something decidedly less basic, and, potentially impossible to ever fully capture in an algorithmic method: the sense of newness.

Results: Visualizing Intratextual Novelty
What does the Bloom Filter reveal at the scale of the book? What formal patterns emerge in innovation and repetition? And how do those patterns help us understand intratextual novelty? In order to process even larger pieces of data, the sequences of 0s and 1s is binned into intervals of 10,000 k-mers, computing the fraction of 1s in each interval as its novelty score. These scores can then be graphed, providing us with a general shape of how intratextual novelty unfolds in narrative time, from cover to cover. Each text begins with 100% novelty -none of it has been seen before -and novelty gradually decays as character sequences are repeated. 14 The Bloom Filter reveals the ways in which character-sequence novelty unfolds throughout a text, how quickly and often k-mers are reused. The Bloom Filter thus registers moments of relative novelty, in which moments of newness are measured against the aggregated text that came before -or, novelty in context. Along the x-axis are intervals of binned k-mers; the y-axis shows the degree of novelty. We see a fairly steady rate of decay, without much internal variation.
In Middlemarch, for instance, novelty appears to decay at a steady, gradual rate from the novel's early intervals (see fig. 1). The peaks registering "novel" and valleys registering "not-novel" do not vary substantially from the best fit line (r²=0.8903), and occur at what appear to be fairly well-paced intervals. This suggests that the language of Middlemarch is fairly consistent, and exhibits very little internal variation. Middlemarch demonstrates usefully the limitations of the Bloom Filter. It would be a mistake to read this graph as a visual description of the shape of the novel's plot, sentiments, or narrative structure. Though novelty seems to increase in the novel's last, small interval, we cannot rightly say -either by interpreting this graph or by reading Middlemarch -that the novel has a surprise ending. We can say, however, that the language in Middlemarch appears to vary very little, and novelty appears to decay at a steady rate.
By contrast, and in keeping with our filter's name, we turn to James Joyce's Ulysses. How would the filter register Ulysses's novelty, if at all? Bloomian resonances aside, Ulysses is a useful counterpoint to Middlemarch: it's a common touchstone in discussions of novelty and the signal text in literary modernism. Levenson uses Ulysses' "telegraphic repetition" to make his case for modernist intratextuality. How would the novelty filter respond to a text that we presume, from the outset and without question, to be incredibly novel in distinctly modernist terms? Joyce's language is intentionally playful and experimental, providing ample opportunity to test the Bloom Filter's precision and flexibility for measuring intratextual novelty: Ulysses relies on a new language system for (almost) each episode, from the advert-speak of "Aeolus" to the catechism of "Ithaca. " How does the Bloom Filter respond to these changes? Unlike Middlemarch, the dispersal of character-sequence novelty throughout Ulysses appears uneven (see fig. 2). While the novelty of Middlemarch decays at a steady rate, Ulysses appears to exhibit a good deal of variation, suggesting both frequent innovation and repetition. While the graph of Ulysses is striking in its inconsistency, we need to return to the text in order to study these high and low points in detail. We wrote a Python script that marks up texts with intervals and running novelty scores to ease the process of cross-referencing and determining what textual features may have produced a given score (available for public access in our GitHub Repository). 15 An examination of Ulysses helps us to clarify the nature of the novelty laid out by the Bloom Filter: the percentage of the text that registers as "novel" is far less significant to our purposes than the ways that the text varies -how it repeats, revises, and recirculates textual material.
After the early chapters (declining as Stephen Dedalus and Buck Mulligan speak, then jumping up again as the narrative shifts to Eccles Street), the major spike in novelty occurs around interval 90, approximately 55% of the way through the text. Even given Joyce's experimentation, this increase in novelty is dramaticparticularly at the novel's midpoint. Interval 90 corresponds to episode 14, "Oxen of the Sun. " An episode known for its difficulty, "Oxen of the Sun" rehearses the history of the English language as Bloom waits at the Holles Street maternity hospital during Mrs. Purefoy's labor and delivery. Yet, at some point in episode 14, the English language catches up to Joyce, and novelty begins to decline once more.
Around interval 45, Ulysses' character-sequence novelty dips to a surprisingly low level. This decline corresponds to the beginning of episode 10, "Wandering Rocks, " which presents a series of short vignettes about Dublin residents loosely connected to the Bloom and Dedalus families. The language is highly repetitive, both in word choice and sentence structure, with the character's full name in the subject position -"Father John Conmee" or "Blazes Boylan, " without abbreviation -followed by a simple past tense verb (Father John Conmee walked, Father John Conmee thought, Father John Conmee smiled). The repetition in names and sentence structure would seem to account for this apparent drop in novelty. While "Wandering Rocks" demonstrates very little innovation in the form of (say) portmanteau or onomatopoeia, its character-sequence repetition in simple sentences represents a marked difference within Ulysses, particularly on the heels of the absurd rhetorical battle of episode 9, "Scylla and Charybdis. " These low-points are significant, and should not be read as "not novel, " but, as in our example of Gertrude Stein, moments of unusual character-sequence repetition that serve as a form of experimentation itself.
Just as peaks in novelty scores indicate moments of high innovation, we suggest reading these valleys of moments of experimental repetition, a form of textual play and reinvention. The Bloom Filter most effectively highlights the interplay of sameness and difference that is a hallmark of intratextuality. In the context of fluctuating difference, sameness or repetition rightly appears as a formal choice. Because the Bloom Filter produces a measurement of relative novelty, moments of intense repetition (such as "Wandering Rocks") only make the bursts of new language (such as "Oxen of the Sun") all the more startling, and the spikes in our readouts steeper. The Bloom Filter thus helps us to positively value unusual repetition as a form of novelty in itself.
It is tempting to read the novel's plot into the graph -to equate each peak with a new episode in Leopold Bloom's walk around Dublin. This simply is not the case; there is no direct correlation between novelty score and plot. While there are moments where the novelty scores correspond to important episodes in the novel, this is due to the language that Joyce employs in a particular episode; this clear correspondence would likely not appear in texts that are more singularly voiced.
(Later examples will further illuminate this principle). Even still, the case of Ulysses helps us to clarify the type of novelty that the Bloom Filter can register at the scale of the single text. It would seem that it is especially wellsuited to evaluating intratextual novelty in highly fragmentary, multivocal texts, like Ulysses -evaluating the ways that a text deploys novelty internally through reinvention, regeneration, or recycling textual material.

Results: Comparing Intratextual Novelty
Our first-pass, visual comparison between Middlemarch and Ulysses appears to indicate that Middlemarchian novelty decays relatively consistently throughout and Ulysses displays wild intratextual variation. But how consistent or wild are each of these texts when compared to others? To understand the relative intratextual novelty of both texts, we compared intratextual novelty scores on a midsized corpus. We rely on two primary metrics (depicted in the linear regressions above) to compare the intratextual novelty of multiple texts: r² and slope. Taken together, these two measures are surprisingly descriptive.

Degree of Formal Variation:
We use r² to describe the degree of formal variation. The r² value, typically a measurement of error, measures the relationship between the best-fit line and the running novelty scores. A high r² value (close to 1) indicates very little internal variation since the the best-fit line adequately describes the general shape of the novelty's decay (as in Middlemarch). A low r² value indicates that there is a high degree of internal variation, for the best fit line does not accurately describe the running novelty scores (as in Ulysses). The r² can handily describe the degree of variation in the language, accounting for both moments of striking newness and moments of unusual repetition.
Rate of Novelty Decay: By contrast, the slope tells us about the rate or pace of novelty decay, or how quickly a text becomes ingrained in its own linguistic milieu. A steep slope suggests that novelty depletes itself quickly and steadily; the novel's language is likely conventional (unto itself) and consistent throughout-again, Middlemarch. A shallow slope, by contrast, indicates that a text decays slowly; said another way, a shallow slope indicates that variation occurs throughout the entirety of the text. The slope thus tells us something about the structure of intratextual novelty in the most general terms, without accounting for the shape of its unfolding, as the r² value does.
By graphing the r² (degree of variation) and slope (rate of decay) together, we can see how much and how often variation occurs over the course of a text, allowing us to better clarify the contours of intratextual novelty. These two parameters provide a basis for comparing texts to one another (i.e., the intratextual novelty of Text A against the intratextual novelty of Text B) and have the potential to enable macro-level questions. By graphing the degree of variation against the rate of decay, we hypothesized, we could establish a set of parameters for typical intratextual novelty-that is, the way that most texts exhibit novelty internallywhile accounting for variation in textual patterns.
We ran the Bloom Filter against a corpus of texts, graphing each according to its degree of variation (x-axis) and rate of decay (y-axis). For this exploratory analysis, our corpus remained relatively small, only 410 Anglophone texts, spanning 1700-2016. We used English-language texts provided by the txtLAB 450 corpus. 16 Additionally, we developed a working list of 20th century titles by compiling publicly available doctoral field exam lists from leading universities, arriving at a relatively stable list of widely-held canonical titles, evenly dispersed over time. (See Appendix for a full list.) Figure 3. This graph shows the novelty of 410 Anglophone novels. r² is measured along the x-axis, and slope is measured along the y-axis. Confidence intervals have been inserted, with p<0.05. Ulysses is marked with a green arrow, and Middlemarch is marked with a red arrow.
On the x-axis is the degree of variation (r²), and on the y-axis, the rate of decay (slope). Texts that are closer to 1.0 on the x-axis exhibit a smaller degree of internal variation. The closer a text is to 0.00 on the y-axis, the slower the rate of decay. Theoretically, the text with the highest levels of intratextual novelty would be one with a completely flat slope and an r² value of 1.0 -high degrees of variation, consistent throughout the text. A more realistic version of high intratextual novelty, however, seems to couple very flat, slightly negative, slopes with correspondingly high degrees of internal variation. In figure 3, dotted lines represent confidence intervals of p < 0.05; texts that fall outside of those dotted lines exhibit statistically significant levels of intratextual novelty. The centermost segment of the graph, where most texts fall, indicates a fairly standard degree of intratextual novelty: a moderate degree of internal variation and a moderate rate of decay. Middlemarch (marked with the red arrow) is positioned within this central cluster, and is typical of novels in our corpus.
Many of the texts that exhibit an unusual degree of intratextual novelty have long been held up as exemplars of modernist novelty, giving us confidence in the validity of our measure. Yet many of these results surprise us. Ulysses, marked by a green arrow, is not quite as overwhelmingly novel as we presumed; while exhibiting a slower rate of decay, its degree of internal variation is actually not statistically significant to the degree we're measuring here. This is not to say that Ulysses is not a significantly novel text according to the Bloom Filter, but that different types of intratextual novelty emerged through our analysis.

Intratextual Fragmentation: High, Sustained Variation
Novels that have both a significantly slow rate of decay and a significantly high degree of internal variation are our most novel texts, and are located in the top left corner of our graph; tellingly, only one novel falls in this region: The Sound and the Fury. No other text in our corpus exhibits anything close to this degree of intratextual novelty, in terms of both variation and continuity. (In a larger corpus, we believe that it is entirely plausible that other texts would exhibit novelty of this degree.) No other text in our corpus contains an interval score that exceeds that of the first interval; novelty consistently decays from the beginning of most texts. Here, however, the novel begins with a novelty score of .8998 in interval 1, then drops to a low-point of .748 at interval 9, before jumping up to .932 at interval 13. An examination of the binned text confirmed our suspicions: The Sound and the Fury famously opens with narrative written from the perspective of Benjy Compson, which consists of a great deal of word-level repetition, a limited vocabulary, and simple syntax. The novelty score begins shifting upward at interval 10, corresponding with the shift to Quentin Compson's point of view; novelty increases to a 1.0, entirely new in comparison to Benjy's narration, reflecting not only the point of view shift but the difference in language and syntactic complexity. While this result is consistent with a reading of the novel, and is a (more extreme) version of what we expected to see reflected in a graph of The Sound and the Fury, it is less the individual high-and low-points that are significant than the degree of difference between the two. To say that Benjy's point of view is somehow notnovel because the scores are so low is simply false; indeed, Benjy's narration is perhaps the most unusual aspect of The Sound and the Fury. Similarly, to say that Quentin's point of view is somehow more novel simply because of its high score is also disingenuous. The difference between these two sections, and the rather jarring shift, is what distinguishes The Sound and the Fury from other novels in our corpus.
The Sound and the Fury is also continuously novel, spiking again at interval 47; that is to say, the novel repeatedly reinvents itself, introducing new types of textual variation relatively late in the text. While the middle sections of the novel, narrated by Quentin and Jason Compson, do not vary substantially (all scores falling within the standard error range), the end of the novel peaks toward the beginning of "April Eighth, 1928. " This may be due to the narrative emphasis on Dilsey and the prevalence of Faulkner's variation on black English; it may also be due to the shift from first person to third. Because the language shifts with each narrative perspective, and because the novel changes perspective continuously, The Sound and the Fury exhibits an especially flat slope. More than any other text we examined, The Sound and the Fury exhibits high and consistent textual variation, whereas other novels with multiple narrators (as in our next case, The Color Purple) taper off once the diverging voices have been introduced.
That The Sound and the Fury should emerge as our most novel text confirmed our confidence in our measure's ability to capture the types of intratextual novelty most closely associated with modernism -novelty akin to fragmentation, montage, or a composite form. The Sound and the Fury is a fascinating example; no other text in our corpus match its internal variation or its consistent renova-tion. But it is also important to note that The Sound and the Fury is not the only Faulkner text in our corpus. We also included As I Lay Dying, for comparison's sake, and found that it falls within the typical range. Our intratextual novelty scores seem to be not necessarily a matter of oeuvre or aesthetic, but (appropriately) vary by text. For now, The Sound and the Fury is in a class of its own, a text that varies wildly in its Bloomian novelty, but when all is said and done, the final page turned, has also left a record of sustained textual invention.

Dialogic Novelty: High Variation, Average Decay
Novels in Category 2 exhibit many of the same features as The Sound and the Fury by virtue of their equally high degree of internal variation: the language varies such that the best fit line does not accurately describe the majority of the text. While more texts exhibit this type of novelty than Category 1 novelty, it is still rather rare in our corpus. The texts that fall into this category, including The Color Purple and A Portrait of the Artist as a Young Man, would indicate that our results fall less neatly into categories than a continuum. We turn to The Color Purple for illustration. While the significance of A Portrait of the Artist is unsurprising, given our research team's investment in literary modernism and our measure's apparent sensitivity to intratextual fragmentation, The Color Purple is associated neither with the modernist period, nor is it held up as an example of a modernist continuation beyond midcentury into the latter 20th. A closer investigation of Walker's novel is particularly illustrative of the Bloom Filter's limitations.
The Color Purple has a typical rate of decay. But the degree of internal variation is significantly high. The Color Purple is an epistolary novel, and like The Sound and the Fury, has more than one first-person narrator-Celie composes the majority of letters, first to God and then to her sister Nettie, who in return writes a number of letters to Celie. There is a fairly steep rate of decay in the first quarter of the novel, consistent with Celie's vocabulary and short letters. The novelty increases, however, once Celie discovers and begins reading Nettie's letters, hidden over a number of years (the biggest spike in novelty corresponds with Celie's initial discovery). Nettie's letters employ a more sophisticated vocabulary and sentence structure, in keeping with her education. Furthermore, Nettie's vocabulary is contextually different than Celie's. Nettie is a missionary in northern Africa, and her vocabulary reflects her immediate cultural context; Nettie's letters introduce nouns that would never occur in Celie's letters. Even still, while the running novelty score peaks slightly each time the narration shifts between the sisters, it recalibrates to their respective vocabularies and contexts, decaying at a standard rate.
Figure 5. This graph shows the novelty of Alice Walker's The Color Purple, which exhibits a significantly low r² but a typical slope.
The instances of unusual repetition in The Color Purple are especially instructive-the name "dialogic novelty" is meant to gesture toward these low-points as much as the novel's structural dialogue between Celie and Nettie. Unlike Nettie's letters, Celie's, as the main narrative, consist of a great deal of dialogue. Celie writes without punctuation and with a great deal of repetition to indicate who is speaking and to whom. This passage, for example, comes from the interval with the lowest novelty score: This back and forth between Sofia and Eleanor Jane is fairly typical of the dialogue that Celie writes. With a sort of call-and-response structure, Sofia repeats much of what Eleanor Jane tells her, with slight differences, often implied tonally ("You just don't like him cause he look like daddy, " and "You don't like him cause he look like Daddy"). Without punctuation, Celie also insists on repetitive dialogue tags-"say Sofia" and "say Miss Eleanor Jane" repeat a number of times in this passage, consistent with all of the dialogue in the novel. Here, the measurement is somewhat deceptive. Dialogue is not the same thing as experimental repetition, in a Steinian sense, but the Bloom Filter, agnostic to the fictional content it scans, encodes these segments of texts similarly. This is not to say that The Color Purple is a false positive, but it does reveal the limitations of the measurement. The work of close reading remains necessary in order to understand the dynamics being measured, scored, and graphed.

Maximalist Novelty: Average Variation, Slow Decay
Category 3 texts demonstrate a statistically significant level of novelty across one variable: slope. Category 3 texts sustain a novelty continuously over their length, but do not exhibit as much internal variation as the unusual exemplar in Category 1; interval novelty scores rarely exceed or fall below a standard margin of error, resulting in an non-significant r² score. The slopes, however, are statistically flatter than most texts in our corpus. Novels in this category include Ulysses, Infinite Jest, and Gravity's Rainbow. We'll take Infinite Jest as our example.
Infinite Jest (see Fig. 6) exhibits very little internal variation; its r² value falls within the typical range. Initially, the novel's typical r² score surprised us. We were expecting Infinite Jest to exhibit a higher degree of internal variation. In contrast to The Sound and the Fury and Ulysses (which exhibits a low, though still statistically insignificant r²) Infinite Jest is more univocal on a narrative scale; while The Sound and the Fury and The Color Purple are focalized in the first-person and through a number of characters, Infinite Jest retains a consistently detached third-person narration. Infinite Jest owes its flat slope and significant score to one distinctive portion of the novel: the end. Those familiar with the novel can likely hazard a guess why: the novel concludes with a dense and meticulously detailed appendix, beginning around interval 284. The "Notes and Errata" novelty scores are, true to their name, much more erratic. Figure 6. This graph shows novelty over the course of David Foster Wallace's Infinite Jest. While the r² is typical, the slope is significantly shallower than most texts in our corpus.
The endnotes have dramatically influenced the shape of the novel, pulling the slope upward with their strange variations in form. Other texts that fall into this category also reflect this pattern, with the later portions of the text shifting in novelty and rendering the slope more shallow. Thomas Pynchon's Gravity's Rainbow also sits comfortably in this category. Gravity's Rainbow (see Fig. 7) begins with a rather steep decline. Around interval 125, however, the novelty levels off; a best-fit line drawn from interval 125 to the end of the novel would look relatively flat, easily characterizing the majority of the novel. While perhaps Pynchon and Wallace do not exhibit a great deal of innovation on the micro-level -or, at least, not much more than is typical -what novelty they do exhibit comes later in the text. Infinite Jest's appendix is a limit case; we hypothesize that additional texts in this category would likely resemble something closer to Gravity's Rainbow than Infinite Jest. Our examples in this category raise the question: is this postmodern novelty? Does it speak to characteristics of the postmodern novel, typified by Wallace and Pynchon? While the presence of Ulysses seems to confound a neat periodspecific reading, it perhaps suggests a continuity of literary experimentation in the twentieth century. As exemplars of their respective periods, the similarity of these texts may gesture towards a more singular modernity than a postmodern rupture. Rather than a question of periodization, we might also approach these texts with questions of form. These three texts are all quite long; perhaps this type of novelty has more to do with length than with any other stylistic category. 17 Perhaps we might say, instead, that the novels in this category, Ulysses included, represent a sort of maximalist tendency that reached its apex in postmodernism. 18 17 One would be forgiven for wondering at this point whether the Bloom Filter is simply biased toward longer texts: A line drawn from the novelty reading at point A (the beginning of the text) to point B (the ending) has a shallower slope the farther apart they are. Yet our data reveals that the longer texts we examine here really are generating as much new textual novelty over their length than their shorter yet otherwise identical counterparts. A bias in favor of "more novel" longer texts would be operative only if the novelty decay plateaued at some point, in which case extending the plateau a few hundred more pages would certainly seem to "artificially" elevate the calculated slope. Our analysis shows, however, that attempts to fit non-linear patterns of decay to the data do not return increased accuracy, and we do not see evidence of this plateauing effect. In fact, the high r² values our maximalist texts tend to receive, indicate that their shallowly sloping novelty does not subside in ways a non-linear measure would capture.
18 Stefano Ercolino has remarked on the relationship between length and maximalism, arguing that, "Length is not simply a neutral material aspect as regards the maximalist novel, but something more… It is a possibility that turns out to be related both to the strongly innovative and experimental nature of maximalist novels and to their ambition to realize synthetic-totalizing representations of the world. So, just as the Bloom Filter is particularly well-suited to uncovering patterns in fragmentary texts, it may also be responsive to the long novels that characterize postmodernism. If not "postmodern" and if not simply "long, " then perhaps we can see in this type a definition of Maximalist Novelty.

Not Novelty: Very Little Variation, Fast Decay
Curiouser and curiouser. Alice's Adventures in Wonderland (see Fig. 8

) provides us with another fascinating example. Like The Sound and the Fury, Alice's Adventures in Wonderland is the only text of its type, though others (including Carroll's
Through the Looking Glass) do tend toward this type of novelty. While The Sound and the Fury has a uniquely low r² and shallow slope, Alice's Adventures is nearly its inverse: it has the steepest slope of any text in our corpus, though the r² is quite high; at times, there is very little variation between the best fit line and the novelty interval scores. This suggests that there is very little linguistic variation at the structural or vocabulary level.
Alice in Wonderland is not significant because of its high degree of internal variation or its slow decay; it is unique in terms of how little variation it exhibits. This may be related to any number of factors: its genre, its target audience (it is the only children's book in our corpus), its precocious narrational style. While these hypotheses are provocative, and may bear out with more testing, none of them seem to quite explain why Alice scores so distinctly according to the Bloom Filter. Alice's Adventures in Wonderland offers an important corrective: all texts exhibit some novelty, according to our measure. To argue that the texts that fall within the typical novelty range are not novel is incorrect; as measure of intratextual novelty, the Bloom Filter gives each text an unfolding series of scores for how novel it is according to its own parameters. Alice's Adventures in Wonderland registers as significant for how little novelty it exhibits in comparison to our range of "normal" texts. This text is significant for its relative lack of novelty: the slope is quite steep, and the r²is quite high.
While only those texts that fall outside of the dotted lines are significantly novel, it is worth considering this typology as poles toward which novels tend on a continuum: novels with a lower r² tend to vary more syntactically; novels with a steeper slope tend to be more conventional. Though tendencies in novelty may not be statistically significant, they are nevertheless descriptive as we seek to understand the shapes that intratextual novelty takes, and provide a useful basis for comparing novelty across texts. By no means is this analysis exhaustive; rather, it offers an explanatory heuristic by which we might consider some basic shapes, patterns, and behaviors of intratextual novelty. Without question, there are more questions to be asked regarding languages, genres, and any number of other analytical categories.

Conclusion
This proof-of-concept has demonstrated the potential of the Bloom Filter to measure intratextual novelty and suggested ways that the results of this measurement might be brought to bear on larger questions of literary history, helping us to better understand the paradoxical concept of novelty and the literature that lays claims to it. Yet our results raise far more questions than they answer, first among them: does novelty really exist at the level of the alphabetic character? William James insisted that not even meaning could be drawn from the character level: "It is not as if men had first invented letters and made syllables out of them, then made words of the syllables and sentences of the words; -they actually followed the reverse order. " 19 Yet words are also not made out of thin air -they do take form in specific and concrete languages obeying physical rules of morphology and phonology that do leave their traces in the accidents of spelling. Still, can we argue that novelty can be measured at the level of the k-mer, an admittedly constructed unit of measurement? Certainly, the k-mer is not the "it" that Ezra Pound had in mind when he insisted upon making it new, not Eliot's "really new" that should disrupt and rearrange the nature of tradition. And, certainly, novelty cannot be disentangled from its cultural context. This measurement is an imperfect one, without question. Imperfect though the measurement might be, we believe that the Bloom Filter does provide a suitable proxy for the measurement of intratextual novelty, at both the micro and macro scales.
Beyond demonstrating the effectiveness of Bloom Filter, our preliminary analysis raises a number questions, inviting further application of the measurement that can intervene in the literary history and periodicity of the 20 th century. Consider two of our significantly novel texts: Infinite Jest and Gravity's Rainbow; intratextual novelty may, in fact, be more characteristic of postwar novels, prompting any number of questions regarding postmodernism's extension, negation, or response to earlier modernist experimentation. While, on the one hand, these results may affirm a lineage of 20 th century maximalist experimentation-from Joyce to Pynchon to Wallace-they also seem to disrupt the periodizing narrative of modernism's conclusion on or about 1945.
Alternatively, it seems equally plausible that, when taken as a whole, modernism may be far less novel than we thought-that these statistically significant texts are, in fact, outliers. Recently, scholars of literary modernism have moved away from novelty as an explanatory feature of periodicity, instead considering modernism's relationship to the mundane, the quotidian, the obsolete, the ordinary, and the everyday. 20 No connection between modernism and intratextual novelty would, in itself, be a fascinating result. While modernism is often defined by such figures as Joyce and Faulkner, a larger corpus of modernist fiction could reveal the exemplarity or exceptionality of their writing in the context of their contemporaries, widening or narrowing the Great Divide accordingly.
Finally, what of The Color Purple? Among our statistically significant results, it is the only novel written by a woman, and the only novel written by a person of color-authors who tend to be written out of the history of High Modernism (capital-H, capital-M) and, for the most part, postmodernism. On the one hand, The Color Purple's presence may suggest limitations of the measurement-dialogue mistaken for formal innovation. But perhaps the significance of The Color Purple prompts a reconsideration of the whitened and masculinized genealogy of 20 th century experimentation. Perhaps Walker's formal innovations, as revealed by the Bloom Filter, challenge notions of literary experimentation and novelty.
While these questions are provocative, they cannot be answered without expansion of this study and more creative applications of the Bloom Filter. As with any new form of measurement or hypothesis, only replication will validate the hypotheses we propose here. But we believe that this method is promising, particularly for scholars of literary modernism, and it is our hope that even more novel uses for the Bloom Filter might emerge in time.