An essay is not merely a concatenation of paragraphs. Each paragraph in an essay serves a purpose, or a rhetorical function. Thus, the purpose of the essay is likely to be conveyed to the reader only when the appropriate kinds of paragraphs are used in a meaningful order (Meyer & Freedle, 1984). The function of paragraphs in essays is analogous to the function of sentences in paragraphs. The topic sentence of a paragraph, typically the first sentence, establishes the theme of the paragraph. The sentences immediately following the topic sentence support, expand, and elaborate the theme. A final warrant sentence offers the relevance, importance, or significance of the issues discussed within the paragraph (McCarthy et al., 2008; Toulmin, 1969). This three-part division is similar to the function of paragraphs in a typical student essay, particularly the five-paragraph essay: the Introduction, the Body Paragraphs, and the Conclusion (Smith, 2006; Wiley, 2000). The first paragraph, the introduction, has the rhetorical function of providing readers with (usually) three ideas to be discussed in the essay. The following three paragraphs, comprising the body of the essay, serve the rhetorical function of supporting, explaining, and elaborating on one of the ideas presented in the introduction. The final paragraph, the conclusion, has the rhetorical function of restating and emphasizing the importance of the ideas presented in the preceding paragraphs (College Board, 2009; Nunnally, 1991; Smith, 2006; Wesley, 2000).

The rhetorical functions of introductions, body paragraphs, and conclusions, as described by the five-paragraph theme, have been adopted by other conceptions of essay structure. That is, students are taught the rhetorical functions of paragraphs that operate within the five-paragraph theme, even when they are not taught to use the five-paragraph essay theme itself (see Wesley, 2000). The act of teaching students these rhetorical functions rests on the assumption that each body paragraph serves the same rhetorical function (College Board, 2009; Essay Info, 2009), and it disregards potential variance in the total number of paragraphs of the essays.

According to College Board, only 8% of the student essays of the SAT writing assessment test actually conformed to the five-paragraph theme. This frequent divergence from the five-paragraph theme suggests variance in the total number of paragraphs that students produce in response to essay prompts (College Board, 2009). Such variation in the total number of paragraphs of student essays may be related to variation in the rhetorical function (and therefore the linguistic characteristics) of the body paragraphs. For example, one student may write only three paragraphs in response to a prompt, while another may write five or six paragraphs. If both students in this hypothetical situation were responding to the same prompt, then their essays would be representative of their individual approaches to addressing the prompt, and the respective body paragraphs that they would produce may feature differing linguistic characteristics that are dependent upon those approaches.

This study examines the relationship between the total number of paragraphs in an essay and the linguistic characteristics of its body paragraphs. In approaching this question, we formed two contrasting hypotheses. The first hypothesis is that a greater total number of paragraphs is indicative of an essay in which ideas are more rhetorically explicit within each paragraph, because the use of more paragraphs implies more sophisticated explication and knowledge. These body paragraphs are likely written by students who are better equipped to address essay prompts thoroughly, and the paragraphs are more likely to demonstrate characteristics such as cohesion and elaboration (qualities that are shown to significantly improve essay scores) (Crossley, White, McCarthy, & McNamara, 2009). The contrasting hypothesis is that a greater total number of paragraphs is indicative of writers who are conceptually diverse and tend to express multiple ideas without elaborating (i.e., covering more ideas, but in less depth), resulting in paragraphs that are lower in cohesion and higher in characteristics of text difficulty. McNamara, Crossley, and McCarthy (2009) found that such essays tend to be graded significantly higher on scales of writing proficiency. In both cases, the presence of more paragraphs in student essays tends to increase the likelihood that they receive higher grades. This study, however, does not focus on essay quality, but rather on providing a detailed observation of the linguistic features that characterize the body (middle paragraphs) based on the total number of paragraphs in the essays.

Linguistic characteristics

When examining the linguistic characteristics of text, cohesion often plays a very important role in distinguishing texts from one another. Cohesion refers to the extent to which ideas in a text are explicitly connected. Cohesion can be derived from a variety of sources, including word overlap (e.g., noun overlap, content word overlap), causal relations within text (e.g., ratio of causal particles to causal verbs), connective usage (e.g., because, eventually), and lexical diversity measures (low lexical diversity indicating higher lexical repetition; see McCarthy & Jarvis, 2010). Texts with higher cohesion tend to be more easily comprehended than texts with lower cohesion, as the relationships between ideas in cohesive texts are more explicit. These explicit connections between ideas allow readers to make fewer inferences in their efforts to understand a text, emphasizing the importance of expressing ideas coherently (e.g., McNamara, 2001; McNamara, Louwerse, McCarthy, & Graesser, 2010; Ozuru, Dempsey, & McNamara, 2009).

Research on reading comprehension has also shown that the presence of textual cohesion (or lack thereof) is a good indicator of the difficulty of a text (see Duran, Bellissens, Taylor, & McNamara, 2007). McNamara and colleagues claim that the readability formulae (e.g., Flesch-Kincaid Grade Level) that are most commonly used to select appropriate textbooks for students are not optimal due to their incorporation of only surface-structure textual characteristics (e.g., number of syllables per word). On the other hand, cohesion and other deep-structure characteristics (e.g., word frequency measures, semantic features, word familiarity) significantly correlate with ratings of text coherence and difficulty (Duran et al., 2007; Graesser & McNamara, in press).

Coh-metrix

Coh-Metrix is a computational tool that measures both the surface-structure and deep-structure linguistic characteristics of words, sentences, and discourse. This assessment is achieved by combining several linguistic databases, a syntactic parser, and a broad range of state-of-the-art textual analysis approaches. For instance, the MRC database provides psycholinguistic information about word meaningfulness and familiarity (Coltheart, 1981), and Latent Semantic Analysis (LSA) uses a statistical representation of world knowledge based on corpus analysis to calculate the semantic similarities between units of texts (Landauer, McNamara, Dennis, & Kintsch, 2007). Graesser and colleagues provide an extensive overview of many of the linguistic indices supported by Coh-Metrix (Graesser, McNamara, Louwerse, & Cai, 2004; McNamara et al., 2010).

Numerous studies have shown that Coh-Metrix indices are powerful enough to detect even subtle differences in text and discourse, and many of these studies use Coh-Metrix to distinguish between different types of texts. For instance, Louwerse, McCarthy, McNamara, and Graesser (2004) used Coh-Metrix to find significant differences between spoken and written English. McCarthy, Lewis, Dufty, and McNamara (2006) showed that Coh-Metrix indices can successfully predict original authorship, even while considering significant shifts in authors’ writing styles over time. And, Lightman, McCarthy, Dufty, and McNamara (2007) were able to distinguish between the beginnings, middles, and ends of chapters in a corpus of expository text books for high school students. Studies such as these indicate that Coh-Metrix is a valuable text analysis tool capable of analyzing and differentiating a variety of text types.

The current study includes a subset of 13 Coh-Metrix indices (see Table 1) that have been shown to effectively represent the different levels of textual and semantic cohesion and difficulty (see McCarthy, Guess, & McNamara, 2009; McNamara et al., 2009, 2010). The first five Coh-Metrix indices in Table 1 are indicators of cohesion. Noun Overlap measures the repetition of nouns across consecutive sentences; more cohesive texts tend to repeat the same nouns across sentences (McNamara et al., 2010). MED (Minimal Edit Distance) is an approach to measuring differences in the sentential positioning of content words. This measure produces values in the opposite direction of most measurements of cohesion, because a high value for MED indicates that content words are located in different places within sentences across the text (e.g., “Elizabeth is the queen of England.” vs. “This castle belongs to Elizabeth.”), suggesting lower structural cohesion (see McCarthy et al., 2009). The Measure of Textual Lexical Diversity (MTLD: McCarthy & Jarvis, 2010) evaluates the degree to which a text varies in terms of lexical deployment. That is, texts that use many different words (with little repetition) receive higher MTLD values than texts with greater repetition, and texts that have lower lexical diversity (those that use the same words throughout the text) tend to be more cohesive (McNamara et al., 2010). As previously mentioned, LSA uses a statistical representation of world knowledge to measure semantic similarities between units of texts (Landauer et al., 2007). The incidence of causal connectives (e.g., so, because) reflects the degree to which ideas are connected in the text using such causal connectives. Because understanding the causal relationships between objects within a text is integral for comprehension, a higher incidence of causal connectives suggests a repetition of causal relationships and serves as a measure of cohesion.

Table 1 Descriptive statistics of middle paragraphs as a function of the total number of paragraphs in the essay

The next eight Coh-Metrix indices in Table 1 are indicators of text difficulty, while the last two are measures of text length (i.e., number of words and number of sentences). As opposed to the incidence of causal connectives, the incidence of causal verbs is inversely related to text difficulty (Graesser, McNamara, & Kulikowich, 2010). Causal verbs (e.g., pour, break) represent state changes in text (e.g., from intact to broken), and they can be associated with narrativity (i.e., the extent to which a text expresses events rather than pure information), which is easier to process than informative text (Graesser & McNamara, in press; Haberlandt & Graesser, 1985). Among the indicators of text difficulty, Maximum Words before Main Verb is a measure of syntactic complexity. Typically, basic sentences in English express one idea and consist of a subject, followed by a verb, followed by an object (e.g., “The dog ate the bone.”). More complex sentences (e.g., “The dog that we saw in the park yesterday ate the bone.”) tend to contain more words before the mention of the main verb, increasing working memory load and, consequently, the difficulty of the text to be comprehended (Just & Carpenter, 1992). Word Concreteness measures the extent to which the meanings of the words in a text are clear and objective (McNamara et al., 2010). For instance, the word disk is more concrete than the word pleasure. Texts that feature fewer concrete terms than abstract ones tend to receive higher values in terms of difficulty. Word Concreteness Minimum is also a measure of lexical concreteness within a text. However, this measures the minimum lexical concreteness within each sentence. Together, these indices provide a more comprehensive assessment of word concreteness within the texts, without being highly inter-correlated. CELEX (Content) Word Frequency calculates the likelihood of occurrence for content words (e.g., table = high frequency; cognition = low frequency) within the CELEX corpus. Texts with many low frequency words are likely to be more difficult to read. The MRC database derives its word familiarity minimum scores from Toglia and Battig (1978) and Gilhooly and Logie (1980). Higher scores, based on human ratings, indicate words with greater familiarity (e.g., hat), as opposed to lower scores, which indicate less familiar words (e.g., abdicate). Texts composed of more familiar words are likely to be more easily comprehended than texts composed of less familiar words. The Flesch-Kincaid Grade Level measure is commonly used to estimate the difficulty of student text books. The formula is based on the number of words per sentence and the number of syllables per word. Texts with longer sentences often present readers with more ideas, and texts with longer words imply higher semantic complexity, jointly adding more difficulty to the text overall.

As mentioned previously, the 13 Coh-Metrix indices used in this study were selected based on prior research indicative of their importance. According to that research, the variables accurately represent their respective linguistic constructs (e.g., LSA as a measure of semantic overlap), without being highly inter-correlated. Table 2 provides correlations between the 13 variables of this study. Although many of the correlations between the variables are significant, none of them are strong enough for any of the variables to be excluded (r ≥ ±0.70).

Table 2 Correlations among Coh-Metrix indices

Corpus

Our corpus consists of 811 body (middle) paragraphs that were extracted from 1,418 paragraphs of 311 essays. The essays were collected from introductory English composition courses at Mississippi State University (n = 189) (Crossley & McNamara, 2009; McNamara et al., 2010), an introductory psychology course at Northern Illinois University (n = 60) (Wolfe, Britt, & Butler, 2009), and College Board (n = 62). Each essay was written by a different student, and each student wrote in response to one of five essay prompts (e.g., “Does fame bring happiness, or are people who are not famous more likely to be happy?”). Four of the five prompts were designed to mimic prompts created by College Board, while the fifth was an actual College Board prompt (i.e., “Is the world changing for the better?”). The prompts addressed creativity, television, equal rights, cell phone usage, and whether the world is changing for the better or worse. The students were instructed to persuade their readers to take a certain position on the respective prompts.

We used Coh-Metrix to analyze each paragraph. However, it is important to emphasize beforehand that each paragraph was analyzed as an individual text unit. That is, the unit of analysis is the paragraph, not the essay. Each paragraph was categorized according to the total number of paragraphs of the source essay (i.e., the essay in which the paragraph occurred). We refer to this variable as the total number of paragraphs. Paragraphs from texts with two or fewer paragraphs were excluded from the analysis because they did not have enough paragraphs with which to occupy all three positions of an essay (i.e., first, middle, last), and thus there were no middle paragraphs in such essays. Additionally, paragraphs from texts with totals of 10 or more paragraphs were excluded due to their infrequency (n = 8, 0.009% of the data). Table 3 provides the frequency distribution of the middle paragraphs. The table shows that essays with five paragraphs were the most frequent, suggesting the continuing prominence of the five-paragraph essay. As such, the 345 middle paragraphs from essays with five paragraphs comprise 42.54% of the total number of paragraphs.

Table 3 Frequency distribution of middle paragraphs

Analysis 1

Table 1 provides the Coh-Metrix descriptive statistics for the middle paragraphs as a function of the number of paragraphs in the essay. We conducted a principal component analysis (PCA) to extract the predicted underlying factors of cohesion and text difficulty. PCA is designed to extract a specified number of latent, unobserved constructs from a set of observed variables. The variables in Table 1 were analyzed using the extraction method ‘Principal component’ (SPSS), with orthogonal varimax rotation. Varimax rotation is a commonly used statistical technique that attempts to reduce the complexity of extracted factor(s) by maximizing the variance of each variable that contributes to the factor(s) (hence the term ‘varimax’). This technique results in a simpler interpretation of the relationship(s) between the observed variables and the extracted factor(s), and it ensures that the extracted factors are not inter-correlated (StatSoft, 2010).

Based on the contributions of the variables in Table 1, we extracted two factors through PCA. The rotated (varimax) component matrix shown in Table 4 shows the correlations between each variable and the two extracted factors (i.e., factor loadings). Table 4 also provides the eigenvalues and the percentage of overall variance explained by the factors. The factor loadings aid in interpreting the latent constructs extracted from the variables, and they offer insight into how a variable contributes to a factor. Conventionally, variables that moderately correlate with their respective factor (r ≥ ±0.400) are considered to be important contributors to that factor, while eigenvalues above 1.000 and the percentage of variance explained indicate the ‘significance’ of the extracted factor. Factor 1 shows moderate correlations with many of the cohesion variables from Table 1, suggesting that those particular variables contribute to the underlying factor of cohesion. Factor 2 also features moderate correlations with the variables predicted to indicate text difficulty, suggesting its contribution to that particular underlying factor – difficulty.

Table 4 Rotated (varimax) component matrix (correlations with extracted factors 1 and 2)

Analysis 2

To examine the effect of the total number of paragraphs in an essay on the cohesion and difficulty of its middle paragraphs, we used the two extracted factors (the standardized variable scores multiplied by their respective loading coefficients for each factor) as dependent variables in two Hierarchical Linear Models (HLMs) using the SPSS procedure ‘MIXED’. This statistical technique hierarchically structures independent variables to account for variance and interactions at multiple levels simultaneously. HLM is quite often used in classroom studies (with students embedded in classrooms, and classrooms within schools), and its usage on such occasions better accounts for multi-level variance and interactions than does standard multiple regression and analysis of variance (Richter, 2006; Baayen, 2008). Furthermore, HLM is appropriate because it is designed to accommodate unequal observations, which is the case in our corpus. In the HLM, we controlled for random variance in essay topic, participant (student who wrote the essay), and the interaction between topic and participant. We also included text length (number of words in each paragraph) as a covariate in order to account for the variance associated with the length of the paragraphs. Factor 1 (cohesion) produced a significant result: F(6, 221.508) = 2.986, p = 0.008, whereas Factor 2 (difficulty) yielded a marginally significant result: F(6, 216.427) = 2.051, p = 0.060.

Table 5 displays pairwise comparisons using Least Significant Difference (LSD), a post hoc test that shows patterns in data by analyzing the variance between two variables at a time. LSD tests were conducted for each dependent variable (cohesion, difficulty) used in the HLM. The tests revealed that the differences in the linguistic characteristics of middle paragraphs were most acute in essays that featured 3, 4, and 9 total paragraphs. In other words, essays with 5 to 8 total paragraphs tended to demonstrate no significant differences between one another in terms of cohesion. Thus, these essay lengths (i.e., 5-8 paragraphs) might be deemed prototypical, with respect to the variables of analysis and the frequency of essays within this range (see Tables 1 and 3).

Table 5 Least significant difference (LSD) post hoc tests

The results suggest that students who write more than 8 paragraphs in response to College Board prompts may be over-explaining their ideas. Example Paragraph 1 in the Appendix illustrates this notion. The example is a body paragraph from a student essay with 9 paragraphs. The paragraph contains 129 words, 54 of which are repeated at least once, and of the 62 content lemmas, 19 are repeated (about 1:3). This repetition results in high overlap values and low lexical diversity values, which are both related to higher cohesion.

In contrast to Example Paragraph 1, Example Paragraph 2 is from an essay with 3 paragraphs. The student does not seem to have sufficiently explained the ideas in the text. The paragraph might benefit from greater cohesion and perhaps an expansion upon the multiple ideas presented in such a short text. Example Paragraph 2 features 89 words, only 27 of which are repeated; in terms of content lemmas, 52 are unique with a mere 8 repeated (about 1:7).

The results from the second analysis offer some evidence that, in order to write cohesively, student writers may benefit by seeking to write essays of between 5 and 8 paragraphs in length. These findings suggest that essay prompts may only warrant 5-8 paragraphs of information, a concept that largely corresponds with the expectations of the College Board and similar organizations.

Discussion

This study provides writing researchers and educators with valuable information concerning the relationship between the total number of paragraphs in an essay and the cohesion and difficulty of its middle paragraphs. The relationships suggest that the cohesion and difficulty of middle paragraphs are not consistent. That is, when providing instruction to students on how to write cohesive paragraphs or when grading student essays, it may be advantageous to consider how much they plan to write, or how much they have written.

Differences in cohesion between the middle paragraphs across student essays seem to be most prominent when the total number of paragraphs is 3, 4, or 9. These differences suggest that students who write fewer than 5 paragraphs may not be writing enough to adequately convey the ideas about the prompts and that students who write more than 8 paragraphs are being repetitive (see Example Paragraphs 1 and 2).

McNamara et al. (2009) found that ratings of writing proficiency are not related to indices of cohesion. Interestingly though, text difficulty variables (e.g., Words before Main Verb) were significantly predictive of writing proficiency ratings. It is possible that the results of McNamara and colleagues were slightly altered by an interference of the linguistic features of first and last paragraphs; if differing linguistic features across paragraphs are considered, then cohesion may be found to play an important role in assessments of writing proficiency.

The results of this study also suggest that computational assessments of student essays would benefit by considering not only how much the student has written in terms of the number of paragraphs, but what the student has written in terms of linguistic characteristics. That is, analyzing essays by parts (i.e., first, middle, last paragraphs) may increase the accuracy of computational essay assessment by considering and accommodating differing rhetorical functions across paragraphs based on their linguistic characteristics.

Our results also confirmed that student essays most commonly include five paragraphs (42.54%). To some extent, this is to be expected because students are often encouraged to write five paragraphs in response to essay prompts, whether by teachers or by organizations such as College Board (Nunnally, 1991; Smith, 2006). Including five paragraphs in an essay seems optimal in relation to the goal presented by the task, which is to write brief essays that address a particular prompt. Unfortunately, the unwavering predominance of the 5-paragraph essay results in an unequally distributed corpus. Nonetheless, it is an aspect of our data that cannot be remedied, even given that HLM better accommodates the data set. That is, if the ecological validity of the data set is to be maintained, the number of paragraphs will necessarily vary. Given the target task examined in this study, it is likely that the number of paragraphs of essays would have an unequal distribution (with a predominance of 5-paragraph essays).

Our hope is that having a clearer view of the relationship between the total number of paragraphs of an essay and the linguistic characteristics that affect essay scores may better equip writing researchers and educators to pinpoint specific issues underlying student difficulties and to subsequently provide them with more appropriate feedback. Moreover, an examination of the relationship between the total number of paragraphs and essay scores sheds light on the constituents of essay quality and the rhetorical values (i.e., preferences) of essay graders